πŸ—ƒ Find all MGnify Analyses for a given Study#

On this page, we show how to navigate MGnify resources starting from a study and moving to the analyses associated with it using MGni.py.

Open In Colab

Introduction#

It is a common pattern when retrieving data from MGnify: you begin with one biological entity, inspect its details, explore the relationships it exposes, and then traverse those relationships to access related records.

With MGni.py you can use proxies and their links rather than manually constructing API requests and preprocessing.

By the end, you will know how to:

  • load a study by accession

  • fetch the study details from MGnify

  • inspect the relationships available on a study object

  • traverse from a study to its related analyses

  • retrieve and organize analysis details in a convenient tabular form

In this example we start from a single study, but the workflow and traversal could be applied to the other resources.


# uncomment below if colab
#!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple mgnipy
#!pip install asyncio

Starting from a Study Accession#

For this demonstration we use the study accession MGYS00010442 , but you can replace it with any other public study accession to explore different data.

study_accession: str = "MGYS00010442"

Here we import and init MGnipy

import asyncio  # optional, for async requests
import pandas as pd  # also not necessary, only for demo and type annotation
import mgnipy  # for type annotation

from mgnipy import MGnipy

# init
MG = MGnipy()

# access study resource
study = MG.study(study_accession)

# check it out
print(study)
MGnifier instance for resource: study
I.e., mgnipy.V2.proxies.StudyDetail
----------------------------------------
Base URL: https://www.ebi.ac.uk/
Parameters: {'accession': 'MGYS00010442'}
Endpoint module: mgnipy.emgapi_v2_client.api.studies.get_mgnify_study
Example request URL: https://www.ebi.ac.uk/metagenomics/api/v2/studies/MGYS00010442
Returns paginated results: False

let’s go ahead and get the given study’s details then

await study.aget()

# check it out
study.to_list()

Hide code cell output

[{'accession': 'MGYS00010442',
  'ena_accessions': ['PRJEB37289', 'ERP120598'],
  'title': 'TKI',
  'biome': {'biome_name': 'Digestive system',
   'lineage': 'root:Host-associated:Human:Digestive system'},
  'updated_at': '2026-04-21T08:55:57.196000+00:00',
  'downloads': [{'file_type': 'tsv',
    'download_type': 'Taxonomic analysis',
    'short_description': 'Summary of DADA2-PR2 taxonomies',
    'long_description': 'Summary of DADA2-PR2 taxonomic assignments, across all runs in the study',
    'alias': 'ERP120598_DADA2-PR2_16S-V3-V4_study_summary.tsv',
    'download_group': 'study_summary.v6.amplicon',
    'file_size_bytes': None,
    'index_files': None,
    'url': 'https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_results/ERP120/ERP120598/study-summaries/ERP120598_DADA2-PR2_16S-V3-V4_study_summary.tsv'},
   {'file_type': 'tsv',
    'download_type': 'Taxonomic analysis',
    'short_description': 'Summary of PR2 taxonomies',
    'long_description': 'Summary of PR2 taxonomic assignments, across all runs in the study',
    'alias': 'ERP120598_PR2_study_summary.tsv',
    'download_group': 'study_summary.v6.amplicon',
    'file_size_bytes': None,
    'index_files': None,
    'url': 'https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_results/ERP120/ERP120598/study-summaries/ERP120598_PR2_study_summary.tsv'},
   {'file_type': 'tsv',
    'download_type': 'Taxonomic analysis',
    'short_description': 'Summary of DADA2-SILVA taxonomies',
    'long_description': 'Summary of DADA2-SILVA taxonomic assignments, across all runs in the study',
    'alias': 'ERP120598_DADA2-SILVA_16S-V3-V4_study_summary.tsv',
    'download_group': 'study_summary.v6.amplicon',
    'file_size_bytes': None,
    'index_files': None,
    'url': 'https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_results/ERP120/ERP120598/study-summaries/ERP120598_DADA2-SILVA_16S-V3-V4_study_summary.tsv'},
   {'file_type': 'tsv',
    'download_type': 'Taxonomic analysis',
    'short_description': 'Summary of SILVA-SSU taxonomies',
    'long_description': 'Summary of SILVA-SSU taxonomic assignments, across all runs in the study',
    'alias': 'ERP120598_SILVA-SSU_study_summary.tsv',
    'download_group': 'study_summary.v6.amplicon',
    'file_size_bytes': None,
    'index_files': None,
    'url': 'https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_results/ERP120/ERP120598/study-summaries/ERP120598_SILVA-SSU_study_summary.tsv'},
   {'file_type': 'csv',
    'download_type': 'Taxonomic analysis',
    'short_description': 'DwC-Ready summary of 16S-V3-V4 ASV taxonomies using -PR2 as ref DB',
    'long_description': 'DwC-Ready summary of 16S-V3-V4 ASV taxonomies using -PR2 as ref DB, across all runs in the study',
    'alias': 'ERP120598_DADA2-PR2_16S-V3-V4_dwcready.csv',
    'download_group': 'study_summary.v6.amplicon',
    'file_size_bytes': None,
    'index_files': None,
    'url': 'https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_results/ERP120/ERP120598/study-summaries/ERP120598_DADA2-PR2_16S-V3-V4_dwcready.csv'},
   {'file_type': 'csv',
    'download_type': 'Taxonomic analysis',
    'short_description': 'DwC-Ready summary of 16S-V3-V4 ASV taxonomies using -SILVA as ref DB',
    'long_description': 'DwC-Ready summary of 16S-V3-V4 ASV taxonomies using -SILVA as ref DB, across all runs in the study',
    'alias': 'ERP120598_DADA2-SILVA_16S-V3-V4_dwcready.csv',
    'download_group': 'study_summary.v6.amplicon',
    'file_size_bytes': None,
    'index_files': None,
    'url': 'https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_results/ERP120/ERP120598/study-summaries/ERP120598_DADA2-SILVA_16S-V3-V4_dwcready.csv'},
   {'file_type': 'csv',
    'download_type': 'Taxonomic analysis',
    'short_description': 'DwC-Ready summary of closed-ref taxonomies using SILVA-SSU as ref DB',
    'long_description': 'DwC-Ready summary of closed-reference taxonomies using SILVA-SSU as ref DB, across all runs in the study',
    'alias': 'ERP120598_closedref_SILVA-SSU_dwcready.csv',
    'download_group': 'study_summary.v6.amplicon',
    'file_size_bytes': None,
    'index_files': None,
    'url': 'https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_results/ERP120/ERP120598/study-summaries/ERP120598_closedref_SILVA-SSU_dwcready.csv'},
   {'file_type': 'csv',
    'download_type': 'Taxonomic analysis',
    'short_description': 'DwC-Ready summary of closed-ref taxonomies using PR2 as ref DB',
    'long_description': 'DwC-Ready summary of closed-reference taxonomies using PR2 as ref DB, across all runs in the study',
    'alias': 'ERP120598_closedref_PR2_dwcready.csv',
    'download_group': 'study_summary.v6.amplicon',
    'file_size_bytes': None,
    'index_files': None,
    'url': 'https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_results/ERP120/ERP120598/study-summaries/ERP120598_closedref_PR2_dwcready.csv'}],
  'metadata': {},
  'first_accession': 'ERP120598'}]

What relationships exist with our study?#

From the study detail we can list their analyses. We can check what other relationshiops are supported using .list_relationships()

study.list_relationships()
['samples', 'analyses', 'publications']

To listing MGnify Analyses of the study#

Alright back to finding the analyses. We can access the list of associated MGnify Analyses via .analyses attribute.

When we call the list endpoint we are lazily building (not yet getting) the queries.

We can take a look at the requests that would be made at get() or aget() using the .explain() helper method.

# traverse to analyses for the given study detail
study_analyses_list = study.analyses
print(type(study_analyses_list))

# moer info about the endpoint
study_analyses_list.describe_endpoint()

# preview how many analyses to retrieve
study_analyses_list.explain()
<class 'mgnipy.V2.proxies.Analyses'>
List MGnify Analyses associated with this Study

MGnify analyses correspond to an individual Run or Assembly within this study,analysed by a MGnify
Pipelione.

Supported parameters:
- accession: (str)
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Planning the API call with params:
{'accession': 'MGYS00010442'}
Total pages to retrieve: 3
Total records to retrieve: 63
https://www.ebi.ac.uk/metagenomics/api/v2/studies/MGYS00010442/analyses?page=1
https://www.ebi.ac.uk/metagenomics/api/v2/studies/MGYS00010442/analyses?page=2
https://www.ebi.ac.uk/metagenomics/api/v2/studies/MGYS00010442/analyses?page=3

And now if we want to execute the list queries:

# async get
await study_analyses_list.aget()

# check it out
study_analyses_list.to_df().head()
Retrieving pages:   0%|          | 0/3 [00:00<?, ?it/s]
Retrieving pages:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 1/3 [00:01<00:02,  1.14s/it]
Retrieving pages: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:01<00:00,  2.47it/s]
experiment_type study_accession accession run sample assembly pipeline_version
0 Amplicon MGYS00010442 MGYA01021267 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156379', 'ena_accessions'... None V6
1 Amplicon MGYS00010442 MGYA01021246 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156386', 'ena_accessions'... None V6
2 Amplicon MGYS00010442 MGYA01021249 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156340', 'ena_accessions'... None V6
3 Amplicon MGYS00010442 MGYA01021266 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156355', 'ena_accessions'... None V6
4 Amplicon MGYS00010442 MGYA01021237 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156360', 'ena_accessions'... None V6

We can get the details for each of the analyses in multiple ways. the easiest would be to access from our study_analyses_list instance via indexing.

Note: When calling by index we do not need to manually execute get() or aget(). This will automatically be completed at the calling of the element.

# to start let's get detail forfirst analysis

# by index
first_analysis = study_analyses_list[0]
# display(first_analysis.to_df())

# or by accession
first_analysis = study_analyses_list["MGYA01021267"]
display(first_analysis.to_df())  # same same as above

# more info about the analysis detail
print("Type: \n", type(first_analysis), "\n")
print("Endpoint description: ")
first_analysis.describe_endpoint()
print("\nAnalysis details: \n", first_analysis)
experiment_type study_accession accession run sample assembly pipeline_version read_run quality_control_summary downloads results_dir metadata
0 Amplicon MGYS00010442 MGYA01021267 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156379', 'ena_accessions'... None V6 [{'experiment_type': 'Amplicon', 'instrument_m... {'sequencing': 'paired end (251 cycles + 251 c... [{'file_type': 'html', 'download_type': 'Quali... https://ftp.ebi.ac.uk/pub/databases/metagenomi... {'marker_gene_summary': {'asv': {'amplified_re...
Type: 
 <class 'mgnipy.V2.proxies.AnalysisDetail'> 

Endpoint description: 
Get MGnify analysis by accession

MGnify analyses are accessioned with an MYGA-prefixed identifier and correspond to an individual Run
or Assembly analysed by a Pipeline.

Supported parameters:
- accession: (str)

Analysis details: 
 MGnifier instance for resource: analysis
I.e., mgnipy.V2.proxies.AnalysisDetail
----------------------------------------
Base URL: https://www.ebi.ac.uk/
Parameters: {'accession': 'MGYA01021267'}
Endpoint module: mgnipy.emgapi_v2_client.api.analyses.get_mgnify_analysis
Example request URL: https://www.ebi.ac.uk/metagenomics/api/v2/analyses/MGYA01021267
Returns paginated results: False

Also we can get the AnalysisDetail’s for each in our Analyses list by iterating over it, it being e.g.study_analyses_list. (sync and async support)

Note: When interating over we do not need to manually execute get() or aget(). Similar to indexing, this will automatically be completed at the calling of the element.

# init a dict to store details for all analyses
analysis_detail_dfs: dict[str, pd.DataFrame] = {}

# get the details as dfs
async for analysis in study_analyses_list:
    analysis_detail_dfs[analysis.identifier] = analysis.to_df()

# concat into one df
df_analysis_details = pd.concat(analysis_detail_dfs, ignore_index=True)
# check it out
df_analysis_details.head()
experiment_type study_accession accession run sample assembly pipeline_version read_run quality_control_summary downloads results_dir metadata
0 Amplicon MGYS00010442 MGYA01021267 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156379', 'ena_accessions'... None V6 [{'experiment_type': 'Amplicon', 'instrument_m... {'sequencing': 'paired end (251 cycles + 251 c... [{'file_type': 'html', 'download_type': 'Quali... https://ftp.ebi.ac.uk/pub/databases/metagenomi... {'marker_gene_summary': {'asv': {'amplified_re...
1 Amplicon MGYS00010442 MGYA01021246 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156386', 'ena_accessions'... None V6 [{'experiment_type': 'Amplicon', 'instrument_m... {'sequencing': 'paired end (251 cycles + 251 c... [{'file_type': 'html', 'download_type': 'Quali... https://ftp.ebi.ac.uk/pub/databases/metagenomi... {'marker_gene_summary': {'asv': {}, 'closed_re...
2 Amplicon MGYS00010442 MGYA01021249 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156340', 'ena_accessions'... None V6 [{'experiment_type': 'Amplicon', 'instrument_m... {'sequencing': 'paired end (251 cycles + 251 c... [{'file_type': 'html', 'download_type': 'Quali... https://ftp.ebi.ac.uk/pub/databases/metagenomi... {'marker_gene_summary': {'asv': {'amplified_re...
3 Amplicon MGYS00010442 MGYA01021266 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156355', 'ena_accessions'... None V6 [{'experiment_type': 'Amplicon', 'instrument_m... {'sequencing': 'paired end (251 cycles + 251 c... [{'file_type': 'html', 'download_type': 'Quali... https://ftp.ebi.ac.uk/pub/databases/metagenomi... {'marker_gene_summary': {'asv': {'amplified_re...
4 Amplicon MGYS00010442 MGYA01021237 {'experiment_type': 'Amplicon', 'instrument_mo... {'accession': 'SAMEA8156360', 'ena_accessions'... None V6 [{'experiment_type': 'Amplicon', 'instrument_m... {'sequencing': 'paired end (251 cycles + 251 c... [{'file_type': 'html', 'download_type': 'Quali... https://ftp.ebi.ac.uk/pub/databases/metagenomi... {'marker_gene_summary': {'asv': {'amplified_re...

Alternatively, MGnifyList proxxies (e.g., studies, analyses, samples etc. plural) also have option to collect_details() or .acollect_details() which will return a list or dict of the associated MGnifyDetail proxies (e.g., study, analysis)

# or if you want to keep them as objects, you can do
analysis_details: dict[str, mgnipy.V2.proxies.AnalysisDetail] = (
    await study_analyses_list.acollect_details(
        fetch=True, by_id=True  # not lazily  # else as a list
    )
)

# for example for further relaionship traversal
# annotations = analysis_details['MGYA01021267'].annotations
Processing:   0%|          | 0/63 [00:00<?, ?it/s]
Processing:   2%|▏         | 1/63 [00:00<00:42,  1.45it/s]
Processing:  10%|β–‰         | 6/63 [00:01<00:11,  4.76it/s]
Processing:  17%|β–ˆβ–‹        | 11/63 [00:02<00:08,  5.93it/s]
Processing:  25%|β–ˆβ–ˆβ–Œ       | 16/63 [00:02<00:07,  6.58it/s]
Processing:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 21/63 [00:03<00:05,  7.04it/s]
Processing:  40%|β–ˆβ–ˆβ–ˆβ–‰      | 25/63 [00:03<00:04,  9.39it/s]
Processing:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 27/63 [00:04<00:05,  7.02it/s]
Processing:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 31/63 [00:04<00:04,  7.12it/s]
Processing:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 33/63 [00:04<00:03,  7.96it/s]
Processing:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 36/63 [00:05<00:03,  7.20it/s]
Processing:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 38/63 [00:05<00:03,  8.00it/s]
Processing:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 41/63 [00:05<00:03,  6.99it/s]
Processing:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 43/63 [00:06<00:02,  8.10it/s]
Processing:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 46/63 [00:06<00:02,  7.04it/s]
Processing:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 48/63 [00:06<00:01,  8.22it/s]
Processing:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 51/63 [00:07<00:01,  7.15it/s]
Processing:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 53/63 [00:07<00:01,  8.33it/s]
Processing:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 56/63 [00:07<00:00,  7.04it/s]
Processing:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 58/63 [00:08<00:00,  8.34it/s]
Processing:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 61/63 [00:08<00:00,  7.12it/s]
Processing: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63/63 [00:08<00:00,  7.28it/s]

Next ⏭#

We explore multiple relationship traversal: Getting all metadata from study > samples > runs > analyses