π Find all MGnify Analyses for a given Study#
On this page, we show how to navigate MGnify resources starting from a study and moving to the analyses associated with it using MGni.py.
Introduction#
It is a common pattern when retrieving data from MGnify: you begin with one biological entity, inspect its details, explore the relationships it exposes, and then traverse those relationships to access related records.
With MGni.py you can use proxies and their links rather than manually constructing API requests and preprocessing.
By the end, you will know how to:
load a study by accession
fetch the study details from MGnify
inspect the relationships available on a study object
traverse from a study to its related analyses
retrieve and organize analysis details in a convenient tabular form
In this example we start from a single study, but the workflow and traversal could be applied to the other resources.
# uncomment below if colab
#!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple mgnipy
#!pip install asyncio
Starting from a Study Accession#
For this demonstration we use the study accession MGYS00010442 , but you can replace it with any other public study accession to explore different data.
study_accession: str = "MGYS00010442"
Here we import and init MGnipy
import asyncio # optional, for async requests
import pandas as pd # also not necessary, only for demo and type annotation
import mgnipy # for type annotation
from mgnipy import MGnipy
# init
MG = MGnipy()
# access study resource
study = MG.study(study_accession)
# check it out
print(study)
MGnifier instance for resource: study
I.e., mgnipy.V2.proxies.StudyDetail
----------------------------------------
Base URL: https://www.ebi.ac.uk/
Parameters: {'accession': 'MGYS00010442'}
Endpoint module: mgnipy.emgapi_v2_client.api.studies.get_mgnify_study
Example request URL: https://www.ebi.ac.uk/metagenomics/api/v2/studies/MGYS00010442
Returns paginated results: False
letβs go ahead and get the given studyβs details then
await study.aget()
# check it out
study.to_list()
What relationships exist with our study?#
From the study detail we can list their analyses. We can check what other relationshiops are supported using .list_relationships()
study.list_relationships()
['samples', 'analyses', 'publications']
To listing MGnify Analyses of the study#
Alright back to finding the analyses. We can access the list of associated MGnify Analyses via .analyses attribute.
When we call the list endpoint we are lazily building (not yet getting) the queries.
We can take a look at the requests that would be made at get() or aget() using the .explain() helper method.
# traverse to analyses for the given study detail
study_analyses_list = study.analyses
print(type(study_analyses_list))
# moer info about the endpoint
study_analyses_list.describe_endpoint()
# preview how many analyses to retrieve
study_analyses_list.explain()
<class 'mgnipy.V2.proxies.Analyses'>
List MGnify Analyses associated with this Study
MGnify analyses correspond to an individual Run or Assembly within this study,analysed by a MGnify
Pipelione.
Supported parameters:
- accession: (str)
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Planning the API call with params:
{'accession': 'MGYS00010442'}
Total pages to retrieve: 3
Total records to retrieve: 63
https://www.ebi.ac.uk/metagenomics/api/v2/studies/MGYS00010442/analyses?page=1
https://www.ebi.ac.uk/metagenomics/api/v2/studies/MGYS00010442/analyses?page=2
https://www.ebi.ac.uk/metagenomics/api/v2/studies/MGYS00010442/analyses?page=3
And now if we want to execute the list queries:
# async get
await study_analyses_list.aget()
# check it out
study_analyses_list.to_df().head()
Retrieving pages: 0%| | 0/3 [00:00<?, ?it/s]
Retrieving pages: 33%|ββββ | 1/3 [00:01<00:02, 1.14s/it]
Retrieving pages: 100%|ββββββββββ| 3/3 [00:01<00:00, 2.47it/s]
| experiment_type | study_accession | accession | run | sample | assembly | pipeline_version | |
|---|---|---|---|---|---|---|---|
| 0 | Amplicon | MGYS00010442 | MGYA01021267 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156379', 'ena_accessions'... | None | V6 |
| 1 | Amplicon | MGYS00010442 | MGYA01021246 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156386', 'ena_accessions'... | None | V6 |
| 2 | Amplicon | MGYS00010442 | MGYA01021249 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156340', 'ena_accessions'... | None | V6 |
| 3 | Amplicon | MGYS00010442 | MGYA01021266 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156355', 'ena_accessions'... | None | V6 |
| 4 | Amplicon | MGYS00010442 | MGYA01021237 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156360', 'ena_accessions'... | None | V6 |
We can get the details for each of the analyses in multiple ways. the easiest would be to access from our study_analyses_list instance via indexing.
Note: When calling by index we do not need to manually execute
get()oraget(). This will automatically be completed at the calling of the element.
# to start let's get detail forfirst analysis
# by index
first_analysis = study_analyses_list[0]
# display(first_analysis.to_df())
# or by accession
first_analysis = study_analyses_list["MGYA01021267"]
display(first_analysis.to_df()) # same same as above
# more info about the analysis detail
print("Type: \n", type(first_analysis), "\n")
print("Endpoint description: ")
first_analysis.describe_endpoint()
print("\nAnalysis details: \n", first_analysis)
| experiment_type | study_accession | accession | run | sample | assembly | pipeline_version | read_run | quality_control_summary | downloads | results_dir | metadata | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Amplicon | MGYS00010442 | MGYA01021267 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156379', 'ena_accessions'... | None | V6 | [{'experiment_type': 'Amplicon', 'instrument_m... | {'sequencing': 'paired end (251 cycles + 251 c... | [{'file_type': 'html', 'download_type': 'Quali... | https://ftp.ebi.ac.uk/pub/databases/metagenomi... | {'marker_gene_summary': {'asv': {'amplified_re... |
Type:
<class 'mgnipy.V2.proxies.AnalysisDetail'>
Endpoint description:
Get MGnify analysis by accession
MGnify analyses are accessioned with an MYGA-prefixed identifier and correspond to an individual Run
or Assembly analysed by a Pipeline.
Supported parameters:
- accession: (str)
Analysis details:
MGnifier instance for resource: analysis
I.e., mgnipy.V2.proxies.AnalysisDetail
----------------------------------------
Base URL: https://www.ebi.ac.uk/
Parameters: {'accession': 'MGYA01021267'}
Endpoint module: mgnipy.emgapi_v2_client.api.analyses.get_mgnify_analysis
Example request URL: https://www.ebi.ac.uk/metagenomics/api/v2/analyses/MGYA01021267
Returns paginated results: False
Also we can get the AnalysisDetailβs for each in our Analyses list by iterating over it, it being e.g.study_analyses_list. (sync and async support)
Note: When interating over we do not need to manually execute
get()oraget(). Similar to indexing, this will automatically be completed at the calling of the element.
# init a dict to store details for all analyses
analysis_detail_dfs: dict[str, pd.DataFrame] = {}
# get the details as dfs
async for analysis in study_analyses_list:
analysis_detail_dfs[analysis.identifier] = analysis.to_df()
# concat into one df
df_analysis_details = pd.concat(analysis_detail_dfs, ignore_index=True)
# check it out
df_analysis_details.head()
| experiment_type | study_accession | accession | run | sample | assembly | pipeline_version | read_run | quality_control_summary | downloads | results_dir | metadata | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Amplicon | MGYS00010442 | MGYA01021267 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156379', 'ena_accessions'... | None | V6 | [{'experiment_type': 'Amplicon', 'instrument_m... | {'sequencing': 'paired end (251 cycles + 251 c... | [{'file_type': 'html', 'download_type': 'Quali... | https://ftp.ebi.ac.uk/pub/databases/metagenomi... | {'marker_gene_summary': {'asv': {'amplified_re... |
| 1 | Amplicon | MGYS00010442 | MGYA01021246 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156386', 'ena_accessions'... | None | V6 | [{'experiment_type': 'Amplicon', 'instrument_m... | {'sequencing': 'paired end (251 cycles + 251 c... | [{'file_type': 'html', 'download_type': 'Quali... | https://ftp.ebi.ac.uk/pub/databases/metagenomi... | {'marker_gene_summary': {'asv': {}, 'closed_re... |
| 2 | Amplicon | MGYS00010442 | MGYA01021249 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156340', 'ena_accessions'... | None | V6 | [{'experiment_type': 'Amplicon', 'instrument_m... | {'sequencing': 'paired end (251 cycles + 251 c... | [{'file_type': 'html', 'download_type': 'Quali... | https://ftp.ebi.ac.uk/pub/databases/metagenomi... | {'marker_gene_summary': {'asv': {'amplified_re... |
| 3 | Amplicon | MGYS00010442 | MGYA01021266 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156355', 'ena_accessions'... | None | V6 | [{'experiment_type': 'Amplicon', 'instrument_m... | {'sequencing': 'paired end (251 cycles + 251 c... | [{'file_type': 'html', 'download_type': 'Quali... | https://ftp.ebi.ac.uk/pub/databases/metagenomi... | {'marker_gene_summary': {'asv': {'amplified_re... |
| 4 | Amplicon | MGYS00010442 | MGYA01021237 | {'experiment_type': 'Amplicon', 'instrument_mo... | {'accession': 'SAMEA8156360', 'ena_accessions'... | None | V6 | [{'experiment_type': 'Amplicon', 'instrument_m... | {'sequencing': 'paired end (251 cycles + 251 c... | [{'file_type': 'html', 'download_type': 'Quali... | https://ftp.ebi.ac.uk/pub/databases/metagenomi... | {'marker_gene_summary': {'asv': {'amplified_re... |
Alternatively, MGnifyList proxxies (e.g., studies, analyses, samples etc. plural) also have option to collect_details() or .acollect_details() which will return a list or dict of the associated MGnifyDetail proxies (e.g., study, analysis)
# or if you want to keep them as objects, you can do
analysis_details: dict[str, mgnipy.V2.proxies.AnalysisDetail] = (
await study_analyses_list.acollect_details(
fetch=True, by_id=True # not lazily # else as a list
)
)
# for example for further relaionship traversal
# annotations = analysis_details['MGYA01021267'].annotations
Processing: 0%| | 0/63 [00:00<?, ?it/s]
Processing: 2%|β | 1/63 [00:00<00:42, 1.45it/s]
Processing: 10%|β | 6/63 [00:01<00:11, 4.76it/s]
Processing: 17%|ββ | 11/63 [00:02<00:08, 5.93it/s]
Processing: 25%|βββ | 16/63 [00:02<00:07, 6.58it/s]
Processing: 33%|ββββ | 21/63 [00:03<00:05, 7.04it/s]
Processing: 40%|ββββ | 25/63 [00:03<00:04, 9.39it/s]
Processing: 43%|βββββ | 27/63 [00:04<00:05, 7.02it/s]
Processing: 49%|βββββ | 31/63 [00:04<00:04, 7.12it/s]
Processing: 52%|ββββββ | 33/63 [00:04<00:03, 7.96it/s]
Processing: 57%|ββββββ | 36/63 [00:05<00:03, 7.20it/s]
Processing: 60%|ββββββ | 38/63 [00:05<00:03, 8.00it/s]
Processing: 65%|βββββββ | 41/63 [00:05<00:03, 6.99it/s]
Processing: 68%|βββββββ | 43/63 [00:06<00:02, 8.10it/s]
Processing: 73%|ββββββββ | 46/63 [00:06<00:02, 7.04it/s]
Processing: 76%|ββββββββ | 48/63 [00:06<00:01, 8.22it/s]
Processing: 81%|ββββββββ | 51/63 [00:07<00:01, 7.15it/s]
Processing: 84%|βββββββββ | 53/63 [00:07<00:01, 8.33it/s]
Processing: 89%|βββββββββ | 56/63 [00:07<00:00, 7.04it/s]
Processing: 92%|ββββββββββ| 58/63 [00:08<00:00, 8.34it/s]
Processing: 97%|ββββββββββ| 61/63 [00:08<00:00, 7.12it/s]
Processing: 100%|ββββββββββ| 63/63 [00:08<00:00, 7.28it/s]
Next β#
We explore multiple relationship traversal: Getting all metadata from study > samples > runs > analyses