Downloading MGnify Study data#

The MGnify API provides access to study and analyses datasets for download. On this page we demonstrate how to:

  • Discover what datasets are available

  • Download the datasets

  • Stream or read in the datasets


# uncomment below if colab
#!pip install mgnipy

🎯 The Goal: Retrieve taxonomic datasets of tomato rhizosphere studies#

Let’s request tomato rhizosphere datasets and metadata from MGnify API.

Recall the typical workflow (from What is MGni.Py? ):

  1. Start up a mgnipy.MGnipy client with your desired configuration

  2. Search in MGnify resources using a MGnifier glass

  3. Receive a MGazine of MGnify datasets

which we will follow in this notebook

1. and 2. mgnipy.MGnipy().studies#

In the below cell we take care of

  • ✅ 1. set up of our MGnipy instance and

  • ✅ 2.a) preparing search for a list of tomato studies using the studies-specific MGnifier aka mgnipy.V2.proxies.studies.Studies

  • ✅ 2.b) populating list of studies

  • ✅ 2.c) retrieving details (i.e., ALL StudyDetails) for every study in the list

from mgnipy import MGnipy

# 1. init with default config
MG = MGnipy(
    #cache_dir="downloads"
)

# 2.a) setup studies mgnifier (build queries)
tomato_studies = MG.studies(
    biome_lineage="root:Host-associated:Plants:Rhizosphere", search="tomato"
)

# 2.b) execute the list query (get the study list)
tomato_studies.bulk_fetch()

# 2.c) get the study list (execute all detail queries)
tomato_studies.enrich_details()

# take a look at the studies details results as a pandas df
tomato_studies.to_df(expand_nested_dicts=True)

Hide code cell output

accession ena_accessions title updated_at biome__biome_name biome__lineage
0 MGYS00010296 [ERP166137, PRJEB82448] Microbiome-mediated tolerance of wild tomato t... 2026-05-28T15:46:48.615000+00:00 Rhizosphere root:Host-associated:Plants:Rhizosphere
1 MGYS00010297 [SRP333165, PRJNA755742] Tomato rhizosphere microbiome in the pot exper... 2026-05-28T15:46:48.649000+00:00 Rhizosphere root:Host-associated:Plants:Rhizosphere
2 MGYS00006231 [ERP139927, PRJEB55060] EMG produced TPA metagenomics assembly of PRJN... 2026-05-28T15:46:58.486000+00:00 Rhizosphere root:Host-associated:Plants:Rhizosphere
3 MGYS00006204 [ERP140102, PRJEB55219] EMG produced TPA metagenomics assembly of PRJN... 2026-05-28T15:47:01.715000+00:00 Rhizosphere root:Host-associated:Plants:Rhizosphere
4 MGYS00006205 [ERP140107, PRJEB55224] EMG produced TPA metagenomics assembly of PRJN... 2026-05-28T15:47:01.734000+00:00 Rhizosphere root:Host-associated:Plants:Rhizosphere
5 MGYS00006208 [ERP140115, PRJEB55232] EMG produced TPA metagenomics assembly of PRJN... 2026-05-28T15:47:01.747000+00:00 Rhizosphere root:Host-associated:Plants:Rhizosphere
6 MGYS00006230 [ERP139923, PRJEB55057] EMG produced TPA metagenomics assembly of PRJN... 2026-05-28T15:47:01.755000+00:00 Rhizosphere root:Host-associated:Plants:Rhizosphere
7 MGYS00010257 [SRP456588, PRJNA1004080] Combined effect of microplastics and fungicide... 2026-05-28T15:46:48.538000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
8 MGYS00010264 [PRJEB91717] Metagenome assembly of PRJNA1127303 data set (... 2026-05-28T15:46:48.743000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
9 MGYS00010251 [SRP338795, PRJNA766489] Culture-independent analysis of rhizosphere mi... 2026-05-28T15:46:48.756000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
10 MGYS00010245 [PRJEB82447, ERP166136] Unveiling diversity and adaptations of the wil... 2026-05-28T15:46:48.558000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
11 MGYS00010324 [PRJEB95772] Metagenome assembly of PRJNA777724 data set (T... 2026-05-28T15:46:48.576000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
12 MGYS00010258 [SRP517058, PRJNA1127303] Metagenomic data of rhizosphere soil during to... 2026-05-28T15:46:48.578000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
13 MGYS00010298 [PRJNA777724, SRP344777] Tomato heritable microbiome 2026-05-28T15:46:48.670000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
14 MGYS00010262 [PRJEB91684] Metagenome assembly of PRJNA1004080 data set (... 2026-05-28T15:46:48.688000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
15 MGYS00010250 [PRJNA755741, SRP333163] Tomato rhizosphere microbiome in the belowgrou... 2026-05-28T15:46:48.730000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil
16 MGYS00010253 [PRJNA789467, SRP351203] Disentangling the genetic basis of rhizosphere... 2026-05-28T15:46:48.818000+00:00 Soil root:Host-associated:Plants:Rhizosphere:Soil

3. Accessing the MGazine of datasets#

  • study details have a mgnipy.MGazine which allow us to download and interact with study-level datasets outputed from MGnify.

  • We can use mgnipy.MGazine to download the datasets onto disk or read them into our notebook.

  • To access the study’s mgazine use .datasets

  • the str representaiton of mgazine gives us a peak into the pipeline versions within, number of downloads and the short_description categories

# access study mgazine
MZ = tomato_studies.datasets

# print for more info
print(MZ)

# also can view more as df
MZ.downloads_df()

Hide code cell output

MGazine containing:
- MGnify pipeline versions: ['v5']
- Number of downloads: 35
- Short descriptions: ['Complete GO annotation',
 'GO slim annotation',
 'InterPro matches',
 'Phylum level taxonomies LSU',
 'Phylum level taxonomies SSU',
 'Taxonomic assignments LSU',
 'Taxonomic assignments SSU']
file_type download_type short_description long_description alias download_group file_size_bytes index_files url accession pipeline_version
0 tsv Taxonomic analysis Phylum level taxonomies SSU Phylum level taxonomies SSU (TSV) ERP139927_phylum_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006231 v5
1 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP139927_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006231 v5
2 tsv Taxonomic analysis Phylum level taxonomies LSU Phylum level taxonomies LSU (TSV) ERP139927_phylum_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006231 v5
3 tsv Taxonomic analysis Taxonomic assignments LSU Taxonomic assignments LSU (TSV) ERP139927_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006231 v5
4 tsv Functional analysis InterPro matches InterPro matches (TSV) ERP139927_IPR_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006231 v5
5 tsv Functional analysis GO slim annotation GO slim annotation ERP139927_GO-slim_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006231 v5
6 tsv Functional analysis Complete GO annotation Complete GO annotation ERP139927_GO_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006231 v5
7 tsv Taxonomic analysis Phylum level taxonomies SSU Phylum level taxonomies SSU (TSV) ERP140102_phylum_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006204 v5
8 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP140102_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006204 v5
9 tsv Taxonomic analysis Phylum level taxonomies LSU Phylum level taxonomies LSU (TSV) ERP140102_phylum_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006204 v5
10 tsv Taxonomic analysis Taxonomic assignments LSU Taxonomic assignments LSU (TSV) ERP140102_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006204 v5
11 tsv Functional analysis InterPro matches InterPro matches (TSV) ERP140102_IPR_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006204 v5
12 tsv Functional analysis GO slim annotation GO slim annotation ERP140102_GO-slim_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006204 v5
13 tsv Functional analysis Complete GO annotation Complete GO annotation ERP140102_GO_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006204 v5
14 tsv Taxonomic analysis Phylum level taxonomies SSU Phylum level taxonomies SSU (TSV) ERP140107_phylum_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006205 v5
15 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP140107_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006205 v5
16 tsv Taxonomic analysis Phylum level taxonomies LSU Phylum level taxonomies LSU (TSV) ERP140107_phylum_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006205 v5
17 tsv Taxonomic analysis Taxonomic assignments LSU Taxonomic assignments LSU (TSV) ERP140107_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006205 v5
18 tsv Functional analysis InterPro matches InterPro matches (TSV) ERP140107_IPR_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006205 v5
19 tsv Functional analysis GO slim annotation GO slim annotation ERP140107_GO-slim_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006205 v5
20 tsv Functional analysis Complete GO annotation Complete GO annotation ERP140107_GO_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006205 v5
21 tsv Taxonomic analysis Phylum level taxonomies SSU Phylum level taxonomies SSU (TSV) ERP140115_phylum_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006208 v5
22 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP140115_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006208 v5
23 tsv Taxonomic analysis Phylum level taxonomies LSU Phylum level taxonomies LSU (TSV) ERP140115_phylum_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006208 v5
24 tsv Taxonomic analysis Taxonomic assignments LSU Taxonomic assignments LSU (TSV) ERP140115_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006208 v5
25 tsv Functional analysis InterPro matches InterPro matches (TSV) ERP140115_IPR_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006208 v5
26 tsv Functional analysis GO slim annotation GO slim annotation ERP140115_GO-slim_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006208 v5
27 tsv Functional analysis Complete GO annotation Complete GO annotation ERP140115_GO_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006208 v5
28 tsv Taxonomic analysis Phylum level taxonomies SSU Phylum level taxonomies SSU (TSV) ERP139923_phylum_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006230 v5
29 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP139923_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006230 v5
30 tsv Taxonomic analysis Phylum level taxonomies LSU Phylum level taxonomies LSU (TSV) ERP139923_phylum_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006230 v5
31 tsv Taxonomic analysis Taxonomic assignments LSU Taxonomic assignments LSU (TSV) ERP139923_taxonomy_abundances_LSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_lsu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006230 v5
32 tsv Functional analysis InterPro matches InterPro matches (TSV) ERP139923_IPR_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006230 v5
33 tsv Functional analysis GO slim annotation GO slim annotation ERP139923_GO-slim_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006230 v5
34 tsv Functional analysis Complete GO annotation Complete GO annotation ERP139923_GO_abundances_v5.0.tsv study_summary.v5.0.functional_analysis None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006230 v5

You can filter by short descriptioins by passing them as you would an index into square brackets i..e, getitem

# we want the taxonomic assignments
ssu = MZ["Taxonomic assignments SSU"]

# checking out what it is
print(type(ssu))
print(ssu)

# downloads_df again
ssu.downloads_df()

Hide code cell output

MGazine Curation TaxaMGazine containing:
- MGnify pipeline versions: ['v5']
- Number of downloads: 5
- Short descriptions: ['Taxonomic assignments SSU']
-----------------------
Next steps: Use `.load()` to initialize.

<class 'mgnipy.V2.datasets.taxonomic.TaxaMGazine'>
MGazine Curation TaxaMGazine containing:
- MGnify pipeline versions: ['v5']
- Number of downloads: 5
- Short descriptions: ['Taxonomic assignments SSU']
file_type download_type short_description long_description alias download_group file_size_bytes index_files url accession pipeline_version
0 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP139927_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006231 v5
1 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP140102_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006204 v5
2 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP140107_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006205 v5
3 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP140115_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006208 v5
4 tsv Taxonomic analysis Taxonomic assignments SSU Taxonomic assignments SSU (TSV) ERP139923_taxonomy_abundances_SSU_v5.0.tsv study_summary.v5.0.taxonomic_analysis_ssu_rrna None None https://ftp.ebi.ac.uk/pub/databases/metagenomi... MGYS00006230 v5
  • for more options over the filtering of mgazines see the MGazine informtion page .

  • additionally the info page delves into how to downloads the files

We will carry on with our filtered TaxaMGazine given our goal for now. There are analysis type-specific mgazines, such as this TaxaMGazine

for example, we can also combine the taxonomic assignment results into one dataframe e.g. .to_pandas(), .to_polars, .X()

# first loading
ssu.load()

ssu.to_pandas().head()

Hide code cell output

TaxaMGazine loaded with 5 datasets. 
Cached runs results: 0 of total 149.
taxonomy ERZ12343720 ERZ12343730 ERZ12343740 ERZ12343750 ERZ12343760 ERZ12343770 ERZ12343780 ERZ12343721 ERZ12343731 ... ERZ12633581 ERZ12633572 ERZ12633582 ERZ12633573 ERZ12633574 ERZ12633594 ERZ12633576 ERZ12633579 ERZ12590661 ERZ12590669
0 sk__Archaea 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN 0.0 2.0
1 sk__Archaea;k__;p__Crenarchaeota 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 sk__Archaea;k__;p__Crenarchaeota;c__Thermoprot... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 sk__Archaea;k__;p__Thaumarchaeota;c__Nitrososp... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 sk__Archaea;k__;p__Thaumarchaeota;c__Nitrososp... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 3.0 3.0

5 rows × 150 columns

There is also option to enrich with additional metadata!

  1. From already retrieved MGnifier results you can set to runs_results, samples_results, studies_results etc, or

  2. use .enrich_runs() etc or .enrich_biosamples which will make the get requests for the additional metadata

ssu.enrich_runs(
    limit=200#default
)

Hide code cell output

ssu.to_anndata()
AnnData object with n_obs × n_vars = 558 × 149
    obs: 'Superkingdom', 'Kingdom', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species'
    var: 'updated_at', 'run_accession', 'sample_accession', 'reads_study_accession', 'assembly_study_accession', 'assembler_name', 'assembler_version', 'status'
ssu.clear_cache()

Wrap Up:#

This page was a quick start demonstration of:

  1. ✅ Start up a mgnipy.MGnipy client with your desired configuration

  2. ✅ Search in MGnify resources using a MGnifier glass

  3. ✅ Receive a MGazine of MGnify datasets