π Explore MGnify Biomes π#
Introduction#
The GOLD ecosystem classifications organize environmental samples into a hierarchical taxonomy of biome typesβfrom broad categories like βEngineeredβ to specific environments like βPlant rhizosphere.β
This demo will show you how to:
Query biomes β Discover available biome classifications and explore the hierarchy
Preview before fetching β Use filtering and preview methods to confirm your query before retrieving full results
Access results flexibly β Retrieve biome data as lists, DataFrames, or hierarchical trees
Navigate relationships β Follow links between biomes and associated studies
By the end, we hope youβll be comfortable querying the MGnify biome resource to find relevant studies.
# uncomment below if colab
#!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple mgnipy
#!pip install asyncio
We can initiate using mgnipy.MGnipy or proxies.Biomes
The start: Preparing queries#
Option 1. mgnipy.MGnipy#
The MGnipy client offers a unified interface to access various MGnify API endpoints, including biomes. This approach is convenient if you want to manage multiple types of queries or resources through a single client object.
Instantiate
MGnipyto configure your API access and manage requests.Use
.biomesto create a biome query with your desired parameters.Use
list_parameters()to see all available filters and options.The
filter()method allows you to refine your query further.The
explain()method previews the constructed API URLs and the first few results.
This method has an additional helper function to list and describe available resources
from mgnipy import MGnipy
# init
mg = MGnipy(
# configuration
)
# access proxy
biomes = mg.biomes
print("Initial url: ", biomes.request_url)
Initial url: https://www.ebi.ac.uk/metagenomics/api/v2/biomes?page=1
mg.describe_resources("biomes")
List all biomes
List all biomes in the MGnify database.
Supported parameters:
- biome_lineage: (None | str | Unset) The lineage to match, including all descendant biomes
- max_depth: (int | None | Unset) Maximum depth of the biome lineage to include, e.g. `root` is 1 and `root:Host-Associated:Human` is level 3
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
If you would like to know what params are supported for the endpoint there is a helper method you can use: .list_supported_params()
# if not sure what kwargs suupported
print("Supported kwargs for biomes: ", biomes.list_supported_params())
Supported kwargs for biomes: ['biome_lineage', 'max_depth', 'page', 'page_size']
also like describe_resources() there is a describe_endpoint()
biomes.describe_endpoint(as_dict=True)
{'title': 'List all biomes',
'description': 'List all biomes in the MGnify database.',
'args': {'biome_lineage': '(None | str | Unset) The lineage to match, including all descendant biomes',
'max_depth': '(int | None | Unset) Maximum depth of the biome lineage to include, e.g. `root` is 1 and `root:Host-Associated:Human` is level 3',
'page': '(int | Unset) Default: 1.',
'page_size': '(int | None | Unset)'},
'raises': 'errors.UnexpectedStatus: If the server returns an undocumented status code and Client.raise_on_unexpected_status is True.\n httpx.TimeoutException: If the request takes longer than Client.timeout.',
'returns': 'NinjaPaginationResponseSchemaBiome',
'examples': '',
'notes': '',
'schema_link': 'https://www.ebi.ac.uk/metagenomics/api/v2/openapi.json'}
and then can pass as kwargs to .filter()
biomes = biomes.filter(
page_size=15,
max_depth=6,
)
print("Filtered url: ", biomes.request_url)
Filtered url: https://www.ebi.ac.uk/metagenomics/api/v2/biomes?max_depth=6&page=1&page_size=15
Option 2. Proxies#
The Biomes proxy provides a direct way to query biome information from the MGnify API. You can customize your query using various parameters such as page_size and max_depth to control the number of results and the depth of the biome hierarchy. You can use the same filtering and previewing methods as with the proxy, such as filter(), list_parameters(), and explain().
from mgnipy.V2.proxies import Biomes
biomes = Biomes(
page_size=50,
)
print("Init url: ", biomes.request_url)
# if not sure what kwargs suupported
print("Supported kwargs for biomes: ", biomes.list_supported_params())
# and then
biomes = biomes.filter(
page_size=15,
max_depth=6,
)
print("Filtered url: ", biomes.request_url)
Init url: https://www.ebi.ac.uk/metagenomics/api/v2/biomes?page=1&page_size=50
Supported kwargs for biomes: ['biome_lineage', 'max_depth', 'page', 'page_size']
Filtered url: https://www.ebi.ac.uk/metagenomics/api/v2/biomes?max_depth=6&page=1&page_size=15
Previewing your requests#
There is an optional intermediary step to
.preview()the first page of results, or.dry_run()to print the number of pages and records to request.explain()to print the planned request urls before.get()ting all the result pages.
biomes.explain(head=5)
# or
# biomes.dry_run()
# or
# biomes.preview()
Planning the API call with params:
{'page_size': 15, 'max_depth': 6}
Total pages to retrieve: 33
Total records to retrieve: 492
https://www.ebi.ac.uk/metagenomics/api/v2/biomes?max_depth=6&page=1&page_size=15
https://www.ebi.ac.uk/metagenomics/api/v2/biomes?max_depth=6&page=2&page_size=15
https://www.ebi.ac.uk/metagenomics/api/v2/biomes?max_depth=6&page=3&page_size=15
https://www.ebi.ac.uk/metagenomics/api/v2/biomes?max_depth=6&page=4&page_size=15
https://www.ebi.ac.uk/metagenomics/api/v2/biomes?max_depth=6&page=5&page_size=15
Carry out requests#
If happy with the plan, proceed with the async or sync get requests.
# asynchronously get the data
await biomes.aget()
Retrieving pages: 0%| | 0/33 [00:00<?, ?it/s]
Retrieving pages: 3%|β | 1/33 [00:00<00:31, 1.01it/s]
Retrieving pages: 18%|ββ | 6/33 [00:01<00:04, 6.67it/s]
Retrieving pages: 33%|ββββ | 11/33 [00:01<00:01, 11.52it/s]
Retrieving pages: 48%|βββββ | 16/33 [00:01<00:01, 15.96it/s]
Retrieving pages: 64%|βββββββ | 21/33 [00:01<00:00, 19.60it/s]
Retrieving pages: 79%|ββββββββ | 26/33 [00:01<00:00, 22.50it/s]
Retrieving pages: 94%|ββββββββββ| 31/33 [00:01<00:00, 24.65it/s]
Retrieving pages: 100%|ββββββββββ| 33/33 [00:02<00:00, 16.18it/s]
Exploring the results#
Different means to access the retireved results
as a list#
biomes.to_list()[:5]
[{'biome_name': 'Algoconsortia',
'lineage': 'root:Engineered:Lab enrichment:Defined media:Marine media:Algoconsortia'},
{'biome_name': 'Undefined media',
'lineage': 'root:Engineered:Lab enrichment:Undefined media'},
{'biome_name': 'Lab Synthesis', 'lineage': 'root:Engineered:Lab Synthesis'},
{'biome_name': 'Genetic cross',
'lineage': 'root:Engineered:Lab Synthesis:Genetic cross'},
{'biome_name': 'Modeled', 'lineage': 'root:Engineered:Modeled'}]
as a pandas dataframe#
biomes.to_df().head()
| biome_name | biome_lineage | |
|---|---|---|
| 0 | Algoconsortia | root:Engineered:Lab enrichment:Defined media:M... |
| 1 | Undefined media | root:Engineered:Lab enrichment:Undefined media |
| 2 | Lab Synthesis | root:Engineered:Lab Synthesis |
| 3 | Genetic cross | root:Engineered:Lab Synthesis:Genetic cross |
| 4 | Modeled | root:Engineered:Modeled |
as a dictionary#
where each key is a page
# look at first 5 records of page 1
biomes.results[1][:5]
[{'biome_name': 'root', 'lineage': 'root'},
{'biome_name': 'Control', 'lineage': 'root:Control'},
{'biome_name': 'Engineered', 'lineage': 'root:Engineered'},
{'biome_name': 'Biogas plant', 'lineage': 'root:Engineered:Biogas plant'},
{'biome_name': 'Wet fermentation',
'lineage': 'root:Engineered:Biogas plant:Wet fermentation'}]
Specific to the biomes, results can also be visualized as a tree βprintβ βhshowβ or βvshowβ
biomes.show_tree()
Extra: Finding studies for a given biome#
# getting the biome_detail for a specific biome
a_biome = biomes["root:Engineered:Biogas plant:Wet fermentation"]
# what relationships can we traverse from biome detail?
a_biome.list_relationships()
Retrieving pages: 0%| | 0/1 [00:00<?, ?it/s]
Retrieving pages: 100%|ββββββββββ| 1/1 [00:00<00:00, 16131.94it/s]
['studies']
# lazily access the studies list related to this biome (basically prepping query)
their_studies_list = a_biome.studies
# preview the requests that will be made to get the studies list
their_studies_list.explain()
# asynchronously get the studies list
await their_studies_list.aget()
# look at results
their_studies_list.to_df().head()
Planning the API call with params:
{'biome_lineage': 'root:Engineered:Biogas plant:Wet fermentation'}
Total pages to retrieve: 1
Total records to retrieve: 12
https://www.ebi.ac.uk/metagenomics/api/v2/studies?biome_lineage=root%3AEngineered%3ABiogas+plant%3AWet+fermentation&page=1
Retrieving pages: 0%| | 0/1 [00:00<?, ?it/s]
Retrieving pages: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.71it/s]
Retrieving pages: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.71it/s]
| accession | ena_accessions | title | biome | updated_at | |
|---|---|---|---|---|---|
| 0 | MGYS00000364 | [ERP005249, PRJEB5813] | Metagenome sequencing of biogas plant operatin... | {'biome_name': 'Wet fermentation', 'lineage': ... | 2026-04-27T16:48:24.092000+00:00 |
| 1 | MGYS00001584 | [ERP008939, PRJEB7938] | Functional redundant and similar microbial com... | {'biome_name': 'Wet fermentation', 'lineage': ... | 2026-04-28T05:52:47.418000+00:00 |
| 2 | MGYS00001776 | [ERP023030, PRJEB20841] | Extraction and sequencing methodology affects ... | {'biome_name': 'Wet fermentation', 'lineage': ... | 2026-04-28T07:55:07.516000+00:00 |
| 3 | MGYS00001781 | [ERP023045, PRJEB20855] | Whole shotgun metagenome sequencing of AD micr... | {'biome_name': 'Wet fermentation', 'lineage': ... | 2026-04-28T07:56:52.466000+00:00 |
| 4 | MGYS00001815 | [SRP027584, PRJNA212723] | Substrate variations triggered the emergent of... | {'biome_name': 'Wet fermentation', 'lineage': ... | 2026-04-28T08:15:54.407000+00:00 |