MGni.py#
MGni.py (‘mæɡ-ni-paɪ’) is a lightweight python client and toolkit for the MGnify API .

Contents#
Features#
FAIR: More findable MGnify analyses and metadata, returned in familiar metagenomics data formats (e.g., GFF, Darwin Core , Dataframes[pandas , polars , anndata ])
Simplifies API interactions: Let MGni.Py handle the complexity of building, executing, and parsing API calls so you can focus on the data!
Fast: MGni.Py uses caching to speed up API expolation, as well as supports both sync and async API calls
Available API Endpoints #
Studies: MGnify studies are based on ENA studies/projects, and are collections of samples, runs, assemblies, and analyses associated with a certain set of experiments.
Samples: MGnify samples are based on ENA/BioSamples samples, and represent individual biological samples.
Runs: Sequencing runs (ENA run accessions; individual sequencing runs of a sample).
Assemblies: Metagenome assemblies (equivalent to ENA assemblies for one or more runs).
Analyses: MGnify analyses are runs of a standard pipeline on an individual sequencing run or assembly. They can include collections of taxonomic and functional annotations.
Publications: Publications (e.g. journal articles) may describe or analyse the content of MGnify Studies or their corresponding datasets in ENA.
Genomes: MGnify Genomes are annotated draft genomes based on either isolates, or metagenome-assembled genomes (MAGs). They are arranged in biome-specific catalogues.
Biomes: The hierarchical GOLD ecosystem classifications biomes represented in MGnify.
Note: Private Data#
To access your private data in any of these API endpoints you just need your MGnify user and password to obtain a valid sliding auth token via the MGnify Authentication endpoints .
for example you can put your login credentials in a
.envfile in your working directory (see .env.example ) andmgnipy.MGnipyConfigtakes care of getting and caching the auth token so that you can easily access your private data using MGni.py 🎉for example you can put your login credentials in a
.envfile in your working directory (see .env.example ) andmgnipy.MGnipyConfigtakes care of getting and caching the auth token so that you can easily access your private data using MGni.py 🎉
Installation#
From PyPI#
From PyPI#
pip install mgnipy
Development installation#
git clone https://github.com/EBI-Metagenomics/mgnipy.git
cd mgnipy
uv sync --all-groups # or: pip install -e ".[dev,docs]"
Quick Start#
🚀 1. Initialize mgnipy.MGnipy#
from mgnipy import MGnipy
# Create the main client, with default configuration
mg = MGnipy()
# See available endpoints
mg.list_resources()
🔎 2. Search resources with a mgnipy.MGnifier#
Building the query set#
# Search for studies keyword
studies = mg.studies(
search="disease"
)
# Can preview requests before fetching
studies.explain()
Executing the queries#
# get page by page via .get(), getting 3 pages
for _ in range(3)
studies.get()
# or via .page(), getting another 3 pages
for i in range(4,7):
studies.page(i)
# OR potentially all at once in large batches (also async option .abulk_fetch())
studies.bulk_fetch()
# then can enrich with detailed metadata
studies.enrich_details()
Viewing the metadata#
# as pandas
pd_metadata = studies.to_df()
# As polars DataFrame
pl_metadata = studies.to_polars()
pl_metadata = studies.to_polars()
# as json
json_metadata = studies.to_json()
# with all details
detailed_metadata = studies.details_df()
🗃️ 3. Explore a mgnipy.MGzine of datasets#
# accessing the mgazine of datasets
mgazine = studies.datasets
# preview
print(mgazine)
Downloading the data#
# download file by file
mgazine.download(to_dir="downloads_folder", alias="mgnify_file_alias.fasta.gz")
# or download all
mgazine.download_all(to_dir="downloads_folder")
Reading in the data#
# support for tsv, csv, txt, jsonl
taxa_table = mgazine.stream(alias="mgnify_file_alias.tsv", df_engine="polars")
# support for fasta, gff, biom via skbio
skbio_fasta = mgazine.stream(alias="mgnify_file_alias.fasta.gz")
Additional Documentation#
Development#
see Contributing.md
License#
TODO
Citation#
TODO
Thank you#
A list of people who have contributed to this repository. Please add your name and github or email if you’d like.
From the Microbiome Informatics team @ The European Bioinformatics Institute (EMBL-EBI):
Angel L. P. (angelphanth) - visiting PhD student
Mahfouz Shehu (MGS-sails) - Mgnify Website Developer
Christian Atallah (chrisAta) - Bioinformatician Mgnify
Sandy Rogers (SandyRogers) - MGnify Web and Platform Project Leader
Martin Beracochea (mberacochea) - MGnify Production Project Leader
Robert Finn (rdf [at] ebi.ac.uk ) - Section Head, Team Leader and Senior Scientist
From the Multiomics Network Analytics team @ Danmarks Tekniske Universitet (DTU):
Angel L. P. (angelphanth) - PhD student
Maria Barranco - Postdoc
Sebastián Ayala Ruano (sayalaruano) - Previous MSc student and Research Assistant
Alberto Santos Delgado (albsantosdel) - Senior Researcher and BRIGHT Informatics Platform Director
Extra special thanks to:
Prof. Rob Finn and everyone in the Microbiome Informatics team and Finn Group at EMBL-EBI for their mentorship
My Prof. Alberto Santos for making this colab happen and also constant guidance and patience
The Informatics Platform at BRIGHT at DTU for their mentorship
The creators of this
python_packagetemplate and especially Henry Webel (enryH)The creators and maintainers of openapi-python-client