MGni.py#

MGni.py (‘mæɡ-ni-paɪ’) is a lightweight python client and toolkit for the MGnify API .

PyPI cicd.yml GitHub Pages docs Python 3.11 to 3.13
GitHub issues GitHub license GitHub last commit GitHub stars

mgnipy schematic

Contents#

Features#

  • FAIR: More findable MGnify analyses and metadata, returned in familiar metagenomics data formats (e.g., GFF, Darwin Core , Dataframes[pandas , polars , anndata ])

  • Simplifies API interactions: Let MGni.Py handle the complexity of building, executing, and parsing API calls so you can focus on the data!

  • Fast: MGni.Py uses caching to speed up API expolation, as well as supports both sync and async API calls

Available API Endpoints #

  • Studies: MGnify studies are based on ENA studies/projects, and are collections of samples, runs, assemblies, and analyses associated with a certain set of experiments.

  • Samples: MGnify samples are based on ENA/BioSamples samples, and represent individual biological samples.

  • Runs: Sequencing runs (ENA run accessions; individual sequencing runs of a sample).

  • Assemblies: Metagenome assemblies (equivalent to ENA assemblies for one or more runs).

  • Analyses: MGnify analyses are runs of a standard pipeline on an individual sequencing run or assembly. They can include collections of taxonomic and functional annotations.

  • Publications: Publications (e.g. journal articles) may describe or analyse the content of MGnify Studies or their corresponding datasets in ENA.

  • Genomes: MGnify Genomes are annotated draft genomes based on either isolates, or metagenome-assembled genomes (MAGs). They are arranged in biome-specific catalogues.

  • Biomes: The hierarchical GOLD ecosystem classifications biomes represented in MGnify.

Note: Private Data#

  • To access your private data in any of these API endpoints you just need your MGnify user and password to obtain a valid sliding auth token via the MGnify Authentication endpoints .

  • for example you can put your login credentials in a .env file in your working directory (see .env.example ) and

  • mgnipy.MGnipyConfig takes care of getting and caching the auth token so that you can easily access your private data using MGni.py 🎉

  • for example you can put your login credentials in a .env file in your working directory (see .env.example ) and

  • mgnipy.MGnipyConfig takes care of getting and caching the auth token so that you can easily access your private data using MGni.py 🎉

Installation#

From PyPI#

From PyPI#

pip install mgnipy

Development installation#

git clone https://github.com/EBI-Metagenomics/mgnipy.git
cd mgnipy
uv sync --all-groups  # or: pip install -e ".[dev,docs]"

Quick Start#

🚀 1. Initialize mgnipy.MGnipy#

from mgnipy import MGnipy

# Create the main client, with default configuration
mg = MGnipy()

# See available endpoints
mg.list_resources()

🔎 2. Search resources with a mgnipy.MGnifier#

Building the query set#

# Search for studies keyword
studies = mg.studies(
    search="disease"
)

# Can preview requests before fetching
studies.explain()

Executing the queries#

# get page by page via .get(), getting 3 pages
for _ in range(3)
    studies.get()

# or via .page(), getting another 3 pages
for i in range(4,7):
    studies.page(i)

# OR potentially all at once in large batches (also async option .abulk_fetch())
studies.bulk_fetch()

# then can enrich with detailed metadata
studies.enrich_details()

Viewing the metadata#

# as pandas
pd_metadata = studies.to_df()

# As polars DataFrame
pl_metadata = studies.to_polars()
pl_metadata = studies.to_polars()

# as json
json_metadata = studies.to_json()

# with all details
detailed_metadata = studies.details_df()

🗃️ 3. Explore a mgnipy.MGzine of datasets#

# accessing the mgazine of datasets
mgazine = studies.datasets

# preview
print(mgazine)

Downloading the data#

# download file by file 
mgazine.download(to_dir="downloads_folder", alias="mgnify_file_alias.fasta.gz")

# or download all 
mgazine.download_all(to_dir="downloads_folder")

Reading in the data#

# support for tsv, csv, txt, jsonl
taxa_table = mgazine.stream(alias="mgnify_file_alias.tsv", df_engine="polars")

# support for fasta, gff, biom via skbio
skbio_fasta = mgazine.stream(alias="mgnify_file_alias.fasta.gz")

Additional Documentation#

Development#

see Contributing.md

License#

TODO

Citation#

TODO

Thank you#

A list of people who have contributed to this repository. Please add your name and github or email if you’d like.

From the Microbiome Informatics team @ The European Bioinformatics Institute (EMBL-EBI):

  • Angel L. P. (angelphanth) - visiting PhD student

  • Mahfouz Shehu (MGS-sails) - Mgnify Website Developer

  • Christian Atallah (chrisAta) - Bioinformatician Mgnify

  • Sandy Rogers (SandyRogers) - MGnify Web and Platform Project Leader

  • Martin Beracochea (mberacochea) - MGnify Production Project Leader

  • Robert Finn (rdf [at] ebi.ac.uk ) - Section Head, Team Leader and Senior Scientist

From the Multiomics Network Analytics team @ Danmarks Tekniske Universitet (DTU):

  • Angel L. P. (angelphanth) - PhD student

  • Maria Barranco - Postdoc

  • Sebastián Ayala Ruano (sayalaruano) - Previous MSc student and Research Assistant

  • Alberto Santos Delgado (albsantosdel) - Senior Researcher and BRIGHT Informatics Platform Director

Extra special thanks to:

  • Prof. Rob Finn and everyone in the Microbiome Informatics team and Finn Group at EMBL-EBI for their mentorship

  • My Prof. Alberto Santos for making this colab happen and also constant guidance and patience

  • The Informatics Platform at BRIGHT at DTU for their mentorship

  • The creators of this python_package template and especially Henry Webel (enryH)

  • The creators and maintainers of openapi-python-client