MGnipy β€” Capabilities Demo#

This notebook demonstrates everything the library can do right now, and explicitly marks what is broken or not yet implemented. It is organized as a live tour before the PyPI draft publish.

Sections:

  1. Setup

  2. MGnifier β€” direct low-level API (works)

  3. Output formatters: to_df, to_polars, to_json, to_list (works)

  4. Query planning: dry_run, preview, explain (works)

  5. Immutable filter cloning (works)

  6. Pagination: page(n), get(limit) (works)

  7. MGnipy facade + proxy classes (partially works)

  8. Async: aget, apage (works)

  9. ❌ What is currently broken

  10. πŸ—ΊοΈ What comes next


1. Setup#

# Verify installation
import mgnipy
print(f"mgnipy version: {mgnipy.__version__}")
mgnipy version: 0.0.1.dev0+g73d819d64.d20260407
from mgnipy.V2.core import MGnifier
from mgnipy.V2.query_set import QuerySet
from mgnipy import MGnipy
print("All imports OK")
All imports OK

2. MGnifier β€” direct low-level API#

MGnifier is the core query object. It wraps QuerySet (which does query building) and delegates HTTP calls to QueryExecutor. You can use it directly or through the higher-level MGnipy facade.

# Create a query for studies β€” no API call yet
mg = MGnifier(resource="studies", params={"page_size": 5})
print(mg)
MGnifier instance for resource: studies
I.e., mgnipy.V2.core.MGnifier
----------------------------------------
Base URL: https://www.ebi.ac.uk/
Parameters: {'page_size': 5}
Endpoint module: mgnipy.emgapi_v2_client.api.studies.list_mgnify_studies
Example request URL: https://www.ebi.ac.uk/metagenomics/api/v2/studies?page=1&page_size=5
Returns paginated results: True
# Fetch just the first page β€” one API call
mg.first()
print(f"Pages fetched so far: {list(mg._results.keys())}")
Planning the API call with params:
{'page_size': 5}
Total pages to retrieve: 1110
Total records to retrieve: 5546
Pages fetched so far: [1]

3. Output formatters#

All four formatters work once data is in _results.

# pandas DataFrame
df = mg.to_df()
print(f"to_df() β†’ {type(df).__name__}, shape: {df.shape}")
df.head()
to_df() β†’ DataFrame, shape: (5, 5)
accession ena_accessions title biome updated_at
0 MGYS00000653 [DRP000157, PRJDA46243] Metatranscriptomic Analysis for Eukaryotic Fun... {'biome_name': 'Soil', 'lineage': 'root:Enviro... 2025-01-27T15:22:29.059000+00:00
1 MGYS00001632 [DRP000423, PRJDA68519] The Usefulness and Reproducibility of Pyrosequ... {'biome_name': 'Bioreactor', 'lineage': 'root:... 2026-04-27T12:08:49.939000+00:00
2 MGYS00001846 [DRP000450, PRJDA72133] food metagenome Metagenome {'biome_name': 'Fermented seafood', 'lineage':... 2025-01-27T15:22:38.826000+00:00
3 MGYS00001633 [DRP000451, PRJDA67149] microbial community of traditional Korean alco... {'biome_name': 'Fermented beverages', 'lineage... 2025-01-27T15:22:37.053000+00:00
4 MGYS00000624 [DRP000487, PRJDA73169] Metagenomic analysis of soil microorganisms {'biome_name': 'Soil', 'lineage': 'root:Enviro... 2025-01-27T15:22:28.785000+00:00
# Polars DataFrame
pl_df = mg.to_polars()
print(f"to_polars() β†’ {type(pl_df).__name__}, shape: {pl_df.shape}")
pl_df.head()
to_polars() β†’ DataFrame, shape: (5, 5)
shape: (5, 5)
accessionena_accessionstitlebiomeupdated_at
strlist[str]strstruct[2]str
"MGYS00000653"["DRP000157", "PRJDA46243"]"Metatranscriptomic Analysis fo…{"Soil","root:Environmental:Terrestrial:Soil"}"2025-01-27T15:22:29.059000+00:…
"MGYS00001632"["DRP000423", "PRJDA68519"]"The Usefulness and Reproducibi…{"Bioreactor","root:Engineered:Bioreactor"}"2026-04-27T12:08:49.939000+00:…
"MGYS00001846"["DRP000450", "PRJDA72133"]"food metagenome Metagenome"{"Fermented seafood","root:Engineered:Food production:Fermented seafood"}"2025-01-27T15:22:38.826000+00:…
"MGYS00001633"["DRP000451", "PRJDA67149"]"microbial community of traditi…{"Fermented beverages","root:Engineered:Food production:Fermented beverages"}"2025-01-27T15:22:37.053000+00:…
"MGYS00000624"["DRP000487", "PRJDA73169"]"Metagenomic analysis of soil m…{"Soil","root:Environmental:Terrestrial:Soil"}"2025-01-27T15:22:28.785000+00:…
# List of dicts
records = mg.to_list()
print(f"to_list() β†’ {type(records).__name__}, length: {len(records)}")
print(f"First record keys: {list(records[0].keys()) if records else 'none'}")
to_list() β†’ list, length: 5
First record keys: ['accession', 'ena_accessions', 'title', 'biome', 'updated_at']
# JSON string (newline-delimited by default)
json_str = mg.to_json()
print(f"to_json() β†’ {type(json_str).__name__}, {len(json_str)} chars")
print(json_str[:200], "...")
to_json() β†’ str, 1434 chars
{"accession":"MGYS00000653","ena_accessions":["DRP000157","PRJDA46243"],"title":"Metatranscriptomic Analysis for Eukaryotic Functional Genes in Forest Soil","biome":{"biome_name":"Soil","lineage":"roo ...

4. Query planning: dry_run, preview, explain#

Before fetching everything you can inspect the plan β€” how many records, how many pages, which URLs.

# dry_run: makes one small API call (page_size=1) to learn total count, then prints the plan
planner = MGnifier(resource="analyses", params={"page_size": 10})
planner.dry_run()
Planning the API call with params:
{'page_size': 10}
Total pages to retrieve: 1420
Total records to retrieve: 14198
# After dry_run, count and total_pages are populated
print(f"Total records: {planner.count}")
print(f"Total pages (at page_size=10): {planner.total_pages}")
Total records: 14198
Total pages (at page_size=10): 1420
# explain: print the first N request URLs without making them
planner.explain(head=3)
https://www.ebi.ac.uk/metagenomics/api/v2/analyses?page=1&page_size=10
https://www.ebi.ac.uk/metagenomics/api/v2/analyses?page=2&page_size=10
https://www.ebi.ac.uk/metagenomics/api/v2/analyses?page=3&page_size=10
# list_urls: returns the full URL list
urls = planner.list_urls()
print(f"{len(urls)} URLs total. First: {urls[0]}")
1420 URLs total. First: https://www.ebi.ac.uk/metagenomics/api/v2/analyses?page=1&page_size=10
# preview: fetches page 1 and returns a DataFrame immediately
preview_df = MGnifier(resource="samples", params={"page_size": 5}).preview()
print(f"preview() β†’ {type(preview_df).__name__}, shape: {preview_df.shape}")
preview_df.head()
Planning the API call with params:
{'page_size': 5}
Total pages to retrieve: 7060
Total records to retrieve: 35300
preview() β†’ DataFrame, shape: (5, 5)
accession ena_accessions sample_title biome updated_at
0 SAMEA113539431 [SAMEA113539431, ERS15535852] Study_1322_RNA {'biome_name': 'Fecal', 'lineage': 'root:Host-... 2026-04-24T16:01:41.365000+00:00
1 SAMEA113539645 [ERS15536066, SAMEA113539645] Study_1665_DNA {'biome_name': 'Fecal', 'lineage': 'root:Host-... 2026-04-24T16:02:06.759000+00:00
2 SAMEA113539284 [ERS15535705, SAMEA113539284] Study_963_DNA {'biome_name': 'Fecal', 'lineage': 'root:Host-... 2026-04-24T16:01:23.909000+00:00
3 SAMEA113540517 [ERS15536938, SAMEA113540517] Study_5298_DNA {'biome_name': 'Fecal', 'lineage': 'root:Host-... 2026-04-24T16:03:53.546000+00:00
4 SAMEA115284684 [ERS18228651, SAMEA115284684] ATZ_IGR_046_V1 None 2026-04-24T16:07:38.087000+00:00

5. Immutable filter cloning#

filter() returns a new QuerySet with the updated params β€” the original is untouched. This makes it safe to build queries incrementally.

base = QuerySet(resource="studies")
filtered = base.filter(biome_lineage="root:Environmental:Aquatic", page_size=5)

print(f"base params:     {base.params}")
print(f"filtered params: {filtered.params}")
print(f"Same object?     {base is filtered}")
base params:     {}
filtered params: {'biome_lineage': 'root:Environmental:Aquatic', 'page_size': 5}
Same object?     False
# Chain filters β€” each step is a new clone
qs = (
    QuerySet(resource="studies")
    .filter(biome_lineage="root:Environmental")
    .page_size(3)
)
print(f"Chained params: {qs.params}")
print(f"Request URL: {qs.request_url}")
Chained params: {'biome_lineage': 'root:Environmental', 'page_size': 3}
Request URL: https://www.ebi.ac.uk/metagenomics/api/v2/studies?biome_lineage=root%3AEnvironmental&page=1&page_size=3
# Fetch the filtered results
qs.first()
df = qs.to_df()
print(f"Rows returned: {len(df)}")
df.head()
Planning the API call with params:
{'biome_lineage': 'root:Environmental', 'page_size': 3}
Total pages to retrieve: 889
Total records to retrieve: 2665
Rows returned: 3
accession ena_accessions title biome updated_at
0 MGYS00000274 [SRP000664] Windshield splatter {'biome_name': 'Air', 'lineage': 'root:Environ... 2025-01-27T15:22:25.672000+00:00
1 MGYS00010288 [PRJEB93890] Metagenome assembly of PRJNA270248 data set (M... {'biome_name': 'Sediment', 'lineage': 'root:En... 2025-07-14T20:41:25.062000+00:00
2 MGYS00002009 [ERP104175, PRJEB22494] EMG produced TPA metagenomics assembly of the ... {'biome_name': 'Salt marsh', 'lineage': 'root:... 2025-05-16T10:58:55.790000+00:00

6. Pagination: page(n) and get(limit)#

Pages are fetched individually and cached. Already-fetched pages are not re-requested.

pg = MGnifier(resource="biomes", params={"page_size": 5})
pg.dry_run()
print(f"Total pages: {pg.total_pages}")
Planning the API call with params:
{'page_size': 5}
Total pages to retrieve: 99
Total records to retrieve: 492
Total pages: 99
# Fetch page 1
pg.page(1)
print(f"Page 1 in results: {pg._is_in_results(1)}")
print(f"Page 2 in results: {pg._is_in_results(2)}")
Page 1 in results: True
Page 2 in results: False
# Fetch page 3 (skipping page 2 β€” non-contiguous fetch is fine)
pg.page(3)
print(f"Pages fetched: {sorted(pg._results.keys())}")
print(f"DataFrame has {len(pg.to_df())} rows (2 pages Γ— 5 per page)")
Pages fetched: [1, 3]
DataFrame has 10 rows (2 pages Γ— 5 per page)
# get(limit=N) fetches however many pages are needed to satisfy limit records
# Requires dry_run() first (or pass safety=False to skip)
limited = MGnifier(resource="samples", params={"page_size": 5})
limited.dry_run()
limited.get(limit=12)  # will fetch 3 pages (5+5+5 = 15, enough for 12)
print(f"Records retrieved: {len(limited.to_df())} (asked for 12, got nearest page boundary)")
Planning the API call with params:
{'page_size': 5}
Total pages to retrieve: 7060
Total records to retrieve: 35300
Records retrieved: 15 (asked for 12, got nearest page boundary)
Retrieving pages: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 21.45it/s]

7. MGnipy facade + proxy classes#

MGnipy is the top-level entry point. It uses __getattr__ to dispatch attribute access to typed proxy classes.

⚠️ Known limitation: the config passed to MGnipy() is not forwarded to the proxy. Custom base URLs and auth tokens are silently ignored until M2 is fixed. βœ…

client = MGnipy()
print(f"Available resources: {client.list_resources()}")
Available resources: ['analyses', 'analysis', 'assemblies', 'assembly', 'genomes', 'genome', 'publications', 'publication', 'samples', 'sample', 'studies', 'study', 'runs', 'run', 'biomes', 'biome', 'catalogues', 'catalogue']
# Accessing a resource returns a typed proxy (list-type)
studies_proxy = client.studies
print(type(studies_proxy))
print(studies_proxy)
<class 'mgnipy.V2.proxies.Studies'>
MGnifier instance for resource: studies
I.e., mgnipy.V2.proxies.Studies
----------------------------------------
Base URL: https://www.ebi.ac.uk/
Parameters: {}
Endpoint module: mgnipy.emgapi_v2_client.api.studies.list_mgnify_studies
Example request URL: https://www.ebi.ac.uk/metagenomics/api/v2/studies?page=1
Returns paginated results: True
# Proxies expose the same query-building API as MGnifier
filtered_proxy = studies_proxy.filter(biome_lineage="root:Environmental", page_size=3)
print(f"Filter returned new object: {filtered_proxy is not studies_proxy}")
print(f"Filtered params: {filtered_proxy.params}")
Filter returned new object: True
Filtered params: {'biome_lineage': 'root:Environmental', 'page_size': 3}
# Fetch first page through the proxy
filtered_proxy.first()
df = filtered_proxy.to_df()
print(f"Rows: {len(df)}")
df.head()
Planning the API call with params:
{'biome_lineage': 'root:Environmental', 'page_size': 3}
Total pages to retrieve: 889
Total records to retrieve: 2665
Rows: 3
accession ena_accessions title biome updated_at
0 MGYS00000274 [SRP000664] Windshield splatter {'biome_name': 'Air', 'lineage': 'root:Environ... 2025-01-27T15:22:25.672000+00:00
1 MGYS00010288 [PRJEB93890] Metagenome assembly of PRJNA270248 data set (M... {'biome_name': 'Sediment', 'lineage': 'root:En... 2025-07-14T20:41:25.062000+00:00
2 MGYS00002009 [ERP104175, PRJEB22494] EMG produced TPA metagenomics assembly of the ... {'biome_name': 'Salt marsh', 'lineage': 'root:... 2025-05-16T10:58:55.790000+00:00
# Biomes has a special tree visualisation (after fetching)
biomes = client.biomes
biomes.first()
print(f"Biome lineages: {biomes.lineages[:5]}")
Planning the API call with params:
{}
Total pages to retrieve: 20
Total records to retrieve: 492
Biome lineages: ['root', 'root:Control', 'root:Engineered', 'root:Engineered:Biogas plant', 'root:Engineered:Biogas plant:Wet fermentation']
# Config bug demonstration β€” custom base_url is silently ignored right now
from mgnipy._models.config import MgnipyConfig
default_url = str(MgnipyConfig().base_url)

custom_client = MGnipy(base_url="https://custom.example.com")
proxy = custom_client.studies

print(f"Custom URL given to MGnipy: https://custom.example.com")
print(f"URL actually used by proxy: {proxy._base_url}")
print(f"Bug present: {str(proxy._base_url) == default_url}  ← should be False after M2 fix")
Custom URL given to MGnipy: https://custom.example.com
URL actually used by proxy: https://custom.example.com/
Bug present: False  ← should be False after M2 fix

8. Async: aget, apage, afirst#

Every sync method has an async counterpart. Use these when you need to concurrently fetch many resources.

import asyncio

async def demo_async():
    mg = MGnifier(resource="runs", params={"page_size": 5})
    await mg.afirst()
    df = mg.to_df()
    print(f"Async fetch β†’ {len(df)} rows")
    return df

# In Jupyter, use await directly (event loop already running)
df_async = await demo_async()
df_async.head()
Planning the API call with params:
{'page_size': 5}
Total pages to retrieve: 7703
Total records to retrieve: 38514
Async fetch β†’ 5 rows
experiment_type accession instrument_model instrument_platform sample_accession study_accession
0 Amplicon DRR019176 None None SAMD00004051 MGYS00001632
1 Amplicon DRR001168 None None SAMD00009393 MGYS00001632
2 Amplicon DRR001169 None None SAMD00009394 MGYS00001632
3 Amplicon DRR001167 None None SAMD00009395 MGYS00001632
4 Amplicon DRR001170 None None SAMD00009396 MGYS00001632
async def demo_concurrent_pages():
    mg = MGnifier(resource="samples", params={"page_size": 10})
    mg.dry_run()
    # aget fetches all pages concurrently (with semaphore to protect the server)
    await mg.aget(limit=30, safety=False)
    return mg.to_df()

df_concurrent = await demo_concurrent_pages()
print(f"Concurrent fetch β†’ {len(df_concurrent)} rows")
Planning the API call with params:
{'page_size': 10}
Total pages to retrieve: 3530
Total records to retrieve: 35300
Concurrent fetch β†’ 30 rows
Retrieving pages: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 28.81it/s]

9. ⚠️ What is currently broken#

These cells document bugs and missing features. They are expected to fail until the corresponding milestone is fixed.

βœ… M1 β€” cli.py is missing#

# This should succeed after M1 is fixed
try:
    import mgnipy.cli
    print(f"βœ… mgnipy.cli imported, main={mgnipy.cli.main}")
except ModuleNotFoundError as e:
    print(f"❌ M1 not fixed: {e}")
    print("   Fix: create mgnipy/cli.py with a main() function")
βœ… mgnipy.cli imported, main=<function main at 0x111b67ce0>

βœ… M2 β€” Config not passed to proxies#

# After M2 is fixed, proxy._base_url should match the custom URL
from mgnipy import MGnipy
from mgnipy._models.config import MgnipyConfig

custom = MGnipy(base_url="https://staging.example.com/")
proxy = custom.studies

expected = "https://staging.example.com/"
actual = str(proxy._base_url)

if actual == expected:
    print("βœ… M2 fixed: config flows through")
else:
    print(f"❌ M2 not fixed: proxy uses '{actual}' instead of '{expected}'")
    print("   Fix: mgnipy/mgnipy.py:52 β€” pass base_url to proxy constructor")
βœ… M2 fixed: config flows through

βœ… Not-yet-implemented: SingleResource (accession lookup)#

βœ… Intead of SingleResource I created MGnifyDetail but I think its a similar idea

from mgnipy.V2.proxies import StudyDetail, Studies

Studies(search="MGYS00001422").preview()
Planning the API call with params:
{'search': 'MGYS00001422'}
Total pages to retrieve: 1
Total records to retrieve: 1
accession ena_accessions title biome updated_at
0 MGYS00001422 [ERP014234, PRJEB12735] Amplicon sequencing of four biogas plants {'biome_name': 'Biogas plant', 'lineage': 'roo... 2026-04-27T12:05:32.095000+00:00
# The plan (Day 2) calls for: mgnipy.studies["MGYS00001422"] β†’ lazy SingleResource
# Currently __getitem__ on the proxy requires data to already be fetched
from mgnipy.V2.proxies import StudyDetail, Studies

detail = StudyDetail("MGYS00001422")

s = Studies(search="MGYS00001422")
s.get()

try:
    item = s["MGYS00001422"]
    print(f"βœ… Accession lookup returned: {type(item).__name__}")
    # or 
    print(f"βœ… Accession lookup returned: {type(detail).__name__}")
except (AttributeError, KeyError, TypeError) as e:
    print(f"❌ SingleResource not implemented: {type(e).__name__}: {e}")
    print("   Fix: implement SingleResource class and update proxy __getitem__")
Planning the API call with params:
{'search': 'MGYS00001422'}
Total pages to retrieve: 1
Total records to retrieve: 1
Planning the API call with params:
{'accession': 'MGYS00001422'}
Total pages to retrieve: 1
Total records to retrieve: 1
βœ… Accession lookup returned: StudyDetail
βœ… Accession lookup returned: StudyDetail
Retrieving pages: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 18236.10it/s]

❌ Not-yet-implemented: .order_by() and .exists()#

from mgnipy.V2.query_set import QuerySet

qs = QuerySet(resource="studies")

for method in ("order_by", "exists"):
    if hasattr(qs, method):
        print(f"βœ… {method}() exists")
    else:
        print(f"❌ {method}() not implemented yet")
❌ order_by() not implemented yet
❌ exists() not implemented yet

βœ… Not-yet-implemented: describe_resources()#

from mgnipy import MGnipy

client = MGnipy()
result = client.describe_resources()

if result is not None:
    print(f"βœ… describe_resources() returned: {result}")
else:
    print("❌ describe_resources() is a stub β€” returns None")
    print("   Fix: implement in mgnipy/mgnipy.py")
List all analyses (MGYAs) available from MGnify

Each analysis is the result of a Pipeline execution on a reads dataset (either a raw read-run, or an
assembly).

Supported parameters:
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Get MGnify analysis by accession

MGnify analyses are accessioned with an MYGA-prefixed identifier and correspond to an individual Run
or Assembly analysed by a Pipeline.

Supported parameters:
- accession: (str)
List all assemblies available in MGnify

Each assembly represents a collection of contigs generated by assembling sequencing reads from an
MGnify or run

Supported parameters:
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Get assembly by accession

Get detailed information about a specific assembly.

Supported parameters:
- accession: (str)
List all genomes across MGnify Genome catalogues

MGnify Genomes are either isolates, or MAGs derived from binned metagenomes.

Supported parameters:
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Get the detail of a single MGnify Genome

MGnify Genomes are either isolates, or MAGs derived from binned metagenomes.

Supported parameters:
- accession: (str)
List all publications

List all publications in the MGnify database.

Supported parameters:
- order: (ListMgnifyPublicationsOrderType0 | None | Unset)
- published_after: (int | None | Unset) Filter by minimum publication year
- published_before: (int | None | Unset) Filter by maximum publication year
- title: (None | str | Unset) Search within publication titles
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Get the detail of a single publication

Get detailed information about a publication, including associated studies.

Supported parameters:
- pubmed_id: (int)
List all samples analysed by MGnify

MGnify samples inherit directly from samples (or BioSamples) in ENA.

Supported parameters:
- biome_lineage: (None | str | Unset) The lineage to match, including all descendant biomes
- search: (None | str | Unset) Search within sample titles and accessions
- order: (ListMgnifySamplesOrderType0 | None | Unset)
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Get the detail of a single sample analysed by MGnify

MGnify samples inherit directly from samples (or BioSamples) in ENA.

Supported parameters:
- accession: (str)
List all studies analysed by MGnify

MGnify studies inherit directly from studies (or projects) in ENA.

Supported parameters:
- order: (ListMgnifyStudiesOrderType0 | None | Unset)
- biome_lineage: (None | str | Unset) The lineage to match, including all descendant biomes
- has_analyses_from_pipeline: (None | PipelineVersions | Unset) If set, will only show studies with analyses from the specified MGnify pipeline version
- search: (None | str | Unset) Search within study titles and accessions
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Get the detail of a single study analysed by MGnify

MGnify studies inherit directly from studies (or projects) in ENA.

Supported parameters:
- accession: (str)
List all analysed runs

List all analysed runs in the MGnify database.

Supported parameters:
- has_experiment_type: (ExperimentTypes | None | Unset) If set, will only show runs with the specified experiment type
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Get the detail of a single analysed run

Get the detail of a single analysed run in the MGnify database.

Supported parameters:
- accession: (str)
List all biomes

List all biomes in the MGnify database.

Supported parameters:
- biome_lineage: (None | str | Unset) The lineage to match, including all descendant biomes
- max_depth: (int | None | Unset) Maximum depth of the biome lineage to include, e.g. `root` is 1 and `root:Host-Associated:Human` is level 3
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
List all biomes

List all biomes in the MGnify database.

Supported parameters:
- biome_lineage: (None | str | Unset) The lineage to match, including all descendant biomes
- max_depth: (int | None | Unset) Maximum depth of the biome lineage to include, e.g. `root` is 1 and `root:Host-Associated:Human` is level 3
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
List all genome catalogues

MGnify Genomes Catalogues are biome-specific collections of isolate and MAG genomes.

Supported parameters:
- page: (int | Unset) Default: 1.
- page_size: (int | None | Unset)
Get genome catalogue by ID

Supported parameters:
- catalogue_id: (str)
βœ… describe_resources() returned: {}

10. πŸ—ΊοΈ What comes next#

Milestones for the 2-hour PyPI publish session#

#

Fix

File

Time

M1

Create cli.py with main()

mgnipy/cli.py (new)

15 min

M2

Pass config to proxy constructors

mgnipy/mgnipy.py:52

10 min

M3

Narrow testpaths in pytest config

pyproject.toml

5 min

M4

Build wheel and test-install

β€”

15 min

M5

Rewrite README with accurate examples

README.md

20 min

M6

Add CHANGELOG

CHANGELOG.md (new)

5 min

M7

Tag version and push

git

10 min

To run the milestone tests#

# All milestones (offline, no API calls)
uv run pytest tests/milestones/test_milestones.py -v

# Include live-API regression tests
uv run pytest tests/milestones/test_milestones.py -v -m live_api

# After fixing M1, remove @pytest.mark.xfail from TestM1_CLI.test_after_fix_*
# and re-run β€” those tests should now be green

Deferred post-publish#

  • SingleResource β€” lazy accession-keyed objects (studies["MGYS00001422"])

  • .order_by(), .exists() methods

  • describe_resources() implementation

  • 85% test coverage with mocked API calls

  • Full docstring pass