mgnipy.V2 package

Contents

mgnipy.V2 package#

class mgnipy.V2.MGnifier(resource, *, config=None, params=None, **kwargs)[source]#

Bases: QuerySet

(Facade) MGnifier is the main use-facing class representing a queryable MGnify resource. It provides methods for fetching and navigating data from the MGnify API.

Parameters:
  • resource (Literal ['biomes', 'biome', 'studies', 'study', 'samples', 'sample', 'runs', 'run', 'genomes', 'genome', 'analyses', 'analysis', 'assemblies', 'assembly', 'publications', 'publication', 'catalogues', 'catalogue'])

  • config (dict | None)

  • params (dict [str , Any ] | None)

get(*args, **kwargs)[source]#
async aget(*args, **kwargs)[source]#
page(*args, **kwargs)[source]#
async apage(*args, **kwargs)[source]#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

class mgnipy.V2.Biomes(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList, BiomesTreeMixin

Parameters:
RESOURCE: ClassVar [Literal ['biomes']] = 'biomes'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

property lineages: list [str ]#
list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

show_tree(method='compact')#
Parameters:

method (Literal ['compact', 'show', 'print', 'horizontal', 'hshow', 'h', 'hprint', 'vertical', 'vshow', 'v', 'vprint'])

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

property tree: Tree#

Convert the biomes metadata to a tree structure for visualization or analysis.

Returns:

A tree representation of the biomes and their relationships.

Return type:

Tree

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

class mgnipy.V2.Studies(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
RESOURCE: ClassVar [Literal ['studies']] = 'studies'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.Samples(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
RESOURCE: ClassVar [Literal ['samples']] = 'samples'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.Analyses(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
RESOURCE: ClassVar [Literal ['analyses']] = 'analyses'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.Genomes(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
RESOURCE: ClassVar [Literal ['genomes']] = 'genomes'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.Client(base_url, *, raise_on_unexpected_status=False, cookies=NOTHING, headers=NOTHING, timeout=None, verify_ssl=True, follow_redirects=False, httpx_args=NOTHING)[source]#

Bases: object

A class for keeping track of data related to the API

The following are accepted as keyword arguments and will be used to construct httpx Clients internally:

base_url: The base URL for the API, all requests are made to a relative path to this URL

cookies: A dictionary of cookies to be sent with every request

headers: A dictionary of headers to be sent with every request

timeout: The maximum amount of a time a request can take. API functions will raise httpx.TimeoutException if this is exceeded.

verify_ssl: Whether or not to verify the SSL certificate of the API server. This should be True in production, but can be set to False for testing purposes.

follow_redirects: Whether or not to follow redirects. Default value is False.

httpx_args: A dictionary of additional arguments to be passed to the httpx.Client and httpx.AsyncClient constructor.

Parameters:
raise_on_unexpected_status#

Whether or not to raise an errors.UnexpectedStatus if the API returns a status code that was not documented in the source OpenAPI document. Can also be provided as a keyword argument to the constructor.

Type:

bool

raise_on_unexpected_status: bool #
with_headers(headers)[source]#

Get a new client matching this one with additional headers

Parameters:

headers (dict [str , str ])

Return type:

Client

with_cookies(cookies)[source]#

Get a new client matching this one with additional cookies

Parameters:

cookies (dict [str , str ])

Return type:

Client

with_timeout(timeout)[source]#

Get a new client matching this one with a new timeout configuration

Parameters:

timeout (Timeout)

Return type:

Client

set_httpx_client(client)[source]#

Manually set the underlying httpx.Client

NOTE: This will override any other settings on the client, including cookies, headers, and timeout.

Parameters:

client (Client)

Return type:

Client

get_httpx_client()[source]#

Get the underlying httpx.Client, constructing a new one if not previously set

Return type:

Client

set_async_httpx_client(async_client)[source]#

Manually set the underlying httpx.AsyncClient

NOTE: This will override any other settings on the client, including cookies, headers, and timeout.

Parameters:

async_client (AsyncClient)

Return type:

Client

get_async_httpx_client()[source]#

Get the underlying httpx.AsyncClient, constructing a new one if not previously set

Return type:

AsyncClient

class mgnipy.V2.AuthenticatedClient(base_url, token, prefix='Bearer', auth_header_name='Authorization', *, raise_on_unexpected_status=False, cookies=NOTHING, headers=NOTHING, timeout=None, verify_ssl=True, follow_redirects=False, httpx_args=NOTHING)[source]#

Bases: object

A Client which has been authenticated for use on secured endpoints

The following are accepted as keyword arguments and will be used to construct httpx Clients internally:

base_url: The base URL for the API, all requests are made to a relative path to this URL

cookies: A dictionary of cookies to be sent with every request

headers: A dictionary of headers to be sent with every request

timeout: The maximum amount of a time a request can take. API functions will raise httpx.TimeoutException if this is exceeded.

verify_ssl: Whether or not to verify the SSL certificate of the API server. This should be True in production, but can be set to False for testing purposes.

follow_redirects: Whether or not to follow redirects. Default value is False.

httpx_args: A dictionary of additional arguments to be passed to the httpx.Client and httpx.AsyncClient constructor.

Parameters:
raise_on_unexpected_status#

Whether or not to raise an errors.UnexpectedStatus if the API returns a status code that was not documented in the source OpenAPI document. Can also be provided as a keyword argument to the constructor.

Type:

bool

token#

The token to use for authentication

Type:

str

prefix#

The prefix to use for the Authorization header

Type:

str

auth_header_name#

The name of the Authorization header

Type:

str

raise_on_unexpected_status: bool #
token: str #
prefix: str #
auth_header_name: str #
with_headers(headers)[source]#

Get a new client matching this one with additional headers

Parameters:

headers (dict [str , str ])

Return type:

AuthenticatedClient

with_cookies(cookies)[source]#

Get a new client matching this one with additional cookies

Parameters:

cookies (dict [str , str ])

Return type:

AuthenticatedClient

with_timeout(timeout)[source]#

Get a new client matching this one with a new timeout configuration

Parameters:

timeout (Timeout)

Return type:

AuthenticatedClient

set_httpx_client(client)[source]#

Manually set the underlying httpx.Client

NOTE: This will override any other settings on the client, including cookies, headers, and timeout.

Parameters:

client (Client)

Return type:

AuthenticatedClient

get_httpx_client()[source]#

Get the underlying httpx.Client, constructing a new one if not previously set

Return type:

Client

set_async_httpx_client(async_client)[source]#

Manually set the underlying httpx.AsyncClient

NOTE: This will override any other settings on the client, including cookies, headers, and timeout.

Parameters:

async_client (AsyncClient)

Return type:

AuthenticatedClient

get_async_httpx_client()[source]#

Get the underlying httpx.AsyncClient, constructing a new one if not previously set

Return type:

AsyncClient

Submodules#

mgnipy.V2.core module#

class mgnipy.V2.core.MGnifier(resource, *, config=None, params=None, **kwargs)[source]#

Bases: QuerySet

(Facade) MGnifier is the main use-facing class representing a queryable MGnify resource. It provides methods for fetching and navigating data from the MGnify API.

Parameters:
  • resource (Literal ['biomes', 'biome', 'studies', 'study', 'samples', 'sample', 'runs', 'run', 'genomes', 'genome', 'analyses', 'analysis', 'assemblies', 'assembly', 'publications', 'publication', 'catalogues', 'catalogue'])

  • config (MgnipyConfig)

  • params (dict [str , Any ] | None)

get(*args, **kwargs)[source]#
async aget(*args, **kwargs)[source]#
page(*args, **kwargs)[source]#
async apage(*args, **kwargs)[source]#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #

mgnipy.V2.datasets module#

class mgnipy.V2.datasets.MGazine(accession)[source]#

Bases: MGnifyAnalysisWithAnnotations

More so an extended data class

Parameters:

accession (str )

property url_dict: dict [str , dict ]#

url for all downloads

Type:

returns a dict of alias

property downloads_df: DataFrame#

returns a dataframe of all downloads with columns alias, url, file_type

property url_list#

returns a list of all download urls

stream_tsv(url, sep='\t', chunksize=None, max_skip=5, **pd_kwargs)[source]#

Reads a tsv file from a url and returns an iterator of pandas dataframes. Handles potential issues with extra header rows (causing pd.errors.ParserError) by trying to read the file with increasing skiprows until it succeeds or reaches max_skip.

Parameters:
  • url (str ) – The url of the tsv file to stream.

  • sep (str , optional) – The separator used in the tsv file. Default is tab.

  • chunksize (int , optional) – The number of rows to include in each chunk. Default is None.

  • max_skip (int , optional) – The maximum number of rows to skip before raising an error. Default is 5.

  • pd_kwargs (dict , optional) – Additional keyword arguments to pass to pandas read_csv.

Returns:

An iterator of pandas dataframes.

Return type:

pd.DataFrame | pd.io.parsers.readers.TextFileReader

stream_html(url, **web_kwargs)[source]#

Streams an html file from a url and opens it in the default web browser.

Parameters:
  • url (str ) – The url of the html file to stream.

  • web_kwargs (dict , optional) – Additional keyword arguments to pass to webbrowser.open.

Returns:

True if the url was opened successfully, False otherwise.

Return type:

bool

stream_txt(url, chunksize=None, httpx_client=None, **httpx_kwargs)[source]#

Streams a txt file from a url and returns an iterator of strings.

Parameters:
  • url (str ) – The url of the txt file to stream.

  • chunksize (Optional[int ], optional) – The number of characters to include in each chunk. Default is None.

  • httpx_kwargs (dict , optional) – Additional keyword arguments to pass to the httpx client.

  • httpx_client (Client | None)

Returns:

An iterator of strings.

Return type:

Generator[str , None, None]

stream_fasta(url, **skbio_kwargs)[source]#

Streams a fasta file from a url and returns an iterator of tuples (header, sequence).

Parameters:
Returns:

An iterator of tuples (header, sequence).

Return type:

Generator[tuple [str , str ], None, None]

stream_gff(url, **skbio_kwargs)[source]#

Streams a gff file from a url and returns an iterator of parsed gff records.

Parameters:
Returns:

“generator of tuple (seq_id of str type, skbio.metadata.IntervalMetadata)”

Return type:

Generator[skbio.io._gff3.GFF3Record, None, None]

stream_biom(url, **skbio_kwargs)[source]#

Streams a biom file from a url and returns an iterator of parsed biom records.

Parameters:
  • url (str ) – The url of the biom file to stream.

  • skbio_kwargs (dict , optional) – Additional keyword arguments to pass to the skbio parser.

Returns:

An iterator of parsed biom records as dictionaries.

Return type:

Generator[dict , None, None]

stream_gzipped(url, chunksize=None, httpx_client=None, decode=False, encoding='utf-8', errors='replace', **httpx_kwargs)[source]#

Streams a gzipped file from a url and returns a file-like object that can be read in chunks. Written using GPT-5.3-Codex. Uses httpx for streaming and zlib for decompression.

Parameters:
  • url (str ) – The url of the gzipped file to stream.

  • chunksize (int , optional) – The size of each chunk to read from the stream.

  • httpx_client (httpx.Client, optional) – The httpx client to use for streaming.

  • decode (bool , default False) – Whether to decode the decompressed bytes to a string.

  • encoding (str , default "utf-8") – The encoding to use for decoding bytes to a string.

  • errors (str , default "replace") – The error handling strategy for decoding bytes to a string.

  • **httpx_kwargs (dict ) – Additional keyword arguments to pass to the httpx client.

Returns:

A file-like object that can be read in chunks. If chunksize is None, returns the full decompressed content as bytes, or string based on decode.

Return type:

bytes | str | io.BufferedReader | io.TextIOWrapper

stream_jsonl(url, orient=None, chunksize=None, **pd_kwargs)[source]#

Streams a jsonl file from a url and returns the parsed json as a dictionary.

Parameters:
  • url (str ) – The url of the json file to stream.

  • sep (str , optional) – The separator to use when parsing the json file. Default is “ “.

  • chunksize (Optional[int ], optional) – The size of the chunks to read from the stream. Default is None.

  • max_skip (int , optional) – The maximum number of rows to skip before raising an error. Default is 5.

  • **pd_kwargs (dict ) – Additional keyword arguments to pass to the pandas parser.

  • orient (Literal ['records', 'split', 'index', 'columns', 'values', 'table'] | None)

Returns:

The parsed json as a dictionary.

Return type:

dict

stream_json(url, chunksize=None, httpx_client=None, **httpx_kwargs)[source]#

Streams a json file from a url and returns the parsed json as a dictionary or an iterator of dictionaries if chunksize is specified.

Parameters:
  • url (str ) – The url of the json file to stream.

  • chunksize (Optional[int ], optional) – The size of the chunks to read from the stream. Default is None.

  • **httpx_kwargs (dict ) – Additional keyword arguments to pass to the httpx client.

  • httpx_client (Client | None)

Returns:

The parsed json as a dictionary, or an iterator of dictionaries if chunksize is specified.

Return type:

dict | Generator

stream_tree(url, **skbio_kwargs)[source]#

Streams a tree file from a url and returns an iterator of parsed tree records.

Parameters:
  • url (str ) – The url of the tree file to stream.

  • skbio_kwargs (dict , optional) – Additional keyword arguments to pass to the skbio parser.

Returns:

An iterator of parsed tree records as dictionaries.

Return type:

Generator[dict , None, None]

stream(*, alias=None, url=None, chunksize=None, max_skip=5, **kwargs)[source]#

Streams a download based on its alias or url. If neither alias nor url is provided, streams all downloads. (if chunksize is specified, it’s kinda lazy loading)

Parameters:
  • alias (Optional[str ]) – The alias of the download to stream.

  • url (Optional[HttpUrl]) – The url of the download to stream.

  • chunksize (Optional[int ]) – The size of the chunks to read from the stream.

  • max_skip (int , optional) – The maximum number of rows to skip before raising an error. Default is 5.

  • **kwargs – Additional keyword arguments to pass to the streamer function.

Returns:

A dictionary of alias: streamer_function for the requested downloads.

Return type:

dict [str , Callable]

download(to_dir, alias=None, *, url=None, filename=None, httpx_client=None, hide_progress=False)[source]#

Downloads a file from a url or alias to a specified directory.

Parameters:
  • to_dir (DirectoryPath) – The directory to download the file to.

  • alias (Optional[str ], optional) – The alias of the file to download. If not provided, url must be provided. Default is None.

  • url (Optional[str ], optional) – The url of the file to download. If not provided, alias must be provided. Default is None.

  • filename (Optional[str ], optional) – The name to save the file as. If not provided, the alias will be used as the filename. Default is None.

  • httpx_client (Client | None)

  • hide_progress (bool )

Raises:

ValueError – If neither alias nor url is provided, or if url is provided without a corresponding alias in the downloads.

async adownload(to_dir, alias=None, *, url=None, filename=None, httpx_aclient=None, hide_progress=False)[source]#

Asynchronously downloads a file from a url or alias to a specified directory.

Parameters:
  • to_dir (DirectoryPath) – The directory to download the file to.

  • alias (Optional[str ], optional) – The alias of the file to download. If not provided, url must be provided. Default is None.

  • url (Optional[str ], optional) – The url of the file to download. If not provided, alias must be provided. Default is None.

  • filename (Optional[str ], optional) – The name to save the file as. If not provided, the alias will be used as the filename. Default is None. Note that if url is provided without a corresponding alias in the downloads, filename must be provided since there is no alias to use as the filename.

  • httpx_aclient (Optional[httpx.AsyncClient], optional) – An optional httpx.AsyncClient to use for the download. If not provided, a new client will be created using the mgnifier helper. Default is None.

  • hide_progress (bool )

async adownload_all(to_dir, hide_progress=False)[source]#

Asynchronously downloads all files in the downloads to a specified directory.

Parameters:
  • to_dir (DirectoryPath) – The directory to download the files to.

  • hide_progress (bool , optional) – Whether to hide the progress bars. Default is False.

Note

This method will use the adownload method for each file, so it will respect the same parameters and behavior for handling aliases, urls, filenames, and httpx clients. If you want to customize those parameters for each file, you can call adownload directly for each file instead of using this method.

download_all(to_dir, hide_progress=False)[source]#

TODO fix Downloads all files in the downloads to a specified directory.

Parameters:
  • to_dir (DirectoryPath) – The directory to download the files to.

  • hide_progress (bool , optional) – Whether to hide the progress bars. Default is False.

Note

This method will use the download method for each file, so it will respect the same parameters and behavior for handling aliases, urls, filenames, and httpx clients. If you want to customize those parameters for each file, you can call download directly for each file instead of using this method.

experiment_type#
study_accession#
accession#
run#
sample#
assembly#
pipeline_version#
read_run#
quality_control_summary#
annotations#
downloads#
results_dir#
metadata#
additional_properties#
property additional_keys: list [str ]#
classmethod from_dict(src_dict)#
Parameters:

src_dict (Mapping [str , Any ])

Return type:

T

to_dict()#
Return type:

dict [str , Any ]

class mgnipy.V2.datasets.MGazineCurator(*mgazines)[source]#

Bases: object

go_terms()[source]#

mgnipy.V2.endpoints module#

mgnipy.V2.mixins module#

class mgnipy.V2.mixins.ResultsHandlerMixin[source]#

Bases: object

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)[source]#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)[source]#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)[source]#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)[source]#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

class mgnipy.V2.mixins.BiomesTreeMixin[source]#

Bases: object

property lineages: list [str ]#
property tree: Tree#

Convert the biomes metadata to a tree structure for visualization or analysis.

Returns:

A tree representation of the biomes and their relationships.

Return type:

Tree

show_tree(method='compact')[source]#
Parameters:

method (Literal ['compact', 'show', 'print', 'horizontal', 'hshow', 'h', 'hprint', 'vertical', 'vshow', 'v', 'vprint'])

class mgnipy.V2.mixins.DescribeEmgapiMixin[source]#

Bases: object

endpoint_module()[source]#
list_supported_params()[source]#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

validate_endpoint_kwargs(**kwargs)[source]#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

sub_url(**kwargs)[source]#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

resolve_query_string(**kwargs)[source]#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

url_path(**kwargs)[source]#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

property emgapi_docs: str #
describe_endpoint(as_dict=False)[source]#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

mgnipy.V2.proxies module#

class mgnipy.V2.proxies.MGnifyList(*, config=None, params=None, **kwargs)[source]#

Bases: MGnifier

Parameters:
  • config (MgnipyConfig)

  • params (Optional[dict [str , Any]])

RESOURCE: ClassVar [Literal ['biomes', 'studies', 'samples', 'runs', 'analyses', 'genomes', 'assemblies', 'publications', 'catalogues', 'private_studies'] | None ] = None#
child_resource: str #
iter_details(fetch=True)[source]#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

collect_details(*, fetch=True, by_id=False)[source]#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

async aiter_details(fetch=True)[source]#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)[source]#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

get_detail(access_param, fetch=True)[source]#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

async aget_detail(access_param, fetch=True)[source]#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.MGnifyDetail(id, config=None, **kwargs)[source]#

Bases: MGnifier

Parameters:
  • id (str )

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['biome', 'study', 'sample', 'run', 'analysis', 'genome', 'assembly', 'publication', 'catalogue'] | None ] = None#
get_list(resource, access_param, fetch=True, explain=False)[source]#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

async aget_list(resource, access_param, fetch=True, explain=False)[source]#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Analyses(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['analyses']] = 'analyses'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Runs(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['runs']] = 'runs'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Samples(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['samples']] = 'samples'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Studies(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['studies']] = 'studies'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.PrivateStudies(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['private_studies']] = 'private_studies'#
config: MgnipyConfig#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Biomes(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList, BiomesTreeMixin

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['biomes']] = 'biomes'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

property lineages: list [str ]#
list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

show_tree(method='compact')#
Parameters:

method (Literal ['compact', 'show', 'print', 'horizontal', 'hshow', 'h', 'hprint', 'vertical', 'vshow', 'v', 'vprint'])

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

property tree: Tree#

Convert the biomes metadata to a tree structure for visualization or analysis.

Returns:

A tree representation of the biomes and their relationships.

Return type:

Tree

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Assemblies(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['assemblies']] = 'assemblies'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Genomes(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['genomes']] = 'genomes'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Publications(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['publications']] = 'publications'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.Catalogues(*, params=None, config=None, **kwargs)[source]#

Bases: MGnifyList

Parameters:
  • params (Optional[dict [str , Any]])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['catalogues']] = 'catalogues'#
async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
Parameters:
Return type:

list [‘QuerySet’] | dict [str , ‘QuerySet’]

async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_detail(access_param, fetch=True)#

Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Examples

sample = await samples.aget_detail({“accession”: “MGYS00001234”})

Parameters:
Return type:

QuerySet

async aiter_details(fetch=True)#

Async version of iter_details.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An async iterator that yields child detail proxies.

Return type:

AsyncIterator of QuerySet

async apage(*args, **kwargs)#
property base_url: str #
collect_details(*, fetch=True, by_id=False)#

Collect child detail proxies into a list or dict.

Parameters:
  • fetch (bool ) – Whether to immediately fetch the details after creating the proxies.

  • by_id (bool ) – Whether to return a dict keyed by identifier instead of a list.

Returns:

A list or dict of child detail proxies.

Return type:

list of QuerySet or dict of str to QuerySet

Example

sample_detail = samples.collect_details(fetch=True, by_id=True)

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_detail(access_param, fetch=True)#

Get detail proxy for a specific accession/pubmed_id/catalogue_id.

Parameters:
  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

sample = samples.get_detail({“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

iter_details(fetch=True)#

Lazily iterate over child detail proxies.

Parameters:

fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.

Returns:

An iterator that yields child detail proxies.

Return type:

Iterator of QuerySet

Example

for sample in samples.iter_details():

sample.get()

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

child_resource: str #
config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.StudyDetail(id=None, *, accession=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail

Parameters:
  • id (Optional[str ])

  • accession (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['study']] = 'study'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.SampleDetail(id=None, *, accession=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail

Parameters:
  • id (Optional[str ])

  • accession (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['sample']] = 'sample'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.RunDetail(id=None, *, accession=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail

Parameters:
  • id (Optional[str ])

  • accession (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['run']] = 'run'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.AnalysisDetail(id=None, *, accession=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail

Parameters:
  • id (Optional[str ])

  • accession (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['analysis']] = 'analysis'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.GenomeDetail(id=None, *, accession=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail

Parameters:
  • id (Optional[str ])

  • accession (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['genome']] = 'genome'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.AssemblyDetail(id=None, *, accession=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail

Parameters:
  • id (Optional[str ])

  • accession (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['assembly']] = 'assembly'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.BiomeDetail(id=None, *, biome_lineage=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail, BiomesTreeMixin

Parameters:
  • id (Optional[str ])

  • biome_lineage (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['biome']] = 'biome'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

property lineages: list [str ]#
list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

show_tree(method='compact')#
Parameters:

method (Literal ['compact', 'show', 'print', 'horizontal', 'hshow', 'h', 'hprint', 'vertical', 'vshow', 'v', 'vprint'])

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

property tree: Tree#

Convert the biomes metadata to a tree structure for visualization or analysis.

Returns:

A tree representation of the biomes and their relationships.

Return type:

Tree

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.PublicationDetail(id=None, *, accession=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail

Parameters:
  • id (Optional[str ])

  • accession (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['publication']] = 'publication'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
class mgnipy.V2.proxies.CatalogueDetail(id=None, *, accession=None, config=None, **kwargs)[source]#

Bases: MGnifyDetail

Parameters:
  • id (Optional[str ])

  • accession (Optional[str ])

  • config (MgnipyConfig)

RESOURCE: ClassVar [Literal ['catalogue']] = 'catalogue'#
async afirst()#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async aget(*args, **kwargs)#
async aget_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool )

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})

async apage(*args, **kwargs)#
property base_url: str #
property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

describe_relationships()#
dry_run(*, verbose=True)#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

property endpoint_module: Callable #
explain(head=None)#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

filter(**filters)#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

first()#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

get(*args, **kwargs)#
get_list(resource, access_param, fetch=True, explain=False)#

Get list proxy for a specific accession/pubmed_id/catalogue_id detail.

Parameters:
  • resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.

  • access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.

  • fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.

  • explain (bool ) – Whether to print example URLs that would be called.

Returns:

A proxy for the next resource.

Return type:

QuerySet

Examples

samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

list_relationships()#
Return type:

list [str ]

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

list_urls()#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

page(*args, **kwargs)#
page_size(n)#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

property params: dict [str , Any ]#
preview()#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

property resource: SupportedEndpoints#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #

mgnipy.V2.query_executor module#

class mgnipy.V2.query_executor.QueryExecutor(query_set)[source]#

Bases: object

Parameters:

query_set (QuerySet)

require_pagination()[source]#
async map_with_concurrency(items, worker, *, concurrency=None, hide_progress=False)[source]#

Map a worker function over a list of items with controlled concurrency. In plain English, it is a “process these things in parallel, but not too many at once” helper.

Example

results = await self.map_with_concurrency(

items=pages, worker=lambda p: self.apage(p, client), concurrency=8,

)

Parameters:
  • concurrency (int | None)

  • hide_progress (bool )

get_pageinated_counts(*args, **kwargs)[source]#
get_any_first()[source]#

Retrieve the first page of metadata for the current resource and parameters.

For unpaginated endpoints, this will retrieve all metadata which is just one. For paginated endpoints, this will retrieve just the first page of results.

async aget_any_first()[source]#

Asynchronously retrieve the first page of metadata for the current resource and parameters.

For unpaginated endpoints, this will retrieve all metadata which is just one. For paginated endpoints, this will retrieve just the first page of results.

page(*args, **kwargs)[source]#
apage(*args, **kwargs)[source]#
get(limit=None, *, pages=None, safety=False, hide_progress=False)[source]#

Getting all

Parameters:
async aget(limit=None, *, pages=None, safety=False, hide_progress=False)[source]#

Getting all asynchronously

Parameters:

mgnipy.V2.query_set module#

class mgnipy.V2.query_set.QuerySet(resource, *, config=None, params=None, **kwargs)[source]#

Bases: ResultsHandlerMixin, DescribeEmgapiMixin

Plans, builds, validates and previews queries based on endpoint_module and params of the MGnifier owner. Stores the request urls. if mgnifier owner changes then the QuerySet should be re-instantiated to update the urls and other info.

Parameters:
  • resource (Literal ['biomes', 'biome', 'studies', 'study', 'samples', 'sample', 'runs', 'run', 'genomes', 'genome', 'analyses', 'analysis', 'assemblies', 'assembly'])

  • config (MgnipyConfig)

  • params (dict [str , Any ] | None)

config: MgnipyConfig#
exec: QueryExecutor#
count: int | None #
total_pages: int | None #
default_page_size: int #
request_urls: list [str ] | None #
property request_url: str #

Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.

Returns:

The constructed URL for the API request.

Return type:

str

property endpoint_module: Callable #
property params: dict [str , Any ]#
property results: dict [int , list [dict ]]#
property results_ids: list [str ] | None #

Get a list of accessions from the retrieved metadata results, if available.

Returns:

A list of accession strings if available, otherwise None.

Return type:

list of str or None

property resource: SupportedEndpoints#
filter(**filters)[source]#

Update the parameters for the API call to filter results.

Parameters:

**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.

Returns:

A new QuerySet instance with updated parameters for filtering results.

Return type:

QuerySet

page_size(n)[source]#

Set the page size for paginated API calls.

Parameters:

n (int )

Returns:

A new QuerySet instance with the updated page size parameter.

Return type:

QuerySet

property base_url: str #
property pagination_status: bool #

Check if the current resource requires pagination based on its supported keyword arguments.

Returns:

True if pagination, False otherwise.

Return type:

bool

dry_run(*, verbose=True)[source]#

Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.

Return type:

None

Parameters:

verbose (bool )

list_urls()[source]#

Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.

Returns:

A list of URLs corresponding to each API request that would be made.

Return type:

list of str

explain(head=None)[source]#

Print example URLs that would be called. Actual requests handled by client.

Parameters:

head (int | None)

Return type:

None

preview()[source]#

Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.

Returns:

A DataFrame containing the metadata from the specified page of results.

Return type:

pd.DataFrame

Raises:

RuntimeError – If the API call fails or if no data is available to preview.

first()[source]#

Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

async afirst()[source]#

Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.

Return type:

dict

property id_param_key: str #
property identifier: str | None #

Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.

Returns:

The identifier value corresponding to the resource type, or None if not available.

Return type:

str or None

property data: dict [int , list [dict [str , Any ]]]#

results based on the current resource.

describe_endpoint(as_dict=False)#
Parameters:

as_dict (bool )

Return type:

dict [str , str ] | None

property emgapi_docs: str #
property emgapi_resource: str | None #

Retrieves the name of the endpoint resource based on the endpoint module.

Returns:

The name of the endpoint resource, or None if the endpoint module is not set.

Return type:

str or None

list_supported_params()#

Lists supported keyword arguments for the endpoint module.

Returns:

List of supported keyword argument names.

Return type:

list of str

resolve_query_string(**kwargs)#

Resolves the query string for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the query string.

Returns:

The resolved query string.

Return type:

str

sub_url(**kwargs)#

Constructs the sub-URL for the endpoint based on the current parameters.

Returns:

The constructed sub-URL, or None if the endpoint module is not set.

Return type:

str or None

to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#

Convert the current or provided metadata to a pandas DataFrame.

Parameters:
  • data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.

  • expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.

  • rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.

  • **kwargs – Additional keyword arguments passed to pd.DataFrame.

Returns:

DataFrame containing the metadata.

Return type:

pd.DataFrame | None

Raises:

RuntimeError – If no data is available to convert.

to_json(data=None, orient='records', lines=True, **json_kwargs)#

Convert the current metadata to a JSON string or save it to a file.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **json_kwargs – Additional keyword arguments passed to the JSON serialization function.

  • orient (str )

  • lines (bool )

Returns:

The JSON string representation of the metadata, or None if no data is available.

Return type:

str or None

Raises:

RuntimeError – If no data is available to convert.

to_list(data=None)#

Convert the current or provided metadata to a list of dictionaries.

Parameters:

data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.

Returns:

A list of metadata records as dictionaries, or None if no data is available .

Return type:

list of dict | None

Raises:

RuntimeError – If no data is available to convert.

to_polars(data=None, **polars_kwargs)#

Convert the current metadata to a Polars DataFrame.

Parameters:
  • data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self._results.

  • **polars_kwargs – Additional keyword arguments passed to pl.DataFrame.

Returns:

A Polars DataFrame containing the metadata.

Return type:

pl.DataFrame

Raises:

RuntimeError – If no data is available to convert.

url_path(**kwargs)#

Constructs the full URL path for the endpoint based on the current parameters.

Parameters:

**kwargs – Keyword arguments to validate and include in the URL construction.

Returns:

The constructed URL path.

Return type:

str

validate_endpoint_kwargs(**kwargs)#

Validates the provided keyword arguments against the supported parameters of the endpoint module.

Parameters:

**kwargs – Keyword arguments to validate.

Returns:

The validated keyword arguments.

Return type:

dict of str to Any

Raises:

ValueError – If any provided keyword argument is not supported by the endpoint module.

list_relationships()[source]#
Return type:

list [str ]

describe_relationships()[source]#