mgnipy.V2 package#
- class mgnipy.V2.MGnifier(resource, *, config=None, params=None, **kwargs)[source]#
Bases:
QuerySet(Facade) MGnifier is the main use-facing class representing a queryable MGnify resource. It provides methods for fetching and navigating data from the MGnify API.
- Parameters:
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- class mgnipy.V2.Biomes(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList,BiomesTreeMixin- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- show_tree(method='compact')#
- Parameters:
method (Literal ['compact', 'show', 'print', 'horizontal', 'hshow', 'h', 'hprint', 'vertical', 'vshow', 'v', 'vprint'])
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- property tree: Tree#
Convert the biomes metadata to a tree structure for visualization or analysis.
- Returns:
A tree representation of the biomes and their relationships.
- Return type:
Tree
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- class mgnipy.V2.Studies(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.Samples(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.Analyses(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.Genomes(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.Client(base_url, *, raise_on_unexpected_status=False, cookies=NOTHING, headers=NOTHING, timeout=None, verify_ssl=True, follow_redirects=False, httpx_args=NOTHING)[source]#
Bases:
objectA class for keeping track of data related to the API
The following are accepted as keyword arguments and will be used to construct httpx Clients internally:
base_url: The base URL for the API, all requests are made to a relative path to this URLcookies: A dictionary of cookies to be sent with every requestheaders: A dictionary of headers to be sent with every requesttimeout: The maximum amount of a time a request can take. API functions will raise httpx.TimeoutException if this is exceeded.verify_ssl: Whether or not to verify the SSL certificate of the API server. This should be True in production, but can be set to False for testing purposes.follow_redirects: Whether or not to follow redirects. Default value is False.httpx_args: A dictionary of additional arguments to be passed to thehttpx.Clientandhttpx.AsyncClientconstructor.- Parameters:
- raise_on_unexpected_status#
Whether or not to raise an errors.UnexpectedStatus if the API returns a status code that was not documented in the source OpenAPI document. Can also be provided as a keyword argument to the constructor.
- Type:
- with_timeout(timeout)[source]#
Get a new client matching this one with a new timeout configuration
- Parameters:
timeout (Timeout)
- Return type:
- set_httpx_client(client)[source]#
Manually set the underlying httpx.Client
NOTE: This will override any other settings on the client, including cookies, headers, and timeout.
- Parameters:
client (Client)
- Return type:
- get_httpx_client()[source]#
Get the underlying httpx.Client, constructing a new one if not previously set
- Return type:
Client
- class mgnipy.V2.AuthenticatedClient(base_url, token, prefix='Bearer', auth_header_name='Authorization', *, raise_on_unexpected_status=False, cookies=NOTHING, headers=NOTHING, timeout=None, verify_ssl=True, follow_redirects=False, httpx_args=NOTHING)[source]#
Bases:
objectA Client which has been authenticated for use on secured endpoints
The following are accepted as keyword arguments and will be used to construct httpx Clients internally:
base_url: The base URL for the API, all requests are made to a relative path to this URLcookies: A dictionary of cookies to be sent with every requestheaders: A dictionary of headers to be sent with every requesttimeout: The maximum amount of a time a request can take. API functions will raise httpx.TimeoutException if this is exceeded.verify_ssl: Whether or not to verify the SSL certificate of the API server. This should be True in production, but can be set to False for testing purposes.follow_redirects: Whether or not to follow redirects. Default value is False.httpx_args: A dictionary of additional arguments to be passed to thehttpx.Clientandhttpx.AsyncClientconstructor.- Parameters:
- raise_on_unexpected_status#
Whether or not to raise an errors.UnexpectedStatus if the API returns a status code that was not documented in the source OpenAPI document. Can also be provided as a keyword argument to the constructor.
- Type:
- with_headers(headers)[source]#
Get a new client matching this one with additional headers
- Parameters:
- Return type:
- with_cookies(cookies)[source]#
Get a new client matching this one with additional cookies
- Parameters:
- Return type:
- with_timeout(timeout)[source]#
Get a new client matching this one with a new timeout configuration
- Parameters:
timeout (Timeout)
- Return type:
- set_httpx_client(client)[source]#
Manually set the underlying httpx.Client
NOTE: This will override any other settings on the client, including cookies, headers, and timeout.
- Parameters:
client (Client)
- Return type:
- get_httpx_client()[source]#
Get the underlying httpx.Client, constructing a new one if not previously set
- Return type:
Client
Submodules#
mgnipy.V2.core module#
- class mgnipy.V2.core.MGnifier(resource, *, config=None, params=None, **kwargs)[source]#
Bases:
QuerySet(Facade) MGnifier is the main use-facing class representing a queryable MGnify resource. It provides methods for fetching and navigating data from the MGnify API.
- Parameters:
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
mgnipy.V2.datasets module#
- class mgnipy.V2.datasets.MGazine(accession)[source]#
Bases:
MGnifyAnalysisWithAnnotationsMore so an extended data class
- Parameters:
accession (str )
- property downloads_df: DataFrame#
returns a dataframe of all downloads with columns alias, url, file_type
- property url_list#
returns a list of all download urls
- stream_tsv(url, sep='\t', chunksize=None, max_skip=5, **pd_kwargs)[source]#
Reads a tsv file from a url and returns an iterator of pandas dataframes. Handles potential issues with extra header rows (causing pd.errors.ParserError) by trying to read the file with increasing skiprows until it succeeds or reaches max_skip.
- Parameters:
url (str ) – The url of the tsv file to stream.
sep (str , optional) – The separator used in the tsv file. Default is tab.
chunksize (int , optional) – The number of rows to include in each chunk. Default is None.
max_skip (int , optional) – The maximum number of rows to skip before raising an error. Default is 5.
pd_kwargs (dict , optional) – Additional keyword arguments to pass to pandas read_csv.
- Returns:
An iterator of pandas dataframes.
- Return type:
pd.DataFrame | pd.io.parsers.readers.TextFileReader
- stream_html(url, **web_kwargs)[source]#
Streams an html file from a url and opens it in the default web browser.
- stream_txt(url, chunksize=None, httpx_client=None, **httpx_kwargs)[source]#
Streams a txt file from a url and returns an iterator of strings.
- Parameters:
- Returns:
An iterator of strings.
- Return type:
Generator[str , None, None]
- stream_fasta(url, **skbio_kwargs)[source]#
Streams a fasta file from a url and returns an iterator of tuples (header, sequence).
- Parameters:
url (str ) – The url of the fasta file to stream.
skbio_kwargs (dict , optional) – Additional keyword arguments to pass to the skbio parsers. https://scikit.bio/docs/latest/generated/skbio.io.format.fasta.html
- Returns:
An iterator of tuples (header, sequence).
- Return type:
- stream_gff(url, **skbio_kwargs)[source]#
Streams a gff file from a url and returns an iterator of parsed gff records.
- Parameters:
url (str ) – The url of the gff file to stream.
skbio_kwargs (dict , optional) – Additional keyword arguments to pass to the skbio parser. https://scikit.bio/docs/latest/generated/skbio.io.format.gff3.html
- Returns:
“generator of tuple (seq_id of str type, skbio.metadata.IntervalMetadata)”
- Return type:
Generator[skbio.io._gff3.GFF3Record, None, None]
- stream_biom(url, **skbio_kwargs)[source]#
Streams a biom file from a url and returns an iterator of parsed biom records.
- stream_gzipped(url, chunksize=None, httpx_client=None, decode=False, encoding='utf-8', errors='replace', **httpx_kwargs)[source]#
Streams a gzipped file from a url and returns a file-like object that can be read in chunks. Written using GPT-5.3-Codex. Uses httpx for streaming and zlib for decompression.
- Parameters:
url (str ) – The url of the gzipped file to stream.
chunksize (int , optional) – The size of each chunk to read from the stream.
httpx_client (httpx.Client, optional) – The httpx client to use for streaming.
decode (bool , default False) – Whether to decode the decompressed bytes to a string.
encoding (str , default "utf-8") – The encoding to use for decoding bytes to a string.
errors (str , default "replace") – The error handling strategy for decoding bytes to a string.
**httpx_kwargs (dict ) – Additional keyword arguments to pass to the httpx client.
- Returns:
A file-like object that can be read in chunks. If chunksize is None, returns the full decompressed content as bytes, or string based on decode.
- Return type:
- stream_jsonl(url, orient=None, chunksize=None, **pd_kwargs)[source]#
Streams a jsonl file from a url and returns the parsed json as a dictionary.
- Parameters:
url (str ) – The url of the json file to stream.
sep (str , optional) – The separator to use when parsing the json file. Default is “ “.
chunksize (Optional[int ], optional) – The size of the chunks to read from the stream. Default is None.
max_skip (int , optional) – The maximum number of rows to skip before raising an error. Default is 5.
**pd_kwargs (dict ) – Additional keyword arguments to pass to the pandas parser.
orient (Literal ['records', 'split', 'index', 'columns', 'values', 'table'] | None)
- Returns:
The parsed json as a dictionary.
- Return type:
- stream_json(url, chunksize=None, httpx_client=None, **httpx_kwargs)[source]#
Streams a json file from a url and returns the parsed json as a dictionary or an iterator of dictionaries if chunksize is specified.
- Parameters:
- Returns:
The parsed json as a dictionary, or an iterator of dictionaries if chunksize is specified.
- Return type:
dict | Generator
- stream_tree(url, **skbio_kwargs)[source]#
Streams a tree file from a url and returns an iterator of parsed tree records.
- stream(*, alias=None, url=None, chunksize=None, max_skip=5, **kwargs)[source]#
Streams a download based on its alias or url. If neither alias nor url is provided, streams all downloads. (if chunksize is specified, it’s kinda lazy loading)
- Parameters:
alias (Optional[str ]) – The alias of the download to stream.
url (Optional[HttpUrl]) – The url of the download to stream.
chunksize (Optional[int ]) – The size of the chunks to read from the stream.
max_skip (int , optional) – The maximum number of rows to skip before raising an error. Default is 5.
**kwargs – Additional keyword arguments to pass to the streamer function.
- Returns:
A dictionary of alias: streamer_function for the requested downloads.
- Return type:
- download(to_dir, alias=None, *, url=None, filename=None, httpx_client=None, hide_progress=False)[source]#
Downloads a file from a url or alias to a specified directory.
- Parameters:
to_dir (DirectoryPath) – The directory to download the file to.
alias (Optional[str ], optional) – The alias of the file to download. If not provided, url must be provided. Default is None.
url (Optional[str ], optional) – The url of the file to download. If not provided, alias must be provided. Default is None.
filename (Optional[str ], optional) – The name to save the file as. If not provided, the alias will be used as the filename. Default is None.
httpx_client (Client | None)
hide_progress (bool )
- Raises:
ValueError – If neither alias nor url is provided, or if url is provided without a corresponding alias in the downloads.
- async adownload(to_dir, alias=None, *, url=None, filename=None, httpx_aclient=None, hide_progress=False)[source]#
Asynchronously downloads a file from a url or alias to a specified directory.
- Parameters:
to_dir (DirectoryPath) – The directory to download the file to.
alias (Optional[str ], optional) – The alias of the file to download. If not provided, url must be provided. Default is None.
url (Optional[str ], optional) – The url of the file to download. If not provided, alias must be provided. Default is None.
filename (Optional[str ], optional) – The name to save the file as. If not provided, the alias will be used as the filename. Default is None. Note that if url is provided without a corresponding alias in the downloads, filename must be provided since there is no alias to use as the filename.
httpx_aclient (Optional[httpx.AsyncClient], optional) – An optional httpx.AsyncClient to use for the download. If not provided, a new client will be created using the mgnifier helper. Default is None.
hide_progress (bool )
- async adownload_all(to_dir, hide_progress=False)[source]#
Asynchronously downloads all files in the downloads to a specified directory.
- Parameters:
to_dir (DirectoryPath) – The directory to download the files to.
hide_progress (bool , optional) – Whether to hide the progress bars. Default is False.
Note
This method will use the adownload method for each file, so it will respect the same parameters and behavior for handling aliases, urls, filenames, and httpx clients. If you want to customize those parameters for each file, you can call adownload directly for each file instead of using this method.
- download_all(to_dir, hide_progress=False)[source]#
TODO fix Downloads all files in the downloads to a specified directory.
- Parameters:
to_dir (DirectoryPath) – The directory to download the files to.
hide_progress (bool , optional) – Whether to hide the progress bars. Default is False.
Note
This method will use the download method for each file, so it will respect the same parameters and behavior for handling aliases, urls, filenames, and httpx clients. If you want to customize those parameters for each file, you can call download directly for each file instead of using this method.
- experiment_type#
- study_accession#
- accession#
- run#
- sample#
- assembly#
- pipeline_version#
- read_run#
- quality_control_summary#
- annotations#
- downloads#
- results_dir#
- metadata#
- additional_properties#
mgnipy.V2.endpoints module#
mgnipy.V2.mixins module#
- class mgnipy.V2.mixins.ResultsHandlerMixin[source]#
Bases:
object- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)[source]#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)[source]#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)[source]#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)[source]#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- class mgnipy.V2.mixins.BiomesTreeMixin[source]#
Bases:
object- property tree: Tree#
Convert the biomes metadata to a tree structure for visualization or analysis.
- Returns:
A tree representation of the biomes and their relationships.
- Return type:
Tree
- class mgnipy.V2.mixins.DescribeEmgapiMixin[source]#
Bases:
object- validate_endpoint_kwargs(**kwargs)[source]#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- sub_url(**kwargs)[source]#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- resolve_query_string(**kwargs)[source]#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
mgnipy.V2.proxies module#
- class mgnipy.V2.proxies.MGnifyList(*, config=None, params=None, **kwargs)[source]#
Bases:
MGnifier- RESOURCE: ClassVar [Literal ['biomes', 'studies', 'samples', 'runs', 'analyses', 'genomes', 'assemblies', 'publications', 'catalogues', 'private_studies'] | None ] = None#
- iter_details(fetch=True)[source]#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- collect_details(*, fetch=True, by_id=False)[source]#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- get_detail(access_param, fetch=True)[source]#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- async aget_detail(access_param, fetch=True)[source]#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.MGnifyDetail(id, config=None, **kwargs)[source]#
Bases:
MGnifier- Parameters:
id (str )
config (MgnipyConfig)
- RESOURCE: ClassVar [Literal ['biome', 'study', 'sample', 'run', 'analysis', 'genome', 'assembly', 'publication', 'catalogue'] | None ] = None#
- get_list(resource, access_param, fetch=True, explain=False)[source]#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- async aget_list(resource, access_param, fetch=True, explain=False)[source]#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Analyses(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Runs(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Samples(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Studies(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.PrivateStudies(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- config: MgnipyConfig#
- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Biomes(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList,BiomesTreeMixin- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- show_tree(method='compact')#
- Parameters:
method (Literal ['compact', 'show', 'print', 'horizontal', 'hshow', 'h', 'hprint', 'vertical', 'vshow', 'v', 'vprint'])
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- property tree: Tree#
Convert the biomes metadata to a tree structure for visualization or analysis.
- Returns:
A tree representation of the biomes and their relationships.
- Return type:
Tree
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Assemblies(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Genomes(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Publications(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.Catalogues(*, params=None, config=None, **kwargs)[source]#
Bases:
MGnifyList- async acollect_details(*, fetch=True, by_id=False, concurrency=None, hide_progress=False)#
- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_detail(access_param, fetch=True)#
Async version of get_detail. Get detail proxy for a specific accession/pubmed_id/catalogue_id.
Examples
sample = await samples.aget_detail({“accession”: “MGYS00001234”})
- async aiter_details(fetch=True)#
Async version of iter_details.
- async apage(*args, **kwargs)#
- collect_details(*, fetch=True, by_id=False)#
Collect child detail proxies into a list or dict.
- Parameters:
- Returns:
A list or dict of child detail proxies.
- Return type:
Example
sample_detail = samples.collect_details(fetch=True, by_id=True)
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_detail(access_param, fetch=True)#
Get detail proxy for a specific accession/pubmed_id/catalogue_id.
- Parameters:
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
resource_name (Optional[str ]) – The name of the resource to get the next instance of. If None, will use the first or only linked resource.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
- Returns:
A proxy for the next resource.
- Return type:
Examples
sample = samples.get_detail({“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- iter_details(fetch=True)#
Lazily iterate over child detail proxies.
- Parameters:
fetch (bool ) – Whether to immediately fetch each detail after creating the proxy.
- Returns:
An iterator that yields child detail proxies.
- Return type:
Iterator of QuerySet
Example
- for sample in samples.iter_details():
sample.get()
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.StudyDetail(id=None, *, accession=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.SampleDetail(id=None, *, accession=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.RunDetail(id=None, *, accession=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.AnalysisDetail(id=None, *, accession=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.GenomeDetail(id=None, *, accession=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.AssemblyDetail(id=None, *, accession=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.BiomeDetail(id=None, *, biome_lineage=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail,BiomesTreeMixin- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- show_tree(method='compact')#
- Parameters:
method (Literal ['compact', 'show', 'print', 'horizontal', 'hshow', 'h', 'hprint', 'vertical', 'vshow', 'v', 'vprint'])
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- property tree: Tree#
Convert the biomes metadata to a tree structure for visualization or analysis.
- Returns:
A tree representation of the biomes and their relationships.
- Return type:
Tree
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.PublicationDetail(id=None, *, accession=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
- class mgnipy.V2.proxies.CatalogueDetail(id=None, *, accession=None, config=None, **kwargs)[source]#
Bases:
MGnifyDetail- async afirst()#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async aget(*args, **kwargs)#
- async aget_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool )
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = await study.aget_list(“samples”, {“accession”: “MGYS00001234”})
- async apage(*args, **kwargs)#
- describe_relationships()#
- dry_run(*, verbose=True)#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- explain(head=None)#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- filter(**filters)#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- first()#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- get(*args, **kwargs)#
- get_list(resource, access_param, fetch=True, explain=False)#
Get list proxy for a specific accession/pubmed_id/catalogue_id detail.
- Parameters:
resource (str ) – Valid child resource name e.g. in list_relationships(), such as “samples” for a study detail, or “analyses” for a run detail.
access_param (dict [str , str ]) – A dictionary containing the necessary parameter to identify the detail resource, such as {“accession”: “MGYS00001234”} or {“biome_lineage”: “root”}.
fetch (bool ) – Whether to immediately fetch the detail after creating the proxy.
explain (bool ) – Whether to print example URLs that would be called.
- Returns:
A proxy for the next resource.
- Return type:
Examples
samples = study.get_list(“samples”, {“accession”: “MGYS00001234”})
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- list_urls()#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- page(*args, **kwargs)#
- page_size(n)#
Set the page size for paginated API calls.
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- preview()#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- property resource: SupportedEndpoints#
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.
- config: MgnipyConfig#
- exec: QueryExecutor#
mgnipy.V2.query_executor module#
- class mgnipy.V2.query_executor.QueryExecutor(query_set)[source]#
Bases:
object- Parameters:
query_set (QuerySet)
- async map_with_concurrency(items, worker, *, concurrency=None, hide_progress=False)[source]#
Map a worker function over a list of items with controlled concurrency. In plain English, it is a “process these things in parallel, but not too many at once” helper.
Example
- results = await self.map_with_concurrency(
items=pages, worker=lambda p: self.apage(p, client), concurrency=8,
)
- get_any_first()[source]#
Retrieve the first page of metadata for the current resource and parameters.
For unpaginated endpoints, this will retrieve all metadata which is just one. For paginated endpoints, this will retrieve just the first page of results.
- async aget_any_first()[source]#
Asynchronously retrieve the first page of metadata for the current resource and parameters.
For unpaginated endpoints, this will retrieve all metadata which is just one. For paginated endpoints, this will retrieve just the first page of results.
mgnipy.V2.query_set module#
- class mgnipy.V2.query_set.QuerySet(resource, *, config=None, params=None, **kwargs)[source]#
Bases:
ResultsHandlerMixin,DescribeEmgapiMixinPlans, builds, validates and previews queries based on endpoint_module and params of the MGnifier owner. Stores the request urls. if mgnifier owner changes then the QuerySet should be re-instantiated to update the urls and other info.
- Parameters:
- config: MgnipyConfig#
- exec: QueryExecutor#
- property request_url: str #
Get the URL for the API request based on the current resource and parameters. This is a single URL that represents the request for the current page of results.
- Returns:
The constructed URL for the API request.
- Return type:
- property results_ids: list [str ] | None #
Get a list of accessions from the retrieved metadata results, if available.
- property resource: SupportedEndpoints#
- filter(**filters)[source]#
Update the parameters for the API call to filter results.
- Parameters:
**filters – Keyword arguments corresponding to the supported parameters for the current resource. These will be used to filter the results returned by the API.
- Returns:
A new QuerySet instance with updated parameters for filtering results.
- Return type:
- property pagination_status: bool #
Check if the current resource requires pagination based on its supported keyword arguments.
- Returns:
True if pagination, False otherwise.
- Return type:
- dry_run(*, verbose=True)[source]#
Plan the API call by validating parameters and estimating the number of pages and records available. Prints the plan details for the user to review before executing the full data retrieval. This method can be called before get() to ensure that the parameters are valid and to understand the scope of the data retrieval.
- Return type:
None
- Parameters:
verbose (bool )
- list_urls()[source]#
Generate and return a list of URLs for all the API requests that would be made to retrieve the data based on the current parameters. This allows the user to see exactly which endpoints and query parameters will be used in the API calls before executing them.
- explain(head=None)[source]#
Print example URLs that would be called. Actual requests handled by client.
- Parameters:
head (int | None)
- Return type:
None
- preview()[source]#
Preview the first page of metadata for the current resource and parameters, without retrieving all pages. This allows the user to quickly check the structure and content of the data before deciding to retrieve everything.
- Returns:
A DataFrame containing the metadata from the specified page of results.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If the API call fails or if no data is available to preview.
- first()[source]#
Retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- async afirst()[source]#
Asynchronously retrieve the first page of metadata for the current resource and parameters. Same as preview() but returns the raw dictionary instead of a DataFrame.
- Return type:
- property identifier: str | None #
Get the identifier value from the parameters based on the resource type. This is used for constructing URLs for related resources.
- Returns:
The identifier value corresponding to the resource type, or None if not available.
- Return type:
str or None
- property emgapi_resource: str | None #
Retrieves the name of the endpoint resource based on the endpoint module.
- Returns:
The name of the endpoint resource, or None if the endpoint module is not set.
- Return type:
str or None
- list_supported_params()#
Lists supported keyword arguments for the endpoint module.
- resolve_query_string(**kwargs)#
Resolves the query string for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the query string.
- Returns:
The resolved query string.
- Return type:
- sub_url(**kwargs)#
Constructs the sub-URL for the endpoint based on the current parameters.
- Returns:
The constructed sub-URL, or None if the endpoint module is not set.
- Return type:
str or None
- to_df(data=None, expand_nested_dicts=False, rename_columns=None, **kwargs)#
Convert the current or provided metadata to a pandas DataFrame.
- Parameters:
data (list of dict , optional) – List of records to convert. If None, uses self._results or self._previewed_page.
expand_nested_dicts (list of str , optional) – List of keys to expand into separate columns.
rename_columns (dict of str to str, optional) – A dictionary mapping old column names to new column names.
**kwargs – Additional keyword arguments passed to pd.DataFrame.
- Returns:
DataFrame containing the metadata.
- Return type:
pd.DataFrame | None
- Raises:
RuntimeError – If no data is available to convert.
- to_json(data=None, orient='records', lines=True, **json_kwargs)#
Convert the current metadata to a JSON string or save it to a file.
- Parameters:
- Returns:
The JSON string representation of the metadata, or None if no data is available.
- Return type:
str or None
- Raises:
RuntimeError – If no data is available to convert.
- to_list(data=None)#
Convert the current or provided metadata to a list of dictionaries.
- Parameters:
data (dict of int to list of dict , optional) – The paginated data to convert. If None, uses self.data.
- Returns:
A list of metadata records as dictionaries, or None if no data is available .
- Return type:
- Raises:
RuntimeError – If no data is available to convert.
- to_polars(data=None, **polars_kwargs)#
Convert the current metadata to a Polars DataFrame.
- Parameters:
- Returns:
A Polars DataFrame containing the metadata.
- Return type:
pl.DataFrame
- Raises:
RuntimeError – If no data is available to convert.
- url_path(**kwargs)#
Constructs the full URL path for the endpoint based on the current parameters.
- Parameters:
**kwargs – Keyword arguments to validate and include in the URL construction.
- Returns:
The constructed URL path.
- Return type:
- validate_endpoint_kwargs(**kwargs)#
Validates the provided keyword arguments against the supported parameters of the endpoint module.
- Parameters:
**kwargs – Keyword arguments to validate.
- Returns:
The validated keyword arguments.
- Return type:
dict of str to Any
- Raises:
ValueError – If any provided keyword argument is not supported by the endpoint module.