Understanding the cache#
This page provides a quick guide to the cache handled by DiskCheckpointer of the mgnipy.V2.mixins.
# uncomment below if colab
#!pip install mgnipy
Introduction#
Every proxy / resource-specific MGnifier query (i.e., of specific params) has its own deterministic cache key / subdirectory in a containing config.cache_dir.
The cache subdirs are derived from the resource name plus the query parameters via hashlib.sha256
In that subdir there will be a manifest file and a json file per request num.
An example of what it looks like:#
the_main_cache_dir/
βββ 3fddd8853bdd0204eeaeda6c5b9b42b48c8a25ca4f034132d94eb1f93e01ac48/
β βββ mgnipy_manifest.json
β βββ mgnipy_page_1.json
β βββ mgnipy_page_2.json
β βββ ...
βββ hash_for_some_other_query/
β βββ mgnipy_manifest.json
β βββ mgnipy_page_1.json
β βββ ...
βββ ...
How writing to cache is handled#
Response items for a given page/request num are stored on disk in correspondible
mgnipy_page_<n>.jsonThe responses are cached after every made request
If the response already exists in the cache (i.e., page_1.json exists) then the response is derived from the cache rather than making another request.
A
mgnipy_manifest.jsonfile stores the query details such as the given params and resource being searched. (more info on the manifest below)
How loading from cache is handled#
At every proxy / resource-specific MGnifier instantiation any existing records in cache are attempted to be loaded into the instance, such as to
MGnifier().resultsIf you donβt want these to be loaded from the cache, then clear it before instantiating the query
Below are guides for:#
Configuring the cache#
The outtermost containing directory can be configured with the mgnipy.MGnipyConfig or can be pased as a dict argument to config
Only mgnipy.MGnipy also will accept config_kwargs.
More information can be found on the config setup page
πΎ Where to save#
The default cache_dir is based on platformdirs but you can choose another path
# if using MGnipyConfig directly
from mgnipy import MGnipyConfig
config = MGnipyConfig(cache_dir="temp_example")
# which can then be passed to MGnipy or proxies/mgnifier
config
MGnipyConfig(api_version=<SupportedApiVersions.V2: 'v2'>, base_url=HttpUrl('https://www.ebi.ac.uk/'), cache_dir=PosixPath('temp_example'))
π« Disabling the cache#
You can do this easily by configuring cache_dir as None
from mgnipy import MGnipy
from mgnipy.V2.proxies import Samples
MG = MGnipy(cache_dir=None)
# or
config = MGnipyConfig(cache_dir=None)
MG = MGnipy(config=config)
# or at proxy level, only config as dict, not as kwargs
samples = Samples(config=config)
Locating the cache#
For your given mgnipy / mgnifier instance this cache directory path can be found using .cache_dir
ποΈ Setting and finding the main cache directory#
The cache directory is configured via mgnipy.MGnipyConfig of by passing
More information can be found on the config setup page
you can find the cache_dir already from the MGnipy delegator / client:
from mgnipy import MGnipy
MG = MGnipy(
# config=config,
# or
cache_dir="temp_example"
)
MG.cache_dir
PosixPath('temp_example')
and also from the resource-specific MGnifiers aka proxies if wanted:
# init samples proxy
samples = MG.samples
print("general cache dir:", samples.config.cache_dir)
general cache dir: temp_example
π Finding the sub-cache corresponding to a query#
The cache subdirs or cache keys within config.cache_dir are derived from the resource name plus the query parameters via hashlib.sha256
Here is how to find the full path
# option 1: .cache_dir
print(
"Planned cache directory based on params and resource:\n", samples.cache_dir
)
Planned cache directory based on params and resource:
temp_example/3fddd8853bdd0204eeaeda6c5b9b42b48c8a25ca4f034132d94eb1f93e01ac48
The cache directory path is also included in the string representation of the proxy instance :)
# option 2: __str__
print(samples)
MGnifier instance for resource: samples
I.e., mgnipy.V2.proxies.samples.Samples
----------------------------------------
Base URL: https://www.ebi.ac.uk/
Parameters: {}
Example request URL: https://www.ebi.ac.uk/metagenomics/api/v2/samples?page=1
Endpoint module: mgnipy.emgapi_v2_client.api.samples.list_mgnify_samples
Is list endpoint (returns paginated results): True
Cache directory: temp_example/3fddd8853bdd0204eeaeda6c5b9b42b48c8a25ca4f034132d94eb1f93e01ac48
Inspecting Cache#
π The Manifest#
the MGnify API
resourcethe requests were being made to e.g., biomes, biome, studies, analysesThe query
paramse.g. accession, page_size, searchcountof the total items/records for the entire queryfor list endpoints this would be the total number of listed items across all paginated responses
for detail endpoints the count would be 1 (as a detail corresponds to a single accession/id) or 0 if no match
while
total_pagescorresponds to the total number of request urls to obtain allcountnum of itemsfor list endpoints the total_pages or number of requests is dependent on
count/ page_size param (i.e., max number of items to return for a request, default ispage_size=25items)for detail endpoints the total_pages would simply be 1 or 0 again.
ποΈ The pages#
Each page_#.json corresponds to a given request_num
if list endpoint then the items are listed in the json
if detail endpoint then just one item in page_1.json
Clearing the cache#
You can clear specific cache keys / subdirectories
but also β all subdirectories in cache via mgnipy.MGnipy.clear_subcaches() β
π₯ clearing a single cache key#
the cache relevant to a given query instance can be cleared via its .clear_cache() method
samples.clear_cache()
π₯ clear ALL cached queries#
I mean proceed at your own risk and be careful what path you pass to cache_dir:
this will delete all βmgnipy_manifest.jsonβ and βmgnipy_page_*.jsonβ files and
remove resulting empty cache key subdirs
in whatever path that is passed via cache_dir..
# check path again
MG.cache_dir
PosixPath('temp_example')
looks right, letβs clear
MG.clear_subcaches()
Thatβs it really.