DiskCache#
- class sdgx.cachers.disk_cache.DiskCache(cache_dir: str | Path | None = None, identity: str | None = None, *args, **kwargs)[source]#
Bases:
CacherCacher that cache data in disk with parquet format
- Parameters:
blocksize (int) – The blocksize of the cache.
cache_dir (str | Path | None, optional) – The directory where the cache will be stored. Defaults to None.
identity (str | None, optional) – The identity of the data source. Defaults to None.
Todo
Add partial cache when blocksize > chunksize
Improve cache invalidation
Improve performance if blocksize > chunksize
- _refresh(offset: int, data: DataFrame) None[source]#
Refresh cache, will write data to cache file in parquet format.
- is_cached(offset: int) bool[source]#
Check if the data is cached by checking if the cache file exists
- iter(chunksize: int, data_connector: DataConnector) Generator[DataFrame, None, None][source]#
Load data from data_connector or cache in chunk
- load(offset: int, chunksize: int, data_connector: DataConnector) DataFrame[source]#
Load data from data_connector or cache
- load_all(data_connector: DataConnector) DataFrame#
Load all data from data_connector or cache