Metadata-Version: 2.4
Name: kerchunk
Version: 0.2.10
Summary: Functions to make reference descriptions for ReferenceFileSystem
Author-email: Martin Durant <martin.durant@alumni.utoronto.ca>
License: MIT
Project-URL: Documentation, https://fsspec.github.io/kerchunk
Project-URL: issue-tracker, https://github.com/fsspec/kerchunk/issues
Project-URL: source-code, https://github.com/fsspec/kerchunk
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering
Classifier: Development Status :: 3 - Alpha
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fsspec>=2025.2.0
Requires-Dist: numcodecs
Requires-Dist: numpy
Requires-Dist: ujson
Requires-Dist: zarr>=3.0.1
Provides-Extra: cftime
Requires-Dist: cftime; extra == "cftime"
Provides-Extra: fits
Requires-Dist: xarray; extra == "fits"
Provides-Extra: hdf
Requires-Dist: h5py; extra == "hdf"
Requires-Dist: xarray; extra == "hdf"
Provides-Extra: grib2
Requires-Dist: cfgrib; extra == "grib2"
Provides-Extra: netcdf3
Requires-Dist: scipy; extra == "netcdf3"
Provides-Extra: dev
Requires-Dist: cfgrib; extra == "dev"
Requires-Dist: cftime; extra == "dev"
Requires-Dist: dask; extra == "dev"
Requires-Dist: fastparquet>=2024.11.0; extra == "dev"
Requires-Dist: gcsfs; extra == "dev"
Requires-Dist: h5netcdf; extra == "dev"
Requires-Dist: h5py; extra == "dev"
Requires-Dist: jinja2; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: netcdf4; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-subtests; extra == "dev"
Requires-Dist: s3fs; extra == "dev"
Requires-Dist: scipy; extra == "dev"
Requires-Dist: types-ujson; extra == "dev"
Requires-Dist: xarray>=2024.10.0; extra == "dev"
Dynamic: license-file

# kerchunk

Cloud-friendly access to archival data

[![Docs](https://github.com/fsspec/kerchunk/actions/workflows/default.yml/badge.svg)](https://fsspec.github.io/kerchunk/)
[![Tests](https://github.com/fsspec/kerchunk/actions/workflows/tests.yml/badge.svg)](https://github.com/fsspec/kerchunk/actions/workflows/tests.yml)
[![Pypi](https://img.shields.io/pypi/v/kerchunk.svg)](https://pypi.python.org/pypi/kerchunk/)
[![Conda-forge](https://img.shields.io/conda/vn/conda-forge/kerchunk.svg)](https://anaconda.org/conda-forge/kerchunk)

Kerchunk is a library that provides a unified way to represent a variety of chunked, compressed
data formats (e.g. NetCDF, HDF5, GRIB),
allowing efficient access to the data from traditional file systems or cloud object storage.
It also provides a flexible way to create
virtual datasets from multiple files.  It does this by extracting the byte ranges,
compression information and other information about the
data and storing this metadata in a new, separate object.  This means that you can
create a virtual aggregate dataset over potentially many source
files, for efficient, parallel and cloud-friendly *in-situ* access without having to copy or
translate the originals. It is a gateway to in-the-cloud massive data processing while
the data providers still insist on using legacy formats for archival storage.

*Why Kerchunk*:

We provide the following things:

- completely serverless architecture
- metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read
- read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http,
  cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...)
- loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a
  single dataset, without a need to go via the specific driver (e.g., no need for h5py)
- asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency
- parallel access with a library like zarr without any locks
- logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate
  indexing across an arbitrary number of dimensions


<img alt="logo" src="./kerchunk.png" width="200"/>


For further information, please see [the documentation](https://fsspec.github.io/kerchunk/).
