xbatcher.BatchGenerator#

class xbatcher.BatchGenerator(ds: Union[Dataset, DataArray], input_dims: Dict[Hashable, int], input_overlap: Dict[Hashable, int] = {}, batch_dims: Dict[Hashable, int] = {}, concat_input_dims: bool = False, preload_batch: bool = True, cache: Optional[Dict[str, Any]] = None, cache_preprocess: Optional[Callable] = None)[source]#

Create generator for iterating through Xarray DataArrays / Datasets in batches.

Parameters:
dsxarray.Dataset or xarray.DataArray

The data to iterate over

input_dimsdict

A dictionary specifying the size of the inputs in each dimension, e.g. {'lat': 30, 'lon': 30} These are the dimensions the ML library will see. All other dimensions will be stacked into one dimension called sample.

input_overlapdict, optional

A dictionary specifying the overlap along each dimension e.g. {'lat': 3, 'lon': 3}

batch_dimsdict, optional

A dictionary specifying the size of the batch along each dimension e.g. {'time': 10}. These will always be iterated over.

concat_input_dimsbool, optional

If True, the dimension chunks specified in input_dims will be concatenated and stacked into the sample dimension. The batch index will be included as a new level input_batch in the sample coordinate. If False, the dimension chunks specified in input_dims will be iterated over.

preload_batchbool, optional

If True, each batch will be loaded into memory before reshaping / processing, triggering any dask arrays to be computed.

cachedict, optional

Dict-like object to cache batches in (e.g., Zarr DirectoryStore). Note: The caching API is experimental and subject to change.

cache_preprocess: callable, optional

A function to apply to batches prior to caching. Note: The caching API is experimental and subject to change.

Yields:
ds_slicexarray.Dataset or xarray.DataArray

Slices of the array matching the given batch size specification.

__init__(ds: Union[Dataset, DataArray], input_dims: Dict[Hashable, int], input_overlap: Dict[Hashable, int] = {}, batch_dims: Dict[Hashable, int] = {}, concat_input_dims: bool = False, preload_batch: bool = True, cache: Optional[Dict[str, Any]] = None, cache_preprocess: Optional[Callable] = None)[source]#

Methods

__init__(ds, input_dims[, input_overlap, ...])

Attributes

batch_dims

concat_input_dims

input_dims

input_overlap

preload_batch