xbatcher.BatchSchema#

class xbatcher.BatchSchema(ds: Union[Dataset, DataArray], input_dims: Dict[Hashable, int], input_overlap: Optional[Dict[Hashable, int]] = None, batch_dims: Optional[Dict[Hashable, int]] = None, concat_input_bins: bool = True, preload_batch: bool = True)[source]#

A representation of the indices and stacking/transposing parameters needed to generator batches from Xarray DataArrays and Datasets using xbatcher.BatchGenerator.

Parameters
dsxarray.Dataset or xarray.DataArray

The data to iterate over. Unlike for the BatchGenerator, the data is not retained as a class attribute for the BatchSchema.

input_dimsdict

A dictionary specifying the size of the inputs in each dimension, e.g. {'lat': 30, 'lon': 30} These are the dimensions the ML library will see. All other dimensions will be stacked into one dimension called sample.

input_overlapdict, optional

A dictionary specifying the overlap along each dimension e.g. {'lat': 3, 'lon': 3}

batch_dimsdict, optional

A dictionary specifying the size of the batch along each dimension e.g. {'time': 10}. These will always be iterated over.

concat_input_dimsbool, optional

If True, the dimension chunks specified in input_dims will be concatenated and stacked into the sample dimension. The batch index will be included as a new level input_batch in the sample coordinate. If False, the dimension chunks specified in input_dims will be iterated over.

preload_batchbool, optional

If True, each batch will be loaded into memory before reshaping / processing, triggering any dask arrays to be computed.

Notes

The BatchSchema is experimental and subject to change without notice.

__init__(ds: Union[Dataset, DataArray], input_dims: Dict[Hashable, int], input_overlap: Optional[Dict[Hashable, int]] = None, batch_dims: Optional[Dict[Hashable, int]] = None, concat_input_bins: bool = True, preload_batch: bool = True)[source]#

Methods

__init__(ds, input_dims[, input_overlap, ...])