Datasets

`terratorch.datasets.biomassters` #

`BioMasstersNonGeo` #

Bases: BioMassters

BioMassters Dataset for Aboveground Biomass prediction.

Dataset intended for Aboveground Biomass (AGB) prediction over Finnish forests based on Sentinel 1 and 2 data with corresponding target AGB mask values generated by Light Detection and Ranging (LiDAR).

Dataset Format:

.tif files for Sentinel 1 and 2 data
.tif file for pixel wise AGB target mask
.csv files for metadata regarding features and targets

Dataset Features:

13,000 target AGB masks of size (256x256px)
12 months of data per target mask
Sentinel 1 and Sentinel 2 data for each location
Sentinel 1 available for every month
Sentinel 2 available for almost every month (not available for every month due to ESA acquisition halt over the region during particular periods)

If you use this dataset in your research, please cite the following paper:

https://nascetti-a.github.io/BioMasster/

.. versionadded:: 0.5

`init(root='data', split='train', bands=BAND_SETS['all'], transform=None, mask_mean=63.4584, mask_std=72.21242, sensors=['S1', 'S2'], as_time_series=False, metadata_filename=default_metadata_filename, max_cloud_percentage=None, max_red_mean=None, include_corrupt=True, subset=1, seed=42, use_four_frames=False)` #

Initialize a new instance of BioMassters dataset.

If as_time_series=False (the default), each time step becomes its own sample with the target being shared across multiple samples.

Parameters:

Name	Type	Description	Default
`root`		root directory where dataset can be found	`'data'`
`split`	`str`	train or test split	`'train'`
`sensors`	`Sequence[str]`	which sensors to consider for the sample, Sentinel 1 and/or Sentinel 2 ('S1', 'S2')	`['S1', 'S2']`
`as_time_series`	`bool`	whether or not to return all available time-steps or just a single one for a given target location	`False`
`metadata_filename`	`str`	metadata file to be used	`default_metadata_filename`
`max_cloud_percentage`	`float \| None`	maximum allowed cloud percentage for images	`None`
`max_red_mean`	`float \| None`	maximum allowed red_mean value for images	`None`
`include_corrupt`	`bool`	whether to include images marked as corrupted	`True`

Raises:

Type	Description
`AssertionError`	if `split` or `sensors` is invalid
`DatasetNotFoundError`	If dataset is not found.

`plot(sample, show_titles=True, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	a sample returned by :meth:`__getitem__`	required
`show_titles`	`bool`	flag indicating whether to show titles above each panel	`True`
`suptitle`	`str \| None`	optional suptitle to use for figure	`None`

Returns:

Type	Description
`Figure`	a matplotlib Figure with the rendered sample

`terratorch.datasets.burn_intensity` #

`BurnIntensityNonGeo` #

Bases: NonGeoDataset

Dataset implementation for Burn Intensity classification.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, use_full_data=True, no_data_replace=0.0001, no_label_replace=-1, use_metadata=False)` #

Initialize the BurnIntensity dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train' or 'val'.	`'train'`
`bands`	`Sequence[str]`	Bands to output. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Optional[Compose]`	Albumentations transform to be applied.	`None`
`use_metadata`	`bool`	Whether to return metadata info (location).	`False`
`use_full_data`	`bool`	Wheter to use full data or data with less than 25 percent zeros.	`True`
`no_data_replace`	`Optional[float]`	Value to replace NaNs in images.	`0.0001`
`no_label_replace`	`Optional[int]`	Value to replace NaNs in labels.	`-1`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by `__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Any`	A matplotlib Figure with the rendered sample.

`terratorch.datasets.carbonflux` #

`CarbonFluxNonGeo` #

Bases: NonGeoDataset

Dataset for Carbon Flux regression from HLS images and MERRA data.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, gpp_mean=None, gpp_std=None, no_data_replace=0.0001, use_metadata=False, modalities=('image', 'merra_vars'))` #

Initialize the CarbonFluxNonGeo dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	'train' or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to use. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Optional[Compose]`	Albumentations transform to be applied.	`None`
`use_metadata`	`bool`	Whether to return metadata (coordinates and date).	`False`
`merra_means`	`Sequence[float]`	Means for MERRA data normalization.	required
`merra_stds`	`Sequence[float]`	Standard deviations for MERRA data normalization.	required
`gpp_mean`	`float`	Mean for GPP normalization.	`None`
`gpp_std`	`float`	Standard deviation for GPP normalization.	`None`
`no_data_replace`	`Optional[float]`	Value to replace NO_DATA values in images.	`0.0001`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Any]`	A sample returned by `__getitem__`.	required
`suptitle`	`str \| None`	Optional title for the figure.	`None`

Returns:

Type	Description
`Any`	A matplotlib figure with the rendered sample.

`terratorch.datasets.forestnet` #

`ForestNetNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for ForestNet.

`init(data_root, split='train', label_map=default_label_map, transform=None, fraction=1.0, bands=BAND_SETS['all'], use_metadata=False)` #

Initialize the ForestNetNonGeo dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`label_map`	`Dict[str, int]`	Mapping from label names to integer labels.	`default_label_map`
`transform`	`Compose \| None`	Transformations to be applied to the images.	`None`
`fraction`	`float`	Fraction of the dataset to use. Defaults to 1.0 (use all data).	`1.0`

`map_label(index)` #

Map the label name to an integer label.

`terratorch.datasets.fire_scars` #

`FireScarsHLS` #

Bases: RasterDataset

RasterDataset implementation for fire scars input images.

`FireScarsNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for fire scars.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, no_data_replace=0, no_label_replace=-1, use_metadata=False)` #

Constructor

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`bands`	`list[str]`	Bands that should be output by the dataset. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Should end with ToTensorV2(). If used through the corresponding data module, should not include normalization. Defaults to None, which applies ToTensorV2().	`None`
`no_data_replace`	`float \| None`	Replace nan values in input images with this value. If None, does no replacement. Defaults to 0.	`0`
`no_label_replace`	`int \| None`	Replace nan values in label with this value. If none, does no replacement. Defaults to -1.	`-1`
`use_metadata`	`bool`	whether to return metadata info (time and location).	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	a sample returned by :meth:`__getitem__`	required
`suptitle`	`str \| None`	optional string to use as a suptitle	`None`

Returns:

Type	Description
`Figure`	a matplotlib Figure with the rendered sample

`FireScarsSegmentationMask` #

Bases: RasterDataset

RasterDataset implementation for fire scars segmentation mask. Can be easily merged with input images using the & operator.

`terratorch.datasets.landslide4sense` #

`Landslide4SenseNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for Landslide4Sense.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None)` #

Initialize the Landslide4Sense dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'validation', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`

`terratorch.datasets.m_eurosat` #

`MEuroSATNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-EuroSAT.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_bigearthnet` #

`MBigEarthNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-BigEarthNet.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_brick_kiln` #

`MBrickKilnNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-BrickKiln.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_forestnet` #

`MForestNetNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-ForestNet.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`
`use_metadata`	`bool`	Whether to return metadata info (time and location).	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_so2sat` #

`MSo2SatNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-So2Sat.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_pv4ger` #

`MPv4gerNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-PV4GER.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`
`use_metadata`	`bool`	Whether to return metadata info (location coordinates).	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_cashew_plantation` #

`MBeninSmallHolderCashewsNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-BeninSmallHolderCashews.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`
`use_metadata`	`bool`	Whether to return metadata info (time).	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_nz_cattle` #

`MNzCattleNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-NZ-Cattle.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`
`use_metadata`	`bool`	Whether to return metadata info (time and location).	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_chesapeake_landcover` #

`MChesapeakeLandcoverNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-ChesapeakeLandcover.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_pv4ger_seg` #

`MPv4gerSegNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-PV4GER-SEG.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`
`use_metadata`	`bool`	Whether to return metadata info (location coordinates).	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_SA_crop_type` #

`MSACropTypeNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-SA-Crop-Type.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.m_neontree` #

`MNeonTreeNonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for M-NeonTree.

`init(data_root, split='train', bands=rgb_bands, transform=None, partition='default')` #

Initialize the dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	One of 'train', 'val', or 'test'.	`'train'`
`bands`	`Sequence[str]`	Bands to be used. Defaults to RGB bands.	`rgb_bands`
`transform`	`Compose \| None`	Albumentations transform to be applied. Defaults to None, which applies default_transform().	`None`
`partition`	`str`	Partition name for the dataset splits. Defaults to 'default'.	`'default'`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	A sample returned by :meth:`__getitem__`.	required
`suptitle`	`str \| None`	Optional string to use as a suptitle.	`None`

Returns:

Type	Description
`Figure`	matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

`terratorch.datasets.multi_temporal_crop_classification` #

`MultiTemporalCropClassification` #

Bases: NonGeoDataset

NonGeo dataset implementation for multi-temporal crop classification.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=True, reduce_zero_label=True, use_metadata=False, metadata_file_name='chips_df.csv')` #

Constructor

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	one of 'train' or 'val'.	`'train'`
`bands`	`list[str]`	Bands that should be output by the dataset. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Should end with ToTensorV2(). If used through the corresponding data module, should not include normalization. Defaults to None, which applies ToTensorV2().	`None`
`no_data_replace`	`float \| None`	Replace nan values in input images with this value. If None, does no replacement. Defaults to None.	`None`
`no_label_replace`	`int \| None`	Replace nan values in label with this value. If none, does no replacement. Defaults to None.	`None`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to True.	`True`
`reduce_zero_label`	`bool`	Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to True.	`True`
`use_metadata`	`bool`	whether to return metadata info (time and location).	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	a sample returned by :meth:`__getitem__`	required
`suptitle`	`str \| None`	optional string to use as a suptitle	`None`

Returns:

Type	Description
`Figure`	a matplotlib Figure with the rendered sample

`terratorch.datasets.open_sentinel_map` #

`OpenSentinelMap` #

Bases: NonGeoDataset

Pytorch Dataset class to load samples from the OpenSentinelMap dataset, supporting multiple bands and temporal sampling strategies.

`init(data_root, split='train', bands=None, transform=None, spatial_interpolate_and_stack_temporally=True, pad_image=None, truncate_image=None, target=0, pick_random_pair=True)` #

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the root directory of the dataset.	required
`split`	`str`	Dataset split to load. Options are 'train', 'val', or 'test'. Defaults to 'train'.	`'train'`
`bands`	`list of str`	List of band names to load. Defaults to ['gsd_10', 'gsd_20', 'gsd_60'].	`None`
`transform`	`Compose`	Albumentations transformations to apply to the data.	`None`
`spatial_interpolate_and_stack_temporally`	`bool`	If True, the bands are interpolated and concatenated over time. Default is True.	`True`
`pad_image`	`int`	Number of timesteps to pad the time dimension of the image. If None, no padding is applied.	`None`
`truncate_image`	`int`	Number of timesteps to truncate the time dimension of the image. If None, no truncation is performed.	`None`
`target`	`int`	Specifies which target class to use from the mask. Default is 0.	`0`
`pick_random_pair`	`bool`	If True, selects two random images from the temporal sequence. Default is True.	`True`

`terratorch.datasets.openearthmap` #

`OpenEarthMapNonGeo` #

Bases: NonGeoDataset

OpenEarthMapNonGeo Dataset for non-georeferenced imagery.

This dataset class handles non-georeferenced image data from the OpenEarthMap dataset. It supports configurable band sets and transformations, and performs cropping operations to ensure that the images conform to the required input dimensions. The dataset is split into "train", "test", and "val" subsets based on the provided split parameter.

`init(data_root, bands=BAND_SETS['all'], transform=None, split='train', crop_size=256, random_crop=True)` #

Initialize a new instance of the OpenEarthMapNonGeo dataset.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	The root directory containing the dataset files.	required
`bands`	`Sequence[str]`	A list of band names to be used. Default is BAND_SETS["all"].	`BAND_SETS['all']`
`transform`	`Compose or None`	A transformation pipeline to be applied to the data. If None, a default transform converting the data to a tensor is applied.	`None`
`split`	`str`	The dataset split to use ("train", "test", or "val"). Default is "train".	`'train'`
`crop_size`	`int`	The size (in pixels) of the crop to apply to images. Must be greater than 0. Default is 256.	`256`
`random_crop`	`bool`	If True, performs a random crop; otherwise, performs a center crop. Default is True.	`True`

Raises:

Type	Description
`Exception`	If the provided split is not one of "train", "test", or "val".
`AssertionError`	If crop_size is not greater than 0.

`terratorch.datasets.pastis` #

`PASTIS` #

Bases: NonGeoDataset

" Pytorch Dataset class to load samples from the PASTIS dataset, for semantic and panoptic segmentation.

`init(data_root, norm=True, target='semantic', folds=None, reference_date='2018-09-01', date_interval=(-200, 600), class_mapping=None, transform=None, truncate_image=None, pad_image=None, satellites=['S2'])` #

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the dataset.	required
`norm`	`bool`	If true, images are standardised using pre-computed channel-wise means and standard deviations.	`True`
`reference_date`	`(str, Format)`	'YYYY-MM-DD'): Defines the reference date based on which all observation dates are expressed. Along with the image time series and the target tensor, this dataloader yields the sequence of observation dates (in terms of number of days since the reference date). This sequence of dates is used for instance for the positional encoding in attention based approaches.	`'2018-09-01'`
`target`	`str`	'semantic' or 'instance'. Defines which type of target is returned by the dataloader. * If 'semantic' the target tensor is a tensor containing the class of each pixel. * If 'instance' the target tensor is the concatenation of several signals, necessary to train the Parcel-as-Points module: - the centerness heatmap, - the instance ids, - the voronoi partitioning of the patch with regards to the parcels' centers, - the (height, width) size of each parcel, - the semantic label of each parcel, - the semantic label of each pixel.	`'semantic'`
`folds`	`list`	List of ints specifying which of the 5 official folds to load. By default (when None is specified), all folds are loaded.	`None`
`class_mapping`	`dict`	A dictionary to define a mapping between the default 18 class nomenclature and another class grouping. If not provided, the default class mapping is used.	`None`
`transform`	`callable`	A transform to apply to the loaded data (images, dates, and masks). By default, no transformation is applied.	`None`
`truncate_image`	`int`	Truncate the time dimension of the image to a specified number of timesteps. If None, no truncation is performed.	`None`
`pad_image`	`int`	Pad the time dimension of the image to a specified number of timesteps. If None, no padding is applied.	`None`
`satellites`	`list`	Defines the satellites to use. If you are using PASTIS-R, you have access to Sentinel-2 imagery and Sentinel-1 observations in Ascending and Descending orbits, respectively S2, S1A, and S1D. For example, use satellites=['S2', 'S1A'] for Sentinel-2 + Sentinel-1 ascending time series, or satellites=['S2', 'S1A', 'S1D'] to retrieve all time series. If you are using PASTIS, only S2 observations are available.	`['S2']`

`terratorch.datasets.sen1floods11` #

`Sen1Floods11NonGeo` #

Bases: NonGeoDataset

NonGeo dataset implementation for sen1floods11.

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, constant_scale=0.0001, no_data_replace=0, no_label_replace=-1, use_metadata=False)` #

Constructor

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Path to the data root directory.	required
`split`	`str`	one of 'train', 'val' or 'test'.	`'train'`
`bands`	`list[str]`	Bands that should be output by the dataset. Defaults to all bands.	`BAND_SETS['all']`
`transform`	`Compose \| None`	Albumentations transform to be applied. Should end with ToTensorV2(). Defaults to None, which applies ToTensorV2().	`None`
`constant_scale`	`float`	Factor to multiply image values by. Defaults to 0.0001.	`0.0001`
`no_data_replace`	`float \| None`	Replace nan values in input images with this value. If None, does no replacement. Defaults to 0.	`0`
`no_label_replace`	`int \| None`	Replace nan values in label with this value. If none, does no replacement. Defaults to -1.	`-1`
`use_metadata`	`bool`	whether to return metadata info (time and location).	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	a sample returned by :meth:`__getitem__`	required
`suptitle`	`str \| None`	optional string to use as a suptitle	`None`

Returns:

Type	Description
`Figure`	a matplotlib Figure with the rendered sample

`terratorch.datasets.sen4agrinet` #

`Sen4AgriNet` #

Bases: NonGeoDataset

`init(data_root, bands=None, scenario='random', split='train', transform=None, truncate_image=4, pad_image=4, spatial_interpolate_and_stack_temporally=True, seed=42)` #

Pytorch Dataset class to load samples from the Sen4AgriNet dataset, supporting multiple scenarios for splitting the data.

Parameters:

Name	Type	Description	Default
`data_root`	`str`	Root directory of the dataset.	required
`bands`	`list of str`	List of band names to load. Defaults to all available bands.	`None`
`scenario`	`str`	Defines the splitting scenario to use. Options are: - 'random': Random split of the data. - 'spatial': Split by geographical regions (Catalonia and France). - 'spatio-temporal': Split by region and year (France 2019 and Catalonia 2020).	`'random'`
`split`	`str`	Specifies the dataset split. Options are 'train', 'val', or 'test'.	`'train'`
`transform`	`Compose`	Albumentations transformations to apply to the data.	`None`
`truncate_image`	`int`	Number of timesteps to truncate the time dimension of the image. If None, no truncation is applied. Default is 4.	`4`
`pad_image`	`int`	Number of timesteps to pad the time dimension of the image. If None, no padding is applied. Default is 4.	`4`
`spatial_interpolate_and_stack_temporally`	`bool`	Whether to interpolate bands and concatenate them over time	`True`
`seed`	`int`	Random seed used for data splitting.	`42`

`terratorch.datasets.sen4map` #

`Sen4MapDatasetMonthlyComposites` #

Bases: Dataset

Sen4Map Dataset for Monthly Composites.

Dataset intended for land-cover and crop classification tasks based on monthly composites derived from multi-temporal satellite data stored in HDF5 files.

Dataset Format:

HDF5 files containing multi-temporal acquisitions with spectral bands (e.g., B2, B3, …, B12)
Composite images computed as the median across available acquisitions for each month.
Classification labels provided via HDF5 attributes (e.g., 'lc1') with mappings defined for:
- Land-cover: using land_cover_classification_map
- Crops: using crop_classification_map

Dataset Features:

Supports two classification tasks: "land-cover" (default) and "crops".
Pre-processing options include center cropping, reverse tiling, and resizing.
Option to save the keys HDF5 for later filtering.
Input channel selection via a mapping between available bands and input bands.

`init(h5py_file_object, h5data_keys=None, crop_size=None, dataset_bands=None, input_bands=None, resize=False, resize_to=[224, 224], resize_interpolation=InterpolationMode.BILINEAR, resize_antialiasing=True, reverse_tile=False, reverse_tile_size=3, save_keys_path=None, classification_map='land-cover')` #

Initialize a new instance of Sen4MapDatasetMonthlyComposites.

This dataset loads data from an HDF5 file object containing multi-temporal satellite data and computes monthly composite images by aggregating acquisitions (via median).

Parameters:

Name	Type	Description	Default
`h5py_file_object`	`File`	An open h5py.File object containing the dataset.	required
`h5data_keys`		Optional list of keys to select a subset of data samples from the HDF5 file. If None, all keys are used.	`None`
`crop_size`	`None \| int`	Optional integer specifying the square crop size for the output image.	`None`
`dataset_bands`	`list[HLSBands \| int] \| None`	Optional list of bands available in the dataset.	`None`
`input_bands`	`list[HLSBands \| int] \| None`	Optional list of bands to be used as input channels. Must be provided along with `dataset_bands`.	`None`
`resize`		Boolean flag indicating whether the image should be resized. Default is False.	`False`
`resize_to`		Target dimensions [height, width] for resizing. Default is [224, 224].	`[224, 224]`
`resize_interpolation`		Interpolation mode used for resizing. Default is InterpolationMode.BILINEAR.	`BILINEAR`
`resize_antialiasing`		Boolean flag to apply antialiasing during resizing. Default is True.	`True`
`reverse_tile`		Boolean flag indicating whether to apply reverse tiling to the image. Default is False.	`False`
`reverse_tile_size`		Kernel size for the reverse tiling operation. Must be an odd number >= 3. Default is 3.	`3`
`save_keys_path`		Optional file path to save the list of dataset keys.	`None`
`classification_map`		String specifying the classification mapping to use ("land-cover" or "crops"). Default is "land-cover".	`'land-cover'`

Raises:

Type	Description
`ValueError`	If `input_bands` is provided without specifying `dataset_bands`.
`ValueError`	If an invalid `classification_map` is provided.

`reverse_tiling_pytorch(img_tensor, kernel_size=3)` #

Upscales an image where every pixel is expanded into kernel_size*kernel_size pixels. Used to test whether the benefit of resizing images to the pre-trained size comes from the bilnearly interpolated pixels, or if the same would be realized with no interpolated pixels.

Last update: March 23, 2025

Datasets

terratorch.datasets.biomassters #

BioMasstersNonGeo #

plot(sample, show_titles=True, suptitle=None) #

terratorch.datasets.burn_intensity #

BurnIntensityNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, use_full_data=True, no_data_replace=0.0001, no_label_replace=-1, use_metadata=False) #

plot(sample, suptitle=None) #

terratorch.datasets.carbonflux #

CarbonFluxNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, gpp_mean=None, gpp_std=None, no_data_replace=0.0001, use_metadata=False, modalities=('image', 'merra_vars')) #

plot(sample, suptitle=None) #

terratorch.datasets.forestnet #

ForestNetNonGeo #

__init__(data_root, split='train', label_map=default_label_map, transform=None, fraction=1.0, bands=BAND_SETS['all'], use_metadata=False) #

map_label(index) #

terratorch.datasets.fire_scars #

FireScarsHLS #

FireScarsNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, no_data_replace=0, no_label_replace=-1, use_metadata=False) #

plot(sample, suptitle=None) #

FireScarsSegmentationMask #

terratorch.datasets.landslide4sense #

Landslide4SenseNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None) #

terratorch.datasets.m_eurosat #

MEuroSATNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

plot(sample, suptitle=None) #

terratorch.datasets.m_bigearthnet #

MBigEarthNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

plot(sample, suptitle=None) #

terratorch.datasets.m_brick_kiln #

MBrickKilnNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

plot(sample, suptitle=None) #

terratorch.datasets.m_forestnet #

MForestNetNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

plot(sample, suptitle=None) #

terratorch.datasets.m_so2sat #

MSo2SatNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

plot(sample, suptitle=None) #

terratorch.datasets.m_pv4ger #

MPv4gerNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

plot(sample, suptitle=None) #

terratorch.datasets.m_cashew_plantation #

MBeninSmallHolderCashewsNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

plot(sample, suptitle=None) #

terratorch.datasets.m_nz_cattle #

MNzCattleNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

plot(sample, suptitle=None) #

terratorch.datasets.m_chesapeake_landcover #

MChesapeakeLandcoverNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

plot(sample, suptitle=None) #

terratorch.datasets.m_pv4ger_seg #

MPv4gerSegNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

plot(sample, suptitle=None) #

terratorch.datasets.m_SA_crop_type #

MSACropTypeNonGeo #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

plot(sample, suptitle=None) #

terratorch.datasets.m_neontree #

MNeonTreeNonGeo #

__init__(data_root, split='train', bands=rgb_bands, transform=None, partition='default') #

plot(sample, suptitle=None) #

terratorch.datasets.multi_temporal_crop_classification #

MultiTemporalCropClassification #

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=True, reduce_zero_label=True, use_metadata=False, metadata_file_name='chips_df.csv') #

plot(sample, suptitle=None) #

terratorch.datasets.open_sentinel_map #

OpenSentinelMap #

__init__(data_root, split='train', bands=None, transform=None, spatial_interpolate_and_stack_temporally=True, pad_image=None, truncate_image=None, target=0, pick_random_pair=True) #

`terratorch.datasets.biomassters` #

`BioMasstersNonGeo` #

`plot(sample, show_titles=True, suptitle=None)` #

`terratorch.datasets.burn_intensity` #

`BurnIntensityNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, use_full_data=True, no_data_replace=0.0001, no_label_replace=-1, use_metadata=False)` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.carbonflux` #

`CarbonFluxNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, gpp_mean=None, gpp_std=None, no_data_replace=0.0001, use_metadata=False, modalities=('image', 'merra_vars'))` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.forestnet` #

`ForestNetNonGeo` #

`init(data_root, split='train', label_map=default_label_map, transform=None, fraction=1.0, bands=BAND_SETS['all'], use_metadata=False)` #

`map_label(index)` #

`terratorch.datasets.fire_scars` #

`FireScarsHLS` #

`FireScarsNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, no_data_replace=0, no_label_replace=-1, use_metadata=False)` #

`plot(sample, suptitle=None)` #

`FireScarsSegmentationMask` #

`terratorch.datasets.landslide4sense` #

`Landslide4SenseNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None)` #

`terratorch.datasets.m_eurosat` #

`MEuroSATNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_bigearthnet` #

`MBigEarthNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_brick_kiln` #

`MBrickKilnNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_forestnet` #

`MForestNetNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_so2sat` #

`MSo2SatNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_pv4ger` #

`MPv4gerNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_cashew_plantation` #

`MBeninSmallHolderCashewsNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_nz_cattle` #

`MNzCattleNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_chesapeake_landcover` #

`MChesapeakeLandcoverNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_pv4ger_seg` #

`MPv4gerSegNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False)` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_SA_crop_type` #

`MSACropTypeNonGeo` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default')` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.m_neontree` #

`MNeonTreeNonGeo` #

`init(data_root, split='train', bands=rgb_bands, transform=None, partition='default')` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.multi_temporal_crop_classification` #

`MultiTemporalCropClassification` #

`init(data_root, split='train', bands=BAND_SETS['all'], transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=True, reduce_zero_label=True, use_metadata=False, metadata_file_name='chips_df.csv')` #

`plot(sample, suptitle=None)` #

`terratorch.datasets.open_sentinel_map` #

`OpenSentinelMap` #

`init(data_root, split='train', bands=None, transform=None, spatial_interpolate_and_stack_temporally=True, pad_image=None, truncate_image=None, target=0, pick_random_pair=True)` #

`terratorch.datasets.openearthmap` #