Generic Datasets

`terratorch.datasets.generic_pixel_wise_dataset` #

Module containing generic dataset classes

`GenericNonGeoPixelwiseRegressionDataset` #

Bases: GenericPixelWiseDataset

GenericNonGeoPixelwiseRegressionDataset

`init(data_root, label_data_root=None, image_grep='', label_grep='', split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, constant_scale=1, transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=False, reduce_zero_label=False)` #

Constructor

Parameters:

Name	Type	Description	Default
`data_root`	`Path`	Path to data root directory	required
`label_data_root`	`Path`	Path to data root directory with labels. If not specified, will use the same as for images.	`None`
`image_grep`	`str`	Regular expression appended to data_root to find input images. Defaults to "*".	`'*'`
`label_grep`	`str`	Regular expression appended to data_root to find ground truth masks. Defaults to "*".	`'*'`
`split`	`Path`	Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep])	`None`
`ignore_split_file_extensions`	`bool`	Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True.	`True`
`allow_substring_split_file`	`bool`	Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.	`True`
`rgb_indices`	`list[str]`	Indices of RGB channels. Defaults to [0, 1, 2].	`None`
`dataset_bands`	`list[HLSBands \| int] \| None`	Bands present in the dataset.	`None`
`output_bands`	`list[HLSBands \| int] \| None`	Bands that should be output by the dataset.	`None`
`constant_scale`	`float`	Factor to multiply image values by. Defaults to 1.	`1`
`transform`	`Compose \| None`	Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`no_data_replace`	`float \| None`	Replace nan values in input images with this value. If none, does no replacement. Defaults to None.	`None`
`no_label_replace`	`int \| None`	Replace nan values in label with this value. If none, does no replacement. Defaults to None.	`None`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.	`False`
`reduce_zero_label`	`bool`	Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False.	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	a sample returned by :meth:`__getitem__`	required
`suptitle`	`str \| None`	optional string to use as a suptitle	`None`

Returns:

Type	Description
`Figure`	a matplotlib Figure with the rendered sample

.. versionadded:: 0.2

`GenericNonGeoSegmentationDataset` #

Bases: GenericPixelWiseDataset

GenericNonGeoSegmentationDataset

`init(data_root, num_classes, label_data_root=None, image_grep='', label_grep='', split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, class_names=None, constant_scale=1, transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=False, reduce_zero_label=False)` #

Constructor

Parameters:

Name	Type	Description	Default
`data_root`	`Path`	Path to data root directory	required
`num_classes`	`int`	Number of classes in the dataset	required
`label_data_root`	`Path`	Path to data root directory with labels. If not specified, will use the same as for images.	`None`
`image_grep`	`str`	Regular expression appended to data_root to find input images. Defaults to "*".	`'*'`
`label_grep`	`str`	Regular expression appended to data_root to find ground truth masks. Defaults to "*".	`'*'`
`split`	`Path`	Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep])	`None`
`ignore_split_file_extensions`	`bool`	Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True	`True`
`allow_substring_split_file`	`bool`	Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.	`True`
`rgb_indices`	`list[str]`	Indices of RGB channels. Defaults to [0, 1, 2].	`None`
`dataset_bands`	`list[HLSBands \| int] \| None`	Bands present in the dataset.	`None`
`output_bands`	`list[HLSBands \| int] \| None`	Bands that should be output by the dataset.	`None`
`class_names`	`list[str]`	Class names. Defaults to None.	`None`
`constant_scale`	`float`	Factor to multiply image values by. Defaults to 1.	`1`
`transform`	`Compose \| None`	Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`no_data_replace`	`float \| None`	Replace nan values in input images with this value. If none, does no replacement. Defaults to None.	`None`
`no_label_replace`	`int \| None`	Replace nan values in label with this value. If none, does no replacement. Defaults to None.	`None`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.	`False`
`reduce_zero_label`	`bool`	Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False.	`False`

`plot(sample, suptitle=None)` #

Plot a sample from the dataset.

Parameters:

Name	Type	Description	Default
`sample`	`dict[str, Tensor]`	a sample returned by :meth:`__getitem__`	required
`suptitle`	`str \| None`	optional string to use as a suptitle	`None`

Returns:

Type	Description
`Figure`	a matplotlib Figure with the rendered sample

.. versionadded:: 0.2

`GenericPixelWiseDataset` #

Bases: NonGeoDataset, ABC

This is a generic dataset class to be used for instantiating datasets from arguments. Ideally, one would create a dataset class specific to a dataset.

`init(data_root, label_data_root=None, image_grep='', label_grep='', split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, constant_scale=1, transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=False, reduce_zero_label=False)` #

Constructor

Parameters:

Name	Type	Description	Default
`data_root`	`Path`	Path to data root directory	required
`label_data_root`	`Path`	Path to data root directory with labels. If not specified, will use the same as for images.	`None`
`image_grep`	`str`	Regular expression appended to data_root to find input images. Defaults to "*".	`'*'`
`label_grep`	`str`	Regular expression appended to data_root to find ground truth masks. Defaults to "*".	`'*'`
`split`	`Path`	Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep])	`None`
`ignore_split_file_extensions`	`bool`	Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True.	`True`
`allow_substring_split_file`	`bool`	Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.	`True`
`rgb_indices`	`list[str]`	Indices of RGB channels. Defaults to [0, 1, 2].	`None`
`dataset_bands`	`list[HLSBands \| int \| tuple[int, int] \| str] \| None`	Bands present in the dataset. This parameter names input channels (bands) using HLSBands, ints, int ranges, or strings, so that they can then be refered to by output_bands. Defaults to None.	`None`
`output_bands`	`list[HLSBands \| int \| tuple[int, int] \| str] \| None`	Bands that should be output by the dataset as named by dataset_bands.	`None`
`constant_scale`	`float`	Factor to multiply image values by. Defaults to 1.	`1`
`transform`	`Compose \| None`	Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`no_data_replace`	`float \| None`	Replace nan values in input images with this value. If none, does no replacement. Defaults to None.	`None`
`no_label_replace`	`int \| None`	Replace nan values in label with this value. If none, does no replacement. Defaults to -1.	`None`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.	`False`
`reduce_zero_label`	`bool`	Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False.	`False`

`terratorch.datasets.generic_scalar_label_dataset` #

Module containing generic dataset classes

`GenericNonGeoClassificationDataset` #

Bases: GenericScalarLabelDataset

GenericNonGeoClassificationDataset

`init(data_root, num_classes, split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, class_names=None, constant_scale=1, transform=None, no_data_replace=0, expand_temporal_dimension=False)` #

A generic Non-Geo dataset for classification.

Parameters:

Name	Type	Description	Default
`data_root`	`Path`	Path to data root directory	required
`num_classes`	`int`	Number of classes in the dataset	required
`split`	`Path`	Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep])	`None`
`ignore_split_file_extensions`	`bool`	Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True.	`True`
`allow_substring_split_file`	`bool`	Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.	`True`
`rgb_indices`	`list[str]`	Indices of RGB channels. Defaults to [0, 1, 2].	`None`
`dataset_bands`	`list[HLSBands \| int] \| None`	Bands present in the dataset.	`None`
`output_bands`	`list[HLSBands \| int] \| None`	Bands that should be output by the dataset.	`None`
`class_names`	`list[str]`	Class names. Defaults to None.	`None`
`constant_scale`	`float`	Factor to multiply image values by. Defaults to 1.	`1`
`transform`	`Compose \| None`	Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`no_data_replace`	`float`	Replace nan values in input images with this value. Defaults to 0.	`0`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.	`False`

`GenericScalarLabelDataset` #

Bases: NonGeoDataset, ImageFolder, ABC

This is a generic dataset class to be used for instantiating datasets from arguments. Ideally, one would create a dataset class specific to a dataset.

`init(data_root, split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, constant_scale=1, transform=None, no_data_replace=0, expand_temporal_dimension=False)` #

Constructor

Parameters:

Name	Type	Description	Default
`data_root`	`Path`	Path to data root directory	required
`split`	`Path`	Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep])	`None`
`ignore_split_file_extensions`	`bool`	Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True.	`True`
`allow_substring_split_file`	`bool`	Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.	`True`
`rgb_indices`	`list[str]`	Indices of RGB channels. Defaults to [0, 1, 2].	`None`
`dataset_bands`	`list[HLSBands \| int \| tuple[int, int] \| str] \| None`	Bands present in the dataset. This parameter gives identifiers to input channels (bands) so that they can then be refered to by output_bands. Can use the HLSBands enum, ints, int ranges, or strings. Defaults to None.	`None`
`output_bands`	`list[HLSBands \| int \| tuple[int, int] \| str] \| None`	Bands that should be output by the dataset as named by dataset_bands.	`None`
`constant_scale`	`float`	Factor to multiply image values by. Defaults to 1.	`1`
`transform`	`Compose \| None`	Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`no_data_replace`	`float`	Replace nan values in input images with this value. Defaults to 0.	`0`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.	`False`

Last update: March 23, 2025

Generic Datasets

terratorch.datasets.generic_pixel_wise_dataset #

GenericNonGeoPixelwiseRegressionDataset #

plot(sample, suptitle=None) #

GenericNonGeoSegmentationDataset #

plot(sample, suptitle=None) #

GenericPixelWiseDataset #

terratorch.datasets.generic_scalar_label_dataset #

GenericNonGeoClassificationDataset #

__init__(data_root, num_classes, split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, class_names=None, constant_scale=1, transform=None, no_data_replace=0, expand_temporal_dimension=False) #

GenericScalarLabelDataset #

__init__(data_root, split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, constant_scale=1, transform=None, no_data_replace=0, expand_temporal_dimension=False) #

`terratorch.datasets.generic_pixel_wise_dataset` #

`GenericNonGeoPixelwiseRegressionDataset` #

`plot(sample, suptitle=None)` #

`GenericNonGeoSegmentationDataset` #

`plot(sample, suptitle=None)` #

`GenericPixelWiseDataset` #

`terratorch.datasets.generic_scalar_label_dataset` #

`GenericNonGeoClassificationDataset` #

`init(data_root, num_classes, split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, class_names=None, constant_scale=1, transform=None, no_data_replace=0, expand_temporal_dimension=False)` #

`GenericScalarLabelDataset` #

`init(data_root, split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, constant_scale=1, transform=None, no_data_replace=0, expand_temporal_dimension=False)` #