Generic Datamodules

`terratorch.datamodules.generic_pixel_wise_data_module` #

This module contains generic data modules for instantiation at runtime.

`GenericNonGeoPixelwiseRegressionDataModule` #

Bases: NonGeoDataModule

This is a generic datamodule class for instantiating data modules at runtime. Composes several GenericNonGeoPixelwiseRegressionDataset

init(batch_size, num_workers, train_data_root, val_data_root, test_data_root, means, stds, predict_data_root=None, img_grep='', label_grep='', train_label_data_root=None, val_label_data_root=None, test_label_data_root=None, train_split=None, val_split=None, test_split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, dataset_bands=None, output_bands=None, predict_dataset_bands=None, predict_output_bands=None, constant_scale=1, rgb_indices=None, train_transform=None, val_transform=None, test_transform=None, expand_temporal_dimension=False, reduce_zero_label=False, no_data_replace=None, no_label_replace=None, drop_last=True, pin_memory=False, check_stackability=True, **kwargs) #

Constructor

Parameters:

Name	Type	Description	Default
`batch_size`	`int`	description	required
`num_workers`	`int`	description	required
`train_data_root`	`Path`	description	required
`val_data_root`	`Path`	description	required
`test_data_root`	`Path`	description	required
`predict_data_root`	`Path`	description	`None`
`img_grep`	`str`	description	`'*'`
`label_grep`	`str`	description	`'*'`
`means`	`list[float]`	description	required
`stds`	`list[float]`	description	required
`train_label_data_root`	`Path \| None`	description. Defaults to None.	`None`
`val_label_data_root`	`Path \| None`	description. Defaults to None.	`None`
`test_label_data_root`	`Path \| None`	description. Defaults to None.	`None`
`train_split`	`Path \| None`	description. Defaults to None.	`None`
`val_split`	`Path \| None`	description. Defaults to None.	`None`
`test_split`	`Path \| None`	description. Defaults to None.	`None`
`ignore_split_file_extensions`	`bool`	Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True.	`True`
`allow_substring_split_file`	`bool`	Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.	`True`
`dataset_bands`	`list[HLSBands \| int] \| None`	Bands present in the dataset. Defaults to None.	`None`
`output_bands`	`list[HLSBands \| int] \| None`	Bands that should be output by the dataset. Naming must match that of dataset_bands. Defaults to None.	`None`
`predict_dataset_bands`	`list[HLSBands \| int] \| None`	Overwrites dataset_bands with this value at predict time. Defaults to None, which does not overwrite.	`None`
`predict_output_bands`	`list[HLSBands \| int] \| None`	Overwrites output_bands with this value at predict time. Defaults to None, which does not overwrite.	`None`
`constant_scale`	`float`	description. Defaults to 1.	`1`
`rgb_indices`	`list[int] \| None`	description. Defaults to None.	`None`
`train_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`val_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`test_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`no_data_replace`	`float \| None`	Replace nan values in input images with this value. If none, does no replacement. Defaults to None.	`None`
`no_label_replace`	`int \| None`	Replace nan values in label with this value. If none, does no replacement. Defaults to None.	`None`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.	`False`
`reduce_zero_label`	`bool`	Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False.	`False`
`drop_last`	`bool`	Drop the last batch if it is not complete. Defaults to True.	`True`
`pin_memory`	`bool`	If `True`, the data loader will copy Tensors	`False`
`check_stackability`	`bool`	Check if all the files in the dataset has the same size and can be stacked.	`True`

`GenericNonGeoSegmentationDataModule` #

Bases: NonGeoDataModule

This is a generic datamodule class for instantiating data modules at runtime. Composes several GenericNonGeoSegmentationDatasets

init(batch_size, num_workers, train_data_root, val_data_root, test_data_root, img_grep, label_grep, means, stds, num_classes, predict_data_root=None, train_label_data_root=None, val_label_data_root=None, test_label_data_root=None, train_split=None, val_split=None, test_split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, dataset_bands=None, output_bands=None, predict_dataset_bands=None, predict_output_bands=None, constant_scale=1, rgb_indices=None, train_transform=None, val_transform=None, test_transform=None, expand_temporal_dimension=False, reduce_zero_label=False, no_data_replace=None, no_label_replace=None, drop_last=True, pin_memory=False, **kwargs) #

Constructor

Parameters:

Name	Type	Description	Default
`batch_size`	`int`	description	required
`num_workers`	`int`	description	required
`train_data_root`	`Path`	description	required
`val_data_root`	`Path`	description	required
`test_data_root`	`Path`	description	required
`predict_data_root`	`Path`	description	`None`
`img_grep`	`str`	description	required
`label_grep`	`str`	description	required
`means`	`list[float]`	description	required
`stds`	`list[float]`	description	required
`num_classes`	`int`	description	required
`train_label_data_root`	`Path \| None`	description. Defaults to None.	`None`
`val_label_data_root`	`Path \| None`	description. Defaults to None.	`None`
`test_label_data_root`	`Path \| None`	description. Defaults to None.	`None`
`train_split`	`Path \| None`	description. Defaults to None.	`None`
`val_split`	`Path \| None`	description. Defaults to None.	`None`
`test_split`	`Path \| None`	description. Defaults to None.	`None`
`ignore_split_file_extensions`	`bool`	Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True.	`True`
`allow_substring_split_file`	`bool`	Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.	`True`
`dataset_bands`	`list[HLSBands \| int] \| None`	Bands present in the dataset. Defaults to None.	`None`
`output_bands`	`list[HLSBands \| int] \| None`	Bands that should be output by the dataset. Naming must match that of dataset_bands. Defaults to None.	`None`
`predict_dataset_bands`	`list[HLSBands \| int] \| None`	Overwrites dataset_bands with this value at predict time. Defaults to None, which does not overwrite.	`None`
`predict_output_bands`	`list[HLSBands \| int] \| None`	Overwrites output_bands with this value at predict time. Defaults to None, which does not overwrite.	`None`
`constant_scale`	`float`	description. Defaults to 1.	`1`
`rgb_indices`	`list[int] \| None`	description. Defaults to None.	`None`
`train_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`val_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`test_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`no_data_replace`	`float \| None`	Replace nan values in input images with this value. If none, does no replacement. Defaults to None.	`None`
`no_label_replace`	`int \| None`	Replace nan values in label with this value. If none, does no replacement. Defaults to None.	`None`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.	`False`
`reduce_zero_label`	`bool`	Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False.	`False`
`drop_last`	`bool`	Drop the last batch if it is not complete. Defaults to True.	`True`
`pin_memory`	`bool`	If `True`, the data loader will copy Tensors	`False`

`terratorch.datamodules.generic_scalar_label_data_module` #

This module contains generic data modules for instantiation at runtime.

`GenericNonGeoClassificationDataModule` #

Bases: NonGeoDataModule

This is a generic datamodule class for instantiating data modules at runtime. Composes several GenericNonGeoClassificationDatasets

init(batch_size, num_workers, train_data_root, val_data_root, test_data_root, means, stds, num_classes, predict_data_root=None, train_split=None, val_split=None, test_split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, dataset_bands=None, predict_dataset_bands=None, output_bands=None, constant_scale=1, rgb_indices=None, train_transform=None, val_transform=None, test_transform=None, expand_temporal_dimension=False, no_data_replace=0, drop_last=True, check_stackability=True, **kwargs) #

Constructor

Parameters:

Name	Type	Description	Default
`batch_size`	`int`	description	required
`num_workers`	`int`	description	required
`train_data_root`	`Path`	description	required
`val_data_root`	`Path`	description	required
`test_data_root`	`Path`	description	required
`means`	`list[float]`	description	required
`stds`	`list[float]`	description	required
`num_classes`	`int`	description	required
`predict_data_root`	`Path`	description	`None`
`train_split`	`Path \| None`	description. Defaults to None.	`None`
`val_split`	`Path \| None`	description. Defaults to None.	`None`
`test_split`	`Path \| None`	description. Defaults to None.	`None`
`ignore_split_file_extensions`	`bool`	Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg".	`True`
`allow_substring_split_file`	`bool`	Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.	`True`
`dataset_bands`	`list[HLSBands \| int] \| None`	description. Defaults to None.	`None`
`predict_dataset_bands`	`list[HLSBands \| int] \| None`	description. Defaults to None.	`None`
`output_bands`	`list[HLSBands \| int] \| None`	description. Defaults to None.	`None`
`constant_scale`	`float`	description. Defaults to 1.	`1`
`rgb_indices`	`list[int] \| None`	description. Defaults to None.	`None`
`train_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`val_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`test_transform`	`Compose \| None`	Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().	`None`
`no_data_replace`	`float`	Replace nan values in input images with this value. Defaults to 0.	`0`
`expand_temporal_dimension`	`bool`	Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.	`False`
`drop_last`	`bool`	Drop the last batch if it is not complete. Defaults to True.	`True`
`check_stackability`	`bool`	Check if all the files in the dataset has the same size and can be stacked.	`True`

Last update: March 23, 2025

Generic Datamodules

terratorch.datamodules.generic_pixel_wise_data_module #

GenericNonGeoPixelwiseRegressionDataModule #

GenericNonGeoSegmentationDataModule #

terratorch.datamodules.generic_scalar_label_data_module #

GenericNonGeoClassificationDataModule #

`terratorch.datamodules.generic_pixel_wise_data_module` #

`GenericNonGeoPixelwiseRegressionDataModule` #

`GenericNonGeoSegmentationDataModule` #

`terratorch.datamodules.generic_scalar_label_data_module` #

`GenericNonGeoClassificationDataModule` #