Skip to content

Generic Datamodules

terratorch.datamodules.generic_pixel_wise_data_module #

This module contains generic data modules for instantiation at runtime.

GenericNonGeoPixelwiseRegressionDataModule #

Bases: NonGeoDataModule

This is a generic datamodule class for instantiating data modules at runtime. Composes several GenericNonGeoPixelwiseRegressionDataset

__init__(batch_size, num_workers, train_data_root, val_data_root, test_data_root, means, stds, predict_data_root=None, img_grep='*', label_grep='*', train_label_data_root=None, val_label_data_root=None, test_label_data_root=None, train_split=None, val_split=None, test_split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, dataset_bands=None, output_bands=None, predict_dataset_bands=None, predict_output_bands=None, constant_scale=1, rgb_indices=None, train_transform=None, val_transform=None, test_transform=None, expand_temporal_dimension=False, reduce_zero_label=False, no_data_replace=None, no_label_replace=None, drop_last=True, pin_memory=False, check_stackability=True, **kwargs) #

Constructor

Parameters:

Name Type Description Default
batch_size int

description

required
num_workers int

description

required
train_data_root Path

description

required
val_data_root Path

description

required
test_data_root Path

description

required
predict_data_root Path

description

None
img_grep str

description

'*'
label_grep str

description

'*'
means list[float]

description

required
stds list[float]

description

required
train_label_data_root Path | None

description. Defaults to None.

None
val_label_data_root Path | None

description. Defaults to None.

None
test_label_data_root Path | None

description. Defaults to None.

None
train_split Path | None

description. Defaults to None.

None
val_split Path | None

description. Defaults to None.

None
test_split Path | None

description. Defaults to None.

None
ignore_split_file_extensions bool

Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True.

True
allow_substring_split_file bool

Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.

True
dataset_bands list[HLSBands | int] | None

Bands present in the dataset. Defaults to None.

None
output_bands list[HLSBands | int] | None

Bands that should be output by the dataset. Naming must match that of dataset_bands. Defaults to None.

None
predict_dataset_bands list[HLSBands | int] | None

Overwrites dataset_bands with this value at predict time. Defaults to None, which does not overwrite.

None
predict_output_bands list[HLSBands | int] | None

Overwrites output_bands with this value at predict time. Defaults to None, which does not overwrite.

None
constant_scale float

description. Defaults to 1.

1
rgb_indices list[int] | None

description. Defaults to None.

None
train_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
val_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
test_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
no_data_replace float | None

Replace nan values in input images with this value. If none, does no replacement. Defaults to None.

None
no_label_replace int | None

Replace nan values in label with this value. If none, does no replacement. Defaults to None.

None
expand_temporal_dimension bool

Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.

False
reduce_zero_label bool

Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False.

False
drop_last bool

Drop the last batch if it is not complete. Defaults to True.

True
pin_memory bool

If True, the data loader will copy Tensors

False
check_stackability bool

Check if all the files in the dataset has the same size and can be stacked.

True

GenericNonGeoSegmentationDataModule #

Bases: NonGeoDataModule

This is a generic datamodule class for instantiating data modules at runtime. Composes several GenericNonGeoSegmentationDatasets

__init__(batch_size, num_workers, train_data_root, val_data_root, test_data_root, img_grep, label_grep, means, stds, num_classes, predict_data_root=None, train_label_data_root=None, val_label_data_root=None, test_label_data_root=None, train_split=None, val_split=None, test_split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, dataset_bands=None, output_bands=None, predict_dataset_bands=None, predict_output_bands=None, constant_scale=1, rgb_indices=None, train_transform=None, val_transform=None, test_transform=None, expand_temporal_dimension=False, reduce_zero_label=False, no_data_replace=None, no_label_replace=None, drop_last=True, pin_memory=False, **kwargs) #

Constructor

Parameters:

Name Type Description Default
batch_size int

description

required
num_workers int

description

required
train_data_root Path

description

required
val_data_root Path

description

required
test_data_root Path

description

required
predict_data_root Path

description

None
img_grep str

description

required
label_grep str

description

required
means list[float]

description

required
stds list[float]

description

required
num_classes int

description

required
train_label_data_root Path | None

description. Defaults to None.

None
val_label_data_root Path | None

description. Defaults to None.

None
test_label_data_root Path | None

description. Defaults to None.

None
train_split Path | None

description. Defaults to None.

None
val_split Path | None

description. Defaults to None.

None
test_split Path | None

description. Defaults to None.

None
ignore_split_file_extensions bool

Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True.

True
allow_substring_split_file bool

Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.

True
dataset_bands list[HLSBands | int] | None

Bands present in the dataset. Defaults to None.

None
output_bands list[HLSBands | int] | None

Bands that should be output by the dataset. Naming must match that of dataset_bands. Defaults to None.

None
predict_dataset_bands list[HLSBands | int] | None

Overwrites dataset_bands with this value at predict time. Defaults to None, which does not overwrite.

None
predict_output_bands list[HLSBands | int] | None

Overwrites output_bands with this value at predict time. Defaults to None, which does not overwrite.

None
constant_scale float

description. Defaults to 1.

1
rgb_indices list[int] | None

description. Defaults to None.

None
train_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
val_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
test_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
no_data_replace float | None

Replace nan values in input images with this value. If none, does no replacement. Defaults to None.

None
no_label_replace int | None

Replace nan values in label with this value. If none, does no replacement. Defaults to None.

None
expand_temporal_dimension bool

Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.

False
reduce_zero_label bool

Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False.

False
drop_last bool

Drop the last batch if it is not complete. Defaults to True.

True
pin_memory bool

If True, the data loader will copy Tensors

False

terratorch.datamodules.generic_scalar_label_data_module #

This module contains generic data modules for instantiation at runtime.

GenericNonGeoClassificationDataModule #

Bases: NonGeoDataModule

This is a generic datamodule class for instantiating data modules at runtime. Composes several GenericNonGeoClassificationDatasets

__init__(batch_size, num_workers, train_data_root, val_data_root, test_data_root, means, stds, num_classes, predict_data_root=None, train_split=None, val_split=None, test_split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, dataset_bands=None, predict_dataset_bands=None, output_bands=None, constant_scale=1, rgb_indices=None, train_transform=None, val_transform=None, test_transform=None, expand_temporal_dimension=False, no_data_replace=0, drop_last=True, check_stackability=True, **kwargs) #

Constructor

Parameters:

Name Type Description Default
batch_size int

description

required
num_workers int

description

required
train_data_root Path

description

required
val_data_root Path

description

required
test_data_root Path

description

required
means list[float]

description

required
stds list[float]

description

required
num_classes int

description

required
predict_data_root Path

description

None
train_split Path | None

description. Defaults to None.

None
val_split Path | None

description. Defaults to None.

None
test_split Path | None

description. Defaults to None.

None
ignore_split_file_extensions bool

Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg".

True
allow_substring_split_file bool

Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True.

True
dataset_bands list[HLSBands | int] | None

description. Defaults to None.

None
predict_dataset_bands list[HLSBands | int] | None

description. Defaults to None.

None
output_bands list[HLSBands | int] | None

description. Defaults to None.

None
constant_scale float

description. Defaults to 1.

1
rgb_indices list[int] | None

description. Defaults to None.

None
train_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
val_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
test_transform Compose | None

Albumentations transform to be applied to the train dataset. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2().

None
no_data_replace float

Replace nan values in input images with this value. Defaults to 0.

0
expand_temporal_dimension bool

Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False.

False
drop_last bool

Drop the last batch if it is not complete. Defaults to True.

True
check_stackability bool

Check if all the files in the dataset has the same size and can be stacked.

True

Last update: March 23, 2025