Skip to content

Datamodules

terratorch.datamodules.biomassters #

BioMasstersNonGeoDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for BioMassters datamodule.

__init__(data_root, batch_size=4, num_workers=0, bands=BioMasstersNonGeo.all_band_names, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, aug=None, drop_last=True, sensors=['S1', 'S2'], as_time_series=False, metadata_filename=default_metadata_filename, max_cloud_percentage=None, max_red_mean=None, include_corrupt=True, subset=1, seed=42, use_four_frames=False, **kwargs) #

Initializes the DataModule for the non-geospatial BioMassters datamodule.

Parameters:

Name Type Description Default
data_root str

Root directory containing the dataset.

required
batch_size int

Batch size for DataLoaders. Defaults to 4.

4
num_workers int

Number of workers for data loading. Defaults to 0.

0
bands dict[str, Sequence[str]] | Sequence[str]

Band configuration; either a dict mapping sensors to bands or a list for the first sensor. Defaults to BioMasstersNonGeo.all_band_names

all_band_names
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
aug AugmentationSequential

Augmentation or normalization to apply. Defaults to normalization if not provided.

None
drop_last bool

Whether to drop the last incomplete batch. Defaults to True.

True
sensors Sequence[str]

List of sensors to use (e.g., ["S1", "S2"]). Defaults to ["S1", "S2"].

['S1', 'S2']
as_time_series bool

Whether to treat data as a time series. Defaults to False.

False
metadata_filename str

Metadata filename. Defaults to "The_BioMassters_-_features_metadata.csv.csv".

default_metadata_filename
max_cloud_percentage float | None

Maximum allowed cloud percentage. Defaults to None.

None
max_red_mean float | None

Maximum allowed red band mean. Defaults to None.

None
include_corrupt bool

Whether to include corrupt data. Defaults to True.

True
subset float

Fraction of the dataset to use. Defaults to 1.

1
seed int

Random seed for reproducibility. Defaults to 42.

42
use_four_frames bool

Whether to use a four frames configuration. Defaults to False.

False
**kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
None

None.

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.burn_intensity #

BurnIntensityNonGeoDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for BurnIntensity datamodule.

__init__(data_root, batch_size=4, num_workers=0, bands=BurnIntensityNonGeo.all_band_names, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, use_full_data=True, no_data_replace=0.0001, no_label_replace=-1, use_metadata=False, **kwargs) #

Initializes the DataModule for the BurnIntensity non-geospatial datamodule.

Parameters:

Name Type Description Default
data_root str

Root directory of the dataset.

required
batch_size int

Batch size for DataLoaders. Defaults to 4.

4
num_workers int

Number of workers for data loading. Defaults to 0.

0
bands Sequence[str]

List of bands to use. Defaults to BurnIntensityNonGeo.all_band_names.

all_band_names
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction.

None
use_full_data bool

Whether to use the full dataset or data with less than 25 percent zeros. Defaults to True.

True
no_data_replace float | None

Value to replace missing data. Defaults to 0.0001.

0.0001
no_label_replace int | None

Value to replace missing labels. Defaults to -1.

-1
use_metadata bool

Whether to return metadata info (time and location).

False
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.carbonflux #

CarbonFluxNonGeoDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for Carbon FLux dataset.

__init__(data_root, batch_size=4, num_workers=0, bands=CarbonFluxNonGeo.all_band_names, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, aug=None, no_data_replace=0.0001, use_metadata=False, **kwargs) #

Initializes the CarbonFluxNonGeoDataModule.

Parameters:

Name Type Description Default
data_root str

Root directory of the dataset.

required
batch_size int

Batch size for DataLoaders. Defaults to 4.

4
num_workers int

Number of workers for data loading. Defaults to 0.

0
bands Sequence[str]

List of bands to use. Defaults to CarbonFluxNonGeo.all_band_names.

all_band_names
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
aug AugmentationSequential

Augmentation sequence; if None, applies multimodal normalization.

None
no_data_replace float | None

Value to replace missing data. Defaults to 0.0001.

0.0001
use_metadata bool

Whether to return metadata info.

False
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.forestnet #

ForestNetNonGeoDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for Landslide4Sense dataset.

__init__(data_root, batch_size=4, num_workers=0, label_map=ForestNetNonGeo.default_label_map, bands=ForestNetNonGeo.all_band_names, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, fraction=1.0, aug=None, use_metadata=False, **kwargs) #

Initializes the ForestNetNonGeoDataModule.

Parameters:

Name Type Description Default
data_root str

Directory containing the dataset.

required
batch_size int

Batch size for data loaders. Defaults to 4.

4
num_workers int

Number of workers for data loading. Defaults to 0.

0
label_map dict[str, int]

Mapping of labels to integers. Defaults to ForestNetNonGeo.default_label_map.

default_label_map
bands Sequence[str]

List of band names to use. Defaults to ForestNetNonGeo.all_band_names.

all_band_names
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction.

None
fraction float

Fraction of data to use. Defaults to 1.0.

1.0
aug AugmentationSequential

Augmentation/normalization pipeline; if None, uses Normalize.

None
use_metadata bool

Whether to return metadata info.

False
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.fire_scars #

FireScarsDataModule #

Bases: GeoDataModule

Geo Fire Scars data module implementation that merges input data with ground truth segmentation masks.

FireScarsNonGeoDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for Fire Scars dataset.

__init__(data_root, batch_size=4, num_workers=0, bands=FireScarsNonGeo.all_band_names, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, drop_last=True, no_data_replace=0, no_label_replace=-1, use_metadata=False, **kwargs) #

Initializes the FireScarsNonGeoDataModule.

Parameters:

Name Type Description Default
data_root str

Root directory of the dataset.

required
batch_size int

Batch size for DataLoaders. Defaults to 4.

4
num_workers int

Number of workers for data loading. Defaults to 0.

0
bands Sequence[str]

List of band names. Defaults to FireScarsNonGeo.all_band_names.

all_band_names
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction.

None
drop_last bool

Whether to drop the last incomplete batch. Defaults to True.

True
no_data_replace float | None

Replacement value for missing data. Defaults to 0.

0
no_label_replace int | None

Replacement value for missing labels. Defaults to -1.

-1
use_metadata bool

Whether to return metadata info.

False
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.landslide4sense #

Landslide4SenseNonGeoDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for Landslide4Sense dataset.

__init__(data_root, batch_size=4, num_workers=0, bands=Landslide4SenseNonGeo.all_band_names, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, aug=None, **kwargs) #

Initializes the Landslide4SenseNonGeoDataModule.

Parameters:

Name Type Description Default
data_root str

Root directory of the dataset.

required
batch_size int

Batch size for data loaders. Defaults to 4.

4
num_workers int

Number of workers for data loading. Defaults to 0.

0
bands Sequence[str]

List of band names to use. Defaults to Landslide4SenseNonGeo.all_band_names.

all_band_names
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
aug AugmentationSequential

Augmentation pipeline; if None, applies normalization using computed means and stds.

None
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.m_eurosat #

MEuroSATNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-EuroSAT dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', **kwargs) #

Initializes the MEuroSATNonGeoDataModule for the MEuroSATNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_bigearthnet #

MBigEarthNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-BigEarthNet dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', **kwargs) #

Initializes the MBigEarthNonGeoDataModule for the M-BigEarthNet dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_brick_kiln #

MBrickKilnNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-BrickKiln dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', **kwargs) #

Initializes the MBrickKilnNonGeoDataModule for the M-BrickKilnNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_forestnet #

MForestNetNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-ForestNet dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', use_metadata=False, **kwargs) #

Initializes the MForestNetNonGeoDataModule for the MForestNetNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
use_metadata bool

Whether to return metadata info.

False
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_so2sat #

MSo2SatNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-So2Sat dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', **kwargs) #

Initializes the MSo2SatNonGeoDataModule for the MSo2SatNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_pv4ger #

MPv4gerNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-Pv4ger dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', use_metadata=False, **kwargs) #

Initializes the MPv4gerNonGeoDataModule for the MPv4gerNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
use_metadata bool

Whether to return metadata info.

False
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_cashew_plantation #

MBeninSmallHolderCashewsNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-Cashew Plantation dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', use_metadata=False, **kwargs) #

Initializes the MBeninSmallHolderCashewsNonGeoDataModule for the M-BeninSmallHolderCashewsNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
use_metadata bool

Whether to return metadata info.

False
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_nz_cattle #

MNzCattleNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-NZCattle dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', use_metadata=False, **kwargs) #

Initializes the MNzCattleNonGeoDataModule for the MNzCattleNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
use_metadata bool

Whether to return metadata info.

False
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_chesapeake_landcover #

MChesapeakeLandcoverNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-ChesapeakeLandcover dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', **kwargs) #

Initializes the MChesapeakeLandcoverNonGeoDataModule for the M-BigEarthNet dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_pv4ger_seg #

MPv4gerSegNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-Pv4gerSeg dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', use_metadata=False, **kwargs) #

Initializes the MPv4gerNonGeoDataModule for the MPv4gerSegNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
use_metadata bool

Whether to return metadata info.

False
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_SA_crop_type #

MSACropTypeNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-SA-CropType dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', **kwargs) #

Initializes the MSACropTypeNonGeoDataModule for the MSACropTypeNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.m_neontree #

MNeonTreeNonGeoDataModule #

Bases: GeobenchDataModule

NonGeo LightningDataModule implementation for M-NeonTree dataset.

__init__(batch_size=8, num_workers=0, data_root='./', bands=None, train_transform=None, val_transform=None, test_transform=None, aug=None, partition='default', **kwargs) #

Initializes the MNeonTreeNonGeoDataModule for the MNeonTreeNonGeo dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
bands Sequence[str] | None

List of bands to use. Defaults to None.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing.

None
aug AugmentationSequential

Augmentation/normalization pipeline. Defaults to None.

None
partition str

Partition size. Defaults to "default".

'default'
**kwargs Any

Additional keyword arguments.

{}

terratorch.datamodules.multi_temporal_crop_classification #

MultiTemporalCropClassificationDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for multi-temporal crop classification.

__init__(data_root, batch_size=4, num_workers=0, bands=MultiTemporalCropClassification.all_band_names, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, drop_last=True, no_data_replace=0, no_label_replace=-1, expand_temporal_dimension=True, reduce_zero_label=True, use_metadata=False, metadata_file_name='chips_df.csv', **kwargs) #

Initializes the MultiTemporalCropClassificationDataModule for multi-temporal crop classification.

Parameters:

Name Type Description Default
data_root str

Directory containing the dataset.

required
batch_size int

Batch size for DataLoaders. Defaults to 4.

4
num_workers int

Number of workers for data loading. Defaults to 0.

0
bands Sequence[str]

List of bands to use. Defaults to MultiTemporalCropClassification.all_band_names.

all_band_names
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
drop_last bool

Whether to drop the last incomplete batch during training. Defaults to True.

True
no_data_replace float | None

Replacement value for missing data. Defaults to 0.

0
no_label_replace int | None

Replacement value for missing labels. Defaults to -1.

-1
expand_temporal_dimension bool

Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to True.

True
reduce_zero_label bool

Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to True.

True
use_metadata bool

Whether to return metadata info (time and location).

False
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.open_sentinel_map #

OpenSentinelMapDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for Open Sentinel Map.

__init__(bands=None, batch_size=8, num_workers=0, data_root='./', train_transform=None, val_transform=None, test_transform=None, predict_transform=None, spatial_interpolate_and_stack_temporally=True, pad_image=None, truncate_image=None, **kwargs) #

Initializes the OpenSentinelMapDataModule for the Open Sentinel Map dataset.

Parameters:

Name Type Description Default
bands list[str] | None

List of bands to use. Defaults to None.

None
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
spatial_interpolate_and_stack_temporally bool

If True, the bands are interpolated and concatenated over time. Default is True.

True
pad_image int | None

Number of timesteps to pad the time dimension of the image. If None, no padding is applied.

None
truncate_image int | None

Number of timesteps to truncate the time dimension of the image. If None, no truncation is performed.

None
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.openearthmap #

OpenEarthMapNonGeoDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for Open Earth Map.

__init__(batch_size=8, num_workers=0, data_root='./', train_transform=None, val_transform=None, test_transform=None, predict_transform=None, aug=None, **kwargs) #

Initializes the OpenEarthMapNonGeoDataModule for the Open Earth Map dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for test data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
aug AugmentationSequential

Augmentation pipeline; if None, defaults to normalization using computed means and stds.

None
**kwargs Any

Additional keyword arguments. Can include 'bands' (list[str]) to specify the bands; defaults to OpenEarthMapNonGeo.all_band_names if not provided.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.pastis #

PASTISDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for PASTIS.

__init__(batch_size=8, num_workers=0, data_root='./', truncate_image=None, pad_image=None, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, **kwargs) #

Initializes the PASTISDataModule for the PASTIS dataset.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Directory containing the dataset. Defaults to "./".

'./'
truncate_image int

Truncate the time dimension of the image to a specified number of timesteps. If None, no truncation is performed.

None
pad_image int

Pad the time dimension of the image to a specified number of timesteps. If None, no padding is applied.

None
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for testing data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.sen1floods11 #

Sen1Floods11NonGeoDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for Fire Scars.

__init__(data_root, batch_size=4, num_workers=0, bands=Sen1Floods11NonGeo.all_band_names, train_transform=None, val_transform=None, test_transform=None, predict_transform=None, drop_last=True, constant_scale=0.0001, no_data_replace=0, no_label_replace=-1, use_metadata=False, **kwargs) #

Initializes the Sen1Floods11NonGeoDataModule.

Parameters:

Name Type Description Default
data_root str

Root directory of the dataset.

required
batch_size int

Batch size for DataLoaders. Defaults to 4.

4
num_workers int

Number of workers for data loading. Defaults to 0.

0
bands Sequence[str]

List of bands to use. Defaults to Sen1Floods11NonGeo.all_band_names.

all_band_names
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for test data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
drop_last bool

Whether to drop the last incomplete batch. Defaults to True.

True
constant_scale float

Scale constant applied to the dataset. Defaults to 0.0001.

0.0001
no_data_replace float | None

Replacement value for missing data. Defaults to 0.

0
no_label_replace int | None

Replacement value for missing labels. Defaults to -1.

-1
use_metadata bool

Whether to return metadata info (time and location).

False
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.sen4agrinet #

Sen4AgriNetDataModule #

Bases: NonGeoDataModule

NonGeo LightningDataModule implementation for Sen4AgriNet.

__init__(bands=None, batch_size=8, num_workers=0, data_root='./', train_transform=None, val_transform=None, test_transform=None, predict_transform=None, seed=42, scenario='random', requires_norm=True, binary_labels=False, linear_encoder=None, **kwargs) #

Initializes the Sen4AgriNetDataModule for the Sen4AgriNet dataset.

Parameters:

Name Type Description Default
bands list[str] | None

List of bands to use. Defaults to None.

None
batch_size int

Batch size for DataLoaders. Defaults to 8.

8
num_workers int

Number of workers for data loading. Defaults to 0.

0
data_root str

Root directory of the dataset. Defaults to "./".

'./'
train_transform Compose | None | list[BasicTransform]

Transformations for training data.

None
val_transform Compose | None | list[BasicTransform]

Transformations for validation data.

None
test_transform Compose | None | list[BasicTransform]

Transformations for test data.

None
predict_transform Compose | None | list[BasicTransform]

Transformations for prediction data.

None
seed int

Random seed for reproducibility. Defaults to 42.

42
scenario str

Defines the splitting scenario to use. Options are: - 'random': Random split of the data. - 'spatial': Split by geographical regions (Catalonia and France). - 'spatio-temporal': Split by region and year (France 2019 and Catalonia 2020).

'random'
requires_norm bool

Whether normalization is required. Defaults to True.

True
binary_labels bool

Whether to use binary labels. Defaults to False.

False
linear_encoder dict

Mapping for label encoding. Defaults to None.

None
**kwargs Any

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, validate, test, or predict.

required

terratorch.datamodules.sen4map #

Sen4MapLucasDataModule #

Bases: LightningDataModule

NonGeo LightningDataModule implementation for Sen4map.

__init__(batch_size, num_workers, prefetch_factor=0, train_hdf5_path=None, train_hdf5_keys_path=None, test_hdf5_path=None, test_hdf5_keys_path=None, val_hdf5_path=None, val_hdf5_keys_path=None, **kwargs) #

Initializes the Sen4MapLucasDataModule for handling Sen4Map monthly composites.

Parameters:

Name Type Description Default
batch_size int

Batch size for DataLoaders.

required
num_workers int

Number of worker processes for data loading.

required
prefetch_factor int

Number of samples to prefetch per worker. Defaults to 0.

0
train_hdf5_path str

Path to the training HDF5 file.

None
train_hdf5_keys_path str

Path to the training HDF5 keys file.

None
test_hdf5_path str

Path to the testing HDF5 file.

None
test_hdf5_keys_path str

Path to the testing HDF5 keys file.

None
val_hdf5_path str

Path to the validation HDF5 file.

None
val_hdf5_keys_path str

Path to the validation HDF5 keys file.

None
train_hdf5_keys_save_path str

(from kwargs) Path to save generated train keys.

required
test_hdf5_keys_save_path str

(from kwargs) Path to save generated test keys.

required
val_hdf5_keys_save_path str

(from kwargs) Path to save generated validation keys.

required
shuffle bool

Global shuffle flag.

required
train_shuffle bool

Shuffle flag for training data; defaults to global shuffle if unset.

required
val_shuffle bool

Shuffle flag for validation data.

required
test_shuffle bool

Shuffle flag for test data.

required
train_data_fraction float

Fraction of training data to use. Defaults to 1.0.

required
val_data_fraction float

Fraction of validation data to use. Defaults to 1.0.

required
test_data_fraction float

Fraction of test data to use. Defaults to 1.0.

required
all_hdf5_data_path str

General HDF5 data path for all splits. If provided, overrides specific paths.

required
resize bool

Whether to resize images. Defaults to False.

required
resize_to int or tuple

Target size for resizing images.

required
resize_interpolation str

Interpolation mode for resizing ('bilinear', 'bicubic', etc.).

required
resize_antialiasing bool

Whether to apply antialiasing during resizing. Defaults to True.

required
**kwargs

Additional keyword arguments.

{}

setup(stage) #

Set up datasets.

Parameters:

Name Type Description Default
stage str

Either fit, test.

required

terratorch.datamodules.torchgeo_data_module #

Ugly proxy objects so parsing config file works with transforms.

These are necessary since, for LightningCLI to instantiate arguments as objects from the config, they must have type annotations

In TorchGeo, transforms is passed in **kwargs, so it has no type annotations! To get around that, we create these wrappers that have transforms type annotated. They create the transforms and forward all method and attribute calls to the original TorchGeo datamodule.

Additionally, TorchGeo datasets pass the data to the transforms callable as a dict, and as a tensor.

Albumentations expects this data not as a dict but as different key-value arguments, and as numpy. We handle that conversion here.

TorchGeoDataModule #

Bases: GeoDataModule

Proxy object for using Geo data modules defined by TorchGeo.

Allows for transforms to be defined and passed using config files. The only reason this class exists is so that we can annotate the transforms argument with a type. This is required for lightningcli and config files. As such, all getattr and setattr will be redirected to the underlying class.

__init__(cls, batch_size=None, num_workers=0, transforms=None, **kwargs) #

Constructor

Parameters:

Name Type Description Default
cls type[GeoDataModule]

TorchGeo DataModule class to be instantiated

required
batch_size int | None

batch_size. Defaults to None.

None
num_workers int

num_workers. Defaults to 0.

0
transforms None | list[BasicTransform]

List of Albumentations Transforms. Should enc with ToTensorV2. Defaults to None.

None
**kwargs Any

Arguments passed to instantiate cls.

{}

TorchNonGeoDataModule #

Bases: NonGeoDataModule

Proxy object for using NonGeo data modules defined by TorchGeo.

Allows for transforms to be defined and passed using config files. The only reason this class exists is so that we can annotate the transforms argument with a type. This is required for lightningcli and config files. As such, all getattr and setattr will be redirected to the underlying class.

__init__(cls, batch_size=None, num_workers=0, transforms=None, **kwargs) #

Constructor

Parameters:

Name Type Description Default
cls type[NonGeoDataModule]

TorchGeo DataModule class to be instantiated

required
batch_size int | None

batch_size. Defaults to None.

None
num_workers int

num_workers. Defaults to 0.

0
transforms None | list[BasicTransform]

List of Albumentations Transforms. Should enc with ToTensorV2. Defaults to None.

None
**kwargs Any

Arguments passed to instantiate cls.

{}

Last update: March 23, 2025