Skip to content

Datasets

terratorch.datasets.biomassters #

BioMasstersNonGeo #

Bases: BioMassters

BioMassters Dataset for Aboveground Biomass prediction.

Dataset intended for Aboveground Biomass (AGB) prediction over Finnish forests based on Sentinel 1 and 2 data with corresponding target AGB mask values generated by Light Detection and Ranging (LiDAR).

Dataset Format:

  • .tif files for Sentinel 1 and 2 data
  • .tif file for pixel wise AGB target mask
  • .csv files for metadata regarding features and targets

Dataset Features:

  • 13,000 target AGB masks of size (256x256px)
  • 12 months of data per target mask
  • Sentinel 1 and Sentinel 2 data for each location
  • Sentinel 1 available for every month
  • Sentinel 2 available for almost every month (not available for every month due to ESA acquisition halt over the region during particular periods)

If you use this dataset in your research, please cite the following paper:

  • https://nascetti-a.github.io/BioMasster/

.. versionadded:: 0.5

__init__(root='data', split='train', bands=BAND_SETS['all'], transform=None, mask_mean=63.4584, mask_std=72.21242, sensors=['S1', 'S2'], as_time_series=False, metadata_filename=default_metadata_filename, max_cloud_percentage=None, max_red_mean=None, include_corrupt=True, subset=1, seed=42, use_four_frames=False) #

Initialize a new instance of BioMassters dataset.

If as_time_series=False (the default), each time step becomes its own sample with the target being shared across multiple samples.

Parameters:

Name Type Description Default
root

root directory where dataset can be found

'data'
split str

train or test split

'train'
sensors Sequence[str]

which sensors to consider for the sample, Sentinel 1 and/or Sentinel 2 ('S1', 'S2')

['S1', 'S2']
as_time_series bool

whether or not to return all available time-steps or just a single one for a given target location

False
metadata_filename str

metadata file to be used

default_metadata_filename
max_cloud_percentage float | None

maximum allowed cloud percentage for images

None
max_red_mean float | None

maximum allowed red_mean value for images

None
include_corrupt bool

whether to include images marked as corrupted

True

Raises:

Type Description
AssertionError

if split or sensors is invalid

DatasetNotFoundError

If dataset is not found.

plot(sample, show_titles=True, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

a sample returned by :meth:__getitem__

required
show_titles bool

flag indicating whether to show titles above each panel

True
suptitle str | None

optional suptitle to use for figure

None

Returns:

Type Description
Figure

a matplotlib Figure with the rendered sample

terratorch.datasets.burn_intensity #

BurnIntensityNonGeo #

Bases: NonGeoDataset

Dataset implementation for Burn Intensity classification.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, use_full_data=True, no_data_replace=0.0001, no_label_replace=-1, use_metadata=False) #

Initialize the BurnIntensity dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train' or 'val'.

'train'
bands Sequence[str]

Bands to output. Defaults to all bands.

BAND_SETS['all']
transform Optional[Compose]

Albumentations transform to be applied.

None
use_metadata bool

Whether to return metadata info (location).

False
use_full_data bool

Wheter to use full data or data with less than 25 percent zeros.

True
no_data_replace Optional[float]

Value to replace NaNs in images.

0.0001
no_label_replace Optional[int]

Value to replace NaNs in labels.

-1

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by __getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Any

A matplotlib Figure with the rendered sample.

terratorch.datasets.carbonflux #

CarbonFluxNonGeo #

Bases: NonGeoDataset

Dataset for Carbon Flux regression from HLS images and MERRA data.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, gpp_mean=None, gpp_std=None, no_data_replace=0.0001, use_metadata=False, modalities=('image', 'merra_vars')) #

Initialize the CarbonFluxNonGeo dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

'train' or 'test'.

'train'
bands Sequence[str]

Bands to use. Defaults to all bands.

BAND_SETS['all']
transform Optional[Compose]

Albumentations transform to be applied.

None
use_metadata bool

Whether to return metadata (coordinates and date).

False
merra_means Sequence[float]

Means for MERRA data normalization.

required
merra_stds Sequence[float]

Standard deviations for MERRA data normalization.

required
gpp_mean float

Mean for GPP normalization.

None
gpp_std float

Standard deviation for GPP normalization.

None
no_data_replace Optional[float]

Value to replace NO_DATA values in images.

0.0001

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Any]

A sample returned by __getitem__.

required
suptitle str | None

Optional title for the figure.

None

Returns:

Type Description
Any

A matplotlib figure with the rendered sample.

terratorch.datasets.forestnet #

ForestNetNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for ForestNet.

__init__(data_root, split='train', label_map=default_label_map, transform=None, fraction=1.0, bands=BAND_SETS['all'], use_metadata=False) #

Initialize the ForestNetNonGeo dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
label_map Dict[str, int]

Mapping from label names to integer labels.

default_label_map
transform Compose | None

Transformations to be applied to the images.

None
fraction float

Fraction of the dataset to use. Defaults to 1.0 (use all data).

1.0

map_label(index) #

Map the label name to an integer label.

terratorch.datasets.fire_scars #

FireScarsHLS #

Bases: RasterDataset

RasterDataset implementation for fire scars input images.

FireScarsNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for fire scars.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, no_data_replace=0, no_label_replace=-1, use_metadata=False) #

Constructor

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
bands list[str]

Bands that should be output by the dataset. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Should end with ToTensorV2(). If used through the corresponding data module, should not include normalization. Defaults to None, which applies ToTensorV2().

None
no_data_replace float | None

Replace nan values in input images with this value. If None, does no replacement. Defaults to 0.

0
no_label_replace int | None

Replace nan values in label with this value. If none, does no replacement. Defaults to -1.

-1
use_metadata bool

whether to return metadata info (time and location).

False

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

a sample returned by :meth:__getitem__

required
suptitle str | None

optional string to use as a suptitle

None

Returns:

Type Description
Figure

a matplotlib Figure with the rendered sample

FireScarsSegmentationMask #

Bases: RasterDataset

RasterDataset implementation for fire scars segmentation mask. Can be easily merged with input images using the & operator.

terratorch.datasets.landslide4sense #

Landslide4SenseNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for Landslide4Sense.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None) #

Initialize the Landslide4Sense dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'validation', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None

terratorch.datasets.m_eurosat #

MEuroSATNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-EuroSAT.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_bigearthnet #

MBigEarthNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-BigEarthNet.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_brick_kiln #

MBrickKilnNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-BrickKiln.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_forestnet #

MForestNetNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-ForestNet.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'
use_metadata bool

Whether to return metadata info (time and location).

False

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_so2sat #

MSo2SatNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-So2Sat.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_pv4ger #

MPv4gerNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-PV4GER.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'
use_metadata bool

Whether to return metadata info (location coordinates).

False

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_cashew_plantation #

MBeninSmallHolderCashewsNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-BeninSmallHolderCashews.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'
use_metadata bool

Whether to return metadata info (time).

False

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_nz_cattle #

MNzCattleNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-NZ-Cattle.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'
use_metadata bool

Whether to return metadata info (time and location).

False

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_chesapeake_landcover #

MChesapeakeLandcoverNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-ChesapeakeLandcover.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_pv4ger_seg #

MPv4gerSegNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-PV4GER-SEG.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default', use_metadata=False) #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'
use_metadata bool

Whether to return metadata info (location coordinates).

False

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_SA_crop_type #

MSACropTypeNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-SA-Crop-Type.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, partition='default') #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.m_neontree #

MNeonTreeNonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for M-NeonTree.

__init__(data_root, split='train', bands=rgb_bands, transform=None, partition='default') #

Initialize the dataset.

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

One of 'train', 'val', or 'test'.

'train'
bands Sequence[str]

Bands to be used. Defaults to RGB bands.

rgb_bands
transform Compose | None

Albumentations transform to be applied. Defaults to None, which applies default_transform().

None
partition str

Partition name for the dataset splits. Defaults to 'default'.

'default'

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

A sample returned by :meth:__getitem__.

required
suptitle str | None

Optional string to use as a suptitle.

None

Returns:

Type Description
Figure

matplotlib.figure.Figure: A matplotlib Figure with the rendered sample.

terratorch.datasets.multi_temporal_crop_classification #

MultiTemporalCropClassification #

Bases: NonGeoDataset

NonGeo dataset implementation for multi-temporal crop classification.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=True, reduce_zero_label=True, use_metadata=False, metadata_file_name='chips_df.csv') #

Constructor

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

one of 'train' or 'val'.

'train'
bands list[str]

Bands that should be output by the dataset. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Should end with ToTensorV2(). If used through the corresponding data module, should not include normalization. Defaults to None, which applies ToTensorV2().

None
no_data_replace float | None

Replace nan values in input images with this value. If None, does no replacement. Defaults to None.

None
no_label_replace int | None

Replace nan values in label with this value. If none, does no replacement. Defaults to None.

None
expand_temporal_dimension bool

Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to True.

True
reduce_zero_label bool

Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to True.

True
use_metadata bool

whether to return metadata info (time and location).

False

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

a sample returned by :meth:__getitem__

required
suptitle str | None

optional string to use as a suptitle

None

Returns:

Type Description
Figure

a matplotlib Figure with the rendered sample

terratorch.datasets.open_sentinel_map #

OpenSentinelMap #

Bases: NonGeoDataset

Pytorch Dataset class to load samples from the OpenSentinelMap dataset, supporting multiple bands and temporal sampling strategies.

__init__(data_root, split='train', bands=None, transform=None, spatial_interpolate_and_stack_temporally=True, pad_image=None, truncate_image=None, target=0, pick_random_pair=True) #

Parameters:

Name Type Description Default
data_root str

Path to the root directory of the dataset.

required
split str

Dataset split to load. Options are 'train', 'val', or 'test'. Defaults to 'train'.

'train'
bands list of str

List of band names to load. Defaults to ['gsd_10', 'gsd_20', 'gsd_60'].

None
transform Compose

Albumentations transformations to apply to the data.

None
spatial_interpolate_and_stack_temporally bool

If True, the bands are interpolated and concatenated over time. Default is True.

True
pad_image int

Number of timesteps to pad the time dimension of the image. If None, no padding is applied.

None
truncate_image int

Number of timesteps to truncate the time dimension of the image. If None, no truncation is performed.

None
target int

Specifies which target class to use from the mask. Default is 0.

0
pick_random_pair bool

If True, selects two random images from the temporal sequence. Default is True.

True

terratorch.datasets.openearthmap #

OpenEarthMapNonGeo #

Bases: NonGeoDataset

OpenEarthMapNonGeo Dataset for non-georeferenced imagery.

This dataset class handles non-georeferenced image data from the OpenEarthMap dataset. It supports configurable band sets and transformations, and performs cropping operations to ensure that the images conform to the required input dimensions. The dataset is split into "train", "test", and "val" subsets based on the provided split parameter.

__init__(data_root, bands=BAND_SETS['all'], transform=None, split='train', crop_size=256, random_crop=True) #

Initialize a new instance of the OpenEarthMapNonGeo dataset.

Parameters:

Name Type Description Default
data_root str

The root directory containing the dataset files.

required
bands Sequence[str]

A list of band names to be used. Default is BAND_SETS["all"].

BAND_SETS['all']
transform Compose or None

A transformation pipeline to be applied to the data. If None, a default transform converting the data to a tensor is applied.

None
split str

The dataset split to use ("train", "test", or "val"). Default is "train".

'train'
crop_size int

The size (in pixels) of the crop to apply to images. Must be greater than 0. Default is 256.

256
random_crop bool

If True, performs a random crop; otherwise, performs a center crop. Default is True.

True

Raises:

Type Description
Exception

If the provided split is not one of "train", "test", or "val".

AssertionError

If crop_size is not greater than 0.

terratorch.datasets.pastis #

PASTIS #

Bases: NonGeoDataset

" Pytorch Dataset class to load samples from the PASTIS dataset, for semantic and panoptic segmentation.

__init__(data_root, norm=True, target='semantic', folds=None, reference_date='2018-09-01', date_interval=(-200, 600), class_mapping=None, transform=None, truncate_image=None, pad_image=None, satellites=['S2']) #

Parameters:

Name Type Description Default
data_root str

Path to the dataset.

required
norm bool

If true, images are standardised using pre-computed channel-wise means and standard deviations.

True
reference_date (str, Format)

'YYYY-MM-DD'): Defines the reference date based on which all observation dates are expressed. Along with the image time series and the target tensor, this dataloader yields the sequence of observation dates (in terms of number of days since the reference date). This sequence of dates is used for instance for the positional encoding in attention based approaches.

'2018-09-01'
target str

'semantic' or 'instance'. Defines which type of target is returned by the dataloader. * If 'semantic' the target tensor is a tensor containing the class of each pixel. * If 'instance' the target tensor is the concatenation of several signals, necessary to train the Parcel-as-Points module: - the centerness heatmap, - the instance ids, - the voronoi partitioning of the patch with regards to the parcels' centers, - the (height, width) size of each parcel, - the semantic label of each parcel, - the semantic label of each pixel.

'semantic'
folds list

List of ints specifying which of the 5 official folds to load. By default (when None is specified), all folds are loaded.

None
class_mapping dict

A dictionary to define a mapping between the default 18 class nomenclature and another class grouping. If not provided, the default class mapping is used.

None
transform callable

A transform to apply to the loaded data (images, dates, and masks). By default, no transformation is applied.

None
truncate_image int

Truncate the time dimension of the image to a specified number of timesteps. If None, no truncation is performed.

None
pad_image int

Pad the time dimension of the image to a specified number of timesteps. If None, no padding is applied.

None
satellites list

Defines the satellites to use. If you are using PASTIS-R, you have access to Sentinel-2 imagery and Sentinel-1 observations in Ascending and Descending orbits, respectively S2, S1A, and S1D. For example, use satellites=['S2', 'S1A'] for Sentinel-2 + Sentinel-1 ascending time series, or satellites=['S2', 'S1A', 'S1D'] to retrieve all time series. If you are using PASTIS, only S2 observations are available.

['S2']

terratorch.datasets.sen1floods11 #

Sen1Floods11NonGeo #

Bases: NonGeoDataset

NonGeo dataset implementation for sen1floods11.

__init__(data_root, split='train', bands=BAND_SETS['all'], transform=None, constant_scale=0.0001, no_data_replace=0, no_label_replace=-1, use_metadata=False) #

Constructor

Parameters:

Name Type Description Default
data_root str

Path to the data root directory.

required
split str

one of 'train', 'val' or 'test'.

'train'
bands list[str]

Bands that should be output by the dataset. Defaults to all bands.

BAND_SETS['all']
transform Compose | None

Albumentations transform to be applied. Should end with ToTensorV2(). Defaults to None, which applies ToTensorV2().

None
constant_scale float

Factor to multiply image values by. Defaults to 0.0001.

0.0001
no_data_replace float | None

Replace nan values in input images with this value. If None, does no replacement. Defaults to 0.

0
no_label_replace int | None

Replace nan values in label with this value. If none, does no replacement. Defaults to -1.

-1
use_metadata bool

whether to return metadata info (time and location).

False

plot(sample, suptitle=None) #

Plot a sample from the dataset.

Parameters:

Name Type Description Default
sample dict[str, Tensor]

a sample returned by :meth:__getitem__

required
suptitle str | None

optional string to use as a suptitle

None

Returns:

Type Description
Figure

a matplotlib Figure with the rendered sample

terratorch.datasets.sen4agrinet #

Sen4AgriNet #

Bases: NonGeoDataset

__init__(data_root, bands=None, scenario='random', split='train', transform=None, truncate_image=4, pad_image=4, spatial_interpolate_and_stack_temporally=True, seed=42) #

Pytorch Dataset class to load samples from the Sen4AgriNet dataset, supporting multiple scenarios for splitting the data.

Parameters:

Name Type Description Default
data_root str

Root directory of the dataset.

required
bands list of str

List of band names to load. Defaults to all available bands.

None
scenario str

Defines the splitting scenario to use. Options are: - 'random': Random split of the data. - 'spatial': Split by geographical regions (Catalonia and France). - 'spatio-temporal': Split by region and year (France 2019 and Catalonia 2020).

'random'
split str

Specifies the dataset split. Options are 'train', 'val', or 'test'.

'train'
transform Compose

Albumentations transformations to apply to the data.

None
truncate_image int

Number of timesteps to truncate the time dimension of the image. If None, no truncation is applied. Default is 4.

4
pad_image int

Number of timesteps to pad the time dimension of the image. If None, no padding is applied. Default is 4.

4
spatial_interpolate_and_stack_temporally bool

Whether to interpolate bands and concatenate them over time

True
seed int

Random seed used for data splitting.

42

terratorch.datasets.sen4map #

Sen4MapDatasetMonthlyComposites #

Bases: Dataset

Sen4Map Dataset for Monthly Composites.

Dataset intended for land-cover and crop classification tasks based on monthly composites derived from multi-temporal satellite data stored in HDF5 files.

Dataset Format:

  • HDF5 files containing multi-temporal acquisitions with spectral bands (e.g., B2, B3, …, B12)
  • Composite images computed as the median across available acquisitions for each month.
  • Classification labels provided via HDF5 attributes (e.g., 'lc1') with mappings defined for:
    • Land-cover: using land_cover_classification_map
    • Crops: using crop_classification_map

Dataset Features:

  • Supports two classification tasks: "land-cover" (default) and "crops".
  • Pre-processing options include center cropping, reverse tiling, and resizing.
  • Option to save the keys HDF5 for later filtering.
  • Input channel selection via a mapping between available bands and input bands.

__init__(h5py_file_object, h5data_keys=None, crop_size=None, dataset_bands=None, input_bands=None, resize=False, resize_to=[224, 224], resize_interpolation=InterpolationMode.BILINEAR, resize_antialiasing=True, reverse_tile=False, reverse_tile_size=3, save_keys_path=None, classification_map='land-cover') #

Initialize a new instance of Sen4MapDatasetMonthlyComposites.

This dataset loads data from an HDF5 file object containing multi-temporal satellite data and computes monthly composite images by aggregating acquisitions (via median).

Parameters:

Name Type Description Default
h5py_file_object File

An open h5py.File object containing the dataset.

required
h5data_keys

Optional list of keys to select a subset of data samples from the HDF5 file. If None, all keys are used.

None
crop_size None | int

Optional integer specifying the square crop size for the output image.

None
dataset_bands list[HLSBands | int] | None

Optional list of bands available in the dataset.

None
input_bands list[HLSBands | int] | None

Optional list of bands to be used as input channels. Must be provided along with dataset_bands.

None
resize

Boolean flag indicating whether the image should be resized. Default is False.

False
resize_to

Target dimensions [height, width] for resizing. Default is [224, 224].

[224, 224]
resize_interpolation

Interpolation mode used for resizing. Default is InterpolationMode.BILINEAR.

BILINEAR
resize_antialiasing

Boolean flag to apply antialiasing during resizing. Default is True.

True
reverse_tile

Boolean flag indicating whether to apply reverse tiling to the image. Default is False.

False
reverse_tile_size

Kernel size for the reverse tiling operation. Must be an odd number >= 3. Default is 3.

3
save_keys_path

Optional file path to save the list of dataset keys.

None
classification_map

String specifying the classification mapping to use ("land-cover" or "crops"). Default is "land-cover".

'land-cover'

Raises:

Type Description
ValueError

If input_bands is provided without specifying dataset_bands.

ValueError

If an invalid classification_map is provided.

reverse_tiling_pytorch(img_tensor, kernel_size=3) #

Upscales an image where every pixel is expanded into kernel_size*kernel_size pixels. Used to test whether the benefit of resizing images to the pre-trained size comes from the bilnearly interpolated pixels, or if the same would be realized with no interpolated pixels.


Last update: March 23, 2025