Generic Datasets
terratorch.datasets.generic_pixel_wise_dataset
#
Module containing generic dataset classes
GenericNonGeoPixelwiseRegressionDataset
#
Bases: GenericPixelWiseDataset
GenericNonGeoPixelwiseRegressionDataset
__init__(data_root, label_data_root=None, image_grep='*', label_grep='*', split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, constant_scale=1, transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=False, reduce_zero_label=False)
#
Constructor
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_root
|
Path
|
Path to data root directory |
required |
label_data_root
|
Path
|
Path to data root directory with labels. If not specified, will use the same as for images. |
None
|
image_grep
|
str
|
Regular expression appended to data_root to find input images. Defaults to "*". |
'*'
|
label_grep
|
str
|
Regular expression appended to data_root to find ground truth masks. Defaults to "*". |
'*'
|
split
|
Path
|
Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep]) |
None
|
ignore_split_file_extensions
|
bool
|
Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True. |
True
|
allow_substring_split_file
|
bool
|
Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True. |
True
|
rgb_indices
|
list[str]
|
Indices of RGB channels. Defaults to [0, 1, 2]. |
None
|
dataset_bands
|
list[HLSBands | int] | None
|
Bands present in the dataset. |
None
|
output_bands
|
list[HLSBands | int] | None
|
Bands that should be output by the dataset. |
None
|
constant_scale
|
float
|
Factor to multiply image values by. Defaults to 1. |
1
|
transform
|
Compose | None
|
Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2(). |
None
|
no_data_replace
|
float | None
|
Replace nan values in input images with this value. If none, does no replacement. Defaults to None. |
None
|
no_label_replace
|
int | None
|
Replace nan values in label with this value. If none, does no replacement. Defaults to None. |
None
|
expand_temporal_dimension
|
bool
|
Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False. |
False
|
reduce_zero_label
|
bool
|
Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False. |
False
|
plot(sample, suptitle=None)
#
Plot a sample from the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample
|
dict[str, Tensor]
|
a sample returned by :meth: |
required |
suptitle
|
str | None
|
optional string to use as a suptitle |
None
|
Returns:
Type | Description |
---|---|
Figure
|
a matplotlib Figure with the rendered sample |
.. versionadded:: 0.2
GenericNonGeoSegmentationDataset
#
Bases: GenericPixelWiseDataset
GenericNonGeoSegmentationDataset
__init__(data_root, num_classes, label_data_root=None, image_grep='*', label_grep='*', split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, class_names=None, constant_scale=1, transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=False, reduce_zero_label=False)
#
Constructor
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_root
|
Path
|
Path to data root directory |
required |
num_classes
|
int
|
Number of classes in the dataset |
required |
label_data_root
|
Path
|
Path to data root directory with labels. If not specified, will use the same as for images. |
None
|
image_grep
|
str
|
Regular expression appended to data_root to find input images. Defaults to "*". |
'*'
|
label_grep
|
str
|
Regular expression appended to data_root to find ground truth masks. Defaults to "*". |
'*'
|
split
|
Path
|
Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep]) |
None
|
ignore_split_file_extensions
|
bool
|
Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True |
True
|
allow_substring_split_file
|
bool
|
Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True. |
True
|
rgb_indices
|
list[str]
|
Indices of RGB channels. Defaults to [0, 1, 2]. |
None
|
dataset_bands
|
list[HLSBands | int] | None
|
Bands present in the dataset. |
None
|
output_bands
|
list[HLSBands | int] | None
|
Bands that should be output by the dataset. |
None
|
class_names
|
list[str]
|
Class names. Defaults to None. |
None
|
constant_scale
|
float
|
Factor to multiply image values by. Defaults to 1. |
1
|
transform
|
Compose | None
|
Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2(). |
None
|
no_data_replace
|
float | None
|
Replace nan values in input images with this value. If none, does no replacement. Defaults to None. |
None
|
no_label_replace
|
int | None
|
Replace nan values in label with this value. If none, does no replacement. Defaults to None. |
None
|
expand_temporal_dimension
|
bool
|
Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False. |
False
|
reduce_zero_label
|
bool
|
Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False. |
False
|
plot(sample, suptitle=None)
#
Plot a sample from the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample
|
dict[str, Tensor]
|
a sample returned by :meth: |
required |
suptitle
|
str | None
|
optional string to use as a suptitle |
None
|
Returns:
Type | Description |
---|---|
Figure
|
a matplotlib Figure with the rendered sample |
.. versionadded:: 0.2
GenericPixelWiseDataset
#
Bases: NonGeoDataset
, ABC
This is a generic dataset class to be used for instantiating datasets from arguments. Ideally, one would create a dataset class specific to a dataset.
__init__(data_root, label_data_root=None, image_grep='*', label_grep='*', split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, constant_scale=1, transform=None, no_data_replace=None, no_label_replace=None, expand_temporal_dimension=False, reduce_zero_label=False)
#
Constructor
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_root
|
Path
|
Path to data root directory |
required |
label_data_root
|
Path
|
Path to data root directory with labels. If not specified, will use the same as for images. |
None
|
image_grep
|
str
|
Regular expression appended to data_root to find input images. Defaults to "*". |
'*'
|
label_grep
|
str
|
Regular expression appended to data_root to find ground truth masks. Defaults to "*". |
'*'
|
split
|
Path
|
Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep]) |
None
|
ignore_split_file_extensions
|
bool
|
Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True. |
True
|
allow_substring_split_file
|
bool
|
Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True. |
True
|
rgb_indices
|
list[str]
|
Indices of RGB channels. Defaults to [0, 1, 2]. |
None
|
dataset_bands
|
list[HLSBands | int | tuple[int, int] | str] | None
|
Bands present in the dataset. This parameter names input channels (bands) using HLSBands, ints, int ranges, or strings, so that they can then be refered to by output_bands. Defaults to None. |
None
|
output_bands
|
list[HLSBands | int | tuple[int, int] | str] | None
|
Bands that should be output by the dataset as named by dataset_bands. |
None
|
constant_scale
|
float
|
Factor to multiply image values by. Defaults to 1. |
1
|
transform
|
Compose | None
|
Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2(). |
None
|
no_data_replace
|
float | None
|
Replace nan values in input images with this value. If none, does no replacement. Defaults to None. |
None
|
no_label_replace
|
int | None
|
Replace nan values in label with this value. If none, does no replacement. Defaults to -1. |
None
|
expand_temporal_dimension
|
bool
|
Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False. |
False
|
reduce_zero_label
|
bool
|
Subtract 1 from all labels. Useful when labels start from 1 instead of the expected 0. Defaults to False. |
False
|
terratorch.datasets.generic_scalar_label_dataset
#
Module containing generic dataset classes
GenericNonGeoClassificationDataset
#
Bases: GenericScalarLabelDataset
GenericNonGeoClassificationDataset
__init__(data_root, num_classes, split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, class_names=None, constant_scale=1, transform=None, no_data_replace=0, expand_temporal_dimension=False)
#
A generic Non-Geo dataset for classification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_root
|
Path
|
Path to data root directory |
required |
num_classes
|
int
|
Number of classes in the dataset |
required |
split
|
Path
|
Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep]) |
None
|
ignore_split_file_extensions
|
bool
|
Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True. |
True
|
allow_substring_split_file
|
bool
|
Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True. |
True
|
rgb_indices
|
list[str]
|
Indices of RGB channels. Defaults to [0, 1, 2]. |
None
|
dataset_bands
|
list[HLSBands | int] | None
|
Bands present in the dataset. |
None
|
output_bands
|
list[HLSBands | int] | None
|
Bands that should be output by the dataset. |
None
|
class_names
|
list[str]
|
Class names. Defaults to None. |
None
|
constant_scale
|
float
|
Factor to multiply image values by. Defaults to 1. |
1
|
transform
|
Compose | None
|
Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2(). |
None
|
no_data_replace
|
float
|
Replace nan values in input images with this value. Defaults to 0. |
0
|
expand_temporal_dimension
|
bool
|
Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False. |
False
|
GenericScalarLabelDataset
#
Bases: NonGeoDataset
, ImageFolder
, ABC
This is a generic dataset class to be used for instantiating datasets from arguments. Ideally, one would create a dataset class specific to a dataset.
__init__(data_root, split=None, ignore_split_file_extensions=True, allow_substring_split_file=True, rgb_indices=None, dataset_bands=None, output_bands=None, constant_scale=1, transform=None, no_data_replace=0, expand_temporal_dimension=False)
#
Constructor
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_root
|
Path
|
Path to data root directory |
required |
split
|
Path
|
Path to file containing files to be used for this split. The file should be a new-line separated prefixes contained in the desired files. Files will be seached using glob with the form Path(data_root).glob(prefix + [image or label grep]) |
None
|
ignore_split_file_extensions
|
bool
|
Whether to disregard extensions when using the split file to determine which files to include in the dataset. E.g. necessary for Eurosat, since the split files specify ".jpg" but files are actually ".jpg". Defaults to True. |
True
|
allow_substring_split_file
|
bool
|
Whether the split files contain substrings that must be present in file names to be included (as in mmsegmentation), or exact matches (e.g. eurosat). Defaults to True. |
True
|
rgb_indices
|
list[str]
|
Indices of RGB channels. Defaults to [0, 1, 2]. |
None
|
dataset_bands
|
list[HLSBands | int | tuple[int, int] | str] | None
|
Bands present in the dataset. This parameter gives identifiers to input channels (bands) so that they can then be refered to by output_bands. Can use the HLSBands enum, ints, int ranges, or strings. Defaults to None. |
None
|
output_bands
|
list[HLSBands | int | tuple[int, int] | str] | None
|
Bands that should be output by the dataset as named by dataset_bands. |
None
|
constant_scale
|
float
|
Factor to multiply image values by. Defaults to 1. |
1
|
transform
|
Compose | None
|
Albumentations transform to be applied. Should end with ToTensorV2(). If used through the generic_data_module, should not include normalization. Not supported for multi-temporal data. Defaults to None, which simply applies ToTensorV2(). |
None
|
no_data_replace
|
float
|
Replace nan values in input images with this value. Defaults to 0. |
0
|
expand_temporal_dimension
|
bool
|
Go from shape (time*channels, h, w) to (channels, time, h, w). Defaults to False. |
False
|