EncoderDecoderFactory

The Factory

A special factory provided by terratorch is the EncoderDecoderFactory.

This factory leverages the BACKBONE_REGISTRY, DECODER_REGISTRY and NECK_REGISTRY to compose models formed as encoder + decoder, with some optional glue in between provided by the necks. As most current models work this way, this is a particularly important factory, allowing for great flexibility in combining encoders and decoders from different sources.

The factory allows arguments to be passed to the encoder, decoder and head. Arguments with the prefix backbone_ will be routed to the backbone constructor, with decoder_ and head_ working the same way. These are accepted dynamically and not checked. Any unused arguments will raise a ValueError.

Both encoder and decoder may be passed as strings, in which case they will be looked in the respective registry, or as nn.Modules, in which case they will be used as is. In the second case, the factory assumes in good faith that the encoder or decoder which is passed conforms to the expected contract.

Not all decoders will readily accept the raw output of the given encoder. This is where necks come in. Necks are a sequence of operations which are applied to the output of the encoder before it is passed to the decoder. They must be instances of Neck, which is a subclass of nn.Module, meaning they can even define new trainable parameters.

The EncoderDecoderFactory returns a PixelWiseModel or a ScalarOutputModel depending on the task.

terratorch.models.encoder_decoder_factory.EncoderDecoderFactory

Bases: ModelFactory

Source code in terratorch/models/encoder_decoder_factory.py
@MODEL_FACTORY_REGISTRY.register
class EncoderDecoderFactory(ModelFactory):
    def build_model(
        self,
        task: str,
        backbone: str | nn.Module,
        decoder: str | nn.Module,
        num_classes: int | None = None,
        necks: list[dict] | None = None,
        aux_decoders: list[AuxiliaryHead] | None = None,
        rescale: bool = True,  # noqa: FBT002, FBT001
        peft_config: dict | None = None,
        **kwargs,
    ) -> Model:
        """Generic model factory that combines an encoder and decoder, together with a head, for a specific task.

        Further arguments to be passed to the backbone, decoder or head. They should be prefixed with
        `backbone_`, `decoder_` and `head_` respectively.

        Args:
            task (str): Task to be performed. Currently supports "segmentation" and "regression".
            backbone (str, nn.Module): Backbone to be used. If a string, will look for such models in the different
                registries supported (internal terratorch registry, timm, ...). If a torch nn.Module, will use it
                directly. The backbone should have and `out_channels` attribute and its `forward` should return a list[Tensor].
            decoder (Union[str, nn.Module], optional): Decoder to be used for the segmentation model.
                    If a string, will look for such decoders in the different
                    registries supported (internal terratorch registry, smp, ...).
                    If an nn.Module, we expect it to expose a property `decoder.out_channels`.
                    Pixel wise tasks will be concatenated with a Conv2d for the final convolution.
                    Defaults to "FCNDecoder".
            num_classes (int, optional): Number of classes. None for regression tasks.
            necks (list[dict]): nn.Modules to be called in succession on encoder features
                before passing them to the decoder. Should be registered in the NECKS_REGISTRY registry.
                Expects each one to have a key "name" and subsequent keys for arguments, if any.
                Defaults to None, which applies the identity function.
            aux_decoders (list[AuxiliaryHead] | None): List of AuxiliaryHead decoders to be added to the model.
                These decoders take the input from the encoder as well.
            rescale (bool): Whether to apply bilinear interpolation to rescale the model output if its size
                is different from the ground truth. Only applicable to pixel wise models
                (e.g. segmentation, pixel wise regression). Defaults to True.
            peft_config (dict): Configuration options for using [PEFT](https://huggingface.co/docs/peft/index).
                The dictionary should have the following keys:

                - "method": Which PEFT method to use. Should be one implemented in PEFT, a list is available [here](https://huggingface.co/docs/peft/package_reference/peft_types#peft.PeftType).
                - "replace_qkv": String containing a substring of the name of the submodules to replace with QKVSep.
                  This should be used when the qkv matrices are merged together in a single linear layer and the PEFT
                  method should be applied separately to query, key and value matrices (e.g. if LoRA is only desired in
                  Q and V matrices). e.g. If using Prithvi this should be "qkv"
                - "peft_config_kwargs": Dictionary containing keyword arguments which will be passed to [PeftConfig](https://huggingface.co/docs/peft/package_reference/config#peft.PeftConfig)


        Returns:
            nn.Module: Full model with encoder, decoder and head.
        """
        task = task.lower()
        if task not in SUPPORTED_TASKS:
            msg = f"Task {task} not supported. Please choose one of {SUPPORTED_TASKS}"
            raise NotImplementedError(msg)

        backbone_kwargs, kwargs = extract_prefix_keys(kwargs, "backbone_")
        backbone = _get_backbone(backbone, **backbone_kwargs)

        if peft_config is not None:
            if not backbone_kwargs.get("pretrained", False):
                msg = (
                    "You are using PEFT without a pretrained backbone. If you are loading a checkpoint afterwards "
                    "this is probably fine, but if you are training a model check the backbone_pretrained parameter."
                )
                warnings.warn(msg, stacklevel=1)

            backbone = get_peft_backbone(peft_config, backbone)

        try:
            out_channels = backbone.out_channels
        except AttributeError as e:
            msg = "backbone must have out_channels attribute"
            raise AttributeError(msg) from e

        if necks is None:
            necks = []
        neck_list, channel_list = build_neck_list(necks, out_channels)

        # some decoders already include a head
        # for these, we pass the num_classes to them
        # others dont include a head
        # for those, we dont pass num_classes
        decoder_kwargs, kwargs = extract_prefix_keys(kwargs, "decoder_")
        head_kwargs, kwargs = extract_prefix_keys(kwargs, "head_")

        decoder, head_kwargs, decoder_includes_head = _get_decoder_and_head_kwargs(
            decoder, channel_list, decoder_kwargs, head_kwargs, num_classes=num_classes
        )

        if aux_decoders is None:
            _check_all_args_used(kwargs)
            return _build_appropriate_model(
                task,
                backbone,
                decoder,
                head_kwargs,
                necks=neck_list,
                decoder_includes_head=decoder_includes_head,
                rescale=rescale,
            )

        to_be_aux_decoders: list[AuxiliaryHeadWithDecoderWithoutInstantiatedHead] = []
        for aux_decoder in aux_decoders:
            args = aux_decoder.decoder_args if aux_decoder.decoder_args else {}
            aux_decoder_kwargs, args = extract_prefix_keys(args, "decoder_")
            aux_head_kwargs, args = extract_prefix_keys(args, "head_")
            aux_decoder_instance, aux_head_kwargs, aux_decoder_includes_head = _get_decoder_and_head_kwargs(
                aux_decoder.decoder, channel_list, aux_decoder_kwargs, aux_head_kwargs, num_classes=num_classes
            )
            to_be_aux_decoders.append(
                AuxiliaryHeadWithDecoderWithoutInstantiatedHead(aux_decoder.name, aux_decoder_instance, aux_head_kwargs)
            )
            _check_all_args_used(args)

        _check_all_args_used(kwargs)

        return _build_appropriate_model(
            task,
            backbone,
            decoder,
            head_kwargs,
            necks=neck_list,
            decoder_includes_head=decoder_includes_head,
            rescale=rescale,
            auxiliary_heads=to_be_aux_decoders,
        )

build_model(task, backbone, decoder, num_classes=None, necks=None, aux_decoders=None, rescale=True, peft_config=None, **kwargs)

Generic model factory that combines an encoder and decoder, together with a head, for a specific task.

Further arguments to be passed to the backbone, decoder or head. They should be prefixed with backbone_, decoder_ and head_ respectively.

Parameters:
  • task (str) –

    Task to be performed. Currently supports "segmentation" and "regression".

  • backbone ((str, Module)) –

    Backbone to be used. If a string, will look for such models in the different registries supported (internal terratorch registry, timm, ...). If a torch nn.Module, will use it directly. The backbone should have and out_channels attribute and its forward should return a list[Tensor].

  • decoder (Union[str, Module]) –

    Decoder to be used for the segmentation model. If a string, will look for such decoders in the different registries supported (internal terratorch registry, smp, ...). If an nn.Module, we expect it to expose a property decoder.out_channels. Pixel wise tasks will be concatenated with a Conv2d for the final convolution. Defaults to "FCNDecoder".

  • num_classes (int, default: None ) –

    Number of classes. None for regression tasks.

  • necks (list[dict], default: None ) –

    nn.Modules to be called in succession on encoder features before passing them to the decoder. Should be registered in the NECKS_REGISTRY registry. Expects each one to have a key "name" and subsequent keys for arguments, if any. Defaults to None, which applies the identity function.

  • aux_decoders (list[AuxiliaryHead] | None, default: None ) –

    List of AuxiliaryHead decoders to be added to the model. These decoders take the input from the encoder as well.

  • rescale (bool, default: True ) –

    Whether to apply bilinear interpolation to rescale the model output if its size is different from the ground truth. Only applicable to pixel wise models (e.g. segmentation, pixel wise regression). Defaults to True.

  • peft_config (dict, default: None ) –

    Configuration options for using PEFT. The dictionary should have the following keys:

    • "method": Which PEFT method to use. Should be one implemented in PEFT, a list is available here.
    • "replace_qkv": String containing a substring of the name of the submodules to replace with QKVSep. This should be used when the qkv matrices are merged together in a single linear layer and the PEFT method should be applied separately to query, key and value matrices (e.g. if LoRA is only desired in Q and V matrices). e.g. If using Prithvi this should be "qkv"
    • "peft_config_kwargs": Dictionary containing keyword arguments which will be passed to PeftConfig
Returns:
  • Model

    nn.Module: Full model with encoder, decoder and head.

Source code in terratorch/models/encoder_decoder_factory.py
def build_model(
    self,
    task: str,
    backbone: str | nn.Module,
    decoder: str | nn.Module,
    num_classes: int | None = None,
    necks: list[dict] | None = None,
    aux_decoders: list[AuxiliaryHead] | None = None,
    rescale: bool = True,  # noqa: FBT002, FBT001
    peft_config: dict | None = None,
    **kwargs,
) -> Model:
    """Generic model factory that combines an encoder and decoder, together with a head, for a specific task.

    Further arguments to be passed to the backbone, decoder or head. They should be prefixed with
    `backbone_`, `decoder_` and `head_` respectively.

    Args:
        task (str): Task to be performed. Currently supports "segmentation" and "regression".
        backbone (str, nn.Module): Backbone to be used. If a string, will look for such models in the different
            registries supported (internal terratorch registry, timm, ...). If a torch nn.Module, will use it
            directly. The backbone should have and `out_channels` attribute and its `forward` should return a list[Tensor].
        decoder (Union[str, nn.Module], optional): Decoder to be used for the segmentation model.
                If a string, will look for such decoders in the different
                registries supported (internal terratorch registry, smp, ...).
                If an nn.Module, we expect it to expose a property `decoder.out_channels`.
                Pixel wise tasks will be concatenated with a Conv2d for the final convolution.
                Defaults to "FCNDecoder".
        num_classes (int, optional): Number of classes. None for regression tasks.
        necks (list[dict]): nn.Modules to be called in succession on encoder features
            before passing them to the decoder. Should be registered in the NECKS_REGISTRY registry.
            Expects each one to have a key "name" and subsequent keys for arguments, if any.
            Defaults to None, which applies the identity function.
        aux_decoders (list[AuxiliaryHead] | None): List of AuxiliaryHead decoders to be added to the model.
            These decoders take the input from the encoder as well.
        rescale (bool): Whether to apply bilinear interpolation to rescale the model output if its size
            is different from the ground truth. Only applicable to pixel wise models
            (e.g. segmentation, pixel wise regression). Defaults to True.
        peft_config (dict): Configuration options for using [PEFT](https://huggingface.co/docs/peft/index).
            The dictionary should have the following keys:

            - "method": Which PEFT method to use. Should be one implemented in PEFT, a list is available [here](https://huggingface.co/docs/peft/package_reference/peft_types#peft.PeftType).
            - "replace_qkv": String containing a substring of the name of the submodules to replace with QKVSep.
              This should be used when the qkv matrices are merged together in a single linear layer and the PEFT
              method should be applied separately to query, key and value matrices (e.g. if LoRA is only desired in
              Q and V matrices). e.g. If using Prithvi this should be "qkv"
            - "peft_config_kwargs": Dictionary containing keyword arguments which will be passed to [PeftConfig](https://huggingface.co/docs/peft/package_reference/config#peft.PeftConfig)


    Returns:
        nn.Module: Full model with encoder, decoder and head.
    """
    task = task.lower()
    if task not in SUPPORTED_TASKS:
        msg = f"Task {task} not supported. Please choose one of {SUPPORTED_TASKS}"
        raise NotImplementedError(msg)

    backbone_kwargs, kwargs = extract_prefix_keys(kwargs, "backbone_")
    backbone = _get_backbone(backbone, **backbone_kwargs)

    if peft_config is not None:
        if not backbone_kwargs.get("pretrained", False):
            msg = (
                "You are using PEFT without a pretrained backbone. If you are loading a checkpoint afterwards "
                "this is probably fine, but if you are training a model check the backbone_pretrained parameter."
            )
            warnings.warn(msg, stacklevel=1)

        backbone = get_peft_backbone(peft_config, backbone)

    try:
        out_channels = backbone.out_channels
    except AttributeError as e:
        msg = "backbone must have out_channels attribute"
        raise AttributeError(msg) from e

    if necks is None:
        necks = []
    neck_list, channel_list = build_neck_list(necks, out_channels)

    # some decoders already include a head
    # for these, we pass the num_classes to them
    # others dont include a head
    # for those, we dont pass num_classes
    decoder_kwargs, kwargs = extract_prefix_keys(kwargs, "decoder_")
    head_kwargs, kwargs = extract_prefix_keys(kwargs, "head_")

    decoder, head_kwargs, decoder_includes_head = _get_decoder_and_head_kwargs(
        decoder, channel_list, decoder_kwargs, head_kwargs, num_classes=num_classes
    )

    if aux_decoders is None:
        _check_all_args_used(kwargs)
        return _build_appropriate_model(
            task,
            backbone,
            decoder,
            head_kwargs,
            necks=neck_list,
            decoder_includes_head=decoder_includes_head,
            rescale=rescale,
        )

    to_be_aux_decoders: list[AuxiliaryHeadWithDecoderWithoutInstantiatedHead] = []
    for aux_decoder in aux_decoders:
        args = aux_decoder.decoder_args if aux_decoder.decoder_args else {}
        aux_decoder_kwargs, args = extract_prefix_keys(args, "decoder_")
        aux_head_kwargs, args = extract_prefix_keys(args, "head_")
        aux_decoder_instance, aux_head_kwargs, aux_decoder_includes_head = _get_decoder_and_head_kwargs(
            aux_decoder.decoder, channel_list, aux_decoder_kwargs, aux_head_kwargs, num_classes=num_classes
        )
        to_be_aux_decoders.append(
            AuxiliaryHeadWithDecoderWithoutInstantiatedHead(aux_decoder.name, aux_decoder_instance, aux_head_kwargs)
        )
        _check_all_args_used(args)

    _check_all_args_used(kwargs)

    return _build_appropriate_model(
        task,
        backbone,
        decoder,
        head_kwargs,
        necks=neck_list,
        decoder_includes_head=decoder_includes_head,
        rescale=rescale,
        auxiliary_heads=to_be_aux_decoders,
    )

terratorch.models.pixel_wise_model.PixelWiseModel

terratorch.models.scalar_output_model.ScalarOutputModel

Encoders

To be a valid encoder, an object must be an nn.Module with an additional attribute out_channels which is a list of the channel dimension of the features it returns.

It's forward method should return a list of torch.Tensor.

Necks

Necks are the glue between encoder and decoder. They can perform operations such as selecting elements from the output of the encoder (SelectIndices), reshaping the outputs of ViTs so they are compatible with CNNs (ReshapeTokensToImage), amongst others.

Necks are nn.Modules, with an additional method process_channel_list which informs the EncoderDecoderFactory about how it will alter the channel list provided by encoder.out_channels.

terratorch.models.necks.Neck

Bases: ABC, Module

Base class for Neck

A neck must must implement self.process_channel_list which returns the new channel list.

Source code in terratorch/models/necks.py
class Neck(ABC, nn.Module):
    """Base class for Neck

    A neck must must implement `self.process_channel_list` which returns the new channel list.
    """

    def __init__(self, channel_list: list[int]) -> None:
        super().__init__()
        self.channel_list = channel_list

    @abstractmethod
    def process_channel_list(self, channel_list: list[int]) -> list[int]:
        return channel_list

    @abstractmethod
    def forward(self, channel_list: list[torch.Tensor]) -> list[torch.Tensor]: ...

terratorch.models.necks.SelectIndices

Bases: Neck

Source code in terratorch/models/necks.py
@TERRATORCH_NECK_REGISTRY.register
class SelectIndices(Neck):
    def __init__(self, channel_list: list[int], indices: list[int]):
        """Select indices from the embedding list

        Args:
            indices (list[int]): list of indices to select.
        """
        super().__init__(channel_list)
        self.indices = indices

    def forward(self, features: list[torch.Tensor]) -> list[torch.Tensor]:
        features = [features[i] for i in self.indices]
        return features

    def process_channel_list(self, channel_list: list[int]) -> list[int]:
        channel_list = [channel_list[i] for i in self.indices]
        return channel_list

__init__(channel_list, indices)

Select indices from the embedding list

Parameters:
  • indices (list[int]) –

    list of indices to select.

Source code in terratorch/models/necks.py
def __init__(self, channel_list: list[int], indices: list[int]):
    """Select indices from the embedding list

    Args:
        indices (list[int]): list of indices to select.
    """
    super().__init__(channel_list)
    self.indices = indices

terratorch.models.necks.PermuteDims

Bases: Neck

Source code in terratorch/models/necks.py
@TERRATORCH_NECK_REGISTRY.register
class PermuteDims(Neck):
    def __init__(self, channel_list: list[int], new_order: list[int]):
        """Permute dimensions of each element in the embedding list

        Args:
            new_order (list[int]): list of indices to be passed to tensor.permute()
        """
        super().__init__(channel_list)
        self.new_order = new_order

    def forward(self, features: list[torch.Tensor]) -> list[torch.Tensor]:
        features = [feat.permute(*self.new_order).contiguous() for feat in features]
        return features

    def process_channel_list(self, channel_list: list[int]) -> list[int]:
        return super().process_channel_list(channel_list)

__init__(channel_list, new_order)

Permute dimensions of each element in the embedding list

Parameters:
  • new_order (list[int]) –

    list of indices to be passed to tensor.permute()

Source code in terratorch/models/necks.py
def __init__(self, channel_list: list[int], new_order: list[int]):
    """Permute dimensions of each element in the embedding list

    Args:
        new_order (list[int]): list of indices to be passed to tensor.permute()
    """
    super().__init__(channel_list)
    self.new_order = new_order

terratorch.models.necks.InterpolateToPyramidal

Bases: Neck

Source code in terratorch/models/necks.py
@TERRATORCH_NECK_REGISTRY.register
class InterpolateToPyramidal(Neck):
    def __init__(self, channel_list: list[int], scale_factor: int = 2, mode: str = "nearest"):
        """Spatially interpolate embeddings so that embedding[i - 1] is scale_factor times larger than embedding[i]

        Useful to make non-pyramidal backbones compatible with hierarachical ones
        Args:
            scale_factor (int): Amount to scale embeddings by each layer. Defaults to 2.
            mode (str): Interpolation mode to be passed to torch.nn.functional.interpolate. Defaults to 'nearest'.
        """
        super().__init__(channel_list)
        self.scale_factor = scale_factor
        self.mode = mode

    def forward(self, features: list[torch.Tensor]) -> list[torch.Tensor]:
        out = []
        scale_exponents = list(range(len(features), 0, -1))
        for x, exponent in zip(features, scale_exponents, strict=True):
            out.append(F.interpolate(x, scale_factor=self.scale_factor**exponent, mode=self.mode))

        return out

    def process_channel_list(self, channel_list: list[int]) -> list[int]:
        return super().process_channel_list(channel_list)

__init__(channel_list, scale_factor=2, mode='nearest')

Spatially interpolate embeddings so that embedding[i - 1] is scale_factor times larger than embedding[i]

Useful to make non-pyramidal backbones compatible with hierarachical ones Args: scale_factor (int): Amount to scale embeddings by each layer. Defaults to 2. mode (str): Interpolation mode to be passed to torch.nn.functional.interpolate. Defaults to 'nearest'.

Source code in terratorch/models/necks.py
def __init__(self, channel_list: list[int], scale_factor: int = 2, mode: str = "nearest"):
    """Spatially interpolate embeddings so that embedding[i - 1] is scale_factor times larger than embedding[i]

    Useful to make non-pyramidal backbones compatible with hierarachical ones
    Args:
        scale_factor (int): Amount to scale embeddings by each layer. Defaults to 2.
        mode (str): Interpolation mode to be passed to torch.nn.functional.interpolate. Defaults to 'nearest'.
    """
    super().__init__(channel_list)
    self.scale_factor = scale_factor
    self.mode = mode

terratorch.models.necks.MaxpoolToPyramidal

Bases: Neck

Source code in terratorch/models/necks.py
@TERRATORCH_NECK_REGISTRY.register
class MaxpoolToPyramidal(Neck):
    def __init__(self, channel_list: list[int], kernel_size: int = 2):
        """Spatially downsample embeddings so that embedding[i - 1] is scale_factor times smaller than embedding[i]

        Useful to make non-pyramidal backbones compatible with hierarachical ones
        Args:
            kernel_size (int). Base kernel size to use for maxpool. Defaults to 2.
        """
        super().__init__(channel_list)
        self.kernel_size = kernel_size

    def forward(self, features: list[torch.Tensor]) -> list[torch.Tensor]:
        out = []
        scale_exponents = list(range(len(features)))
        for x, exponent in zip(features, scale_exponents, strict=True):
            if exponent == 0:
                out.append(x.clone())
            else:
                out.append(F.max_pool2d(x, kernel_size=self.kernel_size**exponent))

        return out

    def process_channel_list(self, channel_list: list[int]) -> list[int]:
        return super().process_channel_list(channel_list)

__init__(channel_list, kernel_size=2)

Spatially downsample embeddings so that embedding[i - 1] is scale_factor times smaller than embedding[i]

Useful to make non-pyramidal backbones compatible with hierarachical ones Args: kernel_size (int). Base kernel size to use for maxpool. Defaults to 2.

Source code in terratorch/models/necks.py
def __init__(self, channel_list: list[int], kernel_size: int = 2):
    """Spatially downsample embeddings so that embedding[i - 1] is scale_factor times smaller than embedding[i]

    Useful to make non-pyramidal backbones compatible with hierarachical ones
    Args:
        kernel_size (int). Base kernel size to use for maxpool. Defaults to 2.
    """
    super().__init__(channel_list)
    self.kernel_size = kernel_size

terratorch.models.necks.ReshapeTokensToImage

Bases: Neck

Source code in terratorch/models/necks.py
@TERRATORCH_NECK_REGISTRY.register
class ReshapeTokensToImage(Neck):
    def __init__(self, channel_list: list[int], remove_cls_token=True, effective_time_dim: int = 1):  # noqa: FBT002
        """Reshape output of transformer encoder so it can be passed to a conv net.

        Args:
            remove_cls_token (bool, optional): Whether to remove the cls token from the first position.
                Defaults to True.
            effective_time_dim (int, optional): The effective temporal dimension the transformer processes.
                For a ViT, his will be given by `num_frames // tubelet size`. This is used to determine
                the temporal dimension of the embedding, which is concatenated with the embedding dimension.
                For example:
                - A model which processes 1 frame with a tubelet size of 1 has an effective_time_dim of 1.
                    The embedding produced by this model has embedding size embed_dim * 1.
                - A model which processes 3 frames with a tubelet size of 1 has an effective_time_dim of 3.
                    The embedding produced by this model has embedding size embed_dim * 3.
                - A model which processes 12 frames with a tubelet size of 4 has an effective_time_dim of 3.
                    The embedding produced by this model has an embedding size embed_dim * 3.
                Defaults to 1.
        """
        super().__init__(channel_list)
        self.remove_cls_token = remove_cls_token
        self.effective_time_dim = effective_time_dim

    def forward(self, features: list[torch.Tensor]) -> list[torch.Tensor]:
        out = []
        for x in features:
            if self.remove_cls_token:
                x_no_token = x[:, 1:, :]
            else:
                x_no_token = x
            number_of_tokens = x_no_token.shape[1]
            tokens_per_timestep = number_of_tokens // self.effective_time_dim
            h = int(np.sqrt(tokens_per_timestep))
            encoded = rearrange(
                x_no_token,
                "batch (t h w) e -> batch (t e) h w",
                batch=x_no_token.shape[0],
                t=self.effective_time_dim,
                h=h,
            )
            out.append(encoded)
        return out

    def process_channel_list(self, channel_list: list[int]) -> list[int]:
        return super().process_channel_list(channel_list)

__init__(channel_list, remove_cls_token=True, effective_time_dim=1)

Reshape output of transformer encoder so it can be passed to a conv net.

Parameters:
  • remove_cls_token (bool, default: True ) –

    Whether to remove the cls token from the first position. Defaults to True.

  • effective_time_dim (int, default: 1 ) –

    The effective temporal dimension the transformer processes. For a ViT, his will be given by num_frames // tubelet size. This is used to determine the temporal dimension of the embedding, which is concatenated with the embedding dimension. For example: - A model which processes 1 frame with a tubelet size of 1 has an effective_time_dim of 1. The embedding produced by this model has embedding size embed_dim * 1. - A model which processes 3 frames with a tubelet size of 1 has an effective_time_dim of 3. The embedding produced by this model has embedding size embed_dim * 3. - A model which processes 12 frames with a tubelet size of 4 has an effective_time_dim of 3. The embedding produced by this model has an embedding size embed_dim * 3. Defaults to 1.

Source code in terratorch/models/necks.py
def __init__(self, channel_list: list[int], remove_cls_token=True, effective_time_dim: int = 1):  # noqa: FBT002
    """Reshape output of transformer encoder so it can be passed to a conv net.

    Args:
        remove_cls_token (bool, optional): Whether to remove the cls token from the first position.
            Defaults to True.
        effective_time_dim (int, optional): The effective temporal dimension the transformer processes.
            For a ViT, his will be given by `num_frames // tubelet size`. This is used to determine
            the temporal dimension of the embedding, which is concatenated with the embedding dimension.
            For example:
            - A model which processes 1 frame with a tubelet size of 1 has an effective_time_dim of 1.
                The embedding produced by this model has embedding size embed_dim * 1.
            - A model which processes 3 frames with a tubelet size of 1 has an effective_time_dim of 3.
                The embedding produced by this model has embedding size embed_dim * 3.
            - A model which processes 12 frames with a tubelet size of 4 has an effective_time_dim of 3.
                The embedding produced by this model has an embedding size embed_dim * 3.
            Defaults to 1.
    """
    super().__init__(channel_list)
    self.remove_cls_token = remove_cls_token
    self.effective_time_dim = effective_time_dim

terratorch.models.necks.AddBottleneckLayer

Bases: Neck

Add a layer that reduces the channel dimension of the final embedding by half, and concatenates it

Useful for compatibility with some smp decoders.

Source code in terratorch/models/necks.py
@TERRATORCH_NECK_REGISTRY.register
class AddBottleneckLayer(Neck):
    """Add a layer that reduces the channel dimension of the final embedding by half, and concatenates it

    Useful for compatibility with some smp decoders.
    """

    def __init__(self, channel_list: list[int]):
        super().__init__(channel_list)
        self.bottleneck = nn.Conv2d(channel_list[-1], channel_list[-1]//2, kernel_size=1)

    def forward(self, features: list[torch.Tensor]) -> list[torch.Tensor]:
        new_embedding = self.bottleneck(features[-1])
        features.append(new_embedding)
        return features

    def process_channel_list(self, channel_list: list[int]) -> list[int]:
        return [*channel_list, channel_list[-1] // 2]

terratorch.models.necks.LearnedInterpolateToPyramidal

Bases: Neck

Use learned convolutions to transform the output of a non-pyramidal encoder into pyramidal ones

Always requires exactly 4 embeddings

Source code in terratorch/models/necks.py
@TERRATORCH_NECK_REGISTRY.register
class LearnedInterpolateToPyramidal(Neck):
    """Use learned convolutions to transform the output of a non-pyramidal encoder into pyramidal ones

    Always requires exactly 4 embeddings
    """

    def __init__(self, channel_list: list[int]):
        super().__init__(channel_list)
        if len(channel_list) != 4:
            msg = "This class can only handle exactly 4 input embeddings"
            raise Exception(msg)
        self.fpn1 = nn.Sequential(
            nn.ConvTranspose2d(channel_list[0], channel_list[0] // 2, 2, 2),
            nn.BatchNorm2d(channel_list[0] // 2),
            nn.GELU(),
            nn.ConvTranspose2d(channel_list[0] // 2, channel_list[0] // 4, 2, 2),
        )
        self.fpn2 = nn.Sequential(nn.ConvTranspose2d(channel_list[1], channel_list[1] // 2, 2, 2))
        self.fpn3 = nn.Sequential(nn.Identity())
        self.fpn4 = nn.Sequential(nn.MaxPool2d(kernel_size=2, stride=2))
        self.embedding_dim = [channel_list[0] // 4, channel_list[1] // 2, channel_list[2], channel_list[3]]

    def forward(self, features: list[torch.Tensor]) -> list[torch.Tensor]:
        scaled_inputs = []
        scaled_inputs.append(self.fpn1(features[0]))
        scaled_inputs.append(self.fpn2(features[1]))
        scaled_inputs.append(self.fpn3(features[2]))
        scaled_inputs.append(self.fpn4(features[3]))
        return scaled_inputs

    def process_channel_list(self, channel_list: list[int]) -> list[int]:
        return [channel_list[0] // 4, channel_list[1] // 2, channel_list[2], channel_list[3]]

Decoders

To be a valid decoder, an object must be an nn.Module with an additional attribute out_channels which is an int with the channel dimension of the output.

The first argument to its constructor will be a list of channel dimensions it should expect as input.

It's forward method should accept a list of embeddings.

Heads

Most decoders require a final head to be added for a specific task (e.g. semantic segmentation vs pixel wise regression).

Those registries producing decoders that dont require a head must expose the attribute includes_head=True so that a head is not added. Decoders passed as nn.Modules which dont require a head must expose the same attribute themselves.

terratorch.models.heads.classification_head.ClassificationHead

Bases: Module

Classification head

Source code in terratorch/models/heads/classification_head.py
class ClassificationHead(nn.Module):
    """Classification head"""

    # how to allow cls token?
    def __init__(
        self,
        in_dim: int,
        num_classes: int,
        dim_list: list[int] | None = None,
        dropout: float = 0,
        linear_after_pool: bool = False,
    ) -> None:
        """Constructor

        Args:
            in_dim (int): Input dimensionality
            num_classes (int): Number of output classes
            dim_list (list[int] | None, optional):  List with number of dimensions for each Linear
                layer to be created. Defaults to None.
            dropout (float, optional): Dropout value to apply. Defaults to 0.
            linear_after_pool (bool, optional): Apply pooling first, then apply the linear layer. Defaults to False
        """
        super().__init__()
        self.num_classes = num_classes
        self.linear_after_pool = linear_after_pool
        if dim_list is None:
            pre_head = nn.Identity()
        else:

            def block(in_dim, out_dim):
                return nn.Sequential(nn.Linear(in_features=in_dim, out_features=out_dim), nn.ReLU())

            dim_list = [in_dim, *dim_list]
            pre_head = nn.Sequential(*[block(dim_list[i], dim_list[i + 1]) for i in range(len(dim_list) - 1)])
            in_dim = dim_list[-1]
        dropout = nn.Identity() if dropout == 0 else nn.Dropout(dropout)
        self.head = nn.Sequential(
            pre_head,
            dropout,
            nn.Linear(in_features=in_dim, out_features=num_classes),
        )

    def forward(self, x: Tensor):
        x = x.reshape(x.shape[0], x.shape[1], -1).permute(0, 2, 1)

        if self.linear_after_pool:
            x = x.mean(axis=1)
            out = self.head(x)
        else:
            x = self.head(x)
            out = x.mean(axis=1)
        return out

__init__(in_dim, num_classes, dim_list=None, dropout=0, linear_after_pool=False)

Constructor

Parameters:
  • in_dim (int) –

    Input dimensionality

  • num_classes (int) –

    Number of output classes

  • dim_list (list[int] | None, default: None ) –

    List with number of dimensions for each Linear layer to be created. Defaults to None.

  • dropout (float, default: 0 ) –

    Dropout value to apply. Defaults to 0.

  • linear_after_pool (bool, default: False ) –

    Apply pooling first, then apply the linear layer. Defaults to False

Source code in terratorch/models/heads/classification_head.py
def __init__(
    self,
    in_dim: int,
    num_classes: int,
    dim_list: list[int] | None = None,
    dropout: float = 0,
    linear_after_pool: bool = False,
) -> None:
    """Constructor

    Args:
        in_dim (int): Input dimensionality
        num_classes (int): Number of output classes
        dim_list (list[int] | None, optional):  List with number of dimensions for each Linear
            layer to be created. Defaults to None.
        dropout (float, optional): Dropout value to apply. Defaults to 0.
        linear_after_pool (bool, optional): Apply pooling first, then apply the linear layer. Defaults to False
    """
    super().__init__()
    self.num_classes = num_classes
    self.linear_after_pool = linear_after_pool
    if dim_list is None:
        pre_head = nn.Identity()
    else:

        def block(in_dim, out_dim):
            return nn.Sequential(nn.Linear(in_features=in_dim, out_features=out_dim), nn.ReLU())

        dim_list = [in_dim, *dim_list]
        pre_head = nn.Sequential(*[block(dim_list[i], dim_list[i + 1]) for i in range(len(dim_list) - 1)])
        in_dim = dim_list[-1]
    dropout = nn.Identity() if dropout == 0 else nn.Dropout(dropout)
    self.head = nn.Sequential(
        pre_head,
        dropout,
        nn.Linear(in_features=in_dim, out_features=num_classes),
    )

terratorch.models.heads.regression_head.RegressionHead

Bases: Module

Regression head

Source code in terratorch/models/heads/regression_head.py
class RegressionHead(nn.Module):
    """Regression head"""

    def __init__(
        self,
        in_channels: int,
        final_act: nn.Module | str | None = None,
        learned_upscale_layers: int = 0,
        channel_list: list[int] | None = None,
        batch_norm: bool = True,
        dropout: float = 0,
    ) -> None:
        """Constructor

        Args:
            in_channels (int): Number of input channels
            final_act (nn.Module | None, optional): Final activation to be applied. Defaults to None.
            learned_upscale_layers (int, optional): Number of Pixelshuffle layers to create. Each upscales 2x.
                Defaults to 0.
            channel_list (list[int] | None, optional): List with number of channels for each Conv
                layer to be created. Defaults to None.
            batch_norm (bool, optional): Whether to apply batch norm. Defaults to True.
            dropout (float, optional): Dropout value to apply. Defaults to 0.

        """
        super().__init__()
        self.learned_upscale_layers = learned_upscale_layers
        self.final_act = final_act if final_act else nn.Identity()
        if isinstance(final_act, str):
            module_name, class_name = final_act.rsplit(".", 1)
            target_class = getattr(importlib.import_module(module_name), class_name)
            self.final_act = target_class()
        pre_layers = []
        if learned_upscale_layers != 0:
            learned_upscale = nn.Sequential(
                *[PixelShuffleUpscale(in_channels) for _ in range(self.learned_upscale_layers)]
            )
            pre_layers.append(learned_upscale)

        if channel_list is None:
            pre_head = nn.Identity()
        else:

            def block(in_channels, out_channels):
                return nn.Sequential(
                    nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, padding=1, bias=False),
                    nn.BatchNorm2d(out_channels),
                    nn.ReLU(inplace=True),
                )

            channel_list = [in_channels, *channel_list]
            pre_head = nn.Sequential(
                *[block(channel_list[i], channel_list[i + 1]) for i in range(len(channel_list) - 1)]
            )
            in_channels = channel_list[-1]
            pre_layers.append(pre_head)
        dropout = nn.Dropout2d(dropout)
        final_layer = nn.Conv2d(in_channels=in_channels, out_channels=1, kernel_size=1)
        self.head = nn.Sequential(*[*pre_layers, dropout, final_layer])

    def forward(self, x):
        output = self.head(x)
        return self.final_act(output)

__init__(in_channels, final_act=None, learned_upscale_layers=0, channel_list=None, batch_norm=True, dropout=0)

Constructor

Parameters:
  • in_channels (int) –

    Number of input channels

  • final_act (Module | None, default: None ) –

    Final activation to be applied. Defaults to None.

  • learned_upscale_layers (int, default: 0 ) –

    Number of Pixelshuffle layers to create. Each upscales 2x. Defaults to 0.

  • channel_list (list[int] | None, default: None ) –

    List with number of channels for each Conv layer to be created. Defaults to None.

  • batch_norm (bool, default: True ) –

    Whether to apply batch norm. Defaults to True.

  • dropout (float, default: 0 ) –

    Dropout value to apply. Defaults to 0.

Source code in terratorch/models/heads/regression_head.py
def __init__(
    self,
    in_channels: int,
    final_act: nn.Module | str | None = None,
    learned_upscale_layers: int = 0,
    channel_list: list[int] | None = None,
    batch_norm: bool = True,
    dropout: float = 0,
) -> None:
    """Constructor

    Args:
        in_channels (int): Number of input channels
        final_act (nn.Module | None, optional): Final activation to be applied. Defaults to None.
        learned_upscale_layers (int, optional): Number of Pixelshuffle layers to create. Each upscales 2x.
            Defaults to 0.
        channel_list (list[int] | None, optional): List with number of channels for each Conv
            layer to be created. Defaults to None.
        batch_norm (bool, optional): Whether to apply batch norm. Defaults to True.
        dropout (float, optional): Dropout value to apply. Defaults to 0.

    """
    super().__init__()
    self.learned_upscale_layers = learned_upscale_layers
    self.final_act = final_act if final_act else nn.Identity()
    if isinstance(final_act, str):
        module_name, class_name = final_act.rsplit(".", 1)
        target_class = getattr(importlib.import_module(module_name), class_name)
        self.final_act = target_class()
    pre_layers = []
    if learned_upscale_layers != 0:
        learned_upscale = nn.Sequential(
            *[PixelShuffleUpscale(in_channels) for _ in range(self.learned_upscale_layers)]
        )
        pre_layers.append(learned_upscale)

    if channel_list is None:
        pre_head = nn.Identity()
    else:

        def block(in_channels, out_channels):
            return nn.Sequential(
                nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, padding=1, bias=False),
                nn.BatchNorm2d(out_channels),
                nn.ReLU(inplace=True),
            )

        channel_list = [in_channels, *channel_list]
        pre_head = nn.Sequential(
            *[block(channel_list[i], channel_list[i + 1]) for i in range(len(channel_list) - 1)]
        )
        in_channels = channel_list[-1]
        pre_layers.append(pre_head)
    dropout = nn.Dropout2d(dropout)
    final_layer = nn.Conv2d(in_channels=in_channels, out_channels=1, kernel_size=1)
    self.head = nn.Sequential(*[*pre_layers, dropout, final_layer])

terratorch.models.heads.segmentation_head.SegmentationHead

Bases: Module

Segmentation head

Source code in terratorch/models/heads/segmentation_head.py
class SegmentationHead(nn.Module):
    """Segmentation head"""

    def __init__(
        self, in_channels: int, num_classes: int, channel_list: list[int] | None = None, dropout: float = 0
    ) -> None:
        """Constructor

        Args:
            in_channels (int): Number of input channels
            num_classes (int): Number of output classes
            channel_list (list[int] | None, optional):  List with number of channels for each Conv
                layer to be created. Defaults to None.
            dropout (float, optional): Dropout value to apply. Defaults to 0.
        """
        super().__init__()
        self.num_classes = num_classes
        if channel_list is None:
            pre_head = nn.Identity()
        else:

            def block(in_channels, out_channels):
                return nn.Sequential(
                    nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, padding=1), nn.ReLU()
                )

            channel_list = [in_channels, *channel_list]
            pre_head = nn.Sequential(
                *[block(channel_list[i], channel_list[i + 1]) for i in range(len(channel_list) - 1)]
            )
            in_channels = channel_list[-1]
        dropout = nn.Identity() if dropout == 0 else nn.Dropout(dropout)
        self.head = nn.Sequential(
            pre_head,
            dropout,
            nn.Conv2d(
                in_channels=in_channels,
                out_channels=num_classes,
                kernel_size=1,
            ),
        )

    def forward(self, x):
        return self.head(x)

__init__(in_channels, num_classes, channel_list=None, dropout=0)

Constructor

Parameters:
  • in_channels (int) –

    Number of input channels

  • num_classes (int) –

    Number of output classes

  • channel_list (list[int] | None, default: None ) –

    List with number of channels for each Conv layer to be created. Defaults to None.

  • dropout (float, default: 0 ) –

    Dropout value to apply. Defaults to 0.

Source code in terratorch/models/heads/segmentation_head.py
def __init__(
    self, in_channels: int, num_classes: int, channel_list: list[int] | None = None, dropout: float = 0
) -> None:
    """Constructor

    Args:
        in_channels (int): Number of input channels
        num_classes (int): Number of output classes
        channel_list (list[int] | None, optional):  List with number of channels for each Conv
            layer to be created. Defaults to None.
        dropout (float, optional): Dropout value to apply. Defaults to 0.
    """
    super().__init__()
    self.num_classes = num_classes
    if channel_list is None:
        pre_head = nn.Identity()
    else:

        def block(in_channels, out_channels):
            return nn.Sequential(
                nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, padding=1), nn.ReLU()
            )

        channel_list = [in_channels, *channel_list]
        pre_head = nn.Sequential(
            *[block(channel_list[i], channel_list[i + 1]) for i in range(len(channel_list) - 1)]
        )
        in_channels = channel_list[-1]
    dropout = nn.Identity() if dropout == 0 else nn.Dropout(dropout)
    self.head = nn.Sequential(
        pre_head,
        dropout,
        nn.Conv2d(
            in_channels=in_channels,
            out_channels=num_classes,
            kernel_size=1,
        ),
    )

Decoder compatibilities

Not all encoders and decoders are compatible. Below we include some caveats.

Some decoders expect pyramidal outputs, but some encoders do not produce such outputs (e.g. vanilla ViT models). In this case, the InterpolateToPyramidal, MaxpoolToPyramidal and LearnedInterpolateToPyramidal necks may be particularly useful.

SMP decoders

Not all decoders are guaranteed to work with all encoders without additional necks. Please check smp documentation to understand the embedding spatial dimensions expected by each decoder.

In particular, smp seems to assume the first feature in the passed feature list has the same spatial resolution as the input, which may not always be true, and may break some decoders.

In addition, for some decoders, the final 2 features have the same spatial resolution. Adding the AddBottleneckLayer neck will make this compatible.

Some smp decoders require additional parameters, such as decoder_channels. These must be passed through the factory. In the case of decoder_channels, it would be passed as decoder_decoder_channels (the first decoder_ routes the parameter to the decoder, where it is passed as decoder_channels).

MMSegmentation decoders

MMSegmentation decoders are available through the BACKBONE_REGISTRY.

Warning

MMSegmentation currently requires mmcv==2.1.0. Pre-built wheels for this only exist for torch==2.1.0. In order to use mmseg without building from source, you must downgrade your torch to this version. Install mmseg with:

pip install -U openmim
mim install mmengine
mim install mmcv==2.1.0
pip install regex ftfy mmsegmentation

We provide access to mmseg decoders as an external source of decoders, but are not directly responsible for the maintainence of that library.

Some mmseg decoders require the parameter in_index, which performs the same function as the SelectIndices neck. For use for pixel wise regression, mmseg decoders should take num_classes=1.