Models

baseRNN

A base class for RNN.

class seq2seq.models.baseRNN.BaseRNN(vocab_size, max_len, hidden_size, input_dropout_p, dropout_p, n_layers, rnn_cell)

Applies a multi-layer RNN to an input sequence. .. note:: Do not use this class directly, use one of the sub classes.

Parameters:
  • vocab_size (int) – size of the vocabulary
  • max_len (int) – maximum allowed length for the sequence to be processed
  • hidden_size (int) – number of features in the hidden state h
  • input_dropout_p (float) – dropout probability for the input sequence
  • dropout_p (float) – dropout probability for the output sequence
  • n_layers (int) – number of recurrent layers
  • rnn_cell (str) – type of RNN cell (Eg. ‘LSTM’ , ‘GRU’)
Inputs: *args, **kwargs
  • *args: variable length argument list.
  • **kwargs: arbitrary keyword arguments.
Variables:
  • SYM_MASK – masking symbol
  • SYM_EOS – end-of-sequence symbol

EncoderRNN

class seq2seq.models.EncoderRNN.EncoderRNN(vocab_size, max_len, hidden_size, input_dropout_p=0, dropout_p=0, n_layers=1, bidirectional=False, rnn_cell='gru', variable_lengths=False, embedding=None, update_embedding=True)

Applies a multi-layer RNN to an input sequence.

Parameters:
  • vocab_size (int) – size of the vocabulary
  • max_len (int) – a maximum allowed length for the sequence to be processed
  • hidden_size (int) – the number of features in the hidden state h
  • input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
  • dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
  • n_layers (int, optional) – number of recurrent layers (default: 1)
  • bidirectional (bool, optional) – if True, becomes a bidirectional encodr (defulat False)
  • rnn_cell (str, optional) – type of RNN cell (default: gru)
  • variable_lengths (bool, optional) – if use variable length RNN (default: False)
  • embedding (torch.Tensor, optional) – Pre-trained embedding. The size of the tensor has to match the size of the embedding parameter: (vocab_size, hidden_size). The embedding layer would be initialized with the tensor if provided (default: None).
  • update_embedding (bool, optional) – If the embedding should be updated during training (default: False).
Inputs: inputs, input_lengths
  • inputs: list of sequences, whose length is the batch size and within which each sequence is a list of token IDs.
  • input_lengths (list of int, optional): list that contains the lengths of sequences
    in the mini-batch, it must be provided when using variable length RNN (default: None)
Outputs: output, hidden
  • output (batch, seq_len, hidden_size): tensor containing the encoded features of the input sequence
  • hidden (num_layers * num_directions, batch, hidden_size): tensor containing the features in the hidden state h

Examples:

>>> encoder = EncoderRNN(input_vocab, max_seq_length, hidden_size)
>>> output, hidden = encoder(input)
forward(input_var, input_lengths=None)

Applies a multi-layer RNN to an input sequence.

Parameters:
  • input_var (batch, seq_len) – tensor containing the features of the input sequence.
  • input_lengths (list of int, optional) – A list that contains the lengths of sequences in the mini-batch
Returns: output, hidden
  • output (batch, seq_len, hidden_size): variable containing the encoded features of the input sequence
  • hidden (num_layers * num_directions, batch, hidden_size): variable containing the features in the hidden state h

DecoderRNN

class seq2seq.models.DecoderRNN.DecoderRNN(vocab_size, max_len, hidden_size, sos_id, eos_id, n_layers=1, rnn_cell='gru', bidirectional=False, input_dropout_p=0, dropout_p=0, use_attention=False)

Provides functionality for decoding in a seq2seq framework, with an option for attention.

Parameters:
  • vocab_size (int) – size of the vocabulary
  • max_len (int) – a maximum allowed length for the sequence to be processed
  • hidden_size (int) – the number of features in the hidden state h
  • sos_id (int) – index of the start of sentence symbol
  • eos_id (int) – index of the end of sentence symbol
  • n_layers (int, optional) – number of recurrent layers (default: 1)
  • rnn_cell (str, optional) – type of RNN cell (default: gru)
  • bidirectional (bool, optional) – if the encoder is bidirectional (default False)
  • input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
  • dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
  • use_attention (bool, optional) – flag indication whether to use attention mechanism or not (default: false)
Variables:
  • KEY_ATTN_SCORE (str) – key used to indicate attention weights in ret_dict
  • KEY_LENGTH (str) – key used to indicate a list representing lengths of output sequences in ret_dict
  • KEY_SEQUENCE (str) – key used to indicate a list of sequences in ret_dict
Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
  • inputs (batch, seq_len, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default None)
  • encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder. (default None)
  • encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
  • function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
  • teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).
Outputs: decoder_outputs, decoder_hidden, ret_dict
  • decoder_outputs (seq_len, batch, vocab_size): list of tensors with size (batch_size, vocab_size) containing the outputs of the decoding function.
  • decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
  • ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs }.

TopKDecoder

class seq2seq.models.TopKDecoder.TopKDecoder(decoder_rnn, k)

Top-K decoding with beam search.

Parameters:
  • decoder_rnn (DecoderRNN) – An object of DecoderRNN used for decoding.
  • k (int) – Size of the beam.
Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
  • inputs (seq_len, batch, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default is None)
  • encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder.
  • encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
  • function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
  • teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).
Outputs: decoder_outputs, decoder_hidden, ret_dict
  • decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
  • decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
  • ret_dict: dictionary containing additional information as follows {length : list of integers representing lengths of output sequences, topk_length: list of integers representing lengths of beam search sequences, sequence : list of sequences, where each sequence is a list of predicted token IDs, topk_sequence : list of beam search sequences, each beam is a list of token IDs, inputs : target outputs if provided for decoding}.
forward(inputs=None, encoder_hidden=None, encoder_outputs=None, function=<function log_softmax>, teacher_forcing_ratio=0, retain_output_probs=True)

Forward rnn for MAX_LENGTH steps. Look at seq2seq.models.DecoderRNN.DecoderRNN.forward_rnn() for details.

attention

class seq2seq.models.attention.Attention(dim)

Applies an attention mechanism on the output features from the decoder.

\[\begin{split}\begin{array}{ll} x = context*output \\ attn = exp(x_i) / sum_j exp(x_j) \\ output = \tanh(w * (attn * context) + b * output) \end{array}\end{split}\]
Parameters:dim (int) – The number of expected features in the output
Inputs: output, context
  • output (batch, output_len, dimensions): tensor containing the output features from the decoder.
  • context (batch, input_len, dimensions): tensor containing features of the encoded input sequence.
Outputs: output, attn
  • output (batch, output_len, dimensions): tensor containing the attended output features from the decoder.
  • attn (batch, output_len, input_len): tensor containing attention weights.
Variables:
  • linear_out (torch.nn.Linear) – applies a linear transformation to the incoming data: \(y = Ax + b\).
  • mask (torch.Tensor, optional) – applies a \(-inf\) to the indices specified in the Tensor.

Examples:

>>> attention = seq2seq.models.Attention(256)
>>> context = Variable(torch.randn(5, 3, 256))
>>> output = Variable(torch.randn(5, 5, 256))
>>> output, attn = attention(output, context)
set_mask(mask)

Sets indices to be masked

Parameters:mask (torch.Tensor) – tensor containing indices to be masked

seq2seq

class seq2seq.models.seq2seq.Seq2seq(encoder, decoder, decode_function=<function log_softmax>)

Standard sequence-to-sequence architecture with configurable encoder and decoder.

Parameters:
  • encoder (EncoderRNN) – object of EncoderRNN
  • decoder (DecoderRNN) – object of DecoderRNN
  • decode_function (func, optional) – function to generate symbols from output hidden states (default: F.log_softmax)
Inputs: input_variable, input_lengths, target_variable, teacher_forcing_ratio
  • input_variable (list, option): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the encoder.
  • input_lengths (list of int, optional): A list that contains the lengths of sequences
    in the mini-batch, it must be provided when using variable length RNN (default: None)
  • target_variable (list, optional): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the decoder.
  • teacher_forcing_ratio (int, optional): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0)
Outputs: decoder_outputs, decoder_hidden, ret_dict
  • decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
  • decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
  • ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs, KEY_INPUT : target outputs if provided for decoding, KEY_ATTN_SCORE : list of sequences, where each list is of attention weights }.