Models¶

baseRNN¶

A base class for RNN.

class seq2seq.models.baseRNN.BaseRNN(vocab_size, max_len, hidden_size, input_dropout_p, dropout_p, n_layers, rnn_cell)¶

Applies a multi-layer RNN to an input sequence. .. note:: Do not use this class directly, use one of the sub classes.

Parameters:

vocab_size (int) – size of the vocabulary
max_len (int) – maximum allowed length for the sequence to be processed
hidden_size (int) – number of features in the hidden state h
input_dropout_p (float) – dropout probability for the input sequence
dropout_p (float) – dropout probability for the output sequence
n_layers (int) – number of recurrent layers
rnn_cell (str) – type of RNN cell (Eg. ‘LSTM’ , ‘GRU’)

Inputs: *args, **kwargs

*args: variable length argument list.
**kwargs: arbitrary keyword arguments.

Variables:	SYM_MASK – masking symbol SYM_EOS – end-of-sequence symbol

EncoderRNN¶

class seq2seq.models.EncoderRNN.EncoderRNN(vocab_size, max_len, hidden_size, input_dropout_p=0, dropout_p=0, n_layers=1, bidirectional=False, rnn_cell='gru', variable_lengths=False, embedding=None, update_embedding=True)¶

Applies a multi-layer RNN to an input sequence.

Parameters:

vocab_size (int) – size of the vocabulary
max_len (int) – a maximum allowed length for the sequence to be processed
hidden_size (int) – the number of features in the hidden state h
input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
n_layers (int, optional) – number of recurrent layers (default: 1)
bidirectional (bool, optional) – if True, becomes a bidirectional encodr (defulat False)
rnn_cell (str, optional) – type of RNN cell (default: gru)
variable_lengths (bool, optional) – if use variable length RNN (default: False)
embedding (torch.Tensor, optional) – Pre-trained embedding. The size of the tensor has to match the size of the embedding parameter: (vocab_size, hidden_size). The embedding layer would be initialized with the tensor if provided (default: None).
update_embedding (bool, optional) – If the embedding should be updated during training (default: False).

Inputs: inputs, input_lengths

inputs: list of sequences, whose length is the batch size and within which each sequence is a list of token IDs.
input_lengths (list of int, optional): list that contains the lengths of sequences

in the mini-batch, it must be provided when using variable length RNN (default: None)

Outputs: output, hidden

output (batch, seq_len, hidden_size): tensor containing the encoded features of the input sequence
hidden (num_layers * num_directions, batch, hidden_size): tensor containing the features in the hidden state h

Examples:

>>> encoder = EncoderRNN(input_vocab, max_seq_length, hidden_size)
>>> output, hidden = encoder(input)

forward(input_var, input_lengths=None)¶

Applies a multi-layer RNN to an input sequence.

Parameters:	input_var (batch, seq_len) – tensor containing the features of the input sequence. input_lengths (list of int, optional) – A list that contains the lengths of sequences in the mini-batch

Returns: output, hidden

output (batch, seq_len, hidden_size): variable containing the encoded features of the input sequence
hidden (num_layers * num_directions, batch, hidden_size): variable containing the features in the hidden state h

DecoderRNN¶

class seq2seq.models.DecoderRNN.DecoderRNN(vocab_size, max_len, hidden_size, sos_id, eos_id, n_layers=1, rnn_cell='gru', bidirectional=False, input_dropout_p=0, dropout_p=0, use_attention=False)¶

Provides functionality for decoding in a seq2seq framework, with an option for attention.

Parameters:

vocab_size (int) – size of the vocabulary
max_len (int) – a maximum allowed length for the sequence to be processed
hidden_size (int) – the number of features in the hidden state h
sos_id (int) – index of the start of sentence symbol
eos_id (int) – index of the end of sentence symbol
n_layers (int, optional) – number of recurrent layers (default: 1)
rnn_cell (str, optional) – type of RNN cell (default: gru)
bidirectional (bool, optional) – if the encoder is bidirectional (default False)
input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
use_attention (bool, optional) – flag indication whether to use attention mechanism or not (default: false)

Variables:

KEY_ATTN_SCORE (str) – key used to indicate attention weights in ret_dict
KEY_LENGTH (str) – key used to indicate a list representing lengths of output sequences in ret_dict
KEY_SEQUENCE (str) – key used to indicate a list of sequences in ret_dict

Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio

inputs (batch, seq_len, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default None)
encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder. (default None)
encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).

Outputs: decoder_outputs, decoder_hidden, ret_dict

decoder_outputs (seq_len, batch, vocab_size): list of tensors with size (batch_size, vocab_size) containing the outputs of the decoding function.
decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs }.

TopKDecoder¶

class seq2seq.models.TopKDecoder.TopKDecoder(decoder_rnn, k)¶

Top-K decoding with beam search.

Parameters:	decoder_rnn (DecoderRNN) – An object of DecoderRNN used for decoding. k (int) – Size of the beam.

Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio

inputs (seq_len, batch, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default is None)
encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder.
encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).

Outputs: decoder_outputs, decoder_hidden, ret_dict

decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
ret_dict: dictionary containing additional information as follows {length : list of integers representing lengths of output sequences, topk_length: list of integers representing lengths of beam search sequences, sequence : list of sequences, where each sequence is a list of predicted token IDs, topk_sequence : list of beam search sequences, each beam is a list of token IDs, inputs : target outputs if provided for decoding}.

forward(inputs=None, encoder_hidden=None, encoder_outputs=None, function=<function log_softmax>, teacher_forcing_ratio=0, retain_output_probs=True)¶: Forward rnn for MAX_LENGTH steps. Look at seq2seq.models.DecoderRNN.DecoderRNN.forward_rnn() for details.

attention¶

class seq2seq.models.attention.Attention(dim)¶

Applies an attention mechanism on the output features from the decoder.

\[\begin{split}\begin{array}{ll} x = context*output \\ attn = exp(x_i) / sum_j exp(x_j) \\ output = \tanh(w * (attn * context) + b * output) \end{array}\end{split}\]

Parameters:	dim (int) – The number of expected features in the output

Inputs: output, context

output (batch, output_len, dimensions): tensor containing the output features from the decoder.
context (batch, input_len, dimensions): tensor containing features of the encoded input sequence.

Outputs: output, attn

output (batch, output_len, dimensions): tensor containing the attended output features from the decoder.
attn (batch, output_len, input_len): tensor containing attention weights.

Variables:	linear_out (torch.nn.Linear) – applies a linear transformation to the incoming data: \(y = Ax + b\). mask (torch.Tensor, optional) – applies a \(-inf\) to the indices specified in the Tensor.

Examples:

>>> attention = seq2seq.models.Attention(256)
>>> context = Variable(torch.randn(5, 3, 256))
>>> output = Variable(torch.randn(5, 5, 256))
>>> output, attn = attention(output, context)

set_mask(mask)¶

Sets indices to be masked

Parameters:	mask (torch.Tensor) – tensor containing indices to be masked

seq2seq¶

class seq2seq.models.seq2seq.Seq2seq(encoder, decoder, decode_function=<function log_softmax>)¶

Standard sequence-to-sequence architecture with configurable encoder and decoder.

Parameters:	encoder (EncoderRNN) – object of EncoderRNN decoder (DecoderRNN) – object of DecoderRNN decode_function (func, optional) – function to generate symbols from output hidden states (default: F.log_softmax)

Inputs: input_variable, input_lengths, target_variable, teacher_forcing_ratio

input_variable (list, option): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the encoder.
input_lengths (list of int, optional): A list that contains the lengths of sequences

in the mini-batch, it must be provided when using variable length RNN (default: None)
target_variable (list, optional): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the decoder.
teacher_forcing_ratio (int, optional): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0)

Outputs: decoder_outputs, decoder_hidden, ret_dict

decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs, KEY_INPUT : target outputs if provided for decoding, KEY_ATTN_SCORE : list of sequences, where each list is of attention weights }.