Models¶
baseRNN¶
A base class for RNN.
-
class
seq2seq.models.baseRNN.
BaseRNN
(vocab_size, max_len, hidden_size, input_dropout_p, dropout_p, n_layers, rnn_cell)¶ Applies a multi-layer RNN to an input sequence. .. note:: Do not use this class directly, use one of the sub classes.
Parameters: - vocab_size (int) – size of the vocabulary
- max_len (int) – maximum allowed length for the sequence to be processed
- hidden_size (int) – number of features in the hidden state h
- input_dropout_p (float) – dropout probability for the input sequence
- dropout_p (float) – dropout probability for the output sequence
- n_layers (int) – number of recurrent layers
- rnn_cell (str) – type of RNN cell (Eg. ‘LSTM’ , ‘GRU’)
- Inputs:
*args
,**kwargs
*args
: variable length argument list.**kwargs
: arbitrary keyword arguments.
Variables: - SYM_MASK – masking symbol
- SYM_EOS – end-of-sequence symbol
EncoderRNN¶
-
class
seq2seq.models.EncoderRNN.
EncoderRNN
(vocab_size, max_len, hidden_size, input_dropout_p=0, dropout_p=0, n_layers=1, bidirectional=False, rnn_cell='gru', variable_lengths=False, embedding=None, update_embedding=True)¶ Applies a multi-layer RNN to an input sequence.
Parameters: - vocab_size (int) – size of the vocabulary
- max_len (int) – a maximum allowed length for the sequence to be processed
- hidden_size (int) – the number of features in the hidden state h
- input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
- dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
- n_layers (int, optional) – number of recurrent layers (default: 1)
- bidirectional (bool, optional) – if True, becomes a bidirectional encodr (defulat False)
- rnn_cell (str, optional) – type of RNN cell (default: gru)
- variable_lengths (bool, optional) – if use variable length RNN (default: False)
- embedding (torch.Tensor, optional) – Pre-trained embedding. The size of the tensor has to match the size of the embedding parameter: (vocab_size, hidden_size). The embedding layer would be initialized with the tensor if provided (default: None).
- update_embedding (bool, optional) – If the embedding should be updated during training (default: False).
- Inputs: inputs, input_lengths
- inputs: list of sequences, whose length is the batch size and within which each sequence is a list of token IDs.
- input_lengths (list of int, optional): list that contains the lengths of sequences
- in the mini-batch, it must be provided when using variable length RNN (default: None)
- Outputs: output, hidden
- output (batch, seq_len, hidden_size): tensor containing the encoded features of the input sequence
- hidden (num_layers * num_directions, batch, hidden_size): tensor containing the features in the hidden state h
Examples:
>>> encoder = EncoderRNN(input_vocab, max_seq_length, hidden_size) >>> output, hidden = encoder(input)
-
forward
(input_var, input_lengths=None)¶ Applies a multi-layer RNN to an input sequence.
Parameters: - input_var (batch, seq_len) – tensor containing the features of the input sequence.
- input_lengths (list of int, optional) – A list that contains the lengths of sequences in the mini-batch
- Returns: output, hidden
- output (batch, seq_len, hidden_size): variable containing the encoded features of the input sequence
- hidden (num_layers * num_directions, batch, hidden_size): variable containing the features in the hidden state h
DecoderRNN¶
-
class
seq2seq.models.DecoderRNN.
DecoderRNN
(vocab_size, max_len, hidden_size, sos_id, eos_id, n_layers=1, rnn_cell='gru', bidirectional=False, input_dropout_p=0, dropout_p=0, use_attention=False)¶ Provides functionality for decoding in a seq2seq framework, with an option for attention.
Parameters: - vocab_size (int) – size of the vocabulary
- max_len (int) – a maximum allowed length for the sequence to be processed
- hidden_size (int) – the number of features in the hidden state h
- sos_id (int) – index of the start of sentence symbol
- eos_id (int) – index of the end of sentence symbol
- n_layers (int, optional) – number of recurrent layers (default: 1)
- rnn_cell (str, optional) – type of RNN cell (default: gru)
- bidirectional (bool, optional) – if the encoder is bidirectional (default False)
- input_dropout_p (float, optional) – dropout probability for the input sequence (default: 0)
- dropout_p (float, optional) – dropout probability for the output sequence (default: 0)
- use_attention (bool, optional) – flag indication whether to use attention mechanism or not (default: false)
Variables: - Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
- inputs (batch, seq_len, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default None)
- encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder. (default None)
- encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
- function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
- teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).
- Outputs: decoder_outputs, decoder_hidden, ret_dict
- decoder_outputs (seq_len, batch, vocab_size): list of tensors with size (batch_size, vocab_size) containing the outputs of the decoding function.
- decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
- ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs }.
TopKDecoder¶
-
class
seq2seq.models.TopKDecoder.
TopKDecoder
(decoder_rnn, k)¶ Top-K decoding with beam search.
Parameters: - decoder_rnn (DecoderRNN) – An object of DecoderRNN used for decoding.
- k (int) – Size of the beam.
- Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
- inputs (seq_len, batch, input_size): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. It is used for teacher forcing when provided. (default is None)
- encoder_hidden (num_layers * num_directions, batch_size, hidden_size): tensor containing the features in the hidden state h of encoder. Used as the initial hidden state of the decoder.
- encoder_outputs (batch, seq_len, hidden_size): tensor with containing the outputs of the encoder. Used for attention mechanism (default is None).
- function (torch.nn.Module): A function used to generate symbols from RNN hidden state (default is torch.nn.functional.log_softmax).
- teacher_forcing_ratio (float): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0).
- Outputs: decoder_outputs, decoder_hidden, ret_dict
- decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
- decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
- ret_dict: dictionary containing additional information as follows {length : list of integers representing lengths of output sequences, topk_length: list of integers representing lengths of beam search sequences, sequence : list of sequences, where each sequence is a list of predicted token IDs, topk_sequence : list of beam search sequences, each beam is a list of token IDs, inputs : target outputs if provided for decoding}.
-
forward
(inputs=None, encoder_hidden=None, encoder_outputs=None, function=<function log_softmax>, teacher_forcing_ratio=0, retain_output_probs=True)¶ Forward rnn for MAX_LENGTH steps. Look at
seq2seq.models.DecoderRNN.DecoderRNN.forward_rnn()
for details.
attention¶
-
class
seq2seq.models.attention.
Attention
(dim)¶ Applies an attention mechanism on the output features from the decoder.
\[\begin{split}\begin{array}{ll} x = context*output \\ attn = exp(x_i) / sum_j exp(x_j) \\ output = \tanh(w * (attn * context) + b * output) \end{array}\end{split}\]Parameters: dim (int) – The number of expected features in the output - Inputs: output, context
- output (batch, output_len, dimensions): tensor containing the output features from the decoder.
- context (batch, input_len, dimensions): tensor containing features of the encoded input sequence.
- Outputs: output, attn
- output (batch, output_len, dimensions): tensor containing the attended output features from the decoder.
- attn (batch, output_len, input_len): tensor containing attention weights.
Variables: - linear_out (torch.nn.Linear) – applies a linear transformation to the incoming data: \(y = Ax + b\).
- mask (torch.Tensor, optional) – applies a \(-inf\) to the indices specified in the Tensor.
Examples:
>>> attention = seq2seq.models.Attention(256) >>> context = Variable(torch.randn(5, 3, 256)) >>> output = Variable(torch.randn(5, 5, 256)) >>> output, attn = attention(output, context)
-
set_mask
(mask)¶ Sets indices to be masked
Parameters: mask (torch.Tensor) – tensor containing indices to be masked
seq2seq¶
-
class
seq2seq.models.seq2seq.
Seq2seq
(encoder, decoder, decode_function=<function log_softmax>)¶ Standard sequence-to-sequence architecture with configurable encoder and decoder.
Parameters: - encoder (EncoderRNN) – object of EncoderRNN
- decoder (DecoderRNN) – object of DecoderRNN
- decode_function (func, optional) – function to generate symbols from output hidden states (default: F.log_softmax)
- Inputs: input_variable, input_lengths, target_variable, teacher_forcing_ratio
- input_variable (list, option): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the encoder.
- input_lengths (list of int, optional): A list that contains the lengths of sequences
- in the mini-batch, it must be provided when using variable length RNN (default: None)
- target_variable (list, optional): list of sequences, whose length is the batch size and within which each sequence is a list of token IDs. This information is forwarded to the decoder.
- teacher_forcing_ratio (int, optional): The probability that teacher forcing will be used. A random number is drawn uniformly from 0-1 for every decoding token, and if the sample is smaller than the given value, teacher forcing would be used (default is 0)
- Outputs: decoder_outputs, decoder_hidden, ret_dict
- decoder_outputs (batch): batch-length list of tensors with size (max_length, hidden_size) containing the outputs of the decoder.
- decoder_hidden (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden state of the decoder.
- ret_dict: dictionary containing additional information as follows {KEY_LENGTH : list of integers representing lengths of output sequences, KEY_SEQUENCE : list of sequences, where each sequence is a list of predicted token IDs, KEY_INPUT : target outputs if provided for decoding, KEY_ATTN_SCORE : list of sequences, where each list is of attention weights }.