Optional
chatProperties that describe the chat context.
Optional
decoding_Represents the strategy used for picking the tokens during generation of the output text.
During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters described below. See this url for an informative article about text generation.
Optional
include_Pass false
to omit matched stop sequences from the end of the output text. The default is true
, meaning
that the output will end with the stop sequence text when matched.
Optional
length_It can be used to exponentially increase the likelihood of the text generation terminating once a specified number of tokens have been generated.
Optional
max_The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used.
How the 'token' is defined depends on the tokenizer and vocabulary size, which in turn depends on the model. Often the tokens are a mix of full words and sub-words. To learn more about tokenization, see here.
Depending on the users plan, and on the model being used, there may be an enforced maximum number of new tokens.
Optional
min_If stop sequences are given, they are ignored until minimum tokens are generated.
Optional
random_Random number generator seed to use in sampling mode for experimental repeatability.
Optional
repetition_Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value 1.0 means that there is no penalty.
Optional
return_Properties that control what is returned.
Optional
stop_Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored.
Optional
temperatureA value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in 'less random' output. Values greater than 1.0 flatten the probability distribution, resulting in 'more random' output. A value of 1.0 has no effect.
Optional
time_Time limit in milliseconds - if not completed within this time, generation will stop. The text generated so far will be returned along with the TIME_LIMIT stop reason.
Depending on the users plan, and on the model being used, there may be an enforced maximum time limit.
Optional
top_The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token.
Optional
top_Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. Also known as nucleus sampling. A value of 1.0 is equivalent to disabled.
Optional
truncate_Represents the maximum number of input tokens accepted. This can be used to avoid requests failing due to
input being longer than configured limits. If the text is truncated, then it truncates the start of the input
(on the left), so the end of the input will remain the same. If this value exceeds the maximum sequence length
(refer to the documentation to find this value for the model) then the call will fail if the total number of
tokens exceeds the maximum sequence length
. Zero means don't truncate.
The chat related parameters.