CrfPlain#

class CrfPlain#

Plaintext implementation of Completely Random Forest model.

get_num_input_cols(self: pyhelayers.CrfPlain) int#

Returns the number of columns that the input samples should have.

Return type:

int

predict(self: pyhelayers.CrfPlain, input: numpy.ndarray[numpy.float64]) numpy.ndarray[numpy.float64]#

Predicts the categories of the given input samples.

Parameters:

input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.

Returns:

The i-th integer in the returned vector is the predicted category for the i-th input sample.

Return type:

tensor of type double

predict_count_all_trees(self: pyhelayers.CrfPlain, input: numpy.ndarray[numpy.float64]) numpy.ndarray[numpy.float64]#

For each sample s, tree t and for the leaf l_{t,s} that the sample s has reached on tree t, this function returns the number of training samples of each category that have reached the leaf l_{t,s} during the training phase.

Parameters:

input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.

Returns:

The output prediction will be stored here. output.at(t,s,i) specifies the number of training samples of category i that have reached the leaf l_{t,s} (defined above) during the training phase.

Return type:

tensor of type double

predict_proba(self: pyhelayers.CrfPlain, input: numpy.ndarray[numpy.float64]) numpy.ndarray[numpy.float64]#

Predicts the probability of the given input to belong to each of the categories 0 and 1.

Parameters:

input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.

Returns:

The output.at(s,i) specifies the predicted probability of the s-th sample to belong to the i-th category.

Return type:

tensor of type double

set_category_weights(self: pyhelayers.CrfPlain, category_weights: Tuple[float, float]) None#

Sets the category weights.

Parameters:

category_weights – categoryWeights[i] specifies the weight that will be given to a tree predicting that the given samples has the category i.

set_nan_handling_method(self: pyhelayers.CrfPlain, nan_handling: str, nan_proba: Tuple[float, float] = (0.0, 0.0)) None#

Specifies handling of the cases when both positive and negative counts are equal to 0.

Parameters:
  • nan_handling (string) – Specifies how “nan” probabilities should be handled. If “nan”, “nan” probabilities are returned. If “data_ratio”, the predicted probability of a category c is set to the weighted ratio of training samples which had the category c (for category weights see the set_category_weights() description). If “value”, the values specified in “nan_proba” are returned. The default nan handling method in CrfPlain object is “data_ratio”.

  • nan_proba (Tuple[float, float]) – In case “nan_handling” is equal to “data_ratio”, specifies the probabilities to return when both positive and negative counts are equal to 0. Defaults to (0, 0).

Raises:

ValueError – If “nan_handling” is not “nan”, “data_ratio” or “value”.

set_prediction_method(self: pyhelayers.CrfPlain, pm: str = 'sum') None#

Sets the prediction method.

Parameters:

pm (string) – The prediction method. If pm is “sum”, then the counters of the resulting leafs of each tree are summed without normalizing. If pm is “voting”, then the counters of the resulting leafs are normalized before summing. Otherwise, ValueError exception is raised. Defaults to “sum”.

Raises:

ValueError – If “pm” is not “sum” or “voting”.