CrfPlain#

class CrfPlain#

Plaintext implementation of Completely Random Forest model.

get_num_input_cols(self: pyhelayers.CrfPlain) → int#

Returns the number of columns that the input samples should have.

Return type:: int

predict(self: pyhelayers.CrfPlain, input: numpy.ndarray[numpy.float64]) → numpy.ndarray[numpy.float64]#

Predicts the categories of the given input samples.

Parameters:: input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.
Returns:: The i-th integer in the returned vector is the predicted category for the i-th input sample.
Return type:: tensor of type double

predict_count_all_trees(self: pyhelayers.CrfPlain, input: numpy.ndarray[numpy.float64]) → numpy.ndarray[numpy.float64]#

For each sample s, tree t and for the leaf l_{t,s} that the sample s has reached on tree t, this function returns the number of training samples of each category that have reached the leaf l_{t,s} during the training phase.

Parameters:: input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.
Returns:: The output prediction will be stored here. output.at(t,s,i) specifies the number of training samples of category i that have reached the leaf l_{t,s} (defined above) during the training phase.
Return type:: tensor of type double

predict_proba(self: pyhelayers.CrfPlain, input: numpy.ndarray[numpy.float64]) → numpy.ndarray[numpy.float64]#

Predicts the probability of the given input to belong to each of the categories 0 and 1.

Parameters:: input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.
Returns:: The output.at(s,i) specifies the predicted probability of the s-th sample to belong to the i-th category.
Return type:: tensor of type double

set_category_weights(self: pyhelayers.CrfPlain, category_weights: Tuple[float, float]) → None#

Sets the category weights.

Parameters:: category_weights – categoryWeights[i] specifies the weight that will be given to a tree predicting that the given samples has the category i.

set_nan_handling_method(self: pyhelayers.CrfPlain, nan_handling: str, nan_proba: Tuple[float, float] = (0.0, 0.0)) → None#

Specifies handling of the cases when both positive and negative counts are equal to 0.

Parameters:

nan_handling (string) – Specifies how “nan” probabilities should be handled. If “nan”, “nan” probabilities are returned. If “data_ratio”, the predicted probability of a category c is set to the weighted ratio of training samples which had the category c (for category weights see the set_category_weights() description). If “value”, the values specified in “nan_proba” are returned. The default nan handling method in CrfPlain object is “data_ratio”.
nan_proba (Tuple[float, float]) – In case “nan_handling” is equal to “data_ratio”, specifies the probabilities to return when both positive and negative counts are equal to 0. Defaults to (0, 0).

Raises:

ValueError – If “nan_handling” is not “nan”, “data_ratio” or “value”.

set_prediction_method(self: pyhelayers.CrfPlain, pm: str = 'sum') → None#

Sets the prediction method.

Parameters:: pm (string) – The prediction method. If pm is “sum”, then the counters of the resulting leafs of each tree are summed without normalizing. If pm is “voting”, then the counters of the resulting leafs are normalized before summing. Otherwise, ValueError exception is raised. Defaults to “sum”.
Raises:: ValueError – If “pm” is not “sum” or “voting”.

class CrfPlain : public helayers::Saveable#

Public Functions

inline CrfPlain()#: A constructor. Creates an empty object.

inline CrfPlain(const HeContext &he)#

A constructor from HeContext.

Creates an empty object. Note that the received HeContext is ignored in this constructor, and this constructor is added for consistency with other classes that inherit from Saveable. This consistency is needed to support MACROs that are used in these classes (see REGISTER_SAVEABLE in Saveable.h, for example).

Parameters:: he – The ignored HeContext.

inline void setCategoryWeights(const std::pair<double, double> &categoryWeights)#

Sets the category weights.

Parameters:: categoryWeights – categoryWeights[i] specifies the weight that will be given to a tree predicting that the given samples has the category i.

inline void setPredictionMethod(CrfPredictMethod pm)#

Sets the prediction method.

Parameters:: pm – The prediction method (see also CrfPredictMethod documentation).

inline void setNanHandlingMethod(CrfNanHandlingMethod _nanHandlingMethod, const std::pair<double, double> &_nanProba = {0, 0})#

Sets the prediction algorithm of samples that would result with “nan” probabilities when following the usual prediction algorithm.

See also “CrfNaNHandlingMethod” documentation.

Parameters:

_nanHandlingMethod – the nan handling method
_nanProba – In case _nanHandlingMethod is REPLACE_WITH_USER_VALUE, the values replacing the “nan” probabilities will be taken from this vector. _nanProba[c] is the probability of category c.

void predictCountAllTrees(DoubleTensor &output, const DoubleTensor &input) const#

Parameters:

output – The output prediction will be stored here. output.at(t,s,i) specifies the number of training samples of category i that have reached the leaf l_{t,s} (defined above) during the training phase.
input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.

void predict(DoubleTensor &output, const DoubleTensor &input) const#

Predicts the categories of the input samples given in input tensor.

Parameters:

output – The output prediction will be stored here. output.at(i) specifies the predicted category for the i-th input sample.
input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.

void predictProba(DoubleTensor &output, const DoubleTensor &input) const#

Predicts the probability of the samples in the given input tensor to belong to each of the categories 0 and 1.

Parameters:

output – The output prediction will be stored here. output.at(s,i) specifies the predicted probability of the s-th sample to belong to the i-th category.
input – The input samples. input.at(s,f) is the f-th feature of the s-th sample.

inline size_t getNumInputCols() const#: Returns the number of columns that the input samples should have.

virtual void debugPrint(const std::string &title = "", Verbosity verbosity = VERBOSITY_REGULAR, std::ostream &out = std::cout) const override#

Prints debug info about this CrfPlain class.

For each leaf l, this debug info reports the condition that samples should fulfill to reach the leaf l. Moreover, for each category c, the debug info contains the number of samples of category c that have reached the leaf l during the training phase.

Parameters:

title – Text to add to the print
verbosity – Verbosity level
out – Output stream

Public Static Functions

static void assessResults(const DoubleTensor &predictions, const DoubleTensor &origLabels, int &tp, int &fp, int &tn, int &fn)#

Assess the prediction results.

Parameters:

predictions – The predictions of this CrfPlain. predictions[s] should specify the category of the s-th sample.
origLabels – The original labels. origLabels.at(s) should specify the original label of the s-th sample.
tp – The number of true positives.
fp – The number of false positives.
tn – The number of true negatives.
fn – The number of false negatives.