Base class for random forests models from which all specific models are derived using CRTP.
More...
|
| randomForestBase () |
| Default constructor. More...
|
|
| randomForestBase (const int num_trees, const int num_levels) |
| Full constructor. More...
|
|
bool | readFromFile (const std::string filename, const int trees_used=-1, const int max_depth_used=-1) |
| Read a pre-trained model in from a file. More...
|
|
bool | writeToFile (const std::string filename) const |
| Write a trained model to a .tr file to be stored and re-used. More...
|
|
bool | isValid () const |
| Check whether a forest model is valid. More...
|
|
void | setFeatureDefinitionString (const std::string &header_str, const std::string &feat_str) |
| Store arbitrary strings that define parameters of the feature extraction process. More...
|
|
void | getFeatureDefinitionString (std::string &feat_str) const |
| Retrieve a stored feature string. More...
|
|
template<class TIdIterator , class TLabelIterator , class TFeatureFunctor , class TParameterFunctor > |
void | train (const TIdIterator first_id, const TIdIterator last_id, const TLabelIterator first_label, TFeatureFunctor &&feature_functor, TParameterFunctor &¶meter_functor, const unsigned num_param_combos_to_test, const bool bagging=true, const float bag_proportion=C_DEFAULT_BAGGING_PROPORTION, const bool fit_split_nodes=true, const unsigned min_training_data=C_DEFAULT_MIN_TRAINING_DATA) |
| Train the random forest model on training data. More...
|
|
template<class TIdIterator , class TOutputIterator , class TFeatureFunctor > |
void | predictDistGroupwise (TIdIterator first_id, const TIdIterator last_id, TOutputIterator out_it, TFeatureFunctor &&feature_functor) const |
| Predict the output distribution for a number of IDs. More...
|
|
template<class TIdIterator , class TOutputIterator , class TFeatureFunctor > |
void | predictDistSingle (TIdIterator first_id, const TIdIterator last_id, TOutputIterator out_it, TFeatureFunctor &&feature_functor) const |
| Predict the output distribution for a number of IDs. More...
|
|
template<class TIdIterator , class TLabelIterator , class TOutputIterator , class TFeatureFunctor > |
void | probabilityGroupwise (TIdIterator first_id, const TIdIterator last_id, TLabelIterator label_it, TOutputIterator out_it, const bool single_label, TFeatureFunctor &&feature_functor) const |
| Evaluate the probability of a certain value of the label for a set of data points. More...
|
|
template<class TIdIterator , class TLabelIterator , class TOutputIterator , class TFeatureFunctor > |
void | probabilitySingle (TIdIterator first_id, const TIdIterator last_id, TLabelIterator label_it, TOutputIterator out_it, const bool single_label, TFeatureFunctor &&feature_functor) const |
| Evaluate the probability of a certain value of the label for a set of data points. More...
|
|
template<class TIdIterator , class TLabelIterator , class TOutputIterator , class TBinaryFunction , class TFeatureFunctor , class TPDFFunctor > |
void | probabilityGroupwiseBase (TIdIterator first_id, const TIdIterator last_id, TLabelIterator label_it, TOutputIterator out_it, const bool single_label, TBinaryFunction &&binary_function, TFeatureFunctor &&feature_functor, TPDFFunctor &&pdf_functor) const |
| A generalised version of the probabilityGroupwise() function that enables the creation of more general functions. More...
|
|
template<class TIdIterator , class TLabelIterator , class TOutputIterator , class TBinaryFunction , class TFeatureFunctor , class TPDFFunctor > |
void | probabilitySingleBase (TIdIterator first_id, const TIdIterator last_id, TLabelIterator label_it, TOutputIterator out_it, const bool single_label, TBinaryFunction &&binary_function, TFeatureFunctor &&feature_functor, TPDFFunctor &&pdf_functor) const |
| A generalised version of the probabilitySingle() function that enables the creation of more general functions. More...
|
|
|
template<class TLabelIterator > |
static double | fastDiscreteEntropy (const std::vector< int > &internal_index, const int n_labels, const TLabelIterator first_label, const std::vector< double > &xlogx_precalc) |
| Calculates the entropy of the discrete labels of a set of data points using an efficient method. More...
|
|
template<class TLabelIterator > |
static int | fastDiscreteEntropySplit (const std::vector< scoreInternalIndexStruct > &data_structs, const int n_labels, const TLabelIterator first_label, const std::vector< double > &xlogx_precalc, double &best_split_impurity, float &thresh) |
| Find the split in a set of training data that results in the best information gain for discrete labels. More...
|
|
static std::vector< double > | preCalculateXlogX (const int N) |
| Calculate an array of x*log(x) for integer x. More...
|
|
template<class TDerived, class TLabel, class TNodeDist, class TOutputDist, unsigned TNumParams>
class canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >
Base class for random forests models from which all specific models are derived using CRTP.
This class implements the basic training and testing routines, and some utility functions that may be used by derived classs. This class cannot not be used directly.
- Template Parameters
-
TDerived | The type of the derived random forests model (e.g. classifier, regressor). Having the derived class as a template parameter implements the curiously recurring template (CRTP) idiom, which allows for static polymorphism. |
TLabel | The type of the label that the model is used to predict. This is the output type of the forest model, for example an integer for a classifier or a float for a 1D regressor. |
TNodeDist | The type of the node distribution, which is the distribution stored at each leaf node. The node distribution must have certain characteristics. |
TOutputDist | The type of the output distribution, which is the type of the distribution predicted by the forest model. This may be same as or different from TNodeDist. The output distribution must have certain charaecteristics. |
TNumParams | The number of parameters used by the features callback. |
template<class TDerived , class TLabel , class TNodeDist, class TOutputDist , unsigned TNumParams>
template<class TIdIterator , class TFeatureFunctor >
void canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >::findLeavesGroupwise |
( |
TIdIterator |
first_id, |
|
|
const TIdIterator |
last_id, |
|
|
const int |
treenum, |
|
|
std::vector< const TNodeDist * > & |
leaves, |
|
|
TFeatureFunctor && |
feature_functor |
|
) |
| const |
|
protected |
Function to query a single tree model with a set of data points and store a pointer to the leaf distribution that each reaches.
This is a basic operation that is used by higher-level processes. Using this method, the features needed by a single node are requested from the feature functor for all the IDs with a single function call. This involves some overhead, but may permit efficiencies resulting from calculating multiple features at once.
- Template Parameters
-
TIdIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TId type expected by the feature functor. |
TFeatureFunctor | The type of the feature functor object. Must meet the specifications for a groupwise feature functor object, meaning it must define operator() with a certain form. |
- Parameters
-
first_id | Iterator to the ID of the first data point for which the leaf distribution is to be found. |
last_id | Iterator to the ID of the last data point for which the leaf distribution is to be found. |
treenum | Index of the tree to use. |
leaves | After the function, this array contains pointers to the leaf distribution reached by the corresponding elements in the ID list. Expects to be pre-allocated to the correct size. |
feature_functor | The feature functor object to be used as a callback to calculate the features. Must be safe to call from multiple threads simultaneously. |
template<class TDerived , class TLabel , class TNodeDist , class TOutputDist , unsigned TNumParams>
template<class TIdIterator , class TOutputIterator , class TFeatureFunctor >
void canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >::predictDistGroupwise |
( |
TIdIterator |
first_id, |
|
|
const TIdIterator |
last_id, |
|
|
TOutputIterator |
out_it, |
|
|
TFeatureFunctor && |
feature_functor |
|
) |
| const |
Predict the output distribution for a number of IDs.
This function uses the forest model to predict the full output distribution for each of a number of data points, where each data point is identified by an ID variable.
These ID variables are passed in as a pair of iterators pointing to the first and last IDs to be processed. The output distribution for each of these IDs is placed in a second container accessed by iterators.
In this version of the function, the features needed by a single node are requested from the feature functor for all the IDs with a single function call. This involves some overhead, but may permit efficiencies resulting from calculating multiple features at once.
Uses OpenMP to query the multiple tree models in parallel.
- Template Parameters
-
TIdIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TId type expected by the feature functor. |
TOutputIterator | Type of the iterator to the output distributions. Must be a forward output iterator that dereferences to TOutputDist. |
TFeatureFunctor | The type of the feature functor object. Must meet the specifications for a groupwise feature functor object, meaning it must define operator() with a certain form. |
- Parameters
-
first_id | Iterator to the first ID whose output is to be predicted. |
last_id | Iterator to the last ID whose output is to be predicted. |
out_it | Iterator to the output distribution corresponding to the first ID. The container of output distributions must already exist, and contain enough elements for all of the IDs between first_id and last_id. At the end of this function, the output distributions in this container relate to the corresponding elements of the id container. |
feature_functor | The feature functor object to be used as a callback to calculate the features. Must be safe to call from multiple threads simultaneously. |
template<class TDerived , class TLabel , class TNodeDist , class TOutputDist , unsigned TNumParams>
template<class TIdIterator , class TOutputIterator , class TFeatureFunctor >
void canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >::predictDistSingle |
( |
TIdIterator |
first_id, |
|
|
const TIdIterator |
last_id, |
|
|
TOutputIterator |
out_it, |
|
|
TFeatureFunctor && |
feature_functor |
|
) |
| const |
Predict the output distribution for a number of IDs.
This function uses the forest model to predict the full output distribution for each of a number of data points, where each data poitn is identified by an ID variable.
These ID variables are passed in as a pair of iterators pointing to the first and last IDs to be processed. The output distribution for each of these IDs is placed in a second container accessed by iterators.
In this version of the function, the features needed by a single node are requested from the feature functor one-by-one.
Uses OpenMP to query the multiple tree models in parallel.
- Template Parameters
-
TIdIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TId type expected by the feature functor. |
TOutputIterator | Type of the iterator to the output distributions. Must be a forward output iterator that dereferences to TOutputDist. |
TFeatureFunctor | The type of the feature functor object. Must meet the specifications for a single feature functor object, meaning it must define operator() with a certain form. |
- Parameters
-
first_id | Iterator to the first ID whose output is to be predicted. |
last_id | Iterator to the last ID whose output is to be predicted. |
out_it | Iterator to the output distribution corresponding to the first ID. The container of output distributions must already exist, and contain enough elements for all of the IDs between first_id and last_id. At the end of this function, the output distributions in this container relate to the corresponding elements of the id container. |
feature_functor | The feature functor object to be used as a callback to calculate the features. Must be safe to call from multiple threads simultaneously. |
template<class TDerived , class TLabel , class TNodeDist , class TOutputDist , unsigned TNumParams>
template<class TIdIterator , class TLabelIterator , class TOutputIterator , class TFeatureFunctor >
void canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >::probabilityGroupwise |
( |
TIdIterator |
first_id, |
|
|
const TIdIterator |
last_id, |
|
|
TLabelIterator |
label_it, |
|
|
TOutputIterator |
out_it, |
|
|
const bool |
single_label, |
|
|
TFeatureFunctor && |
feature_functor |
|
) |
| const |
Evaluate the probability of a certain value of the label for a set of data points.
This function uses the forest model to evaluate the probability of a given value of the label (output) variable for a number of data points, where each point data is identified by an ID variable.
These ID variables are passed in as a pair of iterators pointing to the first and last IDs to be processed. The value of the label for which the probability should be evaluated is passed in as a second iterator. The probability of the label for each of these IDs is placed in a third container accessed by iterators.
In this version of the function, the features needed by a single node are requested from the feature functor for all the IDs with a single function call. This involves some overhead, but may permit efficiencies resulting from calculating multiple features at once.
Uses OpenMP to query the multiple tree models in parallel.
- Template Parameters
-
TIdIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TId type expected by the feature functor. |
TLabelIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TLabel type of the forest (or to something trivially convertible to that type). |
TOutputIterator | Type of the iterator to the output. Must be a forward output iterator that dereferences to a type that supports assignment to float. |
TFeatureFunctor | The type of the feature functor object. Must meet the specifications for a groupwise feature functor object, meaning it must define operator() with a certain form. |
- Parameters
-
first_id | Iterator to the ID of the first data point for which the probability of the label is to be evaluated. |
last_id | Iterator to the ID of the last data point for which the probability of the label is to be evaluated. |
label_it | Iterator to the label variable whose probability is to be evaluated. |
out_it | Iterator to the output probability value for the first ID. The container of output values must already exist, and contain enough elements for all of the IDs between first_id and last_id. At the end of this function, the output values in this container relate to the corresponding elements of the id container. |
single_label | If true, the value of the label whose probability is evaluated is the same for all the data points. This means that the label_it iterator is never advanced. If false, the value of the label is not necessarily the same for all data points, and the label_it iterator is advanced for each data point to give the value of the label to use. |
feature_functor | The feature functor object to be used as a callback to calculate the features. Must be safe to call from multiple threads simultaneously. |
template<class TDerived , class TLabel , class TNodeDist , class TOutputDist , unsigned TNumParams>
template<class TIdIterator , class TLabelIterator , class TOutputIterator , class TBinaryFunction , class TFeatureFunctor , class TPDFFunctor >
void canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >::probabilityGroupwiseBase |
( |
TIdIterator |
first_id, |
|
|
const TIdIterator |
last_id, |
|
|
TLabelIterator |
label_it, |
|
|
TOutputIterator |
out_it, |
|
|
const bool |
single_label, |
|
|
TBinaryFunction && |
binary_function, |
|
|
TFeatureFunctor && |
feature_functor, |
|
|
TPDFFunctor && |
pdf_functor |
|
) |
| const |
A generalised version of the probabilityGroupwise()
function that enables the creation of more general functions.
A generalised version of the probabilityGroupwise()
function. There are two generalisations:
- The pdf value may be calculated from the node distribution in some way other than the calling the pdf() method. This enables, for example, accessing one distribution from a node distribution that contains multiple distributions over different variables. This behaviour is controlled by the pdf_functor object.
- The output probability value may be used for something other than simple assignment to a variable. This may be used, for example, to use the output value to update some other variable (via multiplication or addtition etc) in a single step without having to store results in a temporary array. This behaviour is controlled by the binary_function functor object.
Unless otherwise specified, the behaviour is the same as the probabilityGroupwise()
function.
- Template Parameters
-
TIdIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TId type expected by the feature functor. |
TLabelIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TLabel type of the forest (or to something trivially convertible to that type). |
TOutputIterator | Type of the iterator to the output. Must be a forward output iterator that dereferences to a type that supports assignment to float. |
TBinaryFunction | The type of the binary_function argument. Must be a function object that has an operator() of the form float operator()(TOutput, float) where TOutput is the type that TOutputIterator dereferences to. |
TFeatureFunctor | The type of the feature functor object. Must meet the specifications for a groupwise feature functor object, meaning it must define operator() with a certain form. |
TPDFFunctor | The type of the pdf_functor argument. Must be a function object that has an operator() of the form float operator()(TNodeDist*, TLabel, TId). |
- Parameters
-
first_id | Iterator to the ID of the first data point for which the probability of the label is to be evaluated. |
last_id | Iterator to the ID of the last data point for which the probability of the label is to be evaluated. |
label_it | Iterator to the label variable whose probability is to be evaluated. |
out_it | Iterator to the output probability value for the first ID. The container of output values must already exist, and contain enough elements for all of the IDs between first_id and last_id. At the end of this function, the output values in this container relate to the corresponding elements of the id container. |
single_label | If true, the value of the label whose probability is evaluated is the same for all the data points. This means that the label_it iterator is never advanced. If false, the value of the label is not necessarily the same for all data points, and the label_it iterator is advanced for each data point to give the value of the label to use. |
binary_function | A function object that takes the current value of the output variable (first argument) and the forest's predicted probability value (second) argument and returns the value that is then assigned to the output variable. |
feature_functor | The feature functor object to be used as a callback to calculate the features. Must be safe to call from multiple threads simultaneously. |
pdf_functor | A function object that takes a pointer to the leaf distribution reached by the forest (first argument), a lable value (second argument), and an ID (third argument) and returns the value used as the pdf for the that leaf distribution. |
template<class TDerived , class TLabel , class TNodeDist , class TOutputDist , unsigned TNumParams>
template<class TIdIterator , class TLabelIterator , class TOutputIterator , class TFeatureFunctor >
void canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >::probabilitySingle |
( |
TIdIterator |
first_id, |
|
|
const TIdIterator |
last_id, |
|
|
TLabelIterator |
label_it, |
|
|
TOutputIterator |
out_it, |
|
|
const bool |
single_label, |
|
|
TFeatureFunctor && |
feature_functor |
|
) |
| const |
Evaluate the probability of a certain value of the label for a set of data points.
This function uses the forest model to evaluate the probability of a given value of the label (output) variable for a number of data points, where each point data is identified by an ID variable.
These ID variables are passed in as a pair of iterators pointing to the first and last IDs to be processed. The value of the label for which the probability should be evaluated is passed in as a second iterator. The probability of the label for each of these IDs is placed in a third container accessed by iterators.
In this version of the function, the features needed by a single node are requested from the feature functor one-by-one.
Uses OpenMP to query the multiple tree models in parallel.
- Template Parameters
-
TIdIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TId type expected by the feature functor. |
TLabelIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TLabel type of the forest (or to something trivially convertible to that type). |
TOutputIterator | Type of the iterator to the output. Must be a forward output iterator that dereferences to a type that supports assignment to float. |
TFeatureFunctor | The type of the feature functor object. Must meet the specifications for a single feature functor object, meaning it must define operator() with a certain form. |
- Parameters
-
first_id | Iterator to the ID of the first data point for which the probability of the label is to be evaluated. |
last_id | Iterator to the ID of the last data point for which the probability of the label is to be evaluated. |
label_it | Iterator to the label variable whose probability is to be evaluated. |
out_it | Iterator to the output probability value for the first ID. The container of output values must already exist, and contain enough elements for all of the IDs between first_id and last_id. At the end of this function, the output values in this container relate to the corresponding elements of the id container. |
single_label | If true, the value of the label whose probability is evaluated is the same for all the data points. This means that the label_it iterator is never advanced. If false, the value of the label is not necessarily the same for all data points, and the label_it iterator is advanced for each data point to give the value of the label to use. |
feature_functor | The feature functor object to be used as a callback to calculate the features. Must be safe to call from multiple threads simultaneously. |
template<class TDerived , class TLabel , class TNodeDist , class TOutputDist , unsigned TNumParams>
template<class TIdIterator , class TLabelIterator , class TOutputIterator , class TBinaryFunction , class TFeatureFunctor , class TPDFFunctor >
void canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >::probabilitySingleBase |
( |
TIdIterator |
first_id, |
|
|
const TIdIterator |
last_id, |
|
|
TLabelIterator |
label_it, |
|
|
TOutputIterator |
out_it, |
|
|
const bool |
single_label, |
|
|
TBinaryFunction && |
binary_function, |
|
|
TFeatureFunctor && |
feature_functor, |
|
|
TPDFFunctor && |
pdf_functor |
|
) |
| const |
A generalised version of the probabilitySingle()
function that enables the creation of more general functions.
A generalised version of the probabilitySingle()
function. There are two generalisations:
- The pdf value may be calculated from the node distribution in some way other than the calling the pdf() method. This enables, for example, accessing one distribution from a node distribution that contains multiple distributions over different variables. This behaviour is controlled by the pdf_functor object.
- The output probability value may be used for something other than simple assignment to a variable. This may be used, for example, to use the output value to update some other variable (via multiplication or addtition etc) in a single step without having to store results in a temporary array. This behaviour is controlled by the binary_function functor object.
Unless otherwise specified, the behaviour is the same as the probabilitySingle()
function.
- Template Parameters
-
TIdIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TId type expected by the feature functor. |
TLabelIterator | Type of the iterator to the IDs. Must be a random access iterator and dereference to the TLabel type of the forest (or to something trivially convertible to that type). |
TOutputIterator | Type of the iterator to the output. Must be a forward output iterator that dereferences to a type that supports assignment to float. |
TBinaryFunction | The type of the binary_function argument. Must be a function object that has an operator() of the form float operator()(TOutput, float) where TOutput is the type that TOutputIterator dereferences to. |
TFeatureFunctor | The type of the feature functor object. Must meet the specifications for a single feature functor, meaning it must define operator() with a certain form. |
TPDFFunctor | The type of the pdf_functor argument. Must be a function object that has an operator() of the form float operator()(TNodeDist*, TLabel, TId). |
- Parameters
-
first_id | Iterator to the ID of the first data point for which the probability of the label is to be evaluated. |
last_id | Iterator to the ID of the last data point for which the probability of the label is to be evaluated. |
label_it | Iterator to the label variable whose probability is to be evaluated. |
out_it | Iterator to the output probability value for the first ID. The container of output values must already exist, and contain enough elements for all of the IDs between first_id and last_id. At the end of this function, the output values in this container relate to the corresponding elements of the id container. |
single_label | If true, the value of the label whose probability is evaluated is the same for all the data points. This means that the label_it iterator is never advanced. If false, the value of the label is not necessarily the same for all data points, and the label_it iterator is advanced for each data point to give the value of the label to use. |
binary_function | A function object that takes the current value of the output variable (first argument) and the forest's predicted probability value (second) argument and returns the value that is then assigned to the output variable. |
feature_functor | The feature functor object to be used as a callback to calculate the features. Must be safe to call from multiple threads simultaneously. |
pdf_functor | A function object that takes a pointer to the leaf distribution reached by the forest (first argument), a lable value (second argument), and an ID (third argument) and returns the value used as the pdf for the that leaf distribution. |
template<class TDerived , class TLabel , class TNodeDist , class TOutputDist , unsigned TNumParams>
template<class TIdIterator , class TLabelIterator , class TFeatureFunctor , class TParameterFunctor >
void canopy::randomForestBase< TDerived, TLabel, TNodeDist, TOutputDist, TNumParams >::train |
( |
const TIdIterator |
first_id, |
|
|
const TIdIterator |
last_id, |
|
|
const TLabelIterator |
first_label, |
|
|
TFeatureFunctor && |
feature_functor, |
|
|
TParameterFunctor && |
parameter_functor, |
|
|
const unsigned |
num_param_combos_to_test, |
|
|
const bool |
bagging = true , |
|
|
const float |
bag_proportion = C_DEFAULT_BAGGING_PROPORTION , |
|
|
const bool |
train_split_nodes = true , |
|
|
const unsigned |
min_training_data = C_DEFAULT_MIN_TRAINING_DATA |
|
) |
| |
Train the random forest model on training data.
This function trains the random forest model to produce a valid model that may used for predictions or stored for future use. It takes iterators pointing to the IDs of the training data and the corresponding label variables, and functors to generate parameters of the feature functor and evaluate the features.
This function uses OpenMP to train the trees in parallel threads.
- Template Parameters
-
TIdIterator | Type of the iterator used to access the training IDs. Must be a random access iterator that dereferences to the ID type expected by feature_functor. |
TLabelIterator | Type of the iterator used to access the label variables. Must be a random access iterator that dereferences to type TLabel. |
TFeatureFunctor | Type of the feature_functor parameter. Must be a groupwise feature functor object with an operator() of a specified form. |
TParameterFunctor | Type of the feature_functor parameter. Must be a parameter generator functor object with an operator() of the form void operator()(std::array<int,TNumParams>&) |
- Parameters
-
first_id | Iterator to the ID of the first element in the training list. |
last_id | Iterator to the ID of the last element in the training list. |
first_label | Iterator to the label of the first element in the training list. This iterator will be advanced to find the labels of the subsequent IDs. |
feature_functor | The function object that should be used to evaluate the features when training the split nodes. Must be safe to call from multiple threads simultaneously. |
parameter_functor | The function object that should be called to generate a random set of split nodes parameters for use in the feature_functor. Should take a std::array<int,TNumParams> by reference and populate the elements with a valid combination of randomly chosen parameters. Must be safe to call from multiple threads simultaneously. |
num_param_combos_to_test | The number of parameter combinations to test when training each split node. |
bagging | If true, a random subset of the training data are used to train each tree. If false, the full set of training data are used to train each tree. Default: true. |
bag_proportion | Proportion of the training data in the bag used to train each tree if bagging is true. If bagging is false, this parameter is ignored. If the value is not in the range 0 to 1, the training procedure will fail immediately. Default: C_DEFAULT_BAGGING_PROPORTION . |
train_split_nodes | If true, a node distribution is fitted at every node in the forest, regardless of the lead nodes. This is typically slightly more time consuming and results is a larger .tr, but allows the trained model to be tested using a smaller depth than it was trained at. If false, the node distributions are only fitted to the leaf nodes. Default: true. |
min_training_data | The threshold number of training data points in a node below which a leaf node is declared during training. Default: C_DEFAULT_MIN_TRAINING_DATA . |