mlpack
master
|
The BinaryNumericSplit class implements the numeric feature splitting strategy devised by Gama, Rocha, and Medas in the following paper: More...
Public Types | |
typedef BinaryNumericSplitInfo< ObservationType > | SplitInfo |
The splitting information required by the BinaryNumericSplit. More... | |
Public Member Functions | |
BinaryNumericSplit (const size_t numClasses) | |
Create the BinaryNumericSplit object with the given number of classes. More... | |
BinaryNumericSplit (const size_t numClasses, const BinaryNumericSplit &other) | |
Create the BinaryNumericSplit object with the given number of classes, using information from the given other split for other parameters. More... | |
void | EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) |
Given the points seen so far, evaluate the fitness function, returning the best possible gain of a binary split. More... | |
size_t | MajorityClass () const |
The majority class of the points seen so far. More... | |
double | MajorityProbability () const |
The probability of the majority class given the points seen so far. More... | |
size_t | NumChildren () const |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int) |
Serialize the object. More... | |
void | Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo) |
Given that a split should happen, return the majority classes of the (two) children and an initialized SplitInfo object. More... | |
void | Train (ObservationType value, const size_t label) |
Train on the given value with the given label. More... | |
Private Attributes | |
ObservationType | bestSplit |
A cached best split point. More... | |
arma::Col< size_t > | classCounts |
The classes we have seen so far (for majority calculations). More... | |
bool | isAccurate |
If true, the cached best split point is accurate (that is, we have not seen any more samples since we calculated it). More... | |
std::multimap< ObservationType, size_t > | sortedElements |
The elements seen so far, in sorted order. More... | |
The BinaryNumericSplit class implements the numeric feature splitting strategy devised by Gama, Rocha, and Medas in the following paper:
This splitting procedure builds a binary tree on points it has seen so far, and then EvaluateFitnessFunction() returns the best possible split in O(n) time, where n is the number of samples seen so far. Every split with this split type returns only two splits (greater than or equal to the split point, and less than the split point). The Train() function should take O(1) time.
FitnessFunction | Fitness function to use for calculating gain. |
ObservationType | Type of observation used by this dimension. |
Definition at line 47 of file binary_numeric_split.hpp.
typedef BinaryNumericSplitInfo<ObservationType> mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::SplitInfo |
The splitting information required by the BinaryNumericSplit.
Definition at line 51 of file binary_numeric_split.hpp.
mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::BinaryNumericSplit | ( | const size_t | numClasses | ) |
Create the BinaryNumericSplit object with the given number of classes.
numClasses | Number of classes in dataset. |
mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::BinaryNumericSplit | ( | const size_t | numClasses, |
const BinaryNumericSplit< FitnessFunction, ObservationType > & | other | ||
) |
Create the BinaryNumericSplit object with the given number of classes, using information from the given other split for other parameters.
In this case, there are no other parameters, but this function is required by the HoeffdingTree class.
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::EvaluateFitnessFunction | ( | double & | bestFitness, |
double & | secondBestFitness | ||
) |
Given the points seen so far, evaluate the fitness function, returning the best possible gain of a binary split.
Note that this takes O(n) time, where n is the number of points seen so far. So this may not exactly be fast...
The best possible split will be stored in bestFitness, and the second best possible split will be stored in secondBestFitness.
bestFitness | Fitness function value for best possible split. |
secondBestFitness | Fitness function value for second best possible split. |
size_t mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::MajorityClass | ( | ) | const |
The majority class of the points seen so far.
Referenced by mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().
double mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::MajorityProbability | ( | ) | const |
The probability of the majority class given the points seen so far.
Referenced by mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().
|
inline |
Definition at line 93 of file binary_numeric_split.hpp.
References mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::MajorityClass(), mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::MajorityProbability(), mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Serialize(), and mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Split().
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Serialize | ( | Archive & | ar, |
const unsigned | int | ||
) |
Serialize the object.
Referenced by mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Split | ( | arma::Col< size_t > & | childMajorities, |
SplitInfo & | splitInfo | ||
) |
Given that a split should happen, return the majority classes of the (two) children and an initialized SplitInfo object.
childMajorities | Majority classes of the children after the split. |
splitInfo | Split information. |
Referenced by mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Train | ( | ObservationType | value, |
const size_t | label | ||
) |
Train on the given value with the given label.
value | The value to train on. |
label | The label to train on. |
|
private |
A cached best split point.
Definition at line 120 of file binary_numeric_split.hpp.
|
private |
The classes we have seen so far (for majority calculations).
Definition at line 117 of file binary_numeric_split.hpp.
|
private |
If true, the cached best split point is accurate (that is, we have not seen any more samples since we calculated it).
Definition at line 123 of file binary_numeric_split.hpp.
|
private |
The elements seen so far, in sorted order.
Definition at line 115 of file binary_numeric_split.hpp.