mlpack
master
|
The HoeffdingNumericSplit class implements the numeric feature splitting strategy alluded to by Domingos and Hulten in the following paper: More...
Public Types | |
typedef NumericSplitInfo< ObservationType > | SplitInfo |
The splitting information type required by the HoeffdingNumericSplit. More... | |
Public Member Functions | |
HoeffdingNumericSplit (const size_t numClasses, const size_t bins=10, const size_t observationsBeforeBinning=100) | |
Create the HoeffdingNumericSplit class, and specify some basic parameters about how the binning should take place. More... | |
HoeffdingNumericSplit (const size_t numClasses, const HoeffdingNumericSplit &other) | |
Create the HoeffdingNumericSplit class, using the parameters from the given other split object. More... | |
size_t | Bins () const |
Return the number of bins. More... | |
void | EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) const |
Evaluate the fitness function given what has been calculated so far. More... | |
size_t | MajorityClass () const |
Return the majority class. More... | |
double | MajorityProbability () const |
Return the probability of the majority class. More... | |
size_t | NumChildren () const |
Return the number of children if this node splits on this feature. More... | |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int) |
Serialize the object. More... | |
void | Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo) const |
Return the majority class of each child to be created, if a split on this dimension was performed. More... | |
void | Train (ObservationType value, const size_t label) |
Train the HoeffdingNumericSplit on the given observed value (remember that this object only cares about the information for a single feature, not an entire point). More... | |
Private Attributes | |
size_t | bins |
The number of bins. More... | |
arma::Col< size_t > | labels |
This holds the labels of the points before binning. More... | |
arma::Col< ObservationType > | observations |
Before binning, this holds the points we have seen so far. More... | |
size_t | observationsBeforeBinning |
The number of observations we must see before binning. More... | |
size_t | samplesSeen |
The number of samples we have seen so far. More... | |
arma::Col< ObservationType > | splitPoints |
The split points for the binning (length bins - 1). More... | |
arma::Mat< size_t > | sufficientStatistics |
After binning, this contains the sufficient statistics. More... | |
The HoeffdingNumericSplit class implements the numeric feature splitting strategy alluded to by Domingos and Hulten in the following paper:
The strategy alluded to is very simple: we discretize the numeric features that we see. But in this case, we don't know how many bins we have, which makes things a little difficult. This class only makes binary splits, and has a maximum number of bins. The binning strategy is simple: the split caches the minimum and maximum value of points seen so far, and when the number of points hits a predefined threshold, the cached minimum-maximum range is equally split into bins, and splitting proceeds in the same way as with the categorical splits. This is a simple and stupid strategy, so don't expect it to be the best possible thing you can do.
FitnessFunction | Fitness function to use for calculating gain. |
ObservationType | Type of observations in this dimension. |
Definition at line 53 of file hoeffding_numeric_split.hpp.
typedef NumericSplitInfo<ObservationType> mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::SplitInfo |
The splitting information type required by the HoeffdingNumericSplit.
Definition at line 57 of file hoeffding_numeric_split.hpp.
mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::HoeffdingNumericSplit | ( | const size_t | numClasses, |
const size_t | bins = 10 , |
||
const size_t | observationsBeforeBinning = 100 |
||
) |
Create the HoeffdingNumericSplit class, and specify some basic parameters about how the binning should take place.
numClasses | Number of classes. |
bins | Number of bins. |
observationsBeforeBinning | Number of points to see before binning is performed. |
mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::HoeffdingNumericSplit | ( | const size_t | numClasses, |
const HoeffdingNumericSplit< FitnessFunction, ObservationType > & | other | ||
) |
Create the HoeffdingNumericSplit class, using the parameters from the given other split object.
|
inline |
Return the number of bins.
Definition at line 120 of file hoeffding_numeric_split.hpp.
References mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::bins, and mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Serialize().
void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::EvaluateFitnessFunction | ( | double & | bestFitness, |
double & | secondBestFitness | ||
) | const |
Evaluate the fitness function given what has been calculated so far.
In this case, if binning has not yet been performed, 0 will be returned (i.e., no gain). Because this split can only split one possible way, secondBestFitness (the fitness function for the second best possible split) will be set to 0.
bestFitness | Value of the fitness function for the best possible split. |
secondBestFitness | Value of the fitness function for the second best possible split (always 0 for this split). |
size_t mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityClass | ( | ) | const |
Return the majority class.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
double mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityProbability | ( | ) | const |
Return the probability of the majority class.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
|
inline |
Return the number of children if this node splits on this feature.
Definition at line 106 of file hoeffding_numeric_split.hpp.
References mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::bins, mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityClass(), mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityProbability(), and mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Split().
void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Serialize | ( | Archive & | ar, |
const unsigned | int | ||
) |
Serialize the object.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Bins().
void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Split | ( | arma::Col< size_t > & | childMajorities, |
SplitInfo & | splitInfo | ||
) | const |
Return the majority class of each child to be created, if a split on this dimension was performed.
Also create the split object.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Train | ( | ObservationType | value, |
const size_t | label | ||
) |
Train the HoeffdingNumericSplit on the given observed value (remember that this object only cares about the information for a single feature, not an entire point).
value | Value in the dimension that this HoeffdingNumericSplit refers to. |
label | Label of the given point. |
|
private |
The number of bins.
Definition at line 135 of file hoeffding_numeric_split.hpp.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Bins(), and mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
|
private |
This holds the labels of the points before binning.
Definition at line 130 of file hoeffding_numeric_split.hpp.
|
private |
Before binning, this holds the points we have seen so far.
Definition at line 128 of file hoeffding_numeric_split.hpp.
|
private |
The number of observations we must see before binning.
Definition at line 137 of file hoeffding_numeric_split.hpp.
|
private |
The number of samples we have seen so far.
Definition at line 139 of file hoeffding_numeric_split.hpp.
|
private |
The split points for the binning (length bins - 1).
Definition at line 133 of file hoeffding_numeric_split.hpp.
|
private |
After binning, this contains the sufficient statistics.
Definition at line 142 of file hoeffding_numeric_split.hpp.