|
mlpack
master
|
The HoeffdingNumericSplit class implements the numeric feature splitting strategy alluded to by Domingos and Hulten in the following paper: More...
Public Types | |
| typedef NumericSplitInfo< ObservationType > | SplitInfo |
| The splitting information type required by the HoeffdingNumericSplit. More... | |
Public Member Functions | |
| HoeffdingNumericSplit (const size_t numClasses, const size_t bins=10, const size_t observationsBeforeBinning=100) | |
| Create the HoeffdingNumericSplit class, and specify some basic parameters about how the binning should take place. More... | |
| HoeffdingNumericSplit (const size_t numClasses, const HoeffdingNumericSplit &other) | |
| Create the HoeffdingNumericSplit class, using the parameters from the given other split object. More... | |
| size_t | Bins () const |
| Return the number of bins. More... | |
| void | EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) const |
| Evaluate the fitness function given what has been calculated so far. More... | |
| size_t | MajorityClass () const |
| Return the majority class. More... | |
| double | MajorityProbability () const |
| Return the probability of the majority class. More... | |
| size_t | NumChildren () const |
| Return the number of children if this node splits on this feature. More... | |
| template<typename Archive > | |
| void | Serialize (Archive &ar, const unsigned int) |
| Serialize the object. More... | |
| void | Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo) const |
| Return the majority class of each child to be created, if a split on this dimension was performed. More... | |
| void | Train (ObservationType value, const size_t label) |
| Train the HoeffdingNumericSplit on the given observed value (remember that this object only cares about the information for a single feature, not an entire point). More... | |
Private Attributes | |
| size_t | bins |
| The number of bins. More... | |
| arma::Col< size_t > | labels |
| This holds the labels of the points before binning. More... | |
| arma::Col< ObservationType > | observations |
| Before binning, this holds the points we have seen so far. More... | |
| size_t | observationsBeforeBinning |
| The number of observations we must see before binning. More... | |
| size_t | samplesSeen |
| The number of samples we have seen so far. More... | |
| arma::Col< ObservationType > | splitPoints |
| The split points for the binning (length bins - 1). More... | |
| arma::Mat< size_t > | sufficientStatistics |
| After binning, this contains the sufficient statistics. More... | |
The HoeffdingNumericSplit class implements the numeric feature splitting strategy alluded to by Domingos and Hulten in the following paper:
The strategy alluded to is very simple: we discretize the numeric features that we see. But in this case, we don't know how many bins we have, which makes things a little difficult. This class only makes binary splits, and has a maximum number of bins. The binning strategy is simple: the split caches the minimum and maximum value of points seen so far, and when the number of points hits a predefined threshold, the cached minimum-maximum range is equally split into bins, and splitting proceeds in the same way as with the categorical splits. This is a simple and stupid strategy, so don't expect it to be the best possible thing you can do.
| FitnessFunction | Fitness function to use for calculating gain. |
| ObservationType | Type of observations in this dimension. |
Definition at line 53 of file hoeffding_numeric_split.hpp.
| typedef NumericSplitInfo<ObservationType> mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::SplitInfo |
The splitting information type required by the HoeffdingNumericSplit.
Definition at line 57 of file hoeffding_numeric_split.hpp.
| mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::HoeffdingNumericSplit | ( | const size_t | numClasses, |
| const size_t | bins = 10, |
||
| const size_t | observationsBeforeBinning = 100 |
||
| ) |
Create the HoeffdingNumericSplit class, and specify some basic parameters about how the binning should take place.
| numClasses | Number of classes. |
| bins | Number of bins. |
| observationsBeforeBinning | Number of points to see before binning is performed. |
| mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::HoeffdingNumericSplit | ( | const size_t | numClasses, |
| const HoeffdingNumericSplit< FitnessFunction, ObservationType > & | other | ||
| ) |
Create the HoeffdingNumericSplit class, using the parameters from the given other split object.
|
inline |
Return the number of bins.
Definition at line 120 of file hoeffding_numeric_split.hpp.
References mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::bins, and mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Serialize().
| void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::EvaluateFitnessFunction | ( | double & | bestFitness, |
| double & | secondBestFitness | ||
| ) | const |
Evaluate the fitness function given what has been calculated so far.
In this case, if binning has not yet been performed, 0 will be returned (i.e., no gain). Because this split can only split one possible way, secondBestFitness (the fitness function for the second best possible split) will be set to 0.
| bestFitness | Value of the fitness function for the best possible split. |
| secondBestFitness | Value of the fitness function for the second best possible split (always 0 for this split). |
| size_t mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityClass | ( | ) | const |
Return the majority class.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
| double mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityProbability | ( | ) | const |
Return the probability of the majority class.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
|
inline |
Return the number of children if this node splits on this feature.
Definition at line 106 of file hoeffding_numeric_split.hpp.
References mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::bins, mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityClass(), mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityProbability(), and mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Split().
| void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Serialize | ( | Archive & | ar, |
| const unsigned | int | ||
| ) |
Serialize the object.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Bins().
| void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Split | ( | arma::Col< size_t > & | childMajorities, |
| SplitInfo & | splitInfo | ||
| ) | const |
Return the majority class of each child to be created, if a split on this dimension was performed.
Also create the split object.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
| void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Train | ( | ObservationType | value, |
| const size_t | label | ||
| ) |
Train the HoeffdingNumericSplit on the given observed value (remember that this object only cares about the information for a single feature, not an entire point).
| value | Value in the dimension that this HoeffdingNumericSplit refers to. |
| label | Label of the given point. |
|
private |
The number of bins.
Definition at line 135 of file hoeffding_numeric_split.hpp.
Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Bins(), and mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().
|
private |
This holds the labels of the points before binning.
Definition at line 130 of file hoeffding_numeric_split.hpp.
|
private |
Before binning, this holds the points we have seen so far.
Definition at line 128 of file hoeffding_numeric_split.hpp.
|
private |
The number of observations we must see before binning.
Definition at line 137 of file hoeffding_numeric_split.hpp.
|
private |
The number of samples we have seen so far.
Definition at line 139 of file hoeffding_numeric_split.hpp.
|
private |
The split points for the binning (length bins - 1).
Definition at line 133 of file hoeffding_numeric_split.hpp.
|
private |
After binning, this contains the sufficient statistics.
Definition at line 142 of file hoeffding_numeric_split.hpp.
1.8.11