mlpack  master
Public Types | Public Member Functions | Private Attributes | List of all members
mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType > Class Template Reference

The HoeffdingNumericSplit class implements the numeric feature splitting strategy alluded to by Domingos and Hulten in the following paper: More...

Public Types

typedef NumericSplitInfo< ObservationType > SplitInfo
 The splitting information type required by the HoeffdingNumericSplit. More...
 

Public Member Functions

 HoeffdingNumericSplit (const size_t numClasses, const size_t bins=10, const size_t observationsBeforeBinning=100)
 Create the HoeffdingNumericSplit class, and specify some basic parameters about how the binning should take place. More...
 
 HoeffdingNumericSplit (const size_t numClasses, const HoeffdingNumericSplit &other)
 Create the HoeffdingNumericSplit class, using the parameters from the given other split object. More...
 
size_t Bins () const
 Return the number of bins. More...
 
void EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) const
 Evaluate the fitness function given what has been calculated so far. More...
 
size_t MajorityClass () const
 Return the majority class. More...
 
double MajorityProbability () const
 Return the probability of the majority class. More...
 
size_t NumChildren () const
 Return the number of children if this node splits on this feature. More...
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int)
 Serialize the object. More...
 
void Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo) const
 Return the majority class of each child to be created, if a split on this dimension was performed. More...
 
void Train (ObservationType value, const size_t label)
 Train the HoeffdingNumericSplit on the given observed value (remember that this object only cares about the information for a single feature, not an entire point). More...
 

Private Attributes

size_t bins
 The number of bins. More...
 
arma::Col< size_t > labels
 This holds the labels of the points before binning. More...
 
arma::Col< ObservationType > observations
 Before binning, this holds the points we have seen so far. More...
 
size_t observationsBeforeBinning
 The number of observations we must see before binning. More...
 
size_t samplesSeen
 The number of samples we have seen so far. More...
 
arma::Col< ObservationType > splitPoints
 The split points for the binning (length bins - 1). More...
 
arma::Mat< size_t > sufficientStatistics
 After binning, this contains the sufficient statistics. More...
 

Detailed Description

template<typename FitnessFunction, typename ObservationType = double>
class mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >

The HoeffdingNumericSplit class implements the numeric feature splitting strategy alluded to by Domingos and Hulten in the following paper:

@inproceedings{domingos2000mining,
title={{Mining High-Speed Data Streams}},
author={Domingos, P. and Hulten, G.},
year={2000},
booktitle={Proceedings of the Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '00)},
pages={71--80}
}

The strategy alluded to is very simple: we discretize the numeric features that we see. But in this case, we don't know how many bins we have, which makes things a little difficult. This class only makes binary splits, and has a maximum number of bins. The binning strategy is simple: the split caches the minimum and maximum value of points seen so far, and when the number of points hits a predefined threshold, the cached minimum-maximum range is equally split into bins, and splitting proceeds in the same way as with the categorical splits. This is a simple and stupid strategy, so don't expect it to be the best possible thing you can do.

Template Parameters
FitnessFunctionFitness function to use for calculating gain.
ObservationTypeType of observations in this dimension.

Definition at line 53 of file hoeffding_numeric_split.hpp.

Member Typedef Documentation

template<typename FitnessFunction , typename ObservationType = double>
typedef NumericSplitInfo<ObservationType> mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::SplitInfo

The splitting information type required by the HoeffdingNumericSplit.

Definition at line 57 of file hoeffding_numeric_split.hpp.

Constructor & Destructor Documentation

template<typename FitnessFunction , typename ObservationType = double>
mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::HoeffdingNumericSplit ( const size_t  numClasses,
const size_t  bins = 10,
const size_t  observationsBeforeBinning = 100 
)

Create the HoeffdingNumericSplit class, and specify some basic parameters about how the binning should take place.

Parameters
numClassesNumber of classes.
binsNumber of bins.
observationsBeforeBinningNumber of points to see before binning is performed.
template<typename FitnessFunction , typename ObservationType = double>
mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::HoeffdingNumericSplit ( const size_t  numClasses,
const HoeffdingNumericSplit< FitnessFunction, ObservationType > &  other 
)

Create the HoeffdingNumericSplit class, using the parameters from the given other split object.

Member Function Documentation

template<typename FitnessFunction , typename ObservationType = double>
size_t mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Bins ( ) const
inline
template<typename FitnessFunction , typename ObservationType = double>
void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::EvaluateFitnessFunction ( double &  bestFitness,
double &  secondBestFitness 
) const

Evaluate the fitness function given what has been calculated so far.

In this case, if binning has not yet been performed, 0 will be returned (i.e., no gain). Because this split can only split one possible way, secondBestFitness (the fitness function for the second best possible split) will be set to 0.

Parameters
bestFitnessValue of the fitness function for the best possible split.
secondBestFitnessValue of the fitness function for the second best possible split (always 0 for this split).
template<typename FitnessFunction , typename ObservationType = double>
size_t mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityClass ( ) const
template<typename FitnessFunction , typename ObservationType = double>
double mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::MajorityProbability ( ) const

Return the probability of the majority class.

Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().

template<typename FitnessFunction , typename ObservationType = double>
size_t mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren ( ) const
inline
template<typename FitnessFunction , typename ObservationType = double>
template<typename Archive >
void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Serialize ( Archive &  ar,
const unsigned  int 
)
template<typename FitnessFunction , typename ObservationType = double>
void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Split ( arma::Col< size_t > &  childMajorities,
SplitInfo splitInfo 
) const

Return the majority class of each child to be created, if a split on this dimension was performed.

Also create the split object.

Referenced by mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::NumChildren().

template<typename FitnessFunction , typename ObservationType = double>
void mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::Train ( ObservationType  value,
const size_t  label 
)

Train the HoeffdingNumericSplit on the given observed value (remember that this object only cares about the information for a single feature, not an entire point).

Parameters
valueValue in the dimension that this HoeffdingNumericSplit refers to.
labelLabel of the given point.

Member Data Documentation

template<typename FitnessFunction , typename ObservationType = double>
size_t mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::bins
private
template<typename FitnessFunction , typename ObservationType = double>
arma::Col<size_t> mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::labels
private

This holds the labels of the points before binning.

Definition at line 130 of file hoeffding_numeric_split.hpp.

template<typename FitnessFunction , typename ObservationType = double>
arma::Col<ObservationType> mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::observations
private

Before binning, this holds the points we have seen so far.

Definition at line 128 of file hoeffding_numeric_split.hpp.

template<typename FitnessFunction , typename ObservationType = double>
size_t mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::observationsBeforeBinning
private

The number of observations we must see before binning.

Definition at line 137 of file hoeffding_numeric_split.hpp.

template<typename FitnessFunction , typename ObservationType = double>
size_t mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::samplesSeen
private

The number of samples we have seen so far.

Definition at line 139 of file hoeffding_numeric_split.hpp.

template<typename FitnessFunction , typename ObservationType = double>
arma::Col<ObservationType> mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::splitPoints
private

The split points for the binning (length bins - 1).

Definition at line 133 of file hoeffding_numeric_split.hpp.

template<typename FitnessFunction , typename ObservationType = double>
arma::Mat<size_t> mlpack::tree::HoeffdingNumericSplit< FitnessFunction, ObservationType >::sufficientStatistics
private

After binning, this contains the sufficient statistics.

Definition at line 142 of file hoeffding_numeric_split.hpp.


The documentation for this class was generated from the following file: