mlpack  master
Public Types | Public Member Functions | Private Attributes | List of all members
mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction > Class Template Reference

This is the standard Hoeffding-bound categorical feature proposed in the paper below: More...

Public Types

typedef CategoricalSplitInfo SplitInfo
 The type of split information required by the HoeffdingCategoricalSplit. More...
 

Public Member Functions

 HoeffdingCategoricalSplit (const size_t numCategories, const size_t numClasses)
 Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes. More...
 
 HoeffdingCategoricalSplit (const size_t numCategories, const size_t numClasses, const HoeffdingCategoricalSplit &other)
 Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes and another HoeffdingCategoricalSplit to take parameters from. More...
 
void EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness) const
 Given the points seen so far, evaluate the fitness function, returning the gain for the best possible split and the second best possible split. More...
 
size_t MajorityClass () const
 Get the majority class seen so far. More...
 
double MajorityProbability () const
 Get the probability of the majority class given the points seen so far. More...
 
size_t NumChildren () const
 Return the number of children, if the node were to split. More...
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int)
 Serialize the categorical split. More...
 
void Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo)
 Gather the information for a split: get the labels of the child majorities, and initialize the SplitInfo object. More...
 
template<typename eT >
void Train (eT value, const size_t label)
 Train on the given value with the given label. More...
 

Private Attributes

arma::Mat< size_t > sufficientStatistics
 The sufficient statistics for all points seen so far. More...
 

Detailed Description

template<typename FitnessFunction>
class mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >

This is the standard Hoeffding-bound categorical feature proposed in the paper below:

@inproceedings{domingos2000mining,
title={{Mining High-Speed Data Streams}},
author={Domingos, P. and Hulten, G.},
year={2000},
booktitle={Proceedings of the Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '00)},
pages={71--80}
}

This class will track the sufficient statistics of the training points it has seen. The HoeffdingSplit class (and other related classes) can use this class to track categorical features and split decision tree nodes.

Template Parameters
FitnessFunctionFitness function to use for calculating gain.

Definition at line 44 of file hoeffding_categorical_split.hpp.

Member Typedef Documentation

template<typename FitnessFunction >
typedef CategoricalSplitInfo mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::SplitInfo

The type of split information required by the HoeffdingCategoricalSplit.

Definition at line 48 of file hoeffding_categorical_split.hpp.

Constructor & Destructor Documentation

template<typename FitnessFunction >
mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::HoeffdingCategoricalSplit ( const size_t  numCategories,
const size_t  numClasses 
)

Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes.

Parameters
numCategoriesNumber of categories in this dimension.
numClassesNumber of classes in this dimension.
template<typename FitnessFunction >
mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::HoeffdingCategoricalSplit ( const size_t  numCategories,
const size_t  numClasses,
const HoeffdingCategoricalSplit< FitnessFunction > &  other 
)

Create the HoeffdingCategoricalSplit given a number of categories for this dimension and a number of classes and another HoeffdingCategoricalSplit to take parameters from.

In this particular case, there are no parameters to take, but this constructor is required by the HoeffdingTree class.

Member Function Documentation

template<typename FitnessFunction >
void mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::EvaluateFitnessFunction ( double &  bestFitness,
double &  secondBestFitness 
) const

Given the points seen so far, evaluate the fitness function, returning the gain for the best possible split and the second best possible split.

In this splitting technique, we only split one possible way, so secondBestFitness will always be 0.

Parameters
bestFitnessThe fitness function result for this split.
secondBestFitnessThis is always set to 0 (this split only splits one way).
template<typename FitnessFunction >
size_t mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::MajorityClass ( ) const

Get the majority class seen so far.

Referenced by mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::NumChildren().

template<typename FitnessFunction >
double mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::MajorityProbability ( ) const

Get the probability of the majority class given the points seen so far.

Referenced by mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::NumChildren().

template<typename FitnessFunction >
size_t mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::NumChildren ( ) const
inline
template<typename FitnessFunction >
template<typename Archive >
void mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::Serialize ( Archive &  ar,
const unsigned  int 
)
inline
template<typename FitnessFunction >
void mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::Split ( arma::Col< size_t > &  childMajorities,
SplitInfo splitInfo 
)

Gather the information for a split: get the labels of the child majorities, and initialize the SplitInfo object.

Parameters
childMajoritiesMajorities of child nodes to be created.
splitInfoInformation for splitting.

Referenced by mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::NumChildren().

template<typename FitnessFunction >
template<typename eT >
void mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::Train ( eT  value,
const size_t  label 
)

Train on the given value with the given label.

Parameters
valueValue to train on.
labelLabel to train on.

Member Data Documentation

template<typename FitnessFunction >
arma::Mat<size_t> mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::sufficientStatistics
private

The sufficient statistics for all points seen so far.

Each column corresponds to a category, and contains a count of each of the classes seen for points in that category.

Definition at line 120 of file hoeffding_categorical_split.hpp.

Referenced by mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::NumChildren(), and mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::Serialize().


The documentation for this class was generated from the following file: