mlpack  master
Public Types | Public Member Functions | Private Attributes | List of all members
mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType > Class Template Reference

The BinaryNumericSplit class implements the numeric feature splitting strategy devised by Gama, Rocha, and Medas in the following paper: More...

Public Types

typedef BinaryNumericSplitInfo< ObservationType > SplitInfo
 The splitting information required by the BinaryNumericSplit. More...
 

Public Member Functions

 BinaryNumericSplit (const size_t numClasses)
 Create the BinaryNumericSplit object with the given number of classes. More...
 
 BinaryNumericSplit (const size_t numClasses, const BinaryNumericSplit &other)
 Create the BinaryNumericSplit object with the given number of classes, using information from the given other split for other parameters. More...
 
void EvaluateFitnessFunction (double &bestFitness, double &secondBestFitness)
 Given the points seen so far, evaluate the fitness function, returning the best possible gain of a binary split. More...
 
size_t MajorityClass () const
 The majority class of the points seen so far. More...
 
double MajorityProbability () const
 The probability of the majority class given the points seen so far. More...
 
size_t NumChildren () const
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int)
 Serialize the object. More...
 
void Split (arma::Col< size_t > &childMajorities, SplitInfo &splitInfo)
 Given that a split should happen, return the majority classes of the (two) children and an initialized SplitInfo object. More...
 
void Train (ObservationType value, const size_t label)
 Train on the given value with the given label. More...
 

Private Attributes

ObservationType bestSplit
 A cached best split point. More...
 
arma::Col< size_t > classCounts
 The classes we have seen so far (for majority calculations). More...
 
bool isAccurate
 If true, the cached best split point is accurate (that is, we have not seen any more samples since we calculated it). More...
 
std::multimap< ObservationType, size_t > sortedElements
 The elements seen so far, in sorted order. More...
 

Detailed Description

template<typename FitnessFunction, typename ObservationType = double>
class mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >

The BinaryNumericSplit class implements the numeric feature splitting strategy devised by Gama, Rocha, and Medas in the following paper:

@inproceedings{gama2003accurate,
title={Accurate Decision Trees for Mining High-Speed Data Streams},
author={Gama, J. and Rocha, R. and Medas, P.},
year={2003},
booktitle={Proceedings of the Ninth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '03)},
pages={523--528}
}

This splitting procedure builds a binary tree on points it has seen so far, and then EvaluateFitnessFunction() returns the best possible split in O(n) time, where n is the number of samples seen so far. Every split with this split type returns only two splits (greater than or equal to the split point, and less than the split point). The Train() function should take O(1) time.

Template Parameters
FitnessFunctionFitness function to use for calculating gain.
ObservationTypeType of observation used by this dimension.

Definition at line 47 of file binary_numeric_split.hpp.

Member Typedef Documentation

template<typename FitnessFunction , typename ObservationType = double>
typedef BinaryNumericSplitInfo<ObservationType> mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::SplitInfo

The splitting information required by the BinaryNumericSplit.

Definition at line 51 of file binary_numeric_split.hpp.

Constructor & Destructor Documentation

template<typename FitnessFunction , typename ObservationType = double>
mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::BinaryNumericSplit ( const size_t  numClasses)

Create the BinaryNumericSplit object with the given number of classes.

Parameters
numClassesNumber of classes in dataset.
template<typename FitnessFunction , typename ObservationType = double>
mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::BinaryNumericSplit ( const size_t  numClasses,
const BinaryNumericSplit< FitnessFunction, ObservationType > &  other 
)

Create the BinaryNumericSplit object with the given number of classes, using information from the given other split for other parameters.

In this case, there are no other parameters, but this function is required by the HoeffdingTree class.

Member Function Documentation

template<typename FitnessFunction , typename ObservationType = double>
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::EvaluateFitnessFunction ( double &  bestFitness,
double &  secondBestFitness 
)

Given the points seen so far, evaluate the fitness function, returning the best possible gain of a binary split.

Note that this takes O(n) time, where n is the number of points seen so far. So this may not exactly be fast...

The best possible split will be stored in bestFitness, and the second best possible split will be stored in secondBestFitness.

Parameters
bestFitnessFitness function value for best possible split.
secondBestFitnessFitness function value for second best possible split.
template<typename FitnessFunction , typename ObservationType = double>
size_t mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::MajorityClass ( ) const

The majority class of the points seen so far.

Referenced by mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().

template<typename FitnessFunction , typename ObservationType = double>
double mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::MajorityProbability ( ) const

The probability of the majority class given the points seen so far.

Referenced by mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().

template<typename FitnessFunction , typename ObservationType = double>
size_t mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren ( ) const
inline
template<typename FitnessFunction , typename ObservationType = double>
template<typename Archive >
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Serialize ( Archive &  ar,
const unsigned  int 
)
template<typename FitnessFunction , typename ObservationType = double>
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Split ( arma::Col< size_t > &  childMajorities,
SplitInfo splitInfo 
)

Given that a split should happen, return the majority classes of the (two) children and an initialized SplitInfo object.

Parameters
childMajoritiesMajority classes of the children after the split.
splitInfoSplit information.

Referenced by mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::NumChildren().

template<typename FitnessFunction , typename ObservationType = double>
void mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::Train ( ObservationType  value,
const size_t  label 
)

Train on the given value with the given label.

Parameters
valueThe value to train on.
labelThe label to train on.

Member Data Documentation

template<typename FitnessFunction , typename ObservationType = double>
ObservationType mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::bestSplit
private

A cached best split point.

Definition at line 120 of file binary_numeric_split.hpp.

template<typename FitnessFunction , typename ObservationType = double>
arma::Col<size_t> mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::classCounts
private

The classes we have seen so far (for majority calculations).

Definition at line 117 of file binary_numeric_split.hpp.

template<typename FitnessFunction , typename ObservationType = double>
bool mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::isAccurate
private

If true, the cached best split point is accurate (that is, we have not seen any more samples since we calculated it).

Definition at line 123 of file binary_numeric_split.hpp.

template<typename FitnessFunction , typename ObservationType = double>
std::multimap<ObservationType, size_t> mlpack::tree::BinaryNumericSplit< FitnessFunction, ObservationType >::sortedElements
private

The elements seen so far, in sorted order.

Definition at line 115 of file binary_numeric_split.hpp.


The documentation for this class was generated from the following file: