mlpack  master
Public Types | Public Member Functions | Private Attributes | List of all members
mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType > Class Template Reference

The HoeffdingTree object represents all of the necessary information for a Hoeffding-bound-based decision tree. More...

Public Types

typedef CategoricalSplitType< FitnessFunction > CategoricalSplit
 Allow access to the categorical split type. More...
 
typedef NumericSplitType< FitnessFunction > NumericSplit
 Allow access to the numeric split type. More...
 

Public Member Functions

template<typename MatType >
 HoeffdingTree (const MatType &data, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const bool batchTraining=true, const double successProbability=0.95, const size_t maxSamples=0, const size_t checkInterval=100, const size_t minSamples=100, const CategoricalSplitType< FitnessFunction > &categoricalSplitIn=CategoricalSplitType< FitnessFunction >(0, 0), const NumericSplitType< FitnessFunction > &numericSplitIn=NumericSplitType< FitnessFunction >(0))
 Construct the Hoeffding tree with the given parameters and given training data. More...
 
 HoeffdingTree (const data::DatasetInfo &datasetInfo, const size_t numClasses, const double successProbability=0.95, const size_t maxSamples=0, const size_t checkInterval=100, const size_t minSamples=100, const CategoricalSplitType< FitnessFunction > &categoricalSplitIn=CategoricalSplitType< FitnessFunction >(0, 0), const NumericSplitType< FitnessFunction > &numericSplitIn=NumericSplitType< FitnessFunction >(0), std::unordered_map< size_t, std::pair< size_t, size_t >> *dimensionMappings=NULL)
 Construct the Hoeffding tree with the given parameters, but training on no data. More...
 
 HoeffdingTree (const HoeffdingTree &other)
 Copy another tree (warning: this will duplicate the tree entirely, and may use a lot of memory. More...
 
 ~HoeffdingTree ()
 Clean up memory. More...
 
template<typename VecType >
size_t CalculateDirection (const VecType &point) const
 Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards. More...
 
size_t CheckInterval () const
 Get the number of samples before a split check is performed. More...
 
void CheckInterval (const size_t checkInterval)
 Modify the number of samples before a split check is performed. More...
 
const HoeffdingTreeChild (const size_t i) const
 Get a child. More...
 
HoeffdingTreeChild (const size_t i)
 Modify a child. More...
 
template<typename VecType >
size_t Classify (const VecType &point) const
 Classify the given point, using this node and the entire (sub)tree beneath it. More...
 
template<typename VecType >
void Classify (const VecType &point, size_t &prediction, double &probability) const
 Classify the given point and also return an estimate of the probability that the prediction is correct. More...
 
template<typename MatType >
void Classify (const MatType &data, arma::Row< size_t > &predictions) const
 Classify the given points, using this node and the entire (sub)tree beneath it. More...
 
template<typename MatType >
void Classify (const MatType &data, arma::Row< size_t > &predictions, arma::rowvec &probabilities) const
 Classify the given points, using this node and the entire (sub)tree beneath it. More...
 
void CreateChildren ()
 Given that this node should split, create the children. More...
 
size_t MajorityClass () const
 Get the majority class. More...
 
size_t & MajorityClass ()
 Modify the majority class. More...
 
double MajorityProbability () const
 Get the probability of the majority class (based on training samples). More...
 
double & MajorityProbability ()
 Modify the probability of the majority class. More...
 
size_t MaxSamples () const
 Get the maximum number of samples before a split is forced. More...
 
void MaxSamples (const size_t maxSamples)
 Modify the maximum number of samples before a split is forced. More...
 
size_t MinSamples () const
 Get the minimum number of samples for a split. More...
 
void MinSamples (const size_t minSamples)
 Modify the minimum number of samples for a split. More...
 
size_t NumChildren () const
 Get the number of children. More...
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int)
 Serialize the split. More...
 
size_t SplitCheck ()
 Check if a split would satisfy the conditions of the Hoeffding bound with the node's specified success probability. More...
 
size_t SplitDimension () const
 Get the splitting dimension (size_t(-1) if no split). More...
 
double SuccessProbability () const
 Get the confidence required for a split. More...
 
void SuccessProbability (const double successProbability)
 Modify the confidence required for a split. More...
 
template<typename MatType >
void Train (const MatType &data, const arma::Row< size_t > &labels, const bool batchTraining=true)
 Train on a set of points, either in streaming mode or in batch mode, with the given labels. More...
 
template<typename VecType >
void Train (const VecType &point, const size_t label)
 Train on a single point in streaming mode, with the given label. More...
 

Private Attributes

CategoricalSplitType< FitnessFunction >::SplitInfo categoricalSplit
 If the split is categorical, this holds the splitting information. More...
 
std::vector< CategoricalSplitType< FitnessFunction > > categoricalSplits
 Information for splitting of categorical features (used before split). More...
 
size_t checkInterval
 The number of samples that should be seen before checking for a split. More...
 
std::vector< HoeffdingTree * > children
 If the split has occurred, these are the children. More...
 
const data::DatasetInfodatasetInfo
 The dataset information. More...
 
std::unordered_map< size_t, std::pair< size_t, size_t > > * dimensionMappings
 This structure is owned by this node only if it is the root of the tree. More...
 
size_t majorityClass
 The majority class of this node. More...
 
double majorityProbability
 The empirical probability of a point this node saw having the majority class. More...
 
size_t maxSamples
 The maximum number of samples we can see before splitting. More...
 
size_t minSamples
 The minimum number of samples for splitting. More...
 
size_t numClasses
 The number of classes this node is trained on. More...
 
NumericSplitType< FitnessFunction >::SplitInfo numericSplit
 If the split is numeric, this holds the splitting information. More...
 
std::vector< NumericSplitType< FitnessFunction > > numericSplits
 Information for splitting of numeric features (used before split). More...
 
size_t numSamples
 The number of samples seen so far by this node. More...
 
bool ownsInfo
 Whether or not we own the dataset information. More...
 
bool ownsMappings
 Indicates whether or not we own the mappings. More...
 
size_t splitDimension
 The dimension that this node has split on. More...
 
double successProbability
 The required probability of success for a split to be performed. More...
 

Detailed Description

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
class mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >

The HoeffdingTree object represents all of the necessary information for a Hoeffding-bound-based decision tree.

This class is able to train on samples in streaming settings and batch settings, and perform splits based on the Hoeffding bound. The Hoeffding tree (also known as the "very fast decision tree" – VFDT) is described in the following paper:

@inproceedings{domingos2000mining,
title={{Mining High-Speed Data Streams}},
author={Domingos, P. and Hulten, G.},
year={2000},
booktitle={Proceedings of the Sixth ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD '00)},
pages={71--80}
}

The class is modular, and takes three template parameters. The first, FitnessFunction, is the fitness function that should be used to determine whether a split is beneficial; examples might be GiniImpurity or InformationGain. The NumericSplitType determines how numeric attributes are handled, and the CategoricalSplitType determines how categorical attributes are handled. As far as the actual splitting goes, the meat of the splitting procedure will be contained in those two classes.

Template Parameters
FitnessFunctionFitness function to use.
NumericSplitTypeTechnique for splitting numeric features.
CategoricalSplitTypeTechnique for splitting categorical features.

Definition at line 61 of file hoeffding_tree.hpp.

Member Typedef Documentation

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
typedef CategoricalSplitType<FitnessFunction> mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CategoricalSplit

Allow access to the categorical split type.

Definition at line 67 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
typedef NumericSplitType<FitnessFunction> mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::NumericSplit

Allow access to the numeric split type.

Definition at line 65 of file hoeffding_tree.hpp.

Constructor & Destructor Documentation

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename MatType >
mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::HoeffdingTree ( const MatType &  data,
const data::DatasetInfo datasetInfo,
const arma::Row< size_t > &  labels,
const size_t  numClasses,
const bool  batchTraining = true,
const double  successProbability = 0.95,
const size_t  maxSamples = 0,
const size_t  checkInterval = 100,
const size_t  minSamples = 100,
const CategoricalSplitType< FitnessFunction > &  categoricalSplitIn = CategoricalSplitType< FitnessFunction >(0, 0),
const NumericSplitType< FitnessFunction > &  numericSplitIn = NumericSplitType< FitnessFunction >(0) 
)

Construct the Hoeffding tree with the given parameters and given training data.

The tree may be trained either in batch mode (which looks at all points before splitting, and propagates these points to the created children for further training), or in streaming mode, where each point is only considered once. (In general, batch mode will give better-performing trees, but will have higher memory and runtime costs for the same dataset.)

Parameters
dataDataset to train on.
datasetInfoInformation on the dataset (types of each feature).
labelsLabels of each point in the dataset.
numClassesNumber of classes in the dataset.
batchTrainingWhether or not to train in batch.
successProbabilityProbability of success required in Hoeffding bounds before a split can happen.
maxSamplesMaximum number of samples before a split is forced (0 never forces a split); ignored in batch training mode.
checkIntervalNumber of samples required before each split; ignored in batch training mode.
minSamplesIf the node has seen this many points or fewer, no split will be allowed.
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::HoeffdingTree ( const data::DatasetInfo datasetInfo,
const size_t  numClasses,
const double  successProbability = 0.95,
const size_t  maxSamples = 0,
const size_t  checkInterval = 100,
const size_t  minSamples = 100,
const CategoricalSplitType< FitnessFunction > &  categoricalSplitIn = CategoricalSplitType< FitnessFunction >(0, 0),
const NumericSplitType< FitnessFunction > &  numericSplitIn = NumericSplitType< FitnessFunction >(0),
std::unordered_map< size_t, std::pair< size_t, size_t >> *  dimensionMappings = NULL 
)

Construct the Hoeffding tree with the given parameters, but training on no data.

The dimensionMappings parameter is only used if it is desired that this node does not create its own dimensionMappings object (for instance, if this is a child of another node in the tree).

Parameters
dimensionalityDimensionality of the dataset.
numClassesNumber of classes in the dataset.
datasetInfoInformation on the dataset (types of each feature).
successProbabilityProbability of success required in Hoeffding bound before a split can happen.
maxSamplesMaximum number of samples before a split is forced.
checkIntervalNumber of samples required before each split check.
minSamplesIf the node has seen this many points or fewer, no split will be allowed.
dimensionMappingsMappings from dimension indices to positions in numeric and categorical split vectors. If left NULL, a new one will be created.
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::HoeffdingTree ( const HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType > &  other)

Copy another tree (warning: this will duplicate the tree entirely, and may use a lot of memory.

Make sure it's what you want before you do it).

Parameters
otherTree to copy.
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::~HoeffdingTree ( )

Clean up memory.

Member Function Documentation

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename VecType >
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CalculateDirection ( const VecType &  point) const

Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards.

This method is primarily used by the Classify() function, but it can be used in a standalone sense too.

Parameters
pointPoint to classify.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval ( ) const
inline
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval ( const size_t  checkInterval)

Modify the number of samples before a split check is performed.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
const HoeffdingTree& mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Child ( const size_t  i) const
inline
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
HoeffdingTree& mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Child ( const size_t  i)
inline
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename VecType >
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Classify ( const VecType &  point) const

Classify the given point, using this node and the entire (sub)tree beneath it.

The predicted label is returned.

Parameters
pointPoint to classify.
Returns
Predicted label of point.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename VecType >
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Classify ( const VecType &  point,
size_t &  prediction,
double &  probability 
) const

Classify the given point and also return an estimate of the probability that the prediction is correct.

(This estimate is simply the probability that a training point was from the majority class in the leaf that this point binned to.)

Parameters
pointPoint to classify.
predictionPredicted label of point.
probabilityAn estimate of the probability that the prediction is correct.
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename MatType >
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Classify ( const MatType &  data,
arma::Row< size_t > &  predictions 
) const

Classify the given points, using this node and the entire (sub)tree beneath it.

The predicted labels for each point are returned.

Parameters
dataPoints to classify.
predictionsPredicted labels for each point.
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename MatType >
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Classify ( const MatType &  data,
arma::Row< size_t > &  predictions,
arma::rowvec &  probabilities 
) const

Classify the given points, using this node and the entire (sub)tree beneath it.

The predicted labels for each point are returned, as well as an estimate of the probability that the prediction is correct for each point. This estimate is simply the MajorityProbability() for the leaf that each point bins to.

Parameters
dataPoints to classify.
predictionsPredicted labels for each point.
probabilitiesProbability estimates for each predicted label.
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CreateChildren ( )

Given that this node should split, create the children.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MajorityClass ( ) const
inline
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t& mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MajorityClass ( )
inline
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
double mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MajorityProbability ( ) const
inline

Get the probability of the majority class (based on training samples).

Definition at line 189 of file hoeffding_tree.hpp.

References mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::majorityProbability.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
double& mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MajorityProbability ( )
inline

Modify the probability of the majority class.

Definition at line 191 of file hoeffding_tree.hpp.

References mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::majorityProbability.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MaxSamples ( ) const
inline

Get the maximum number of samples before a split is forced.

Definition at line 212 of file hoeffding_tree.hpp.

References mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::maxSamples.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MaxSamples ( const size_t  maxSamples)

Modify the maximum number of samples before a split is forced.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MinSamples ( ) const
inline

Get the minimum number of samples for a split.

Definition at line 207 of file hoeffding_tree.hpp.

References mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::minSamples.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MinSamples ( const size_t  minSamples)

Modify the minimum number of samples for a split.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::NumChildren ( ) const
inline
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename Archive >
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Serialize ( Archive &  ar,
const unsigned  int 
)
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::SplitCheck ( )

Check if a split would satisfy the conditions of the Hoeffding bound with the node's specified success probability.

If so, the number of children that would be created is returned. If not, 0 is returned.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::SplitDimension ( ) const
inline

Get the splitting dimension (size_t(-1) if no split).

Definition at line 181 of file hoeffding_tree.hpp.

References mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::splitDimension.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
double mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::SuccessProbability ( ) const
inline

Get the confidence required for a split.

Definition at line 202 of file hoeffding_tree.hpp.

References mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::successProbability.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::SuccessProbability ( const double  successProbability)

Modify the confidence required for a split.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename MatType >
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Train ( const MatType &  data,
const arma::Row< size_t > &  labels,
const bool  batchTraining = true 
)

Train on a set of points, either in streaming mode or in batch mode, with the given labels.

Parameters
dataData points to train on.
labelLabels of data points.
batchTrainingIf true, perform training in batch.
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
template<typename VecType >
void mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::Train ( const VecType &  point,
const size_t  label 
)

Train on a single point in streaming mode, with the given label.

Parameters
pointPoint to train on.
labelLabel of point to train on.

Member Data Documentation

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
CategoricalSplitType<FitnessFunction>::SplitInfo mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::categoricalSplit
private

If the split is categorical, this holds the splitting information.

Definition at line 331 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
std::vector<CategoricalSplitType<FitnessFunction> > mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::categoricalSplits
private

Information for splitting of categorical features (used before split).

Definition at line 297 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::checkInterval
private

The number of samples that should be seen before checking for a split.

Definition at line 311 of file hoeffding_tree.hpp.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::CheckInterval().

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
std::vector<HoeffdingTree*> mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::children
private
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
const data::DatasetInfo* mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::datasetInfo
private

The dataset information.

Definition at line 315 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
std::unordered_map<size_t, std::pair<size_t, size_t> >* mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::dimensionMappings
private

This structure is owned by this node only if it is the root of the tree.

Definition at line 300 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::majorityClass
private
template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
double mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::majorityProbability
private

The empirical probability of a point this node saw having the majority class.

Definition at line 329 of file hoeffding_tree.hpp.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MajorityProbability().

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::maxSamples
private

The maximum number of samples we can see before splitting.

Definition at line 309 of file hoeffding_tree.hpp.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MaxSamples().

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::minSamples
private

The minimum number of samples for splitting.

Definition at line 313 of file hoeffding_tree.hpp.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::MinSamples().

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::numClasses
private

The number of classes this node is trained on.

Definition at line 307 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
NumericSplitType<FitnessFunction>::SplitInfo mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::numericSplit
private

If the split is numeric, this holds the splitting information.

Definition at line 333 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
std::vector<NumericSplitType<FitnessFunction> > mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::numericSplits
private

Information for splitting of numeric features (used before split).

Definition at line 295 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::numSamples
private

The number of samples seen so far by this node.

Definition at line 305 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
bool mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::ownsInfo
private

Whether or not we own the dataset information.

Definition at line 317 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
bool mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::ownsMappings
private

Indicates whether or not we own the mappings.

Definition at line 302 of file hoeffding_tree.hpp.

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
size_t mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::splitDimension
private

The dimension that this node has split on.

Definition at line 324 of file hoeffding_tree.hpp.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::SplitDimension().

template<typename FitnessFunction = GiniImpurity, template< typename > class NumericSplitType = HoeffdingDoubleNumericSplit, template< typename > class CategoricalSplitType = HoeffdingCategoricalSplit>
double mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::successProbability
private

The required probability of success for a split to be performed.

Definition at line 319 of file hoeffding_tree.hpp.

Referenced by mlpack::tree::HoeffdingTree< FitnessFunction, NumericSplitType, CategoricalSplitType >::SuccessProbability().


The documentation for this class was generated from the following file: