mlpack  master
Public Member Functions | Private Member Functions | Private Attributes | List of all members
mlpack::decision_stump::DecisionStump< MatType > Class Template Reference

This class implements a decision stump. More...

Public Member Functions

 DecisionStump (const MatType &data, const arma::Row< size_t > &labels, const size_t classes, const size_t bucketSize=10)
 Constructor. More...
 
 DecisionStump (const DecisionStump<> &other, const MatType &data, const arma::Row< size_t > &labels, const arma::rowvec &weights)
 Alternate constructor which copies the parameters bucketSize and classes from an already initiated decision stump, other. More...
 
 DecisionStump ()
 Create a decision stump without training. More...
 
const arma::Col< size_t > BinLabels () const
 Access the labels for each split bin. More...
 
arma::Col< size_t > & BinLabels ()
 Modify the labels for each split bin (be careful!). More...
 
void Classify (const MatType &test, arma::Row< size_t > &predictedLabels)
 Classification function. More...
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int)
 Serialize the decision stump. More...
 
const arma::vec & Split () const
 Access the splitting values. More...
 
arma::vec & Split ()
 Modify the splitting values (be careful!). More...
 
size_t SplitDimension () const
 Access the splitting dimension. More...
 
size_t & SplitDimension ()
 Modify the splitting dimension (be careful!). More...
 
void Train (const MatType &data, const arma::Row< size_t > &labels, const size_t classes, const size_t bucketSize)
 Train the decision stump on the given data. More...
 
void Train (const MatType &data, const arma::Row< size_t > &labels, const arma::rowvec &weights, const size_t classes, const size_t bucketSize)
 Train the decision stump on the given data, with the given weights. More...
 

Private Member Functions

template<bool UseWeights, typename VecType , typename WeightVecType >
double CalculateEntropy (const VecType &labels, const WeightVecType &weights)
 Calculate the entropy of the given dimension. More...
 
template<typename VecType >
double CountMostFreq (const VecType &subCols)
 Count the most frequently occurring element in subCols. More...
 
template<typename VecType >
int IsDistinct (const VecType &featureRow)
 Returns 1 if all the values of featureRow are not same. More...
 
void MergeRanges ()
 After the "split" matrix has been set up, merge ranges with identical class labels. More...
 
template<bool UseWeights, typename VecType >
double SetupSplitDimension (const VecType &dimension, const arma::Row< size_t > &labels, const arma::rowvec &weightD)
 Sets up dimension as if it were splitting on it and finds entropy when splitting on dimension. More...
 
template<bool UseWeights>
void Train (const MatType &data, const arma::Row< size_t > &labels, const arma::rowvec &weights)
 Train the decision stump on the given data and labels. More...
 
template<typename VecType >
void TrainOnDim (const VecType &dimension, const arma::Row< size_t > &labels)
 After having decided the dimension on which to split, train on that dimension. More...
 

Private Attributes

arma::Col< size_t > binLabels
 Stores the labels for each splitting bin. More...
 
size_t bucketSize
 The minimum number of points in a bucket. More...
 
size_t classes
 The number of classes (we must store this for boosting). More...
 
arma::vec split
 Stores the splitting values after training. More...
 
size_t splitDimension
 Stores the value of the dimension on which to split. More...
 

Detailed Description

template<typename MatType = arma::mat>
class mlpack::decision_stump::DecisionStump< MatType >

This class implements a decision stump.

It constructs a single level decision tree, i.e., a decision stump. It uses entropy to decide splitting ranges.

The stump is parameterized by a splitting dimension (the dimension on which points are split), a vector of bin split values, and a vector of labels for each bin. Bin i is specified by the range [split[i], split[i + 1]). The last bin has range up to (split[i + 1] does not exist in that case). Points that are below the first bin will take the label of the first bin.

Template Parameters
MatTypeType of matrix that is being used (sparse or dense).

Definition at line 34 of file decision_stump.hpp.

Constructor & Destructor Documentation

template<typename MatType = arma::mat>
mlpack::decision_stump::DecisionStump< MatType >::DecisionStump ( const MatType &  data,
const arma::Row< size_t > &  labels,
const size_t  classes,
const size_t  bucketSize = 10 
)

Constructor.

Train on the provided data. Generate a decision stump from data.

Parameters
dataInput, training data.
labelsLabels of training data.
classesNumber of distinct classes in labels.
bucketSizeMinimum size of bucket when splitting.
template<typename MatType = arma::mat>
mlpack::decision_stump::DecisionStump< MatType >::DecisionStump ( const DecisionStump<> &  other,
const MatType &  data,
const arma::Row< size_t > &  labels,
const arma::rowvec &  weights 
)

Alternate constructor which copies the parameters bucketSize and classes from an already initiated decision stump, other.

It appropriately sets the weight vector.

Parameters
otherThe other initiated Decision Stump object from which we copy the values.
dataThe data on which to train this object on.
labelsThe labels of data.
weightsWeight vector to use while training. For boosting purposes.
template<typename MatType = arma::mat>
mlpack::decision_stump::DecisionStump< MatType >::DecisionStump ( )

Create a decision stump without training.

This stump will not be useful and will always return a class of 0 for anything that is to be classified, so it would be a prudent idea to call Train() after using this constructor.

Member Function Documentation

template<typename MatType = arma::mat>
const arma::Col<size_t> mlpack::decision_stump::DecisionStump< MatType >::BinLabels ( ) const
inline

Access the labels for each split bin.

Definition at line 127 of file decision_stump.hpp.

References mlpack::decision_stump::DecisionStump< MatType >::binLabels.

template<typename MatType = arma::mat>
arma::Col<size_t>& mlpack::decision_stump::DecisionStump< MatType >::BinLabels ( )
inline

Modify the labels for each split bin (be careful!).

Definition at line 129 of file decision_stump.hpp.

References mlpack::decision_stump::DecisionStump< MatType >::binLabels, and mlpack::decision_stump::DecisionStump< MatType >::Serialize().

template<typename MatType = arma::mat>
template<bool UseWeights, typename VecType , typename WeightVecType >
double mlpack::decision_stump::DecisionStump< MatType >::CalculateEntropy ( const VecType &  labels,
const WeightVecType &  weights 
)
private

Calculate the entropy of the given dimension.

Parameters
labelsCorresponding labels of the dimension.
classesNumber of classes.
weightsWeights for this set of labels.
Template Parameters
UseWeightsIf true, the weights in the weight vector will be used (otherwise they are ignored).
template<typename MatType = arma::mat>
void mlpack::decision_stump::DecisionStump< MatType >::Classify ( const MatType &  test,
arma::Row< size_t > &  predictedLabels 
)

Classification function.

After training, classify test, and put the predicted classes in predictedLabels.

Parameters
testTesting data or data to classify.
predictedLabelsVector to store the predicted classes after classifying test data.
template<typename MatType = arma::mat>
template<typename VecType >
double mlpack::decision_stump::DecisionStump< MatType >::CountMostFreq ( const VecType &  subCols)
private

Count the most frequently occurring element in subCols.

Parameters
subColsThe vector in which to find the most frequently occurring element.
template<typename MatType = arma::mat>
template<typename VecType >
int mlpack::decision_stump::DecisionStump< MatType >::IsDistinct ( const VecType &  featureRow)
private

Returns 1 if all the values of featureRow are not same.

Parameters
featureRowThe dimension which is checked for identical values.
template<typename MatType = arma::mat>
void mlpack::decision_stump::DecisionStump< MatType >::MergeRanges ( )
private

After the "split" matrix has been set up, merge ranges with identical class labels.

template<typename MatType = arma::mat>
template<typename Archive >
void mlpack::decision_stump::DecisionStump< MatType >::Serialize ( Archive &  ar,
const unsigned  int 
)

Serialize the decision stump.

Referenced by mlpack::decision_stump::DecisionStump< MatType >::BinLabels().

template<typename MatType = arma::mat>
template<bool UseWeights, typename VecType >
double mlpack::decision_stump::DecisionStump< MatType >::SetupSplitDimension ( const VecType &  dimension,
const arma::Row< size_t > &  labels,
const arma::rowvec &  weightD 
)
private

Sets up dimension as if it were splitting on it and finds entropy when splitting on dimension.

Parameters
dimensionA row from the training data, which might be a candidate for the splitting dimension.
Template Parameters
UseWeightsWhether we need to run a weighted Decision Stump.
template<typename MatType = arma::mat>
const arma::vec& mlpack::decision_stump::DecisionStump< MatType >::Split ( ) const
inline

Access the splitting values.

Definition at line 122 of file decision_stump.hpp.

References mlpack::decision_stump::DecisionStump< MatType >::split.

template<typename MatType = arma::mat>
arma::vec& mlpack::decision_stump::DecisionStump< MatType >::Split ( )
inline

Modify the splitting values (be careful!).

Definition at line 124 of file decision_stump.hpp.

References mlpack::decision_stump::DecisionStump< MatType >::split.

template<typename MatType = arma::mat>
size_t mlpack::decision_stump::DecisionStump< MatType >::SplitDimension ( ) const
inline

Access the splitting dimension.

Definition at line 117 of file decision_stump.hpp.

References mlpack::decision_stump::DecisionStump< MatType >::splitDimension.

template<typename MatType = arma::mat>
size_t& mlpack::decision_stump::DecisionStump< MatType >::SplitDimension ( )
inline

Modify the splitting dimension (be careful!).

Definition at line 119 of file decision_stump.hpp.

References mlpack::decision_stump::DecisionStump< MatType >::splitDimension.

template<typename MatType = arma::mat>
void mlpack::decision_stump::DecisionStump< MatType >::Train ( const MatType &  data,
const arma::Row< size_t > &  labels,
const size_t  classes,
const size_t  bucketSize 
)

Train the decision stump on the given data.

This completely overwrites any previous training data, so after training the stump may be completely different.

Parameters
dataDataset to train on.
labelsLabels for each point in the dataset.
classesNumber of classes in the dataset.
bucketSizeMinimum size of bucket when splitting.
template<typename MatType = arma::mat>
void mlpack::decision_stump::DecisionStump< MatType >::Train ( const MatType &  data,
const arma::Row< size_t > &  labels,
const arma::rowvec &  weights,
const size_t  classes,
const size_t  bucketSize 
)

Train the decision stump on the given data, with the given weights.

This completely overwrites any previous training data, so after training the stump may be completely different.

Parameters
dataDataset to train on.
labelsLabels for each point in the dataset.
weightsWeights for each point in the dataset.
classesNumber of classes in the dataset.
bucketSizeMinimum size of bucket when splitting.
template<typename MatType = arma::mat>
template<bool UseWeights>
void mlpack::decision_stump::DecisionStump< MatType >::Train ( const MatType &  data,
const arma::Row< size_t > &  labels,
const arma::rowvec &  weights 
)
private

Train the decision stump on the given data and labels.

Parameters
dataDataset to train on.
labelsLabels for dataset.
weightsWeights for this set of labels.
Template Parameters
UseWeightsIf true, the weights in the weight vector will be used (otherwise they are ignored).
template<typename MatType = arma::mat>
template<typename VecType >
void mlpack::decision_stump::DecisionStump< MatType >::TrainOnDim ( const VecType &  dimension,
const arma::Row< size_t > &  labels 
)
private

After having decided the dimension on which to split, train on that dimension.

Template Parameters
dimensiondimension is the dimension decided by the constructor on which we now train the decision stump.

Member Data Documentation

template<typename MatType = arma::mat>
arma::Col<size_t> mlpack::decision_stump::DecisionStump< MatType >::binLabels
private

Stores the labels for each splitting bin.

Definition at line 146 of file decision_stump.hpp.

Referenced by mlpack::decision_stump::DecisionStump< MatType >::BinLabels().

template<typename MatType = arma::mat>
size_t mlpack::decision_stump::DecisionStump< MatType >::bucketSize
private

The minimum number of points in a bucket.

Definition at line 139 of file decision_stump.hpp.

template<typename MatType = arma::mat>
size_t mlpack::decision_stump::DecisionStump< MatType >::classes
private

The number of classes (we must store this for boosting).

Definition at line 137 of file decision_stump.hpp.

template<typename MatType = arma::mat>
arma::vec mlpack::decision_stump::DecisionStump< MatType >::split
private

Stores the splitting values after training.

Definition at line 144 of file decision_stump.hpp.

Referenced by mlpack::decision_stump::DecisionStump< MatType >::Split().

template<typename MatType = arma::mat>
size_t mlpack::decision_stump::DecisionStump< MatType >::splitDimension
private

Stores the value of the dimension on which to split.

Definition at line 142 of file decision_stump.hpp.

Referenced by mlpack::decision_stump::DecisionStump< MatType >::SplitDimension().


The documentation for this class was generated from the following file: