mlpack
master
|
This class implements a decision stump. More...
Public Member Functions | |
DecisionStump (const MatType &data, const arma::Row< size_t > &labels, const size_t classes, const size_t bucketSize=10) | |
Constructor. More... | |
DecisionStump (const DecisionStump<> &other, const MatType &data, const arma::Row< size_t > &labels, const arma::rowvec &weights) | |
Alternate constructor which copies the parameters bucketSize and classes from an already initiated decision stump, other. More... | |
DecisionStump () | |
Create a decision stump without training. More... | |
const arma::Col< size_t > | BinLabels () const |
Access the labels for each split bin. More... | |
arma::Col< size_t > & | BinLabels () |
Modify the labels for each split bin (be careful!). More... | |
void | Classify (const MatType &test, arma::Row< size_t > &predictedLabels) |
Classification function. More... | |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int) |
Serialize the decision stump. More... | |
const arma::vec & | Split () const |
Access the splitting values. More... | |
arma::vec & | Split () |
Modify the splitting values (be careful!). More... | |
size_t | SplitDimension () const |
Access the splitting dimension. More... | |
size_t & | SplitDimension () |
Modify the splitting dimension (be careful!). More... | |
void | Train (const MatType &data, const arma::Row< size_t > &labels, const size_t classes, const size_t bucketSize) |
Train the decision stump on the given data. More... | |
void | Train (const MatType &data, const arma::Row< size_t > &labels, const arma::rowvec &weights, const size_t classes, const size_t bucketSize) |
Train the decision stump on the given data, with the given weights. More... | |
Private Member Functions | |
template<bool UseWeights, typename VecType , typename WeightVecType > | |
double | CalculateEntropy (const VecType &labels, const WeightVecType &weights) |
Calculate the entropy of the given dimension. More... | |
template<typename VecType > | |
double | CountMostFreq (const VecType &subCols) |
Count the most frequently occurring element in subCols. More... | |
template<typename VecType > | |
int | IsDistinct (const VecType &featureRow) |
Returns 1 if all the values of featureRow are not same. More... | |
void | MergeRanges () |
After the "split" matrix has been set up, merge ranges with identical class labels. More... | |
template<bool UseWeights, typename VecType > | |
double | SetupSplitDimension (const VecType &dimension, const arma::Row< size_t > &labels, const arma::rowvec &weightD) |
Sets up dimension as if it were splitting on it and finds entropy when splitting on dimension. More... | |
template<bool UseWeights> | |
void | Train (const MatType &data, const arma::Row< size_t > &labels, const arma::rowvec &weights) |
Train the decision stump on the given data and labels. More... | |
template<typename VecType > | |
void | TrainOnDim (const VecType &dimension, const arma::Row< size_t > &labels) |
After having decided the dimension on which to split, train on that dimension. More... | |
Private Attributes | |
arma::Col< size_t > | binLabels |
Stores the labels for each splitting bin. More... | |
size_t | bucketSize |
The minimum number of points in a bucket. More... | |
size_t | classes |
The number of classes (we must store this for boosting). More... | |
arma::vec | split |
Stores the splitting values after training. More... | |
size_t | splitDimension |
Stores the value of the dimension on which to split. More... | |
This class implements a decision stump.
It constructs a single level decision tree, i.e., a decision stump. It uses entropy to decide splitting ranges.
The stump is parameterized by a splitting dimension (the dimension on which points are split), a vector of bin split values, and a vector of labels for each bin. Bin i is specified by the range [split[i], split[i + 1]). The last bin has range up to (split[i + 1] does not exist in that case). Points that are below the first bin will take the label of the first bin.
MatType | Type of matrix that is being used (sparse or dense). |
Definition at line 34 of file decision_stump.hpp.
mlpack::decision_stump::DecisionStump< MatType >::DecisionStump | ( | const MatType & | data, |
const arma::Row< size_t > & | labels, | ||
const size_t | classes, | ||
const size_t | bucketSize = 10 |
||
) |
Constructor.
Train on the provided data. Generate a decision stump from data.
data | Input, training data. |
labels | Labels of training data. |
classes | Number of distinct classes in labels. |
bucketSize | Minimum size of bucket when splitting. |
mlpack::decision_stump::DecisionStump< MatType >::DecisionStump | ( | const DecisionStump<> & | other, |
const MatType & | data, | ||
const arma::Row< size_t > & | labels, | ||
const arma::rowvec & | weights | ||
) |
Alternate constructor which copies the parameters bucketSize and classes from an already initiated decision stump, other.
It appropriately sets the weight vector.
other | The other initiated Decision Stump object from which we copy the values. |
data | The data on which to train this object on. |
labels | The labels of data. |
weights | Weight vector to use while training. For boosting purposes. |
mlpack::decision_stump::DecisionStump< MatType >::DecisionStump | ( | ) |
Create a decision stump without training.
This stump will not be useful and will always return a class of 0 for anything that is to be classified, so it would be a prudent idea to call Train() after using this constructor.
|
inline |
Access the labels for each split bin.
Definition at line 127 of file decision_stump.hpp.
References mlpack::decision_stump::DecisionStump< MatType >::binLabels.
|
inline |
Modify the labels for each split bin (be careful!).
Definition at line 129 of file decision_stump.hpp.
References mlpack::decision_stump::DecisionStump< MatType >::binLabels, and mlpack::decision_stump::DecisionStump< MatType >::Serialize().
|
private |
Calculate the entropy of the given dimension.
labels | Corresponding labels of the dimension. |
classes | Number of classes. |
weights | Weights for this set of labels. |
UseWeights | If true, the weights in the weight vector will be used (otherwise they are ignored). |
void mlpack::decision_stump::DecisionStump< MatType >::Classify | ( | const MatType & | test, |
arma::Row< size_t > & | predictedLabels | ||
) |
Classification function.
After training, classify test, and put the predicted classes in predictedLabels.
test | Testing data or data to classify. |
predictedLabels | Vector to store the predicted classes after classifying test data. |
|
private |
Count the most frequently occurring element in subCols.
subCols | The vector in which to find the most frequently occurring element. |
|
private |
Returns 1 if all the values of featureRow are not same.
featureRow | The dimension which is checked for identical values. |
|
private |
After the "split" matrix has been set up, merge ranges with identical class labels.
void mlpack::decision_stump::DecisionStump< MatType >::Serialize | ( | Archive & | ar, |
const unsigned | int | ||
) |
Serialize the decision stump.
Referenced by mlpack::decision_stump::DecisionStump< MatType >::BinLabels().
|
private |
Sets up dimension as if it were splitting on it and finds entropy when splitting on dimension.
dimension | A row from the training data, which might be a candidate for the splitting dimension. |
UseWeights | Whether we need to run a weighted Decision Stump. |
|
inline |
Access the splitting values.
Definition at line 122 of file decision_stump.hpp.
References mlpack::decision_stump::DecisionStump< MatType >::split.
|
inline |
Modify the splitting values (be careful!).
Definition at line 124 of file decision_stump.hpp.
References mlpack::decision_stump::DecisionStump< MatType >::split.
|
inline |
Access the splitting dimension.
Definition at line 117 of file decision_stump.hpp.
References mlpack::decision_stump::DecisionStump< MatType >::splitDimension.
|
inline |
Modify the splitting dimension (be careful!).
Definition at line 119 of file decision_stump.hpp.
References mlpack::decision_stump::DecisionStump< MatType >::splitDimension.
void mlpack::decision_stump::DecisionStump< MatType >::Train | ( | const MatType & | data, |
const arma::Row< size_t > & | labels, | ||
const size_t | classes, | ||
const size_t | bucketSize | ||
) |
Train the decision stump on the given data.
This completely overwrites any previous training data, so after training the stump may be completely different.
data | Dataset to train on. |
labels | Labels for each point in the dataset. |
classes | Number of classes in the dataset. |
bucketSize | Minimum size of bucket when splitting. |
void mlpack::decision_stump::DecisionStump< MatType >::Train | ( | const MatType & | data, |
const arma::Row< size_t > & | labels, | ||
const arma::rowvec & | weights, | ||
const size_t | classes, | ||
const size_t | bucketSize | ||
) |
Train the decision stump on the given data, with the given weights.
This completely overwrites any previous training data, so after training the stump may be completely different.
data | Dataset to train on. |
labels | Labels for each point in the dataset. |
weights | Weights for each point in the dataset. |
classes | Number of classes in the dataset. |
bucketSize | Minimum size of bucket when splitting. |
|
private |
Train the decision stump on the given data and labels.
data | Dataset to train on. |
labels | Labels for dataset. |
weights | Weights for this set of labels. |
UseWeights | If true, the weights in the weight vector will be used (otherwise they are ignored). |
|
private |
After having decided the dimension on which to split, train on that dimension.
dimension | dimension is the dimension decided by the constructor on which we now train the decision stump. |
|
private |
Stores the labels for each splitting bin.
Definition at line 146 of file decision_stump.hpp.
Referenced by mlpack::decision_stump::DecisionStump< MatType >::BinLabels().
|
private |
The minimum number of points in a bucket.
Definition at line 139 of file decision_stump.hpp.
|
private |
The number of classes (we must store this for boosting).
Definition at line 137 of file decision_stump.hpp.
|
private |
Stores the splitting values after training.
Definition at line 144 of file decision_stump.hpp.
Referenced by mlpack::decision_stump::DecisionStump< MatType >::Split().
|
private |
Stores the value of the dimension on which to split.
Definition at line 142 of file decision_stump.hpp.
Referenced by mlpack::decision_stump::DecisionStump< MatType >::SplitDimension().