mlpack
master
|
This class implements a generic decision tree learner. More...
Public Types | |
typedef CategoricalSplitType< FitnessFunction > | CategoricalSplit |
Allow access to the categorical split type. More... | |
typedef NumericSplitType< FitnessFunction > | NumericSplit |
Allow access to the numeric split type. More... | |
Public Member Functions | |
template<typename MatType > | |
DecisionTree (const MatType &data, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const size_t minimumLeafSize=10) | |
Construct the decision tree on the given data and labels, where the data can be both numeric and categorical. More... | |
template<typename MatType > | |
DecisionTree (const MatType &data, const arma::Row< size_t > &labels, const size_t numClasses, const size_t minimumLeafSize=10) | |
Construct the decision tree on the given data and labels, assuming that the data is all of the numeric type. More... | |
DecisionTree (const size_t numClasses=1) | |
Construct a decision tree without training it. More... | |
DecisionTree (const DecisionTree &other) | |
Copy another tree. More... | |
DecisionTree (DecisionTree &&other) | |
Take ownership of another tree. More... | |
~DecisionTree () | |
Clean up memory. More... | |
template<typename VecType > | |
size_t | CalculateDirection (const VecType &point) const |
Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards. More... | |
const DecisionTree & | Child (const size_t i) const |
Get the child of the given index. More... | |
DecisionTree & | Child (const size_t i) |
Modify the child of the given index (be careful!). More... | |
template<typename VecType > | |
size_t | Classify (const VecType &point) const |
Classify the given point, using the entire tree. More... | |
template<typename VecType > | |
void | Classify (const VecType &point, size_t &prediction, arma::vec &probabilities) const |
Classify the given point and also return estimates of the probability for each class in the given vector. More... | |
template<typename MatType > | |
void | Classify (const MatType &data, arma::Row< size_t > &predictions) const |
Classify the given points, using the entire tree. More... | |
template<typename MatType > | |
void | Classify (const MatType &data, arma::Row< size_t > &predictions, arma::mat &probabilities) const |
Classify the given points and also return estimates of the probabilities for each class in the given matrix. More... | |
size_t | NumChildren () const |
Get the number of children. More... | |
DecisionTree & | operator= (const DecisionTree &other) |
Copy another tree. More... | |
DecisionTree & | operator= (DecisionTree &&other) |
Take ownership of another tree. More... | |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int) |
Serialize the tree. More... | |
template<typename MatType > | |
void | Train (const MatType &data, const data::DatasetInfo &datasetInfo, const arma::Row< size_t > &labels, const size_t numClasses, const size_t minimumLeafSize=10) |
Train the decision tree on the given data. More... | |
template<typename MatType > | |
void | Train (const MatType &data, const arma::Row< size_t > &labels, const size_t numClasses, const size_t minimumLeafSize=10) |
Train the decision tree on the given data, assuming that all dimensions are numeric. More... | |
Private Types | |
typedef CategoricalSplit::template AuxiliarySplitInfo< ElemType > | CategoricalAuxiliarySplitInfo |
typedef NumericSplit::template AuxiliarySplitInfo< ElemType > | NumericAuxiliarySplitInfo |
Note that this class will also hold the members of the NumericSplit and CategoricalSplit AuxiliarySplitInfo classes, since it inherits from them. More... | |
Private Member Functions | |
template<typename RowType > | |
void | CalculateClassProbabilities (const RowType &labels, const size_t numClasses) |
Calculate the class probabilities of the given labels. More... | |
Private Attributes | |
std::vector< DecisionTree * > | children |
The vector of children. More... | |
arma::vec | classProbabilities |
This vector may hold different things. More... | |
size_t | dimensionTypeOrMajorityClass |
The type of the dimension that we have split on (if we are not a leaf). More... | |
size_t | splitDimension |
The dimension this node splits on. More... | |
This class implements a generic decision tree learner.
Its behavior can be controlled via its template arguments.
The class inherits from the auxiliary split information in order to prevent an empty auxiliary split information struct from taking any extra size.
Definition at line 31 of file decision_tree.hpp.
|
private |
Definition at line 254 of file decision_tree.hpp.
typedef CategoricalSplitType<FitnessFunction> mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::CategoricalSplit |
Allow access to the categorical split type.
Definition at line 41 of file decision_tree.hpp.
|
private |
Note that this class will also hold the members of the NumericSplit and CategoricalSplit AuxiliarySplitInfo classes, since it inherits from them.
We'll define some convenience typedefs here.
Definition at line 252 of file decision_tree.hpp.
typedef NumericSplitType<FitnessFunction> mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::NumericSplit |
Allow access to the numeric split type.
Definition at line 39 of file decision_tree.hpp.
mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::DecisionTree | ( | const MatType & | data, |
const data::DatasetInfo & | datasetInfo, | ||
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const size_t | minimumLeafSize = 10 |
||
) |
Construct the decision tree on the given data and labels, where the data can be both numeric and categorical.
Setting minimumLeafSize too small may cause the tree to overfit, but setting it too large may cause it to underfit.
data | Dataset to train on. |
datasetInfo | Type information for each dimension of the dataset. |
labels | Labels for each training point. |
numClasses | Number of classes in the dataset. |
minimumLeafSize | Minimum number of points in each leaf node. |
mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::DecisionTree | ( | const MatType & | data, |
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const size_t | minimumLeafSize = 10 |
||
) |
Construct the decision tree on the given data and labels, assuming that the data is all of the numeric type.
Setting minimumLeafSize too small may cause the tree to overfit, but setting it too large may cause it to underfit.
data | Dataset to train on. |
labels | Labels for each training point. |
numClasses | Number of classes in the dataset. |
minimumLeafSize | Minimum number of points in each leaf node. |
mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::DecisionTree | ( | const size_t | numClasses = 1 | ) |
Construct a decision tree without training it.
It will be a leaf node with equal probabilities for each class.
numClasses | Number of classes in the dataset. |
mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::DecisionTree | ( | const DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion > & | other | ) |
Copy another tree.
This may use a lot of memory—be sure that it's what you want to do.
other | Tree to copy. |
mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::DecisionTree | ( | DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion > && | other | ) |
Take ownership of another tree.
other | Tree to take ownership of. |
mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::~DecisionTree | ( | ) |
Clean up memory.
|
private |
Calculate the class probabilities of the given labels.
size_t mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::CalculateDirection | ( | const VecType & | point | ) | const |
Given a point and that this node is not a leaf, calculate the index of the child node this point would go towards.
This method is primarily used by the Classify() function, but it can be used in a standalone sense too.
point | Point to classify. |
Referenced by mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Child().
|
inline |
Get the child of the given index.
Definition at line 217 of file decision_tree.hpp.
|
inline |
Modify the child of the given index (be careful!).
Definition at line 219 of file decision_tree.hpp.
References mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::CalculateDirection(), and mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::children.
size_t mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Classify | ( | const VecType & | point | ) | const |
Classify the given point, using the entire tree.
The predicted label is returned.
point | Point to classify. |
void mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Classify | ( | const VecType & | point, |
size_t & | prediction, | ||
arma::vec & | probabilities | ||
) | const |
Classify the given point and also return estimates of the probability for each class in the given vector.
point | Point to classify. |
prediction | This will be set to the predicted class of the point. |
probabilities | This will be filled with class probabilities for the point. |
void mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Classify | ( | const MatType & | data, |
arma::Row< size_t > & | predictions | ||
) | const |
Classify the given points, using the entire tree.
The predicted labels for each point are stored in the given vector.
data | Set of points to classify. |
predictions | This will be filled with predictions for each point. |
void mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Classify | ( | const MatType & | data, |
arma::Row< size_t > & | predictions, | ||
arma::mat & | probabilities | ||
) | const |
Classify the given points and also return estimates of the probabilities for each class in the given matrix.
The predicted labels for each point are stored in the given vector.
data | Set of points to classify. |
predictions | This will be filled with predictions for each point. |
probabilities | This will be filled with class probabilities for each point. |
|
inline |
Get the number of children.
Definition at line 214 of file decision_tree.hpp.
DecisionTree& mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::operator= | ( | const DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion > & | other | ) |
Copy another tree.
This may use a lot of memory—be sure that it's what you want to do.
other | Tree to copy. |
DecisionTree& mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::operator= | ( | DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion > && | other | ) |
Take ownership of another tree.
other | Tree to take ownership of. |
void mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Serialize | ( | Archive & | ar, |
const unsigned | int | ||
) |
Serialize the tree.
void mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Train | ( | const MatType & | data, |
const data::DatasetInfo & | datasetInfo, | ||
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const size_t | minimumLeafSize = 10 |
||
) |
Train the decision tree on the given data.
This will overwrite the existing model. The data may have numeric and categorical types, specified by the datasetInfo parameter. Setting minimumLeafSize too small may cause the tree to overfit, but setting it too large may cause it to underfit.
data | Dataset to train on. |
datasetInfo | Type information for each dimension. |
labels | Labels for each training point. |
numClasses | Number of classes in the dataset. |
minimumLeafSize | Minimum number of points in each leaf node. |
void mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Train | ( | const MatType & | data, |
const arma::Row< size_t > & | labels, | ||
const size_t | numClasses, | ||
const size_t | minimumLeafSize = 10 |
||
) |
Train the decision tree on the given data, assuming that all dimensions are numeric.
This will overwrite the given model. Setting minimumLeafSize too small may cause the tree to overfit, but setting it too large may cause it to underfit.
data | Dataset to train on. |
labels | Labels for each training point. |
numClasses | Number of classes in the dataset. |
minimumLeafSize | Minimum number of points in each leaf node. |
|
private |
The vector of children.
Definition at line 233 of file decision_tree.hpp.
Referenced by mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::Child(), and mlpack::tree::DecisionTree< FitnessFunction, NumericSplitType, CategoricalSplitType, ElemType, NoRecursion >::NumChildren().
|
private |
This vector may hold different things.
If the node has no children, then it is guaranteed to hold the probabilities of each class. If the node has children, then it may be used arbitrarily by the split type's CalculateDirection() function and may not necessarily hold class probabilities.
Definition at line 246 of file decision_tree.hpp.
|
private |
The type of the dimension that we have split on (if we are not a leaf).
If we are a leaf, then this is the index of the majority class.
Definition at line 238 of file decision_tree.hpp.
|
private |
The dimension this node splits on.
Definition at line 235 of file decision_tree.hpp.