mlpack
master
|
A density estimation tree is similar to both a decision tree and a space partitioning tree (like a kd-tree). More...
Public Types | |
typedef MatType::elem_type | ElemType |
The actual, underlying type we're working with. More... | |
typedef arma::Col< ElemType > | StatType |
typedef MatType::vec_type | VecType |
Public Member Functions | |
DTree () | |
Create an empty density estimation tree. More... | |
DTree (const DTree &obj) | |
Create a tree that is the copy of the given tree. More... | |
DTree (DTree &&obj) | |
Create a tree by taking ownership of another tree (move constructor). More... | |
DTree (const StatType &maxVals, const StatType &minVals, const size_t totalPoints) | |
Create a density estimation tree with the given bounds and the given number of total points. More... | |
DTree (MatType &data) | |
Create a density estimation tree on the given data. More... | |
DTree (const StatType &maxVals, const StatType &minVals, const size_t start, const size_t end, const double logNegError) | |
Create a child node of a density estimation tree given the bounding box specified by maxVals and minVals, using the size given in start and end and the specified error. More... | |
DTree (const StatType &maxVals, const StatType &minVals, const size_t totalPoints, const size_t start, const size_t end) | |
Create a child node of a density estimation tree given the bounding box specified by maxVals and minVals, using the size given in start and end, and calculating the error with the total number of points given. More... | |
~DTree () | |
Clean up memory allocated by the tree. More... | |
double | AlphaUpper () const |
Return the upper part of the alpha sum. More... | |
TagType | BucketTag () const |
Return the current bucket's ID, if leaf, or -1 otherwise. More... | |
double | ComputeValue (const VecType &query) const |
Compute the logarithm of the density estimate of a given query point. More... | |
void | ComputeVariableImportance (arma::vec &importances) const |
Compute the variable importance of each dimension in the learned tree. More... | |
size_t | End () const |
Return the first index of a point not contained in this node. More... | |
TagType | FindBucket (const VecType &query) const |
Return the tag of the leaf containing the query. More... | |
double | Grow (MatType &data, arma::Col< size_t > &oldFromNew, const bool useVolReg=false, const size_t maxLeafSize=10, const size_t minLeafSize=5) |
Greedily expand the tree. More... | |
DTree * | Left () const |
Return the left child. More... | |
double | LogNegativeError (const size_t totalPoints) const |
Compute the log-negative-error for this point, given the total number of points in the dataset. More... | |
double | LogNegError () const |
Return the log negative error of this node. More... | |
double | LogVolume () const |
Return the inverse of the volume of this node. More... | |
const StatType & | MaxVals () const |
Return the maximum values. More... | |
const StatType & | MinVals () const |
Return the minimum values. More... | |
DTree & | operator= (const DTree &obj) |
Copy the given tree. More... | |
DTree & | operator= (DTree &&obj) |
Take ownership of the given tree (move operator). More... | |
double | PruneAndUpdate (const double oldAlpha, const size_t points, const bool useVolReg=false) |
Perform alpha pruning on a tree. More... | |
double | Ratio () const |
Return the ratio of points in this node to the points in the whole dataset. More... | |
DTree * | Right () const |
Return the right child. More... | |
bool | Root () const |
Return whether or not this is the root of the tree. More... | |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int) |
Serialize the density estimation tree. More... | |
size_t | SplitDim () const |
Return the split dimension of this node. More... | |
ElemType | SplitValue () const |
Return the split value of this node. More... | |
size_t | Start () const |
Return the starting index of points contained in this node. More... | |
size_t | SubtreeLeaves () const |
Return the number of leaves which are descendants of this node. More... | |
double | SubtreeLeavesLogNegError () const |
Return the log negative error of all descendants of this node. More... | |
TagType | TagTree (const TagType &tag=0) |
Index the buckets for possible usage later; this results in every leaf in the tree having a specific tag (accessible with BucketTag()). More... | |
bool | WithinRange (const VecType &query) const |
Return whether a query point is within the range of this node. More... | |
Private Member Functions | |
bool | FindSplit (const MatType &data, size_t &splitDim, ElemType &splitValue, double &leftError, double &rightError, const size_t minLeafSize=5) const |
Find the dimension to split on. More... | |
size_t | SplitData (MatType &data, const size_t splitDim, const ElemType splitValue, arma::Col< size_t > &oldFromNew) const |
Split the data, returning the number of points left of the split. More... | |
Private Attributes | |
double | alphaUpper |
Upper part of alpha sum; used for pruning. More... | |
TagType | bucketTag |
The tag for the leaf, used for hashing points. More... | |
size_t | end |
The index of the last point in the dataset contained in this node (and its children). More... | |
DTree * | left |
The left child. More... | |
double | logNegError |
log-negative-L2-error of the node. More... | |
double | logVolume |
The logarithm of the volume of the node. More... | |
StatType | maxVals |
Upper half of bounding box for this node. More... | |
StatType | minVals |
Lower half of bounding box for this node. More... | |
double | ratio |
Ratio of the number of points in the node to the total number of points. More... | |
DTree * | right |
The right child. More... | |
bool | root |
If true, this node is the root of the tree. More... | |
size_t | splitDim |
The splitting dimension for this node. More... | |
ElemType | splitValue |
The split value on the splitting dimension for this node. More... | |
size_t | start |
The index of the first point in the dataset contained in this node (and its children). More... | |
size_t | subtreeLeaves |
Number of leaves of the subtree. More... | |
double | subtreeLeavesLogNegError |
Sum of the error of the leaves of the subtree. More... | |
A density estimation tree is similar to both a decision tree and a space partitioning tree (like a kd-tree).
Each leaf represents a constant-density hyper-rectangle. The tree is constructed in such a way as to minimize the integrated square error between the probability distribution of the tree and the observed probability distribution of the data. Because the tree is similar to a decision tree, the density estimation tree can provide very fast density estimates for a given point.
For more information, see the following paper:
typedef MatType::elem_type mlpack::det::DTree< MatType, TagType >::ElemType |
typedef arma::Col<ElemType> mlpack::det::DTree< MatType, TagType >::StatType |
typedef MatType::vec_type mlpack::det::DTree< MatType, TagType >::VecType |
mlpack::det::DTree< MatType, TagType >::DTree | ( | ) |
Create an empty density estimation tree.
mlpack::det::DTree< MatType, TagType >::DTree | ( | const DTree< MatType, TagType > & | obj | ) |
Create a tree that is the copy of the given tree.
obj | Tree to copy. |
mlpack::det::DTree< MatType, TagType >::DTree | ( | DTree< MatType, TagType > && | obj | ) |
Create a tree by taking ownership of another tree (move constructor).
obj | Tree to take ownership of. |
mlpack::det::DTree< MatType, TagType >::DTree | ( | const StatType & | maxVals, |
const StatType & | minVals, | ||
const size_t | totalPoints | ||
) |
Create a density estimation tree with the given bounds and the given number of total points.
Children will not be created.
maxVals | Maximum values of the bounding box. |
minVals | Minimum values of the bounding box. |
totalPoints | Total number of points in the dataset. |
mlpack::det::DTree< MatType, TagType >::DTree | ( | MatType & | data | ) |
Create a density estimation tree on the given data.
Children will be created following the procedure outlined in the paper. The data will be modified; it will be reordered similar to the way BinarySpaceTree modifies datasets.
data | Dataset to build tree on. |
mlpack::det::DTree< MatType, TagType >::DTree | ( | const StatType & | maxVals, |
const StatType & | minVals, | ||
const size_t | start, | ||
const size_t | end, | ||
const double | logNegError | ||
) |
Create a child node of a density estimation tree given the bounding box specified by maxVals and minVals, using the size given in start and end and the specified error.
Children of this node will not be created recursively.
maxVals | Upper bound of bounding box. |
minVals | Lower bound of bounding box. |
start | Start of points represented by this node in the data matrix. |
end | End of points represented by this node in the data matrix. |
error | log-negative error of this node. |
mlpack::det::DTree< MatType, TagType >::DTree | ( | const StatType & | maxVals, |
const StatType & | minVals, | ||
const size_t | totalPoints, | ||
const size_t | start, | ||
const size_t | end | ||
) |
Create a child node of a density estimation tree given the bounding box specified by maxVals and minVals, using the size given in start and end, and calculating the error with the total number of points given.
Children of this node will not be created recursively.
maxVals | Upper bound of bounding box. |
minVals | Lower bound of bounding box. |
start | Start of points represented by this node in the data matrix. |
end | End of points represented by this node in the data matrix. |
mlpack::det::DTree< MatType, TagType >::~DTree | ( | ) |
Clean up memory allocated by the tree.
|
inline |
Return the upper part of the alpha sum.
Definition at line 302 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::alphaUpper.
|
inline |
double mlpack::det::DTree< MatType, TagType >::ComputeValue | ( | const VecType & | query | ) | const |
Compute the logarithm of the density estimate of a given query point.
query | Point to estimate density of. |
void mlpack::det::DTree< MatType, TagType >::ComputeVariableImportance | ( | arma::vec & | importances | ) | const |
Compute the variable importance of each dimension in the learned tree.
importances | Vector to store the calculated importances in. |
|
inline |
Return the first index of a point not contained in this node.
Definition at line 279 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::end.
TagType mlpack::det::DTree< MatType, TagType >::FindBucket | ( | const VecType & | query | ) | const |
Return the tag of the leaf containing the query.
This is useful for generating class memberships.
query | Query to search for. |
|
private |
Find the dimension to split on.
Referenced by mlpack::det::DTree< MatType, TagType >::MinVals().
double mlpack::det::DTree< MatType, TagType >::Grow | ( | MatType & | data, |
arma::Col< size_t > & | oldFromNew, | ||
const bool | useVolReg = false , |
||
const size_t | maxLeafSize = 10 , |
||
const size_t | minLeafSize = 5 |
||
) |
Greedily expand the tree.
The points in the dataset will be reordered during tree growth.
data | Dataset to build tree on. |
oldFromNew | Mappings from old points to new points. |
useVolReg | If true, volume regularization is used. |
maxLeafSize | Maximum size of a leaf. |
minLeafSize | Minimum size of a leaf. |
|
inline |
Return the left child.
Definition at line 296 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::left.
double mlpack::det::DTree< MatType, TagType >::LogNegativeError | ( | const size_t | totalPoints | ) | const |
Compute the log-negative-error for this point, given the total number of points in the dataset.
totalPoints | Total number of points in the dataset. |
|
inline |
Return the log negative error of this node.
Definition at line 285 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::logNegError.
|
inline |
Return the inverse of the volume of this node.
Definition at line 294 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::logVolume.
|
inline |
Return the maximum values.
Definition at line 307 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::maxVals.
|
inline |
Return the minimum values.
Definition at line 310 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::FindSplit(), mlpack::det::DTree< MatType, TagType >::minVals, mlpack::det::DTree< MatType, TagType >::Serialize(), and mlpack::det::DTree< MatType, TagType >::SplitData().
DTree& mlpack::det::DTree< MatType, TagType >::operator= | ( | const DTree< MatType, TagType > & | obj | ) |
Copy the given tree.
obj | Tree to copy. |
DTree& mlpack::det::DTree< MatType, TagType >::operator= | ( | DTree< MatType, TagType > && | obj | ) |
Take ownership of the given tree (move operator).
obj | Tree to take ownership of. |
double mlpack::det::DTree< MatType, TagType >::PruneAndUpdate | ( | const double | oldAlpha, |
const size_t | points, | ||
const bool | useVolReg = false |
||
) |
Perform alpha pruning on a tree.
Returns the new value of alpha.
oldAlpha | Old value of alpha. |
points | Total number of points in dataset. |
useVolReg | If true, volume regularization is used. |
|
inline |
Return the ratio of points in this node to the points in the whole dataset.
Definition at line 292 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::ratio.
|
inline |
Return the right child.
Definition at line 298 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::right.
|
inline |
Return whether or not this is the root of the tree.
Definition at line 300 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::root.
void mlpack::det::DTree< MatType, TagType >::Serialize | ( | Archive & | ar, |
const unsigned | int | ||
) |
Serialize the density estimation tree.
Referenced by mlpack::det::DTree< MatType, TagType >::MinVals().
|
private |
Split the data, returning the number of points left of the split.
Referenced by mlpack::det::DTree< MatType, TagType >::MinVals().
|
inline |
Return the split dimension of this node.
Definition at line 281 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::splitDim.
|
inline |
Return the split value of this node.
Definition at line 283 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::splitValue.
|
inline |
Return the starting index of points contained in this node.
Definition at line 277 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::start.
|
inline |
Return the number of leaves which are descendants of this node.
Definition at line 289 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::subtreeLeaves.
|
inline |
Return the log negative error of all descendants of this node.
Definition at line 287 of file dtree.hpp.
References mlpack::det::DTree< MatType, TagType >::subtreeLeavesLogNegError.
TagType mlpack::det::DTree< MatType, TagType >::TagTree | ( | const TagType & | tag = 0 | ) |
Index the buckets for possible usage later; this results in every leaf in the tree having a specific tag (accessible with BucketTag()).
This function calls itself recursively.
tag | Tag for the next leaf; leave at 0 for the initial call. |
bool mlpack::det::DTree< MatType, TagType >::WithinRange | ( | const VecType & | query | ) | const |
Return whether a query point is within the range of this node.
|
private |
Upper part of alpha sum; used for pruning.
Definition at line 268 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::AlphaUpper().
|
private |
|
private |
The index of the last point in the dataset contained in this node (and its children).
Definition at line 233 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::End().
|
private |
The left child.
Definition at line 271 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::Left().
|
private |
log-negative-L2-error of the node.
Definition at line 247 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::LogNegError().
|
private |
The logarithm of the volume of the node.
Definition at line 262 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::LogVolume().
|
private |
Upper half of bounding box for this node.
Definition at line 236 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::MaxVals().
|
private |
Lower half of bounding box for this node.
Definition at line 238 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::MinVals().
|
private |
Ratio of the number of points in the node to the total number of points.
Definition at line 259 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::Ratio().
|
private |
The right child.
Definition at line 273 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::Right().
|
private |
If true, this node is the root of the tree.
Definition at line 256 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::Root().
|
private |
The splitting dimension for this node.
Definition at line 241 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::SplitDim().
|
private |
The split value on the splitting dimension for this node.
Definition at line 244 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::SplitValue().
|
private |
The index of the first point in the dataset contained in this node (and its children).
Definition at line 230 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::Start().
|
private |
Number of leaves of the subtree.
Definition at line 253 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::SubtreeLeaves().
|
private |
Sum of the error of the leaves of the subtree.
Definition at line 250 of file dtree.hpp.
Referenced by mlpack::det::DTree< MatType, TagType >::SubtreeLeavesLogNegError().