mlpack  master
Public Types | Public Member Functions | Private Member Functions | Private Attributes | List of all members
mlpack::det::DTree< MatType, TagType > Class Template Reference

A density estimation tree is similar to both a decision tree and a space partitioning tree (like a kd-tree). More...

Public Types

typedef MatType::elem_type ElemType
 The actual, underlying type we're working with. More...
 
typedef arma::Col< ElemTypeStatType
 
typedef MatType::vec_type VecType
 

Public Member Functions

 DTree ()
 Create an empty density estimation tree. More...
 
 DTree (const DTree &obj)
 Create a tree that is the copy of the given tree. More...
 
 DTree (DTree &&obj)
 Create a tree by taking ownership of another tree (move constructor). More...
 
 DTree (const StatType &maxVals, const StatType &minVals, const size_t totalPoints)
 Create a density estimation tree with the given bounds and the given number of total points. More...
 
 DTree (MatType &data)
 Create a density estimation tree on the given data. More...
 
 DTree (const StatType &maxVals, const StatType &minVals, const size_t start, const size_t end, const double logNegError)
 Create a child node of a density estimation tree given the bounding box specified by maxVals and minVals, using the size given in start and end and the specified error. More...
 
 DTree (const StatType &maxVals, const StatType &minVals, const size_t totalPoints, const size_t start, const size_t end)
 Create a child node of a density estimation tree given the bounding box specified by maxVals and minVals, using the size given in start and end, and calculating the error with the total number of points given. More...
 
 ~DTree ()
 Clean up memory allocated by the tree. More...
 
double AlphaUpper () const
 Return the upper part of the alpha sum. More...
 
TagType BucketTag () const
 Return the current bucket's ID, if leaf, or -1 otherwise. More...
 
double ComputeValue (const VecType &query) const
 Compute the logarithm of the density estimate of a given query point. More...
 
void ComputeVariableImportance (arma::vec &importances) const
 Compute the variable importance of each dimension in the learned tree. More...
 
size_t End () const
 Return the first index of a point not contained in this node. More...
 
TagType FindBucket (const VecType &query) const
 Return the tag of the leaf containing the query. More...
 
double Grow (MatType &data, arma::Col< size_t > &oldFromNew, const bool useVolReg=false, const size_t maxLeafSize=10, const size_t minLeafSize=5)
 Greedily expand the tree. More...
 
DTreeLeft () const
 Return the left child. More...
 
double LogNegativeError (const size_t totalPoints) const
 Compute the log-negative-error for this point, given the total number of points in the dataset. More...
 
double LogNegError () const
 Return the log negative error of this node. More...
 
double LogVolume () const
 Return the inverse of the volume of this node. More...
 
const StatTypeMaxVals () const
 Return the maximum values. More...
 
const StatTypeMinVals () const
 Return the minimum values. More...
 
DTreeoperator= (const DTree &obj)
 Copy the given tree. More...
 
DTreeoperator= (DTree &&obj)
 Take ownership of the given tree (move operator). More...
 
double PruneAndUpdate (const double oldAlpha, const size_t points, const bool useVolReg=false)
 Perform alpha pruning on a tree. More...
 
double Ratio () const
 Return the ratio of points in this node to the points in the whole dataset. More...
 
DTreeRight () const
 Return the right child. More...
 
bool Root () const
 Return whether or not this is the root of the tree. More...
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int)
 Serialize the density estimation tree. More...
 
size_t SplitDim () const
 Return the split dimension of this node. More...
 
ElemType SplitValue () const
 Return the split value of this node. More...
 
size_t Start () const
 Return the starting index of points contained in this node. More...
 
size_t SubtreeLeaves () const
 Return the number of leaves which are descendants of this node. More...
 
double SubtreeLeavesLogNegError () const
 Return the log negative error of all descendants of this node. More...
 
TagType TagTree (const TagType &tag=0)
 Index the buckets for possible usage later; this results in every leaf in the tree having a specific tag (accessible with BucketTag()). More...
 
bool WithinRange (const VecType &query) const
 Return whether a query point is within the range of this node. More...
 

Private Member Functions

bool FindSplit (const MatType &data, size_t &splitDim, ElemType &splitValue, double &leftError, double &rightError, const size_t minLeafSize=5) const
 Find the dimension to split on. More...
 
size_t SplitData (MatType &data, const size_t splitDim, const ElemType splitValue, arma::Col< size_t > &oldFromNew) const
 Split the data, returning the number of points left of the split. More...
 

Private Attributes

double alphaUpper
 Upper part of alpha sum; used for pruning. More...
 
TagType bucketTag
 The tag for the leaf, used for hashing points. More...
 
size_t end
 The index of the last point in the dataset contained in this node (and its children). More...
 
DTreeleft
 The left child. More...
 
double logNegError
 log-negative-L2-error of the node. More...
 
double logVolume
 The logarithm of the volume of the node. More...
 
StatType maxVals
 Upper half of bounding box for this node. More...
 
StatType minVals
 Lower half of bounding box for this node. More...
 
double ratio
 Ratio of the number of points in the node to the total number of points. More...
 
DTreeright
 The right child. More...
 
bool root
 If true, this node is the root of the tree. More...
 
size_t splitDim
 The splitting dimension for this node. More...
 
ElemType splitValue
 The split value on the splitting dimension for this node. More...
 
size_t start
 The index of the first point in the dataset contained in this node (and its children). More...
 
size_t subtreeLeaves
 Number of leaves of the subtree. More...
 
double subtreeLeavesLogNegError
 Sum of the error of the leaves of the subtree. More...
 

Detailed Description

template<typename MatType, typename TagType = int>
class mlpack::det::DTree< MatType, TagType >

A density estimation tree is similar to both a decision tree and a space partitioning tree (like a kd-tree).

Each leaf represents a constant-density hyper-rectangle. The tree is constructed in such a way as to minimize the integrated square error between the probability distribution of the tree and the observed probability distribution of the data. Because the tree is similar to a decision tree, the density estimation tree can provide very fast density estimates for a given point.

For more information, see the following paper:

@incollection{ram2011,
author = {Ram, Parikshit and Gray, Alexander G.},
title = {Density estimation trees},
booktitle = {{Proceedings of the 17th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining}},
series = {KDD '11},
year = {2011},
pages = {627--635}
}

Definition at line 46 of file dtree.hpp.

Member Typedef Documentation

template<typename MatType, typename TagType = int>
typedef MatType::elem_type mlpack::det::DTree< MatType, TagType >::ElemType

The actual, underlying type we're working with.

Definition at line 52 of file dtree.hpp.

template<typename MatType, typename TagType = int>
typedef arma::Col<ElemType> mlpack::det::DTree< MatType, TagType >::StatType

Definition at line 54 of file dtree.hpp.

template<typename MatType, typename TagType = int>
typedef MatType::vec_type mlpack::det::DTree< MatType, TagType >::VecType

Definition at line 53 of file dtree.hpp.

Constructor & Destructor Documentation

template<typename MatType, typename TagType = int>
mlpack::det::DTree< MatType, TagType >::DTree ( )

Create an empty density estimation tree.

template<typename MatType, typename TagType = int>
mlpack::det::DTree< MatType, TagType >::DTree ( const DTree< MatType, TagType > &  obj)

Create a tree that is the copy of the given tree.

Parameters
objTree to copy.
template<typename MatType, typename TagType = int>
mlpack::det::DTree< MatType, TagType >::DTree ( DTree< MatType, TagType > &&  obj)

Create a tree by taking ownership of another tree (move constructor).

Parameters
objTree to take ownership of.
template<typename MatType, typename TagType = int>
mlpack::det::DTree< MatType, TagType >::DTree ( const StatType maxVals,
const StatType minVals,
const size_t  totalPoints 
)

Create a density estimation tree with the given bounds and the given number of total points.

Children will not be created.

Parameters
maxValsMaximum values of the bounding box.
minValsMinimum values of the bounding box.
totalPointsTotal number of points in the dataset.
template<typename MatType, typename TagType = int>
mlpack::det::DTree< MatType, TagType >::DTree ( MatType &  data)

Create a density estimation tree on the given data.

Children will be created following the procedure outlined in the paper. The data will be modified; it will be reordered similar to the way BinarySpaceTree modifies datasets.

Parameters
dataDataset to build tree on.
template<typename MatType, typename TagType = int>
mlpack::det::DTree< MatType, TagType >::DTree ( const StatType maxVals,
const StatType minVals,
const size_t  start,
const size_t  end,
const double  logNegError 
)

Create a child node of a density estimation tree given the bounding box specified by maxVals and minVals, using the size given in start and end and the specified error.

Children of this node will not be created recursively.

Parameters
maxValsUpper bound of bounding box.
minValsLower bound of bounding box.
startStart of points represented by this node in the data matrix.
endEnd of points represented by this node in the data matrix.
errorlog-negative error of this node.
template<typename MatType, typename TagType = int>
mlpack::det::DTree< MatType, TagType >::DTree ( const StatType maxVals,
const StatType minVals,
const size_t  totalPoints,
const size_t  start,
const size_t  end 
)

Create a child node of a density estimation tree given the bounding box specified by maxVals and minVals, using the size given in start and end, and calculating the error with the total number of points given.

Children of this node will not be created recursively.

Parameters
maxValsUpper bound of bounding box.
minValsLower bound of bounding box.
startStart of points represented by this node in the data matrix.
endEnd of points represented by this node in the data matrix.
template<typename MatType, typename TagType = int>
mlpack::det::DTree< MatType, TagType >::~DTree ( )

Clean up memory allocated by the tree.

Member Function Documentation

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::AlphaUpper ( ) const
inline

Return the upper part of the alpha sum.

Definition at line 302 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::alphaUpper.

template<typename MatType, typename TagType = int>
TagType mlpack::det::DTree< MatType, TagType >::BucketTag ( ) const
inline

Return the current bucket's ID, if leaf, or -1 otherwise.

Definition at line 304 of file dtree.hpp.

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::ComputeValue ( const VecType query) const

Compute the logarithm of the density estimate of a given query point.

Parameters
queryPoint to estimate density of.
template<typename MatType, typename TagType = int>
void mlpack::det::DTree< MatType, TagType >::ComputeVariableImportance ( arma::vec &  importances) const

Compute the variable importance of each dimension in the learned tree.

Parameters
importancesVector to store the calculated importances in.
template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::End ( ) const
inline

Return the first index of a point not contained in this node.

Definition at line 279 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::end.

template<typename MatType, typename TagType = int>
TagType mlpack::det::DTree< MatType, TagType >::FindBucket ( const VecType query) const

Return the tag of the leaf containing the query.

This is useful for generating class memberships.

Parameters
queryQuery to search for.
template<typename MatType, typename TagType = int>
bool mlpack::det::DTree< MatType, TagType >::FindSplit ( const MatType &  data,
size_t &  splitDim,
ElemType splitValue,
double &  leftError,
double &  rightError,
const size_t  minLeafSize = 5 
) const
private

Find the dimension to split on.

Referenced by mlpack::det::DTree< MatType, TagType >::MinVals().

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::Grow ( MatType &  data,
arma::Col< size_t > &  oldFromNew,
const bool  useVolReg = false,
const size_t  maxLeafSize = 10,
const size_t  minLeafSize = 5 
)

Greedily expand the tree.

The points in the dataset will be reordered during tree growth.

Parameters
dataDataset to build tree on.
oldFromNewMappings from old points to new points.
useVolRegIf true, volume regularization is used.
maxLeafSizeMaximum size of a leaf.
minLeafSizeMinimum size of a leaf.
template<typename MatType, typename TagType = int>
DTree* mlpack::det::DTree< MatType, TagType >::Left ( ) const
inline

Return the left child.

Definition at line 296 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::left.

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::LogNegativeError ( const size_t  totalPoints) const

Compute the log-negative-error for this point, given the total number of points in the dataset.

Parameters
totalPointsTotal number of points in the dataset.
template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::LogNegError ( ) const
inline

Return the log negative error of this node.

Definition at line 285 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::logNegError.

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::LogVolume ( ) const
inline

Return the inverse of the volume of this node.

Definition at line 294 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::logVolume.

template<typename MatType, typename TagType = int>
const StatType& mlpack::det::DTree< MatType, TagType >::MaxVals ( ) const
inline

Return the maximum values.

Definition at line 307 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::maxVals.

template<typename MatType, typename TagType = int>
const StatType& mlpack::det::DTree< MatType, TagType >::MinVals ( ) const
inline
template<typename MatType, typename TagType = int>
DTree& mlpack::det::DTree< MatType, TagType >::operator= ( const DTree< MatType, TagType > &  obj)

Copy the given tree.

Parameters
objTree to copy.
template<typename MatType, typename TagType = int>
DTree& mlpack::det::DTree< MatType, TagType >::operator= ( DTree< MatType, TagType > &&  obj)

Take ownership of the given tree (move operator).

Parameters
objTree to take ownership of.
template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::PruneAndUpdate ( const double  oldAlpha,
const size_t  points,
const bool  useVolReg = false 
)

Perform alpha pruning on a tree.

Returns the new value of alpha.

Parameters
oldAlphaOld value of alpha.
pointsTotal number of points in dataset.
useVolRegIf true, volume regularization is used.
Returns
New value of alpha.
template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::Ratio ( ) const
inline

Return the ratio of points in this node to the points in the whole dataset.

Definition at line 292 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::ratio.

template<typename MatType, typename TagType = int>
DTree* mlpack::det::DTree< MatType, TagType >::Right ( ) const
inline

Return the right child.

Definition at line 298 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::right.

template<typename MatType, typename TagType = int>
bool mlpack::det::DTree< MatType, TagType >::Root ( ) const
inline

Return whether or not this is the root of the tree.

Definition at line 300 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::root.

template<typename MatType, typename TagType = int>
template<typename Archive >
void mlpack::det::DTree< MatType, TagType >::Serialize ( Archive &  ar,
const unsigned  int 
)

Serialize the density estimation tree.

Referenced by mlpack::det::DTree< MatType, TagType >::MinVals().

template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::SplitData ( MatType &  data,
const size_t  splitDim,
const ElemType  splitValue,
arma::Col< size_t > &  oldFromNew 
) const
private

Split the data, returning the number of points left of the split.

Referenced by mlpack::det::DTree< MatType, TagType >::MinVals().

template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::SplitDim ( ) const
inline

Return the split dimension of this node.

Definition at line 281 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::splitDim.

template<typename MatType, typename TagType = int>
ElemType mlpack::det::DTree< MatType, TagType >::SplitValue ( ) const
inline

Return the split value of this node.

Definition at line 283 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::splitValue.

template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::Start ( ) const
inline

Return the starting index of points contained in this node.

Definition at line 277 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::start.

template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::SubtreeLeaves ( ) const
inline

Return the number of leaves which are descendants of this node.

Definition at line 289 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::subtreeLeaves.

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::SubtreeLeavesLogNegError ( ) const
inline

Return the log negative error of all descendants of this node.

Definition at line 287 of file dtree.hpp.

References mlpack::det::DTree< MatType, TagType >::subtreeLeavesLogNegError.

template<typename MatType, typename TagType = int>
TagType mlpack::det::DTree< MatType, TagType >::TagTree ( const TagType &  tag = 0)

Index the buckets for possible usage later; this results in every leaf in the tree having a specific tag (accessible with BucketTag()).

This function calls itself recursively.

Parameters
tagTag for the next leaf; leave at 0 for the initial call.
template<typename MatType, typename TagType = int>
bool mlpack::det::DTree< MatType, TagType >::WithinRange ( const VecType query) const

Return whether a query point is within the range of this node.

Member Data Documentation

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::alphaUpper
private

Upper part of alpha sum; used for pruning.

Definition at line 268 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::AlphaUpper().

template<typename MatType, typename TagType = int>
TagType mlpack::det::DTree< MatType, TagType >::bucketTag
private

The tag for the leaf, used for hashing points.

Definition at line 265 of file dtree.hpp.

template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::end
private

The index of the last point in the dataset contained in this node (and its children).

Definition at line 233 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::End().

template<typename MatType, typename TagType = int>
DTree* mlpack::det::DTree< MatType, TagType >::left
private

The left child.

Definition at line 271 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::Left().

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::logNegError
private

log-negative-L2-error of the node.

Definition at line 247 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::LogNegError().

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::logVolume
private

The logarithm of the volume of the node.

Definition at line 262 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::LogVolume().

template<typename MatType, typename TagType = int>
StatType mlpack::det::DTree< MatType, TagType >::maxVals
private

Upper half of bounding box for this node.

Definition at line 236 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::MaxVals().

template<typename MatType, typename TagType = int>
StatType mlpack::det::DTree< MatType, TagType >::minVals
private

Lower half of bounding box for this node.

Definition at line 238 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::MinVals().

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::ratio
private

Ratio of the number of points in the node to the total number of points.

Definition at line 259 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::Ratio().

template<typename MatType, typename TagType = int>
DTree* mlpack::det::DTree< MatType, TagType >::right
private

The right child.

Definition at line 273 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::Right().

template<typename MatType, typename TagType = int>
bool mlpack::det::DTree< MatType, TagType >::root
private

If true, this node is the root of the tree.

Definition at line 256 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::Root().

template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::splitDim
private

The splitting dimension for this node.

Definition at line 241 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::SplitDim().

template<typename MatType, typename TagType = int>
ElemType mlpack::det::DTree< MatType, TagType >::splitValue
private

The split value on the splitting dimension for this node.

Definition at line 244 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::SplitValue().

template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::start
private

The index of the first point in the dataset contained in this node (and its children).

Definition at line 230 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::Start().

template<typename MatType, typename TagType = int>
size_t mlpack::det::DTree< MatType, TagType >::subtreeLeaves
private

Number of leaves of the subtree.

Definition at line 253 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::SubtreeLeaves().

template<typename MatType, typename TagType = int>
double mlpack::det::DTree< MatType, TagType >::subtreeLeavesLogNegError
private

Sum of the error of the leaves of the subtree.

Definition at line 250 of file dtree.hpp.

Referenced by mlpack::det::DTree< MatType, TagType >::SubtreeLeavesLogNegError().


The documentation for this class was generated from the following file: