mlpack
master
|
The RASearch class: This class provides a generic manner to perform rank-approximate search via random-sampling. More...
Public Types | |
typedef TreeType< MetricType, RAQueryStat< SortPolicy >, MatType > | Tree |
Convenience typedef. More... | |
Public Member Functions | |
RASearch (const MatType &referenceSet, const bool naive=false, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType()) | |
Initialize the RASearch object, passing both a reference dataset (this is the dataset that will be searched). More... | |
RASearch (MatType &&referenceSet, const bool naive=false, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType()) | |
Initialize the RASearch object, passing both a reference dataset (this is the dataset that will be searched). More... | |
RASearch (Tree *referenceTree, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType()) | |
Initialize the RASearch object with the given pre-constructed reference tree. More... | |
RASearch (const bool naive=false, const bool singleMode=false, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20, const MetricType metric=MetricType()) | |
Create an RASearch object with no reference data. More... | |
~RASearch () | |
Delete the RASearch object. More... | |
double | Alpha () const |
Get the desired success probability. More... | |
double & | Alpha () |
Modify the desired success probability. More... | |
bool | FirstLeafExact () const |
Get whether or not we traverse to the first leaf without approximation. More... | |
bool & | FirstLeafExact () |
Modify whether or not we traverse to the first leaf without approximation. More... | |
bool | Naive () const |
Get whether or not naive (brute-force) search is used. More... | |
bool & | Naive () |
Modify whether or not naive (brute-force) search is used. More... | |
const MatType & | ReferenceSet () const |
Access the reference set. More... | |
void | ResetQueryTree (Tree *queryTree) const |
This function recursively resets the RAQueryStat of the given query tree to set 'bound' to SortPolicy::WorstDistance and 'numSamplesMade' to 0. More... | |
bool | SampleAtLeaves () const |
Get whether or not sampling is done at the leaves. More... | |
bool & | SampleAtLeaves () |
Modify whether or not sampling is done at the leaves. More... | |
void | Search (const MatType &querySet, const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances) |
Compute the rank approximate nearest neighbors of each query point in the query set and store the output in the given matrices. More... | |
void | Search (Tree *queryTree, const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances) |
Compute the rank approximate nearest neighbors of each point in the pre-built query tree and store the output in the given matrices. More... | |
void | Search (const size_t k, arma::Mat< size_t > &neighbors, arma::mat &distances) |
Compute the rank approximate nearest neighbors of each point in the reference set (that is, the query set is taken to be the reference set), and store the output in the given matrices. More... | |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int) |
Serialize the object. More... | |
bool | SingleMode () const |
Get whether or not single-tree search is used. More... | |
bool & | SingleMode () |
Modify whether or not single-tree search is used. More... | |
size_t | SingleSampleLimit () const |
Get the limit on the size of a node that can be approximated. More... | |
size_t & | SingleSampleLimit () |
Modify the limit on the size of a node that can be approximation. More... | |
double | Tau () const |
Get the rank-approximation in percentile of the data. More... | |
double & | Tau () |
Modify the rank-approximation in percentile of the data. More... | |
void | Train (const MatType &referenceSet) |
"Train" the model on the given reference set. More... | |
void | Train (MatType &&referenceSet) |
"Train" the model on the given reference set, taking ownership of the data matrix. More... | |
Private Attributes | |
double | alpha |
The desired success probability (between 0 and 1). More... | |
bool | firstLeafExact |
If true, we will traverse to the first leaf without approximation. More... | |
MetricType | metric |
Instantiation of kernel. More... | |
bool | naive |
Indicates if naive random sampling on the set is being used. More... | |
std::vector< size_t > | oldFromNewReferences |
Permutations of reference points during tree building. More... | |
const MatType * | referenceSet |
Reference dataset. In some situations we may own this dataset. More... | |
Tree * | referenceTree |
Pointer to the root of the reference tree. More... | |
bool | sampleAtLeaves |
Whether or not sampling is done at the leaves. Faster, but less accurate. More... | |
bool | setOwner |
If true, we are responsible for deleting the dataset. More... | |
bool | singleMode |
Indicates if single-tree search is being used (opposed to dual-tree). More... | |
size_t | singleSampleLimit |
The limit on the number of points in the largest node that can be approximated by sampling. More... | |
double | tau |
The rank-approximation in percentile of the data (between 0 and 100). More... | |
bool | treeOwner |
If true, this object created the trees and is responsible for them. More... | |
The RASearch class: This class provides a generic manner to perform rank-approximate search via random-sampling.
If the 'naive' option is chosen, this rank-approximate search will be done by randomly sampling from the whole set. If the 'naive' option is not chosen, the sampling is done in a stratified manner in the tree as mentioned in the algorithms in Figure 2 of the following paper:
{ram2009rank, title={{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions}}, author={{Ram, P. and Lee, D. and Ouyang, H. and Gray, A. G.}}, booktitle={{Advances of Neural Information Processing Systems}}, year={2009} }
RASearch is currently known to not work with ball trees (#356).
SortPolicy | The sort policy for distances; see NearestNeighborSort. |
MetricType | The metric to use for computation. |
TreeType | The tree type to use. |
Definition at line 71 of file ra_search.hpp.
typedef TreeType<MetricType, RAQueryStat<SortPolicy>, MatType> mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Tree |
Convenience typedef.
Definition at line 75 of file ra_search.hpp.
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::RASearch | ( | const MatType & | referenceSet, |
const bool | naive = false , |
||
const bool | singleMode = false , |
||
const double | tau = 5 , |
||
const double | alpha = 0.95 , |
||
const bool | sampleAtLeaves = false , |
||
const bool | firstLeafExact = false , |
||
const size_t | singleSampleLimit = 20 , |
||
const MetricType | metric = MetricType() |
||
) |
Initialize the RASearch object, passing both a reference dataset (this is the dataset that will be searched).
Optionally, perform the computation in naive mode or single-tree mode. An initialized distance metric can be given, for cases where the metric has internal data (i.e. the distance::MahalanobisDistance class).
This method will copy the matrices to internal copies, which are rearranged during tree-building. You can avoid this extra copy by pre-constructing the trees and using the appropriate constructor, or by using the constructor that takes an rvalue reference to the data with std::move().
tau, the rank-approximation parameter, specifies that we are looking for k neighbors with probability alpha of being in the top tau percent of nearest neighbors. So, as an example, if our dataset has 1000 points, and we want 5 nearest neighbors with 95% probability of being in the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set k = 5, tau = 5, and alpha = 0.95.
The method will fail (and throw a std::invalid_argument exception) if the value of tau is too low: tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point – this is invalid.
referenceSet | Set of reference points. |
naive | If true, the rank-approximate search will be performed by directly sampling the whole set instead of using the stratified sampling on the tree. |
singleMode | If true, single-tree search will be used (as opposed to dual-tree search). This is useful when Search() will be called with few query points. |
metric | An optional instance of the MetricType class. |
tau | The rank-approximation in percentile of the data. The default value is 5%. |
alpha | The desired success probability. The default value is 0.95. |
sampleAtLeaves | Sample at leaves for faster but less accurate computation. This defaults to 'false'. |
firstLeafExact | Traverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now. |
singleSampleLimit | The limit on the largest node that can be approximated by sampling. This defaults to 20. |
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::RASearch | ( | MatType && | referenceSet, |
const bool | naive = false , |
||
const bool | singleMode = false , |
||
const double | tau = 5 , |
||
const double | alpha = 0.95 , |
||
const bool | sampleAtLeaves = false , |
||
const bool | firstLeafExact = false , |
||
const size_t | singleSampleLimit = 20 , |
||
const MetricType | metric = MetricType() |
||
) |
Initialize the RASearch object, passing both a reference dataset (this is the dataset that will be searched).
Optionally, perform the computation in naive mode or single-tree mode. An initialized distance metric can be given, for cases where the metric has internal data (i.e. the distance::MahalanobisDistance class).
This method will take ownership of the given reference set, avoiding a copy. If you need to use the reference set for other purposes, too, consider using the constructor that takes a const reference.
tau, the rank-approximation parameter, specifies that we are looking for k neighbors with probability alpha of being in the top tau percent of nearest neighbors. So, as an example, if our dataset has 1000 points, and we want 5 nearest neighbors with 95% probability of being in the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set k = 5, tau = 5, and alpha = 0.95.
The method will fail (and throw a std::invalid_argument exception) if the value of tau is too low: tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point – this is invalid.
referenceSet | Set of reference points. |
naive | If true, the rank-approximate search will be performed by directly sampling the whole set instead of using the stratified sampling on the tree. |
singleMode | If true, single-tree search will be used (as opposed to dual-tree search). This is useful when Search() will be called with few query points. |
metric | An optional instance of the MetricType class. |
tau | The rank-approximation in percentile of the data. The default value is 5%. |
alpha | The desired success probability. The default value is 0.95. |
sampleAtLeaves | Sample at leaves for faster but less accurate computation. This defaults to 'false'. |
firstLeafExact | Traverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now. |
singleSampleLimit | The limit on the largest node that can be approximated by sampling. This defaults to 20. |
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::RASearch | ( | Tree * | referenceTree, |
const bool | singleMode = false , |
||
const double | tau = 5 , |
||
const double | alpha = 0.95 , |
||
const bool | sampleAtLeaves = false , |
||
const bool | firstLeafExact = false , |
||
const size_t | singleSampleLimit = 20 , |
||
const MetricType | metric = MetricType() |
||
) |
Initialize the RASearch object with the given pre-constructed reference tree.
It is assumed that the points in the tree's dataset correspond to the reference set. Optionally, choose to use single-tree mode. Naive mode is not available as an option for this constructor; instead, to run naive computation, use a different constructor. Additionally, an instantiated distance metric can be given, for cases where the distance metric holds data.
There is no copying of the data matrices in this constructor (because tree-building is not necessary), so this is the constructor to use when copies absolutely must be avoided.
tau, the rank-approximation parameter, specifies that we are looking for k neighbors with probability alpha of being in the top tau percent of nearest neighbors. So, as an example, if our dataset has 1000 points, and we want 5 nearest neighbors with 95% probability of being in the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set k = 5, tau = 5, and alpha = 0.95.
The method will fail (and throw a std::invalid_argument exception) if the value of tau is too low: tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point – this is invalid.
referenceTree | Pre-built tree for reference points. |
singleMode | Whether single-tree computation should be used (as opposed to dual-tree computation). |
tau | The rank-approximation in percentile of the data. The default value is 5%. |
alpha | The desired success probability. The default value is 0.95. |
sampleAtLeaves | Sample at leaves for faster but less accurate computation. This defaults to 'false'. |
firstLeafExact | Traverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now. |
singleSampleLimit | The limit on the largest node that can be approximated by sampling. This defaults to 20. |
metric | Instantiated distance metric. |
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::RASearch | ( | const bool | naive = false , |
const bool | singleMode = false , |
||
const double | tau = 5 , |
||
const double | alpha = 0.95 , |
||
const bool | sampleAtLeaves = false , |
||
const bool | firstLeafExact = false , |
||
const size_t | singleSampleLimit = 20 , |
||
const MetricType | metric = MetricType() |
||
) |
Create an RASearch object with no reference data.
If Search() is called before a reference set is set with Train(), an exception will be thrown.
naive | Whether naive (brute-force) search should be used. |
singleMode | Whether single-tree computation should be used (as opposed to dual-tree computation). |
tau | The rank-approximation in percentile of the data. The default value is 5%. |
alpha | The desired success probability. The default value is 0.95. |
sampleAtLeaves | Sample at leaves for faster but less accurate computation. This defaults to 'false'. |
firstLeafExact | Traverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now. |
singleSampleLimit | The limit on the largest node that can be approximated by sampling. This defaults to 20. |
metric | Instantiated distance metric. |
mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::~RASearch | ( | ) |
Delete the RASearch object.
The tree is the only member we are responsible for deleting. The others will take care of themselves.
|
inline |
Get the desired success probability.
Definition at line 398 of file ra_search.hpp.
|
inline |
Modify the desired success probability.
Definition at line 400 of file ra_search.hpp.
|
inline |
Get whether or not we traverse to the first leaf without approximation.
Definition at line 408 of file ra_search.hpp.
|
inline |
Modify whether or not we traverse to the first leaf without approximation.
Definition at line 410 of file ra_search.hpp.
|
inline |
Get whether or not naive (brute-force) search is used.
Definition at line 383 of file ra_search.hpp.
|
inline |
Modify whether or not naive (brute-force) search is used.
Definition at line 385 of file ra_search.hpp.
|
inline |
Access the reference set.
Definition at line 380 of file ra_search.hpp.
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::ResetQueryTree | ( | Tree * | queryTree | ) | const |
This function recursively resets the RAQueryStat of the given query tree to set 'bound' to SortPolicy::WorstDistance and 'numSamplesMade' to 0.
This allows a user to perform multiple searches with the same query tree, possibly with different levels of approximation without requiring to build a new pair of trees for every new (approximate) search.
If Search() is called multiple times with the same query tree without calling ResetQueryTree(), the results may not satisfy the theoretical guarantees provided by the rank-approximate neighbor search algorithm.
queryTree | Tree whose statistics should be reset. |
|
inline |
Get whether or not sampling is done at the leaves.
Definition at line 403 of file ra_search.hpp.
|
inline |
Modify whether or not sampling is done at the leaves.
Definition at line 405 of file ra_search.hpp.
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Search | ( | const MatType & | querySet, |
const size_t | k, | ||
arma::Mat< size_t > & | neighbors, | ||
arma::mat & | distances | ||
) |
Compute the rank approximate nearest neighbors of each query point in the query set and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
If querySet is small or only contains one point, it can be faster to do single-tree search; single-tree search can be set with the SingleMode() function or in the constructor.
querySet | Set of query points (can be a single point). |
k | Number of neighbors to search for. |
neighbors | Matrix storing lists of neighbors for each query point. |
distances | Matrix storing distances of neighbors for each query point. |
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Search | ( | Tree * | queryTree, |
const size_t | k, | ||
arma::Mat< size_t > & | neighbors, | ||
arma::mat & | distances | ||
) |
Compute the rank approximate nearest neighbors of each point in the pre-built query tree and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
If singleMode or naive is enabled, then this method will throw a std::invalid_argument exception; calling this function implies a dual-tree algorithm.
queryTree | Tree built on query points. |
k | Number of neighbors to search for. |
neighbors | Matrix storing lists of neighbors for each query point. |
distances | Matrix storing distances of neighbors for each query point. |
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Search | ( | const size_t | k, |
arma::Mat< size_t > & | neighbors, | ||
arma::mat & | distances | ||
) |
Compute the rank approximate nearest neighbors of each point in the reference set (that is, the query set is taken to be the reference set), and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
k | Number of neighbors to search for. |
neighbors | Matrix storing lists of neighbors for each point. |
distances | Matrix storing distances of neighbors for each query point. |
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Serialize | ( | Archive & | ar, |
const unsigned | int | ||
) |
Serialize the object.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::SingleSampleLimit().
|
inline |
Get whether or not single-tree search is used.
Definition at line 388 of file ra_search.hpp.
|
inline |
Modify whether or not single-tree search is used.
Definition at line 390 of file ra_search.hpp.
|
inline |
Get the limit on the size of a node that can be approximated.
Definition at line 413 of file ra_search.hpp.
|
inline |
Modify the limit on the size of a node that can be approximation.
Definition at line 415 of file ra_search.hpp.
|
inline |
Get the rank-approximation in percentile of the data.
Definition at line 393 of file ra_search.hpp.
|
inline |
Modify the rank-approximation in percentile of the data.
Definition at line 395 of file ra_search.hpp.
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Train | ( | const MatType & | referenceSet | ) |
"Train" the model on the given reference set.
If tree-based search is being used (if Naive() is false), this means rebuilding the reference tree. This particular method will make a copy of the given reference data. To avoid that copy, use the Train() method that takes an rvalue reference with std::move().
referenceSet | New reference set to use. |
void mlpack::neighbor::RASearch< SortPolicy, MetricType, MatType, TreeType >::Train | ( | MatType && | referenceSet | ) |
"Train" the model on the given reference set, taking ownership of the data matrix.
If tree-based search is being used (if Naive() is false), this also means rebuilding the reference tree. If you need to keep a copy of the reference data, use the Train() method that takes a const reference to the data.
referenceSet | New reference set to use. |
|
private |
The desired success probability (between 0 and 1).
Definition at line 442 of file ra_search.hpp.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::Alpha().
|
private |
If true, we will traverse to the first leaf without approximation.
Definition at line 446 of file ra_search.hpp.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::FirstLeafExact().
|
private |
Instantiation of kernel.
Definition at line 452 of file ra_search.hpp.
|
private |
Indicates if naive random sampling on the set is being used.
Definition at line 435 of file ra_search.hpp.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::Naive().
|
private |
Permutations of reference points during tree building.
Definition at line 423 of file ra_search.hpp.
|
private |
Reference dataset. In some situations we may own this dataset.
Definition at line 427 of file ra_search.hpp.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::ReferenceSet().
|
private |
Pointer to the root of the reference tree.
Definition at line 425 of file ra_search.hpp.
|
private |
Whether or not sampling is done at the leaves. Faster, but less accurate.
Definition at line 444 of file ra_search.hpp.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::SampleAtLeaves().
|
private |
If true, we are responsible for deleting the dataset.
Definition at line 432 of file ra_search.hpp.
|
private |
Indicates if single-tree search is being used (opposed to dual-tree).
Definition at line 437 of file ra_search.hpp.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::SingleMode().
|
private |
The limit on the number of points in the largest node that can be approximated by sampling.
Definition at line 449 of file ra_search.hpp.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::SingleSampleLimit().
|
private |
The rank-approximation in percentile of the data (between 0 and 100).
Definition at line 440 of file ra_search.hpp.
Referenced by mlpack::neighbor::RASearch< tree::RStarTree >::Tau().
|
private |
If true, this object created the trees and is responsible for them.
Definition at line 430 of file ra_search.hpp.