|
| DBSCAN (const double epsilon, const size_t minPoints, RangeSearchType rangeSearch=RangeSearchType(), PointSelectionPolicy pointSelector=PointSelectionPolicy()) |
| Construct the DBSCAN object with the given parameters. More...
|
|
template<typename MatType > |
size_t | Cluster (const MatType &data, arma::mat ¢roids) |
| Performs DBSCAN clustering on the data, returning number of clusters and also the centroid of each cluster. More...
|
|
template<typename MatType > |
size_t | Cluster (const MatType &data, arma::Row< size_t > &assignments) |
| Performs DBSCAN clustering on the data, returning number of clusters and also the list of cluster assignments. More...
|
|
template<typename MatType > |
size_t | Cluster (const MatType &data, arma::Row< size_t > &assignments, arma::mat ¢roids) |
| Performs DBSCAN clustering on the data, returning number of clusters, the centroid of each cluster and also the list of cluster assignments. More...
|
|
template<typename RangeSearchType = range::RangeSearch<>, typename PointSelectionPolicy = RandomPointSelection>
class mlpack::dbscan::DBSCAN< RangeSearchType, PointSelectionPolicy >
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering technique described in the following paper:
@inproceedings{ester1996density,
title={A density-based algorithm for discovering clusters in large spatial
databases with noise.},
author={Ester, M. and Kriegel, H.-P. and Sander, J. and Xu, X.},
booktitle={Proceedings of the Second International Conference on Knowledge
Discovery and Data Mining (KDD '96)},
pages={226--231},
year={1996}
}
The DBSCAN algorithm iteratively clusters points using range searches with a specified radius parameter. This implementation allows configuration of the range search technique used and the point selection strategy by means of template parameters.
- Template Parameters
-
RangeSearchType | Class to use for range searching. |
PointSelectionPolicy | Strategy for selecting next point to cluster with. |
Definition at line 46 of file dbscan.hpp.
template<typename RangeSearchType = range::RangeSearch<>, typename PointSelectionPolicy = RandomPointSelection>
template<typename MatType >
size_t mlpack::dbscan::DBSCAN< RangeSearchType, PointSelectionPolicy >::Cluster |
( |
const MatType & |
data, |
|
|
arma::Row< size_t > & |
assignments, |
|
|
arma::mat & |
centroids |
|
) |
| |
Performs DBSCAN clustering on the data, returning number of clusters, the centroid of each cluster and also the list of cluster assignments.
If assignments[i] == assignments.n_elem - 1, then the point is considered "noise".
- Template Parameters
-
MatType | Type of matrix (arma::mat or arma::sp_mat). |
- Parameters
-
data | Dataset to cluster. |
assignments | Vector to store cluster assignments. |
centroids | Matrix in which centroids are stored. |
template<typename RangeSearchType = range::RangeSearch<>, typename PointSelectionPolicy = RandomPointSelection>
template<typename MatType >
size_t mlpack::dbscan::DBSCAN< RangeSearchType, PointSelectionPolicy >::ProcessPoint |
( |
const MatType & |
data, |
|
|
boost::dynamic_bitset<> & |
unvisited, |
|
|
const size_t |
index, |
|
|
arma::Row< size_t > & |
assignments, |
|
|
const size_t |
currentCluster, |
|
|
const std::vector< std::vector< size_t >> & |
neighbors, |
|
|
const std::vector< std::vector< double >> & |
distances, |
|
|
const bool |
topLevel = true |
|
) |
| |
|
private |
This function processes the point at index.
It marks the point as visited, checks if the given point is core or non-core. If it is a core point, it expands the cluster, otherwise it returns.
- Template Parameters
-
MatType | Type of matrix (arma::mat or arma::sp_mat). |
- Parameters
-
data | Dataset to cluster. |
unvisited | Remembers if a point has been visited. |
index | Index of point to be visited now. |
assignments | Vector to store cluster assignments. |
currentCluster | Index of cluster which will be assigned to points in current cluster. |
neighbors | Matrix containing list of neighbors for each point which fall in its epsilon-neighborhood. |
distances | Matrix containing list of distances for each point which fall in its epsilon-neighborhood. |
topLevel | If true, then current point is the first point in the current cluster, helps in detecting noise. |