mlpack  master
Public Member Functions | Private Attributes | List of all members
mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType > Class Template Reference

This class implements K-Means clustering, using a variety of possible implementations of Lloyd's algorithm. More...

Public Member Functions

 KMeans (const size_t maxIterations=1000, const MetricType metric=MetricType(), const InitialPartitionPolicy partitioner=InitialPartitionPolicy(), const EmptyClusterPolicy emptyClusterAction=EmptyClusterPolicy())
 Create a K-Means object and (optionally) set the parameters which K-Means will be run with. More...
 
void Cluster (const MatType &data, const size_t clusters, arma::Row< size_t > &assignments, const bool initialGuess=false)
 Perform k-means clustering on the data, returning a list of cluster assignments. More...
 
void Cluster (const MatType &data, const size_t clusters, arma::mat &centroids, const bool initialGuess=false)
 Perform k-means clustering on the data, returning the centroids of each cluster in the centroids matrix. More...
 
void Cluster (const MatType &data, const size_t clusters, arma::Row< size_t > &assignments, arma::mat &centroids, const bool initialAssignmentGuess=false, const bool initialCentroidGuess=false)
 Perform k-means clustering on the data, returning a list of cluster assignments and also the centroids of each cluster. More...
 
const EmptyClusterPolicy & EmptyClusterAction () const
 Get the empty cluster policy. More...
 
EmptyClusterPolicy & EmptyClusterAction ()
 Modify the empty cluster policy. More...
 
size_t MaxIterations () const
 Get the maximum number of iterations. More...
 
size_t & MaxIterations ()
 Set the maximum number of iterations. More...
 
const MetricType & Metric () const
 Get the distance metric. More...
 
MetricType & Metric ()
 Modify the distance metric. More...
 
const InitialPartitionPolicy & Partitioner () const
 Get the initial partitioning policy. More...
 
InitialPartitionPolicy & Partitioner ()
 Modify the initial partitioning policy. More...
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int version)
 Serialize the k-means object. More...
 

Private Attributes

EmptyClusterPolicy emptyClusterAction
 Instantiated empty cluster policy. More...
 
size_t maxIterations
 Maximum number of iterations before giving up. More...
 
MetricType metric
 Instantiated distance metric. More...
 
InitialPartitionPolicy partitioner
 Instantiated initial partitioning policy. More...
 

Detailed Description

template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
class mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >

This class implements K-Means clustering, using a variety of possible implementations of Lloyd's algorithm.

Four template parameters can (optionally) be supplied: the distance metric to use, the policy for how to find the initial partition of the data, the actions to be taken when an empty cluster is encountered, and the implementation of a single Lloyd step to use.

A simple example of how to run K-Means clustering is shown below.

extern arma::mat data; // Dataset we want to run K-Means on.
arma::Row<size_t> assignments; // Cluster assignments.
arma::mat centroids; // Cluster centroids.
KMeans<> k; // Default options.
k.Cluster(data, 3, assignments, centroids); // 3 clusters.
// Cluster using the Manhattan distance, 100 iterations maximum, saving only
// the centroids.
KMeans<metric::ManhattanDistance> k(100);
k.Cluster(data, 6, centroids); // 6 clusters.
Template Parameters
MetricTypeThe distance metric to use for this KMeans; see metric::LMetric for an example.
InitialPartitionPolicyInitial partitioning policy; must implement a default constructor and either 'void Cluster(const arma::mat&, const size_t, arma::Row<size_t>&)' or 'void Cluster(const arma::mat&, const size_t, arma::mat&)'.
EmptyClusterPolicyPolicy for what to do on an empty cluster; must implement a default constructor and 'void EmptyCluster(const arma::mat& data, const size_t emptyCluster, const arma::mat& oldCentroids, arma::mat& newCentroids, arma::Col<size_t>& counts, MetricType& metric, const size_t iteration)'.
LloydStepTypeImplementation of single Lloyd step to use.
See also
RandomPartition, SampleInitialization, RefinedStart, AllowEmptyClusters, MaxVarianceNewCluster, NaiveKMeans, ElkanKMeans

Definition at line 73 of file kmeans.hpp.

Constructor & Destructor Documentation

template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::KMeans ( const size_t  maxIterations = 1000,
const MetricType  metric = MetricType(),
const InitialPartitionPolicy  partitioner = InitialPartitionPolicy(),
const EmptyClusterPolicy  emptyClusterAction = EmptyClusterPolicy() 
)

Create a K-Means object and (optionally) set the parameters which K-Means will be run with.

Parameters
maxIterationsMaximum number of iterations allowed before giving up (0 is valid, but the algorithm may never terminate).
metricOptional MetricType object; for when the metric has state it needs to store.
partitionerOptional InitialPartitionPolicy object; for when a specially initialized partitioning policy is required.
emptyClusterActionOptional EmptyClusterPolicy object; for when a specially initialized empty cluster policy is required.

Member Function Documentation

template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
void mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Cluster ( const MatType &  data,
const size_t  clusters,
arma::Row< size_t > &  assignments,
const bool  initialGuess = false 
)

Perform k-means clustering on the data, returning a list of cluster assignments.

Optionally, the vector of assignments can be set to an initial guess of the cluster assignments; to do this, set initialGuess to true.

Template Parameters
MatTypeType of matrix (arma::mat or arma::sp_mat).
Parameters
dataDataset to cluster.
clustersNumber of clusters to compute.
assignmentsVector to store cluster assignments in.
initialGuessIf true, then it is assumed that assignments has a list of initial cluster assignments.
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
void mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Cluster ( const MatType &  data,
const size_t  clusters,
arma::mat &  centroids,
const bool  initialGuess = false 
)

Perform k-means clustering on the data, returning the centroids of each cluster in the centroids matrix.

Optionally, the initial centroids can be specified by filling the centroids matrix with the initial centroids and specifying initialGuess = true.

Template Parameters
MatTypeType of matrix (arma::mat or arma::sp_mat).
Parameters
dataDataset to cluster.
clustersNumber of clusters to compute.
centroidsMatrix in which centroids are stored.
initialGuessIf true, then it is assumed that centroids contains the initial cluster centroids.
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
void mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Cluster ( const MatType &  data,
const size_t  clusters,
arma::Row< size_t > &  assignments,
arma::mat &  centroids,
const bool  initialAssignmentGuess = false,
const bool  initialCentroidGuess = false 
)

Perform k-means clustering on the data, returning a list of cluster assignments and also the centroids of each cluster.

Optionally, the vector of assignments can be set to an initial guess of the cluster assignments; to do this, set initialAssignmentGuess to true. Another way to set initial cluster guesses is to fill the centroids matrix with the centroid guesses, and then set initialCentroidGuess to true. initialAssignmentGuess supersedes initialCentroidGuess, so if both are set to true, the assignments vector is used.

Template Parameters
MatTypeType of matrix (arma::mat or arma::sp_mat).
Parameters
dataDataset to cluster.
clustersNumber of clusters to compute.
assignmentsVector to store cluster assignments in.
centroidsMatrix in which centroids are stored.
initialAssignmentGuessIf true, then it is assumed that assignments has a list of initial cluster assignments.
initialCentroidGuessIf true, then it is assumed that centroids contains the initial centroids of each cluster.
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
const EmptyClusterPolicy& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::EmptyClusterAction ( ) const
inline
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
EmptyClusterPolicy& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::EmptyClusterAction ( )
inline
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
size_t mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::MaxIterations ( ) const
inline
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
size_t& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::MaxIterations ( )
inline
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
const MetricType& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Metric ( ) const
inline
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
MetricType& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Metric ( )
inline
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
const InitialPartitionPolicy& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Partitioner ( ) const
inline
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
InitialPartitionPolicy& mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Partitioner ( )
inline
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
template<typename Archive >
void mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Serialize ( Archive &  ar,
const unsigned int  version 
)

Member Data Documentation

template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
EmptyClusterPolicy mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::emptyClusterAction
private
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
size_t mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::maxIterations
private

Maximum number of iterations before giving up.

Definition at line 185 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::MaxIterations().

template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
MetricType mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::metric
private
template<typename MetricType = metric::EuclideanDistance, typename InitialPartitionPolicy = SampleInitialization, typename EmptyClusterPolicy = MaxVarianceNewCluster, template< class, class > class LloydStepType = NaiveKMeans, typename MatType = arma::mat>
InitialPartitionPolicy mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::partitioner
private

Instantiated initial partitioning policy.

Definition at line 189 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy, LloydStepType, MatType >::Partitioner().


The documentation for this class was generated from the following file: