mlpack
master
|
This class contains methods which can fit a GMM to observations using the EM algorithm. More...
Public Member Functions | |
EMFit (const size_t maxIterations=300, const double tolerance=1e-10, InitialClusteringType clusterer=InitialClusteringType(), CovarianceConstraintPolicy constraint=CovarianceConstraintPolicy()) | |
Construct the EMFit object, optionally passing an InitialClusteringType object (just in case it needs to store state). More... | |
const InitialClusteringType & | Clusterer () const |
Get the clusterer. More... | |
InitialClusteringType & | Clusterer () |
Modify the clusterer. More... | |
const CovarianceConstraintPolicy & | Constraint () const |
Get the covariance constraint policy class. More... | |
CovarianceConstraintPolicy & | Constraint () |
Modify the covariance constraint policy class. More... | |
void | Estimate (const arma::mat &observations, std::vector< distribution::GaussianDistribution > &dists, arma::vec &weights, const bool useInitialModel=false) |
Fit the observations to a Gaussian mixture model (GMM) using the EM algorithm. More... | |
void | Estimate (const arma::mat &observations, const arma::vec &probabilities, std::vector< distribution::GaussianDistribution > &dists, arma::vec &weights, const bool useInitialModel=false) |
Fit the observations to a Gaussian mixture model (GMM) using the EM algorithm, taking into account the probabilities of each point being from this mixture. More... | |
size_t | MaxIterations () const |
Get the maximum number of iterations of the EM algorithm. More... | |
size_t & | MaxIterations () |
Modify the maximum number of iterations of the EM algorithm. More... | |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int version) |
Serialize the fitter. More... | |
double | Tolerance () const |
Get the tolerance for the convergence of the EM algorithm. More... | |
double & | Tolerance () |
Modify the tolerance for the convergence of the EM algorithm. More... | |
Private Member Functions | |
void | InitialClustering (const arma::mat &observations, std::vector< distribution::GaussianDistribution > &dists, arma::vec &weights) |
Run the clusterer, and then turn the cluster assignments into Gaussians. More... | |
double | LogLikelihood (const arma::mat &data, const std::vector< distribution::GaussianDistribution > &dists, const arma::vec &weights) const |
Calculate the log-likelihood of a model. More... | |
Private Attributes | |
InitialClusteringType | clusterer |
Object which will perform the clustering. More... | |
CovarianceConstraintPolicy | constraint |
Object which applies constraints to the covariance matrix. More... | |
size_t | maxIterations |
Maximum iterations of EM algorithm. More... | |
double | tolerance |
Tolerance for convergence of EM. More... | |
This class contains methods which can fit a GMM to observations using the EM algorithm.
It requires an initial clustering mechanism, which is by default the KMeans algorithm. The clustering mechanism must implement the following method:
This method should create 'clusters' clusters, and return the assignment of each point to a cluster.
Definition at line 43 of file em_fit.hpp.
mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::EMFit | ( | const size_t | maxIterations = 300 , |
const double | tolerance = 1e-10 , |
||
InitialClusteringType | clusterer = InitialClusteringType() , |
||
CovarianceConstraintPolicy | constraint = CovarianceConstraintPolicy() |
||
) |
Construct the EMFit object, optionally passing an InitialClusteringType object (just in case it needs to store state).
Setting the maximum number of iterations to 0 means that the EM algorithm will iterate until convergence (with the given tolerance).
The parameter forcePositive controls whether or not the covariance matrices are checked for positive definiteness at each iteration. This could be a time-consuming task, so, if you know your data is well-behaved, you can set it to false and save some runtime.
maxIterations | Maximum number of iterations for EM. |
tolerance | Log-likelihood tolerance required for convergence. |
forcePositive | Check for positive-definiteness of each covariance matrix at each iteration. |
clusterer | Object which will perform the initial clustering. |
|
inline |
Get the clusterer.
Definition at line 112 of file em_fit.hpp.
References mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::clusterer.
|
inline |
Modify the clusterer.
Definition at line 114 of file em_fit.hpp.
References mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::clusterer.
|
inline |
Get the covariance constraint policy class.
Definition at line 117 of file em_fit.hpp.
References mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::constraint.
|
inline |
Modify the covariance constraint policy class.
Definition at line 119 of file em_fit.hpp.
References mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::constraint.
void mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Estimate | ( | const arma::mat & | observations, |
std::vector< distribution::GaussianDistribution > & | dists, | ||
arma::vec & | weights, | ||
const bool | useInitialModel = false |
||
) |
Fit the observations to a Gaussian mixture model (GMM) using the EM algorithm.
The size of the vectors (indicating the number of components) must already be set. Optionally, if useInitialModel is set to true, then the model given in the means, covariances, and weights parameters is used as the initial model, instead of using the InitialClusteringType::Cluster() option.
observations | List of observations to train on. |
means | Vector to store trained means in. |
covariances | Vector to store trained covariances in. |
weights | Vector to store a priori weights in. |
useInitialModel | If true, the given model is used for the initial clustering. |
void mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Estimate | ( | const arma::mat & | observations, |
const arma::vec & | probabilities, | ||
std::vector< distribution::GaussianDistribution > & | dists, | ||
arma::vec & | weights, | ||
const bool | useInitialModel = false |
||
) |
Fit the observations to a Gaussian mixture model (GMM) using the EM algorithm, taking into account the probabilities of each point being from this mixture.
The size of the vectors (indicating the number of components) must already be set. Optionally, if useInitialModel is set to true, then the model given in the means, covariances, and weights parameters is used as the initial model, instead of using the InitialClusteringType::Cluster() option.
observations | List of observations to train on. |
probabilities | Probability of each point being from this model. |
means | Vector to store trained means in. |
covariances | Vector to store trained covariances in. |
weights | Vector to store a priori weights in. |
useInitialModel | If true, the given model is used for the initial clustering. |
|
private |
Run the clusterer, and then turn the cluster assignments into Gaussians.
This is a helper function for both overloads of Estimate(). The vectors must be already set to the number of clusters.
observations | List of observations. |
means | Vector to store means in. |
covariances | Vector to store covariances in. |
weights | Vector to store a priori weights in. |
Referenced by mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Tolerance().
|
private |
Calculate the log-likelihood of a model.
Yes, this is reimplemented in the GMM code. Intuition suggests that the log-likelihood is not the best way to determine if the EM algorithm has converged.
data | Data matrix. |
means | Vector of means. |
covariances | Vector of covariance matrices. |
weights | Vector of a priori weights. |
Referenced by mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Tolerance().
|
inline |
Get the maximum number of iterations of the EM algorithm.
Definition at line 122 of file em_fit.hpp.
References mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::maxIterations.
|
inline |
Modify the maximum number of iterations of the EM algorithm.
Definition at line 124 of file em_fit.hpp.
References mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::maxIterations.
void mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Serialize | ( | Archive & | ar, |
const unsigned int | version | ||
) |
Serialize the fitter.
Referenced by mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Tolerance().
|
inline |
Get the tolerance for the convergence of the EM algorithm.
Definition at line 127 of file em_fit.hpp.
References mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::tolerance.
|
inline |
Modify the tolerance for the convergence of the EM algorithm.
Definition at line 129 of file em_fit.hpp.
References mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::InitialClustering(), mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::LogLikelihood(), mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Serialize(), and mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::tolerance.
|
private |
Object which will perform the clustering.
Definition at line 170 of file em_fit.hpp.
Referenced by mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Clusterer().
|
private |
Object which applies constraints to the covariance matrix.
Definition at line 172 of file em_fit.hpp.
Referenced by mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Constraint().
|
private |
Maximum iterations of EM algorithm.
Definition at line 166 of file em_fit.hpp.
Referenced by mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::MaxIterations().
|
private |
Tolerance for convergence of EM.
Definition at line 168 of file em_fit.hpp.
Referenced by mlpack::gmm::EMFit< InitialClusteringType, CovarianceConstraintPolicy >::Tolerance().