mlpack  master
Classes | Typedefs | Enumerations | Functions
mlpack::data Namespace Reference

Functions to load and save matrices and models. More...

Classes

class  CustomImputation
 A simple custom imputation class. More...
 
class  DatasetMapper
 Auxiliary information for a dataset, including mappings to/from strings and the datatype of each dimension. More...
 
struct  FirstArrayShim
 A first shim for arrays. More...
 
struct  FirstNormalArrayShim
 A first shim for arrays without a Serialize() method. More...
 
struct  FirstShim
 The first shim: simply holds the object and its name. More...
 
struct  HasSerialize
 
struct  HasSerializeFunction
 
class  Imputer
 Given a dataset of a particular datatype, replace user-specified missing value with a variable dependent on the StrategyType and MapperType. More...
 
class  IncrementPolicy
 IncrementPolicy is used as a helper class for DatasetMapper. More...
 
class  ListwiseDeletion
 A complete-case analysis to remove the values containing mappedValue. More...
 
class  MeanImputation
 A simple mean imputation class. More...
 
class  MedianImputation
 This is a class implementation of simple median imputation. More...
 
class  MissingPolicy
 MissingPolicy is used as a helper class for DatasetMapper. More...
 
struct  PointerShim
 A shim for pointers. More...
 
struct  SecondArrayShim
 A shim for objects in an array; this is basically like the SecondShim, but for arrays that hold objects that have Serialize() methods instead of serialize() methods. More...
 
struct  SecondNormalArrayShim
 A shim for objects in an array which do not have a Serialize() function. More...
 
struct  SecondShim
 The second shim: wrap the call to Serialize() inside of a serialize() function, so that an archive type can call serialize() on a SecondShim object and this gets forwarded correctly to our object's Serialize() function. More...
 

Typedefs

using DatasetInfo = DatasetMapper< data::IncrementPolicy >
 

Enumerations

enum  Datatype : bool {
  numeric = 0,
  categorical = 1
}
 The Datatype enum specifies the types of data mlpack algorithms can use. More...
 
enum  format {
  autodetect,
  text,
  xml,
  binary
}
 Define the formats we can read through boost::serialization. More...
 

Functions

template<typename T >
void Binarize (const arma::Mat< T > &input, arma::Mat< T > &output, const double threshold)
 Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0. More...
 
template<typename T >
void Binarize (const arma::Mat< T > &input, arma::Mat< T > &output, const double threshold, const size_t dimension)
 Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0. More...
 
template<typename T >
FirstArrayShim< T > CreateArrayNVP (T *t, const size_t len, const std::string &name, typename std::enable_if_t< HasSerialize< T >::value > *=0)
 Call this function to produce a name-value pair for an array; this is similar to boost::serialization::make_array(), but provides a nicer wrapper, allows types that have a Serialize() function, and allows you to give a name to your array. More...
 
template<typename T >
FirstNormalArrayShim< T > CreateArrayNVP (T *t, const size_t len, const std::string &name, typename std::enable_if_t<!HasSerialize< T >::value > *=0)
 Call this function to produce a name-value pair for an array; this is similar to boost::serialization::make_array(), but provides a nicer wrapper, allows types that have a Serialize() function, and allows you to give a name to your array. More...
 
template<typename T >
FirstShim< T > CreateNVP (T &t, const std::string &name, typename std::enable_if_t< HasSerialize< T >::value > *=0)
 Call this function to produce a name-value pair; this is similar to BOOST_SERIALIZATION_NVP(), but should be used for types that have a Serialize() function (or contain a type that has a Serialize() function) instead of a serialize() function. More...
 
template<typename T >
const boost::serialization::nvp< T > CreateNVP (T &t, const std::string &name, typename std::enable_if_t<!HasSerialize< T >::value > *=0, typename std::enable_if_t<!std::is_pointer< T >::value > *=0)
 Call this function to produce a name-value pair; this is similar to BOOST_SERIALIZATION_NVP(), but should be used for types that have a Serialize() function (or contain a type that has a Serialize() function) instead of a serialize() function. More...
 
template<typename T >
const boost::serialization::nvp< PointerShim< T > * > CreateNVP (T *&t, const std::string &name, typename std::enable_if_t< HasSerialize< T >::value > *=0)
 Call this function to produce a name-value pair; this is similar to BOOST_SERIALIZATION_NVP(), but should be used for types that have a Serialize() function (or contain a type that has a Serialize() function) instead of a serialize() function. More...
 
template<typename T >
const boost::serialization::nvp< T * > CreateNVP (T *&t, const std::string &name, typename std::enable_if_t<!HasSerialize< T >::value > *=0)
 Call this function to produce a name-value pair; this is similar to BOOST_SERIALIZATION_NVP(), but should be used for types that have a Serialize() function (or contain a type that has a Serialize() function) instead of a serialize() function. More...
 
std::string Extension (const std::string &filename)
 
 HAS_MEM_FUNC (Serialize, HasSerializeCheck)
 
template<typename eT >
bool Load (const std::string &filename, arma::Mat< eT > &matrix, const bool fatal=false, const bool transpose=true)
 Loads a matrix from file, guessing the filetype from the extension. More...
 
template<typename eT , typename PolicyType >
bool Load (const std::string &filename, arma::Mat< eT > &matrix, DatasetMapper< PolicyType > &info, const bool fatal=false, const bool transpose=true)
 Loads a matrix from a file, guessing the filetype from the extension and mapping categorical features with a DatasetMapper object. More...
 
template<typename T >
bool Load (const std::string &filename, const std::string &name, T &t, const bool fatal=false, format f=format::autodetect)
 Load a model from a file, guessing the filetype from the extension, or, optionally, loading the specified format. More...
 
template<typename eT >
void LoadARFF (const std::string &filename, arma::Mat< eT > &matrix)
 A utility function to load an ARFF dataset as numeric features (that is, as an Armadillo matrix without any modification). More...
 
template<typename eT , typename PolicyType >
void LoadARFF (const std::string &filename, arma::Mat< eT > &matrix, DatasetMapper< PolicyType > &info)
 A utility function to load an ARFF dataset as numeric and categorical features, using the DatasetInfo structure for mapping. More...
 
template<typename eT , typename RowType >
void NormalizeLabels (const RowType &labelsIn, arma::Row< size_t > &labels, arma::Col< eT > &mapping)
 Given a set of labels of a particular datatype, convert them to unsigned labels in the range [0, n) where n is the number of different labels. More...
 
template<typename Archive , typename T >
Archive & operator& (Archive &ar, FirstShim< T > t)
 Catch when we call operator& with a FirstShim object. More...
 
template<typename Archive , typename T >
Archive & operator& (Archive &ar, FirstArrayShim< T > t)
 Catch when we call operator& with a FirstArrayShim object. More...
 
template<typename Archive , typename T >
Archive & operator& (Archive &ar, FirstNormalArrayShim< T > t)
 Catch when we call operator& with a FirstNormalArrayShim object. More...
 
template<typename Archive , typename T >
Archive & operator<< (Archive &ar, FirstShim< T > t)
 Catch when we call operator<< with a FirstShim object. More...
 
template<typename Archive , typename T >
Archive & operator<< (Archive &ar, FirstArrayShim< T > t)
 Catch when we call operator<< with a FirstArrayShim object. More...
 
template<typename Archive , typename T >
Archive & operator<< (Archive &ar, FirstNormalArrayShim< T > t)
 Catch when we call operator<< with a FirstNormalArrayShim object. More...
 
template<typename Archive , typename T >
Archive & operator>> (Archive &ar, FirstShim< T > t)
 Catch when we call operator>> with a FirstShim object. More...
 
template<typename Archive , typename T >
Archive & operator>> (Archive &ar, FirstArrayShim< T > t)
 Catch when we call operator>> with a FirstArrayShim object. More...
 
template<typename Archive , typename T >
Archive & operator>> (Archive &ar, FirstNormalArrayShim< T > t)
 Catch when we call operator>> with a FirstNormalArrayShim object. More...
 
template<typename eT >
void RevertLabels (const arma::Row< size_t > &labels, const arma::Col< eT > &mapping, arma::Row< eT > &labelsOut)
 Given a set of labels that have been mapped to the range [0, n), map them back to the original labels given by the 'mapping' vector. More...
 
template<typename eT >
bool Save (const std::string &filename, const arma::Mat< eT > &matrix, const bool fatal=false, bool transpose=true)
 Saves a matrix to file, guessing the filetype from the extension. More...
 
template<typename T >
bool Save (const std::string &filename, const std::string &name, T &t, const bool fatal=false, format f=format::autodetect)
 Saves a model to file, guessing the filetype from the extension, or, optionally, saving the specified format. More...
 
template<typename T , typename U >
void Split (const arma::Mat< T > &input, const arma::Row< U > &inputLabel, arma::Mat< T > &trainData, arma::Mat< T > &testData, arma::Row< U > &trainLabel, arma::Row< U > &testLabel, const double testRatio)
 Given an input dataset and labels, split into a training set and test set. More...
 
template<typename T >
void Split (const arma::Mat< T > &input, arma::Mat< T > &trainData, arma::Mat< T > &testData, const double testRatio)
 Given an input dataset, split into a training set and test set. More...
 
template<typename T , typename U >
std::tuple< arma::Mat< T >, arma::Mat< T >, arma::Row< U >, arma::Row< U > > Split (const arma::Mat< T > &input, const arma::Row< U > &inputLabel, const double testRatio)
 Given an input dataset and labels, split into a training set and test set. More...
 
template<typename T >
std::tuple< arma::Mat< T >, arma::Mat< T > > Split (const arma::Mat< T > &input, const double testRatio)
 Given an input dataset, split into a training set and test set. More...
 

Detailed Description

Functions to load and save matrices and models.

Functions to load and save matrices.

Typedef Documentation

Definition at line 162 of file dataset_mapper.hpp.

Enumeration Type Documentation

The Datatype enum specifies the types of data mlpack algorithms can use.

The vast majority of mlpack algorithms can only use numeric data (i.e. float/double/etc.), but some algorithms can use categorical data, specified via this Datatype enum and the DatasetMapper class.

Enumerator
numeric 
categorical 

Definition at line 24 of file datatype.hpp.

Define the formats we can read through boost::serialization.

Enumerator
autodetect 
text 
xml 
binary 

Definition at line 20 of file format.hpp.

Function Documentation

template<typename T >
void mlpack::data::Binarize ( const arma::Mat< T > &  input,
arma::Mat< T > &  output,
const double  threshold 
)

Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0.

This overload applies the changes to all dimensions.

arma::Mat<double> input = loadData();
arma::Mat<double> output;
double threshold = 0.5;
// Binarize the whole Matrix. All positive values in will be set to 1 and
// the values less than or equal to 0.5 will become 0.
Binarize<double>(input, output, threshold);
Parameters
inputInput matrix to Binarize.
outputMatrix you want to save binarized data into.
thresholdThreshold can by any number.

Definition at line 41 of file binarize.hpp.

template<typename T >
void mlpack::data::Binarize ( const arma::Mat< T > &  input,
arma::Mat< T > &  output,
const double  threshold,
const size_t  dimension 
)

Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0.

This overload takes a dimension and applys the changes to the given dimension.

arma::Mat<double> input = loadData();
arma::Mat<double> output;
double threshold = 0.5;
size_t dimension = 0;
// Binarize the first dimension. All positive values in the first dimension
// will be set to 1 and the values less than or equal to 0 will become 0.
Binarize<double>(input, output, threshold, dimension);
Parameters
inputInput matrix to Binarize.
outputMatrix you want to save binarized data into.
thresholdThreshold can by any number.
dimensionFeature to apply the Binarize function.

Definition at line 83 of file binarize.hpp.

template<typename T >
FirstArrayShim<T> mlpack::data::CreateArrayNVP ( T *  t,
const size_t  len,
const std::string name,
typename std::enable_if_t< HasSerialize< T >::value > *  = 0 
)
inline

Call this function to produce a name-value pair for an array; this is similar to boost::serialization::make_array(), but provides a nicer wrapper, allows types that have a Serialize() function, and allows you to give a name to your array.

This particular overload is used by classes that have a Serialize() function.

Definition at line 214 of file serialization_shim.hpp.

template<typename T >
FirstNormalArrayShim<T> mlpack::data::CreateArrayNVP ( T *  t,
const size_t  len,
const std::string name,
typename std::enable_if_t<!HasSerialize< T >::value > *  = 0 
)
inline

Call this function to produce a name-value pair for an array; this is similar to boost::serialization::make_array(), but provides a nicer wrapper, allows types that have a Serialize() function, and allows you to give a name to your array.

This particular overload is used by classes that do not have a Serialize() function or primitive types.

Definition at line 231 of file serialization_shim.hpp.

template<typename T >
FirstShim<T> mlpack::data::CreateNVP ( T &  t,
const std::string name,
typename std::enable_if_t< HasSerialize< T >::value > *  = 0 
)
inline

Call this function to produce a name-value pair; this is similar to BOOST_SERIALIZATION_NVP(), but should be used for types that have a Serialize() function (or contain a type that has a Serialize() function) instead of a serialize() function.

The template type should be automatically deduced, and the two std::enable_if_t<> parameters are automatically deduced too. So usage looks like

MyType t;
CreateNVP(t, "my_name_for_t");

Note that the second parameter, 'name', must be a valid XML identifier.

This function does not return a boost::serialization::nvp<T> object, but instead a shim type (FirstShim<T>).

This particular overload is used by classes that have a Serialize() function.

Parameters
tObject to create NVP (name-value pair) with.
nameName of object (must be a valid XML identifier).

Definition at line 94 of file serialization_shim.hpp.

Referenced by mlpack::tree::BinaryNumericSplitInfo< ObservationType >::Serialize(), mlpack::tree::NumericSplitInfo< ObservationType >::Serialize(), mlpack::range::RangeSearchStat::Serialize(), mlpack::amf::GivenInitialization::Serialize(), mlpack::distribution::RegressionDistribution::Serialize(), mlpack::neighbor::RAQueryStat< SortPolicy >::Serialize(), mlpack::kernel::PolynomialKernel::Serialize(), mlpack::kernel::HyperbolicTangentKernel::Serialize(), mlpack::tree::AxisParallelProjVector::Serialize(), mlpack::gmm::EigenvalueRatioConstraint::Serialize(), mlpack::adaboost::AdaBoostModel::Serialize(), mlpack::kernel::TriangularKernel::Serialize(), mlpack::kmeans::RefinedStart::Serialize(), mlpack::neighbor::NeighborSearchStat< neighbor::NearestNeighborSort >::Serialize(), mlpack::kernel::LaplacianKernel::Serialize(), mlpack::fastmks::FastMKSStat::Serialize(), mlpack::kernel::SphericalKernel::Serialize(), mlpack::tree::HoeffdingCategoricalSplit< FitnessFunction >::Serialize(), mlpack::regression::LinearRegression::Serialize(), mlpack::data::DatasetMapper< PolicyType >::Serialize(), mlpack::tree::HyperplaneBase< BoundT, ProjVectorT >::Serialize(), mlpack::hmm::HMMModel::Serialize(), mlpack::tree::ProjVector::Serialize(), mlpack::distribution::LaplaceDistribution::Serialize(), mlpack::distribution::GaussianDistribution::Serialize(), mlpack::kernel::GaussianKernel::Serialize(), mlpack::tree::HoeffdingTreeModel::Serialize(), mlpack::amf::SVDBatchLearning::Serialize(), mlpack::regression::SoftmaxRegression< OptimizerType >::Serialize(), mlpack::tree::XTreeAuxiliaryInformation< TreeType >::SplitHistoryStruct::Serialize(), mlpack::distribution::DiscreteDistribution::Serialize(), mlpack::tree::XTreeAuxiliaryInformation< TreeType >::Serialize(), mlpack::data::SecondArrayShim< T >::serialize(), mlpack::SerializeObject(), and mlpack::SerializePointerObject().

template<typename T >
const boost::serialization::nvp<T> mlpack::data::CreateNVP ( T &  t,
const std::string name,
typename std::enable_if_t<!HasSerialize< T >::value > *  = 0,
typename std::enable_if_t<!std::is_pointer< T >::value > *  = 0 
)
inline

Call this function to produce a name-value pair; this is similar to BOOST_SERIALIZATION_NVP(), but should be used for types that have a Serialize() function (or contain a type that has a Serialize() function) instead of a serialize() function.

The template type should be automatically deduced, and the two std::enable_if<> parameters are automatically deduced too. So usage looks like

MyType t;
CreateNVP(t, "my_name_for_t");

Note that the second parameter, 'name', must be a valid XML identifier.

This particular overload is used by classes that do not have a Serialize() function (so, no shim is necessary) or primitive types that aren't pointers.

Parameters
tObject to create NVP (name-value pair) with.
nameName of object (must be a valid XML identifier).

Definition at line 128 of file serialization_shim.hpp.

template<typename T >
const boost::serialization::nvp<PointerShim<T>*> mlpack::data::CreateNVP ( T *&  t,
const std::string name,
typename std::enable_if_t< HasSerialize< T >::value > *  = 0 
)
inline

Call this function to produce a name-value pair; this is similar to BOOST_SERIALIZATION_NVP(), but should be used for types that have a Serialize() function (or contain a type that has a Serialize() function) instead of a serialize() function.

The template type should be automatically deduced, and the two std::enable_if_t<> parameters are automatically deduced too. So usage looks like

MyType t;
CreateNVP(t, "my_name_for_t");

Note that the second parameter, 'name', must be a valid XML identifier.

This particular overload is used by pointers to classes that have a Serialize() function.

Parameters
tObject to create NVP (name-value pair) with.
nameName of object (must be a valid XML identifier).

Definition at line 163 of file serialization_shim.hpp.

template<typename T >
const boost::serialization::nvp<T*> mlpack::data::CreateNVP ( T *&  t,
const std::string name,
typename std::enable_if_t<!HasSerialize< T >::value > *  = 0 
)
inline

Call this function to produce a name-value pair; this is similar to BOOST_SERIALIZATION_NVP(), but should be used for types that have a Serialize() function (or contain a type that has a Serialize() function) instead of a serialize() function.

The template type should be automatically deduced, and the two std::enable_if_t<> parameters are automatically deduced too. So usage looks like

MyType t;
CreateNVP(t, "my_name_for_t");

Note that the second parameter, 'name', must be a valid XML identifier.

This particular overload is used by pointers to classes that do not have a Serialize() function, or pointers to non-classes.

Parameters
tObject to create NVP (name-value pair) with.
nameName of object (must be a valid XML identifier).

Definition at line 198 of file serialization_shim.hpp.

std::string mlpack::data::Extension ( const std::string filename)
inline

Definition at line 21 of file extension.hpp.

References string().

mlpack::data::HAS_MEM_FUNC ( Serialize  ,
HasSerializeCheck   
)
template<typename eT >
bool mlpack::data::Load ( const std::string filename,
arma::Mat< eT > &  matrix,
const bool  fatal = false,
const bool  transpose = true 
)

Loads a matrix from file, guessing the filetype from the extension.

This will transpose the matrix at load time (unless the transpose parameter is set to false). If the filetype cannot be determined, an error will be given.

The supported types of files are the same as found in Armadillo:

  • CSV (csv_ascii), denoted by .csv, or optionally .txt
  • TSV (raw_ascii), denoted by .tsv, .csv, or .txt
  • ASCII (raw_ascii), denoted by .txt
  • Armadillo ASCII (arma_ascii), also denoted by .txt
  • PGM (pgm_binary), denoted by .pgm
  • PPM (ppm_binary), denoted by .ppm
  • Raw binary (raw_binary), denoted by .bin
  • Armadillo binary (arma_binary), denoted by .bin
  • HDF5, denoted by .hdf, .hdf5, .h5, or .he5

If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.

If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully. The parameter 'transpose' controls whether or not the matrix is transposed after loading. In most cases, because data is generally stored in a row-major format and mlpack requires column-major matrices, this should be left at its default value of 'true'.

Parameters
filenameName of file to load.
matrixMatrix to load contents of file into.
fatalIf an error should be reported as fatal (default false).
transposeIf true, transpose the matrix after loading.
Returns
Boolean value indicating success or failure of load.
template<typename eT , typename PolicyType >
bool mlpack::data::Load ( const std::string filename,
arma::Mat< eT > &  matrix,
DatasetMapper< PolicyType > &  info,
const bool  fatal = false,
const bool  transpose = true 
)

Loads a matrix from a file, guessing the filetype from the extension and mapping categorical features with a DatasetMapper object.

This will transpose the matrix (unless the transpose parameter is set to false). This particular overload of Load() can only load text-based formats, such as those given below:

  • CSV (csv_ascii), denoted by .csv, or optionally .txt
  • TSV (raw_ascii), denoted by .tsv, .csv, or .txt
  • ASCII (raw_ascii), denoted by .txt

If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.

If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully. The parameter 'transpose' controls whether or not the matrix is transposed after loading. In most cases, because data is generally stored in a row-major format and mlpack requires column-major matrices, this should be left at its default value of 'true'.

The DatasetMapper object passed to this function will be re-created, so any mappings from previous loads will be lost.

Parameters
filenameName of file to load.
matrixMatrix to load contents of file into.
infoDatasetMapper object to populate with mappings and data types.
fatalIf an error should be reported as fatal (default false).
transposeIf true, transpose the matrix after loading.
Returns
Boolean value indicating success or failure of load.
template<typename T >
bool mlpack::data::Load ( const std::string filename,
const std::string name,
T &  t,
const bool  fatal = false,
format  f = format::autodetect 
)

Load a model from a file, guessing the filetype from the extension, or, optionally, loading the specified format.

If automatic extension detection is used and the filetype cannot be determined, an error will be given.

The supported types of files are the same as what is supported by the boost::serialization library:

  • text, denoted by .txt
  • xml, denoted by .xml
  • binary, denoted by .bin

The format parameter can take any of the values in the 'format' enum: 'format::autodetect', 'format::text', 'format::xml', and 'format::binary'. The autodetect functionality operates on the file extension (so, "file.txt" would be autodetected as text).

The name parameter should be specified to indicate the name of the structure to be loaded. This should be the same as the name that was used to save the structure (otherwise, the loading procedure will fail).

If the parameter 'fatal' is set to true, then an exception will be thrown in the event of load failure. Otherwise, the method will return false and the relevant error information will be printed to Log::Warn.

template<typename eT >
void mlpack::data::LoadARFF ( const std::string filename,
arma::Mat< eT > &  matrix 
)

A utility function to load an ARFF dataset as numeric features (that is, as an Armadillo matrix without any modification).

An exception will be thrown if any features are non-numeric.

template<typename eT , typename PolicyType >
void mlpack::data::LoadARFF ( const std::string filename,
arma::Mat< eT > &  matrix,
DatasetMapper< PolicyType > &  info 
)

A utility function to load an ARFF dataset as numeric and categorical features, using the DatasetInfo structure for mapping.

An exception will be thrown upon failure.

A pre-existing DatasetInfo object can be passed in, but if the dimensionality of the given DatasetInfo object (info.Dimensionality()) does not match the dimensionality of the data, a std::invalid_argument exception will be thrown. If an empty DatasetInfo object is given (constructed with the default constructor or otherwise, so that info.Dimensionality() is 0), it will be set to the right dimensionality.

This ability to pass in pre-existing DatasetInfo objects is very necessary when, e.g., loading a test set after training. If the same DatasetInfo from loading the training set is not used, then the test set may be loaded with different mappings—which can cause horrible problems!

Parameters
filenameName of ARFF file to load.
matrixMatrix to load data into.
infoDatasetInfo object; can be default-constructed or pre-existing from another call to LoadARFF().
template<typename eT , typename RowType >
void mlpack::data::NormalizeLabels ( const RowType &  labelsIn,
arma::Row< size_t > &  labels,
arma::Col< eT > &  mapping 
)

Given a set of labels of a particular datatype, convert them to unsigned labels in the range [0, n) where n is the number of different labels.

Also, a reverse mapping from the new label to the old value is stored in the 'mapping' vector.

Parameters
labelsInInput labels of arbitrary datatype.
labelsVector that unsigned labels will be stored in.
mappingReverse mapping to convert new labels back to old labels.
template<typename Archive , typename T >
Archive& mlpack::data::operator& ( Archive &  ar,
FirstShim< T >  t 
)

Catch when we call operator& with a FirstShim object.

In this case, we make the second-level shim and use it. Note that this second-level shim can be used as an lvalue, which is what's necessary for this whole thing to work. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 385 of file serialization_shim.hpp.

References mlpack::data::FirstShim< T >::name, and mlpack::data::FirstShim< T >::t.

Referenced by mlpack::bound::HRectBound< MetricType >::MinWidth().

template<typename Archive , typename T >
Archive& mlpack::data::operator& ( Archive &  ar,
FirstArrayShim< T >  t 
)

Catch when we call operator& with a FirstArrayShim object.

In this case, we make the second-level array shim and use it. Note that this second-level shim can be used as an lvalue, which is what's necessary for this whole thing to work. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 427 of file serialization_shim.hpp.

References mlpack::data::FirstArrayShim< T >::len, mlpack::data::FirstArrayShim< T >::name, and mlpack::data::FirstArrayShim< T >::t.

template<typename Archive , typename T >
Archive& mlpack::data::operator& ( Archive &  ar,
FirstNormalArrayShim< T >  t 
)

Catch when we call operator& with a FirstNormalArrayShim object.

In this case, we make the second-level array shim and use it. Note that this second-level shim can be used as an lvalue, which is necessary if we want to use make_nvp() safely. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 469 of file serialization_shim.hpp.

References mlpack::data::FirstNormalArrayShim< T >::len, mlpack::data::FirstNormalArrayShim< T >::name, and mlpack::data::FirstNormalArrayShim< T >::t.

template<typename Archive , typename T >
Archive& mlpack::data::operator<< ( Archive &  ar,
FirstShim< T >  t 
)

Catch when we call operator<< with a FirstShim object.

In this case, we make the second-level shim and use it. Note that this second-level shim can be used as an lvalue, which is what's necessary for this whole thing to work. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 371 of file serialization_shim.hpp.

template<typename Archive , typename T >
Archive& mlpack::data::operator<< ( Archive &  ar,
FirstArrayShim< T >  t 
)

Catch when we call operator<< with a FirstArrayShim object.

In this case, we make the second-level array shim and use it. Note that this second-level shim can be used as an lvalue, which is what's necessary for this whole thing to work. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 413 of file serialization_shim.hpp.

template<typename Archive , typename T >
Archive& mlpack::data::operator<< ( Archive &  ar,
FirstNormalArrayShim< T >  t 
)

Catch when we call operator<< with a FirstNormalArrayShim object.

In this case, we make the second-level array shim and use it. Note that this second-level shim can be used as an lvalue, which is necessary if we want to use make_nvp() safely. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 455 of file serialization_shim.hpp.

template<typename Archive , typename T >
Archive& mlpack::data::operator>> ( Archive &  ar,
FirstShim< T >  t 
)

Catch when we call operator>> with a FirstShim object.

In this case, we make the second-level shim and use it. Note that this second-level shim can be used as an lvalue, which is what's necessary for this whole thing to work. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 399 of file serialization_shim.hpp.

References mlpack::data::FirstShim< T >::name, and mlpack::data::FirstShim< T >::t.

template<typename Archive , typename T >
Archive& mlpack::data::operator>> ( Archive &  ar,
FirstArrayShim< T >  t 
)

Catch when we call operator>> with a FirstArrayShim object.

In this case, we make the second-level array shim and use it. Note that this second-level shim can be used as an lvalue, which is what's necessary for this whole thing to work. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 441 of file serialization_shim.hpp.

References mlpack::data::FirstArrayShim< T >::len, mlpack::data::FirstArrayShim< T >::name, and mlpack::data::FirstArrayShim< T >::t.

template<typename Archive , typename T >
Archive& mlpack::data::operator>> ( Archive &  ar,
FirstNormalArrayShim< T >  t 
)

Catch when we call operator>> with a FirstNormalArrayShim object.

In this case, we make the second-level array shim and use it. Note that this second-level shim can be used as an lvalue, which is necessary if we want to use make_nvp() safely. The first-level shim can't be an lvalue (this is why we need two levels of shims).

Definition at line 483 of file serialization_shim.hpp.

References mlpack::data::FirstNormalArrayShim< T >::len, mlpack::data::FirstNormalArrayShim< T >::name, and mlpack::data::FirstNormalArrayShim< T >::t.

template<typename eT >
void mlpack::data::RevertLabels ( const arma::Row< size_t > &  labels,
const arma::Col< eT > &  mapping,
arma::Row< eT > &  labelsOut 
)

Given a set of labels that have been mapped to the range [0, n), map them back to the original labels given by the 'mapping' vector.

Parameters
labelsSet of normalized labels to convert.
mappingMapping to use to convert labels.
labelsOutVector to store new labels in.
template<typename eT >
bool mlpack::data::Save ( const std::string filename,
const arma::Mat< eT > &  matrix,
const bool  fatal = false,
bool  transpose = true 
)

Saves a matrix to file, guessing the filetype from the extension.

This will transpose the matrix at save time. If the filetype cannot be determined, an error will be given.

The supported types of files are the same as found in Armadillo:

  • CSV (csv_ascii), denoted by .csv, or optionally .txt
  • ASCII (raw_ascii), denoted by .txt
  • Armadillo ASCII (arma_ascii), also denoted by .txt
  • PGM (pgm_binary), denoted by .pgm
  • PPM (ppm_binary), denoted by .ppm
  • Raw binary (raw_binary), denoted by .bin
  • Armadillo binary (arma_binary), denoted by .bin
  • HDF5 (hdf5_binary), denoted by .hdf5, .hdf, .h5, or .he5

If the file extension is not one of those types, an error will be given. If the 'fatal' parameter is set to true, a std::runtime_error exception will be thrown upon failure. If the 'transpose' parameter is set to true, the matrix will be transposed before saving. Generally, because mlpack stores matrices in a column-major format and most datasets are stored on disk as row-major, this parameter should be left at its default value of 'true'.

Parameters
filenameName of file to save to.
matrixMatrix to save into file.
fatalIf an error should be reported as fatal (default false).
transposeIf true, transpose the matrix before saving.
Returns
Boolean value indicating success or failure of save.
template<typename T >
bool mlpack::data::Save ( const std::string filename,
const std::string name,
T &  t,
const bool  fatal = false,
format  f = format::autodetect 
)

Saves a model to file, guessing the filetype from the extension, or, optionally, saving the specified format.

If automatic extension detection is used and the filetype cannot be determined, and error will be given.

The supported types of files are the same as what is supported by the boost::serialization library:

  • text, denoted by .txt
  • xml, denoted by .xml
  • binary, denoted by .bin

The format parameter can take any of the values in the 'format' enum: 'format::autodetect', 'format::text', 'format::xml', and 'format::binary'. The autodetect functionality operates on the file extension (so, "file.txt" would be autodetected as text).

The name parameter should be specified to indicate the name of the structure to be saved. If Load() is later called on the generated file, the name used to load should be the same as the name used for this call to Save().

If the parameter 'fatal' is set to true, then an exception will be thrown in the event of a save failure. Otherwise, the method will return false and the relevant error information will be printed to Log::Warn.

template<typename T , typename U >
void mlpack::data::Split ( const arma::Mat< T > &  input,
const arma::Row< U > &  inputLabel,
arma::Mat< T > &  trainData,
arma::Mat< T > &  testData,
arma::Row< U > &  trainLabel,
arma::Row< U > &  testLabel,
const double  testRatio 
)

Given an input dataset and labels, split into a training set and test set.

Example usage below. This overload places the split dataset into the four output parameters given (trainData, testData, trainLabel, and testLabel).

arma::mat input = loadData();
arma::Row<size_t> label = loadLabel();
arma::mat trainData;
arma::mat testData;
arma::Row<size_t> trainLabel;
arma::Row<size_t> testLabel;
math::RandomSeed(100); // Set the seed if you like.
// Split the dataset into a training and test set, with 30% of the data being
// held out for the test set.
Split(input, label, trainData,
testData, trainLabel, testLabel, 0.3);
Parameters
inputInput dataset to split.
labelInput labels to split.
trainDataMatrix to store training data into.
testDataMatrix to store test data into.
trainLabelVector to store training labels into.
testLabelVector to store test labels into.
testRatioPercentage of dataset to use for test set (between 0 and 1).

Definition at line 49 of file split_data.hpp.

Referenced by Split().

template<typename T >
void mlpack::data::Split ( const arma::Mat< T > &  input,
arma::Mat< T > &  trainData,
arma::Mat< T > &  testData,
const double  testRatio 
)

Given an input dataset, split into a training set and test set.

Example usage below. This overload places the split dataset into the two output parameters given (trainData, testData).

arma::mat input = loadData();
arma::mat trainData;
arma::mat testData;
math::RandomSeed(100); // Set the seed if you like.
// Split the dataset into a training and test set, with 30% of the data being
// held out for the test set.
Split(input, trainData, testData, 0.3);
Parameters
inputInput dataset to split.
trainDataMatrix to store training data into.
testDataMatrix to store test data into.
testRatioPercentage of dataset to use for test set (between 0 and 1).

Definition at line 103 of file split_data.hpp.

template<typename T , typename U >
std::tuple<arma::Mat<T>, arma::Mat<T>, arma::Row<U>, arma::Row<U> > mlpack::data::Split ( const arma::Mat< T > &  input,
const arma::Row< U > &  inputLabel,
const double  testRatio 
)

Given an input dataset and labels, split into a training set and test set.

Example usage below. This overload returns the split dataset as a std::tuple with four elements: an arma::Mat<T> containing the training data, an arma::Mat<T> containing the test data, an arma::Row<U> containing the training labels, and an arma::Row<U> containing the test labels.

arma::mat input = loadData();
arma::Row<size_t> label = loadLabel();
auto splitResult = Split(input, label, 0.2);
Parameters
inputInput dataset to split.
labelInput labels to split.
testRatioPercentage of dataset to use for test set (between 0 and 1).
Returns
std::tuple containing trainData (arma::Mat<T>), testData (arma::Mat<T>), trainLabel (arma::Row<U>), and testLabel (arma::Row<U>).

Definition at line 148 of file split_data.hpp.

References Split().

template<typename T >
std::tuple<arma::Mat<T>, arma::Mat<T> > mlpack::data::Split ( const arma::Mat< T > &  input,
const double  testRatio 
)

Given an input dataset, split into a training set and test set.

Example usage below. This overload returns the split dataset as a std::tuple with two elements: an arma::Mat<T> containing the training data and an arma::Mat<T> containing the test data.

arma::mat input = loadData();
auto splitResult = Split(input, 0.2);
Parameters
inputInput dataset to split.
testRatioPercentage of dataset to use for test set (between 0 and 1).
Returns
std::tuple containing trainData (arma::Mat<T>) and testData (arma::Mat<T>).

Definition at line 184 of file split_data.hpp.

References Split().