mlpack  master
Public Member Functions | Private Types | Private Attributes | List of all members
mlpack::data::DatasetMapper< PolicyType > Class Template Reference

Auxiliary information for a dataset, including mappings to/from strings and the datatype of each dimension. More...

Public Member Functions

 DatasetMapper (const size_t dimensionality=0)
 Create the DatasetMapper object with the given dimensionality. More...
 
 DatasetMapper (PolicyType &policy, const size_t dimensionality=0)
 Create the DatasetMapper object with the given policy and dimensionality. More...
 
size_t Dimensionality () const
 Get the dimensionality of the DatasetMapper object (that is, how many dimensions it has information for). More...
 
PolicyType::MappedType MapString (const std::string &string, const size_t dimension)
 Given the string and the dimension to which it belongs, return its numeric mapping. More...
 
template<typename eT >
void MapTokens (const std::vector< std::string > &tokens, size_t &row, arma::Mat< eT > &matrix)
 MapTokens turns vector of strings into numeric variables and puts them into a given matrix. More...
 
size_t NumMappings (const size_t dimension) const
 Get the number of mappings for a particular dimension. More...
 
const PolicyType & Policy () const
 Return the policy of the mapper. More...
 
PolicyType & Policy ()
 Modify the policy of the mapper (be careful!). More...
 
void Policy (PolicyType &&policy)
 Modify (Replace) the policy of the mapper with a new policy. More...
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int)
 Serialize the dataset information. More...
 
Datatype Type (const size_t dimension) const
 Return the type of a given dimension (numeric or categorical). More...
 
DatatypeType (const size_t dimension)
 Modify the type of a given dimension (be careful!). More...
 
const std::stringUnmapString (const size_t value, const size_t dimension)
 Return the string that corresponds to a given value in a given dimension. More...
 
PolicyType::MappedType UnmapValue (const std::string &string, const size_t dimension)
 Return the value that corresponds to a given string in a given dimension. More...
 

Private Types

using BiMapType = boost::bimap< std::string, typename PolicyType::MappedType >
 
using MapType = std::unordered_map< size_t, std::pair< BiMapType, size_t >>
 

Private Attributes

MapType maps
 maps object stores string and numerical pairs. More...
 
PolicyType policy
 policy object tells dataset mapper how the categorical values should be More...
 
std::vector< Datatypetypes
 Types of each dimension. More...
 

Detailed Description

template<typename PolicyType>
class mlpack::data::DatasetMapper< PolicyType >

Auxiliary information for a dataset, including mappings to/from strings and the datatype of each dimension.

DatasetMapper objects are optionally produced by data::Load(), and store the type of each dimension (Datatype::numeric or Datatype::categorical) as well as mappings from strings to unsigned integers and vice versa.

Template Parameters
PolicyTypeMapping policy used to specify MapString();

Definition at line 36 of file dataset_mapper.hpp.

Member Typedef Documentation

template<typename PolicyType >
using mlpack::data::DatasetMapper< PolicyType >::BiMapType = boost::bimap<std::string, typename PolicyType::MappedType>
private

Definition at line 146 of file dataset_mapper.hpp.

template<typename PolicyType >
using mlpack::data::DatasetMapper< PolicyType >::MapType = std::unordered_map<size_t, std::pair<BiMapType, size_t>>
private

Definition at line 151 of file dataset_mapper.hpp.

Constructor & Destructor Documentation

template<typename PolicyType >
mlpack::data::DatasetMapper< PolicyType >::DatasetMapper ( const size_t  dimensionality = 0)
explicit

Create the DatasetMapper object with the given dimensionality.

Note that the dimensionality cannot be changed later; you will have to create a new DatasetMapper object.

template<typename PolicyType >
mlpack::data::DatasetMapper< PolicyType >::DatasetMapper ( PolicyType &  policy,
const size_t  dimensionality = 0 
)
explicit

Create the DatasetMapper object with the given policy and dimensionality.

Note that the dimensionality cannot be changed later; you will have to create a new DatasetMapper object. Policy can be modified by the modifier.

Member Function Documentation

template<typename PolicyType >
size_t mlpack::data::DatasetMapper< PolicyType >::Dimensionality ( ) const

Get the dimensionality of the DatasetMapper object (that is, how many dimensions it has information for).

If this object was created by a call to mlpack::data::Load(), then the dimensionality will be the same as the number of rows (dimensions) in the dataset.

template<typename PolicyType >
PolicyType::MappedType mlpack::data::DatasetMapper< PolicyType >::MapString ( const std::string string,
const size_t  dimension 
)

Given the string and the dimension to which it belongs, return its numeric mapping.

If no mapping yet exists, the string is added to the list of mappings for the given dimension. The dimension parameter refers to the index of the dimension of the string (i.e. the row in the dataset).

Parameters
stringString to find/create mapping for.
dimensionIndex of the dimension of the string.
template<typename PolicyType >
template<typename eT >
void mlpack::data::DatasetMapper< PolicyType >::MapTokens ( const std::vector< std::string > &  tokens,
size_t &  row,
arma::Mat< eT > &  matrix 
)

MapTokens turns vector of strings into numeric variables and puts them into a given matrix.

It is uses mapping policy to store categorical values to maps. How it determines whether a value is categorical and how it stores the categorical value into map and replaces with the numerical value all depends on the mapping policy object's MapTokens() funciton.

Template Parameters
eTType of armadillo matrix.
Parameters
tokensVector of variables inside a dimension.
rowPosition of the given tokens.
matrixMatrix to save the data into.
template<typename PolicyType >
size_t mlpack::data::DatasetMapper< PolicyType >::NumMappings ( const size_t  dimension) const

Get the number of mappings for a particular dimension.

If the dimension is numeric, then this will return 0.

template<typename PolicyType >
const PolicyType& mlpack::data::DatasetMapper< PolicyType >::Policy ( ) const

Return the policy of the mapper.

Referenced by mlpack::data::DatasetMapper< PolicyType >::Serialize().

template<typename PolicyType >
PolicyType& mlpack::data::DatasetMapper< PolicyType >::Policy ( )

Modify the policy of the mapper (be careful!).

template<typename PolicyType >
void mlpack::data::DatasetMapper< PolicyType >::Policy ( PolicyType &&  policy)

Modify (Replace) the policy of the mapper with a new policy.

template<typename PolicyType >
template<typename Archive >
void mlpack::data::DatasetMapper< PolicyType >::Serialize ( Archive &  ar,
const unsigned  int 
)
inline
template<typename PolicyType >
Datatype mlpack::data::DatasetMapper< PolicyType >::Type ( const size_t  dimension) const

Return the type of a given dimension (numeric or categorical).

template<typename PolicyType >
Datatype& mlpack::data::DatasetMapper< PolicyType >::Type ( const size_t  dimension)

Modify the type of a given dimension (be careful!).

template<typename PolicyType >
const std::string& mlpack::data::DatasetMapper< PolicyType >::UnmapString ( const size_t  value,
const size_t  dimension 
)

Return the string that corresponds to a given value in a given dimension.

If the string is not a valid mapping in the given dimension, a std::invalid_argument is thrown.

Parameters
valueMapped value for string.
dimensionDimension to unmap string from.
template<typename PolicyType >
PolicyType::MappedType mlpack::data::DatasetMapper< PolicyType >::UnmapValue ( const std::string string,
const size_t  dimension 
)

Return the value that corresponds to a given string in a given dimension.

If the value is not a valid mapping in the given dimension, a std::invalid_argument is thrown.

Parameters
stringMapped string for value.
dimensionDimension to unmap string from.

Member Data Documentation

template<typename PolicyType >
MapType mlpack::data::DatasetMapper< PolicyType >::maps
private

maps object stores string and numerical pairs.

Definition at line 154 of file dataset_mapper.hpp.

Referenced by mlpack::data::DatasetMapper< PolicyType >::Serialize().

template<typename PolicyType >
PolicyType mlpack::data::DatasetMapper< PolicyType >::policy
private

policy object tells dataset mapper how the categorical values should be

Definition at line 158 of file dataset_mapper.hpp.

template<typename PolicyType >
std::vector<Datatype> mlpack::data::DatasetMapper< PolicyType >::types
private

Types of each dimension.

Definition at line 143 of file dataset_mapper.hpp.

Referenced by mlpack::data::DatasetMapper< PolicyType >::Serialize().


The documentation for this class was generated from the following file: