mlpack
master
|
Auxiliary information for a dataset, including mappings to/from strings and the datatype of each dimension. More...
Public Member Functions | |
DatasetMapper (const size_t dimensionality=0) | |
Create the DatasetMapper object with the given dimensionality. More... | |
DatasetMapper (PolicyType &policy, const size_t dimensionality=0) | |
Create the DatasetMapper object with the given policy and dimensionality. More... | |
size_t | Dimensionality () const |
Get the dimensionality of the DatasetMapper object (that is, how many dimensions it has information for). More... | |
PolicyType::MappedType | MapString (const std::string &string, const size_t dimension) |
Given the string and the dimension to which it belongs, return its numeric mapping. More... | |
template<typename eT > | |
void | MapTokens (const std::vector< std::string > &tokens, size_t &row, arma::Mat< eT > &matrix) |
MapTokens turns vector of strings into numeric variables and puts them into a given matrix. More... | |
size_t | NumMappings (const size_t dimension) const |
Get the number of mappings for a particular dimension. More... | |
const PolicyType & | Policy () const |
Return the policy of the mapper. More... | |
PolicyType & | Policy () |
Modify the policy of the mapper (be careful!). More... | |
void | Policy (PolicyType &&policy) |
Modify (Replace) the policy of the mapper with a new policy. More... | |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int) |
Serialize the dataset information. More... | |
Datatype | Type (const size_t dimension) const |
Return the type of a given dimension (numeric or categorical). More... | |
Datatype & | Type (const size_t dimension) |
Modify the type of a given dimension (be careful!). More... | |
const std::string & | UnmapString (const size_t value, const size_t dimension) |
Return the string that corresponds to a given value in a given dimension. More... | |
PolicyType::MappedType | UnmapValue (const std::string &string, const size_t dimension) |
Return the value that corresponds to a given string in a given dimension. More... | |
Private Types | |
using | BiMapType = boost::bimap< std::string, typename PolicyType::MappedType > |
using | MapType = std::unordered_map< size_t, std::pair< BiMapType, size_t >> |
Private Attributes | |
MapType | maps |
maps object stores string and numerical pairs. More... | |
PolicyType | policy |
policy object tells dataset mapper how the categorical values should be More... | |
std::vector< Datatype > | types |
Types of each dimension. More... | |
Auxiliary information for a dataset, including mappings to/from strings and the datatype of each dimension.
DatasetMapper objects are optionally produced by data::Load(), and store the type of each dimension (Datatype::numeric or Datatype::categorical) as well as mappings from strings to unsigned integers and vice versa.
PolicyType | Mapping policy used to specify MapString(); |
Definition at line 36 of file dataset_mapper.hpp.
|
private |
Definition at line 146 of file dataset_mapper.hpp.
|
private |
Definition at line 151 of file dataset_mapper.hpp.
|
explicit |
Create the DatasetMapper object with the given dimensionality.
Note that the dimensionality cannot be changed later; you will have to create a new DatasetMapper object.
|
explicit |
Create the DatasetMapper object with the given policy and dimensionality.
Note that the dimensionality cannot be changed later; you will have to create a new DatasetMapper object. Policy can be modified by the modifier.
size_t mlpack::data::DatasetMapper< PolicyType >::Dimensionality | ( | ) | const |
Get the dimensionality of the DatasetMapper object (that is, how many dimensions it has information for).
If this object was created by a call to mlpack::data::Load(), then the dimensionality will be the same as the number of rows (dimensions) in the dataset.
PolicyType::MappedType mlpack::data::DatasetMapper< PolicyType >::MapString | ( | const std::string & | string, |
const size_t | dimension | ||
) |
Given the string and the dimension to which it belongs, return its numeric mapping.
If no mapping yet exists, the string is added to the list of mappings for the given dimension. The dimension parameter refers to the index of the dimension of the string (i.e. the row in the dataset).
string | String to find/create mapping for. |
dimension | Index of the dimension of the string. |
void mlpack::data::DatasetMapper< PolicyType >::MapTokens | ( | const std::vector< std::string > & | tokens, |
size_t & | row, | ||
arma::Mat< eT > & | matrix | ||
) |
MapTokens turns vector of strings into numeric variables and puts them into a given matrix.
It is uses mapping policy to store categorical values to maps. How it determines whether a value is categorical and how it stores the categorical value into map and replaces with the numerical value all depends on the mapping policy object's MapTokens() funciton.
eT | Type of armadillo matrix. |
tokens | Vector of variables inside a dimension. |
row | Position of the given tokens. |
matrix | Matrix to save the data into. |
size_t mlpack::data::DatasetMapper< PolicyType >::NumMappings | ( | const size_t | dimension | ) | const |
Get the number of mappings for a particular dimension.
If the dimension is numeric, then this will return 0.
const PolicyType& mlpack::data::DatasetMapper< PolicyType >::Policy | ( | ) | const |
Return the policy of the mapper.
Referenced by mlpack::data::DatasetMapper< PolicyType >::Serialize().
PolicyType& mlpack::data::DatasetMapper< PolicyType >::Policy | ( | ) |
Modify the policy of the mapper (be careful!).
void mlpack::data::DatasetMapper< PolicyType >::Policy | ( | PolicyType && | policy | ) |
Modify (Replace) the policy of the mapper with a new policy.
|
inline |
Serialize the dataset information.
Definition at line 126 of file dataset_mapper.hpp.
References mlpack::data::CreateNVP(), mlpack::data::DatasetMapper< PolicyType >::maps, mlpack::data::DatasetMapper< PolicyType >::Policy(), and mlpack::data::DatasetMapper< PolicyType >::types.
Datatype mlpack::data::DatasetMapper< PolicyType >::Type | ( | const size_t | dimension | ) | const |
Return the type of a given dimension (numeric or categorical).
Datatype& mlpack::data::DatasetMapper< PolicyType >::Type | ( | const size_t | dimension | ) |
Modify the type of a given dimension (be careful!).
const std::string& mlpack::data::DatasetMapper< PolicyType >::UnmapString | ( | const size_t | value, |
const size_t | dimension | ||
) |
Return the string that corresponds to a given value in a given dimension.
If the string is not a valid mapping in the given dimension, a std::invalid_argument is thrown.
value | Mapped value for string. |
dimension | Dimension to unmap string from. |
PolicyType::MappedType mlpack::data::DatasetMapper< PolicyType >::UnmapValue | ( | const std::string & | string, |
const size_t | dimension | ||
) |
Return the value that corresponds to a given string in a given dimension.
If the value is not a valid mapping in the given dimension, a std::invalid_argument is thrown.
string | Mapped string for value. |
dimension | Dimension to unmap string from. |
|
private |
maps object stores string and numerical pairs.
Definition at line 154 of file dataset_mapper.hpp.
Referenced by mlpack::data::DatasetMapper< PolicyType >::Serialize().
|
private |
policy object tells dataset mapper how the categorical values should be
Definition at line 158 of file dataset_mapper.hpp.
|
private |
Types of each dimension.
Definition at line 143 of file dataset_mapper.hpp.
Referenced by mlpack::data::DatasetMapper< PolicyType >::Serialize().