This class converts any array of characters to the english alphabet, removing any accent marks and replacing all non-english characters (i.e. "ü" will be converted to "ue" and "á" will be "a").
File | common/lib/transliterator.h |
Classes |
Transliterator |
Class that converts any array of characters to the english alphabet.
class Transliterator {
typedef struct {
const char * i; // utf8-encoded string with one or more ucs2 source characters
const char * o; // utf8-encoded string holding one or more ucs2 destination characters
} MapEntryUtf8;
typedef std::map TUCS2CharToStringMap;
TUCS2CharToStringMap ucs2CharToStringMap;
bool SetRuleBasedTransliteration1(MapEntryUtf8 * map, unsigned szMap);
const word * MapUCS2CharToString(word ucs2Char);
void MapUCS2Set(word ucs2Char, word * ucs2String, unsigned len);
void MapUCS2Remove(word ucs2Char);
void ToLower(char * s, unsigned &len);
void ToLower(char * s);
public:
Transliterator();
virtual ~Transliterator();
std::list TransformWords(const char * wordsIn, bool unique);
char * Transform(const char * wordIn);
};
Contructor.
The function that will split the string, convert everything to lower case and transliterate it to the english alphabet. The output will be a list of strings without accent marks and non-english characters.
const char * wordsIn | The string that will be splited by whitespace and processed. |
bool unique | If true, the output array will not contain duplicated strings. |
This is the function that returns the input string transliterated.
const char * wordsIn | The string that will be transliterated. |