Transliterator

This class converts any array of characters to the english alphabet, removing any accent marks and replacing all non-english characters (i.e. "ü" will be converted to "ue" and "á" will be "a").

File information

Filecommon/lib/transliterator.h

Classes Transliterator

Classes

Transliterator

Class that converts any array of characters to the english alphabet.

class Transliterator {
    typedef struct {
        const char * i; // utf8-encoded string with one or more ucs2 source characters
        const char * o; // utf8-encoded string holding one or more ucs2 destination characters
    } MapEntryUtf8;

    typedef std::map TUCS2CharToStringMap;
    TUCS2CharToStringMap ucs2CharToStringMap;

    bool SetRuleBasedTransliteration1(MapEntryUtf8 * map, unsigned szMap);
    const word * MapUCS2CharToString(word ucs2Char);
    void MapUCS2Set(word ucs2Char, word * ucs2String, unsigned len);
    void MapUCS2Remove(word ucs2Char);
    void ToLower(char * s, unsigned &len);
    void ToLower(char * s);

public:
    Transliterator();
    virtual ~Transliterator();
    std::list TransformWords(const char * wordsIn, bool unique);
    char * Transform(const char * wordIn);
};

Transliterator

Contructor.

TransformWords

The function that will split the string, convert everything to lower case and transliterate it to the english alphabet. The output will be a list of strings without accent marks and non-english characters.

Parameters
const char * wordsIn The string that will be splited by whitespace and processed.
bool unique If true, the output array will not contain duplicated strings.

Transform

This is the function that returns the input string transliterated.

Parameters
const char * wordsIn The string that will be transliterated.