This page is available in English at this time.
hermesAI™

The Thai character confounding NLP engines

5 min

Bangkok, Thailand

If you’ve ever attempted to learn Thai, you can assume that this Southeast Asian language is extremely difficult if not the most difficult for machines to also understand. 

Thai is a character-based language with numerous quirks that disrupt natural language processing algorithms. Because of these quirks, leading NLP engines fail to understand Thai beyond the surface-level, causing an underwhelming customer experience. Why?

Thai is a notoriously difficult language for natural language processing engines to understand.

First, the language consists of several types of interjection words in a single sentence. Many of these words do not carry any meaning relevant to the sentence’s intent; these words are most often used to indicate emotion or an expression of politeness. At this point, you might be thinking: easy, just remove these words. Proto’s NLP team would have loved that too!

However, there is presently no accurate word separation technique for the Thai language. 

In English, we label such interjection words with a simple Part-of-Speech Tagging technique: a dictionary defines the problematic words and the NLP removes them. For example:

Oh! I can fly now.”

We would remove 'Oh' without disrupting the sentence’s intent. It is not so straightforward in Thai.

Let’s examine ค่ะ or คะ interjection words used frequently by female speakers to indicate politeness. The POS Tagging technique is unfeasible because the Thai dictionary cannot accurately separate words that contain ค่ะ or คะ in all of their various intents. For example:

“อยากรู้มั้ยคะ (Do you want to know?)”
“อยากรู้มั้ยคะแนนเท่าไหร่ (Do you want to know the score?)” 

In the first sentence, คะ is an interjection word indicating the politeness with which the speakers asks อยากรู้มั้ย

In the second sentence, คะ is a character within the word คะแนน (score). So you see, in order to identify ค่ะ or คะ in all their various forms, an NLP engine that actually understands Thai would require a significantly more advanced algorithm with additional pre-processing steps.

With the POS Tagging technique rendered useless, we could look to another method called Word Embedding.

This method converts each word in a language into a vector of numbers that represents some aspect of its meaning. Embedding is a common technique, nearly-universal across a wide range of NLP tasks; however, in the case of our คะ conundrum, embedding also has disqualifying limitations.

Popular word embedding libraries with Thai capability, such as 'Word2vec' and 'fasttext' are both trained on shallow language-modelling tasks that result in a loss of context, which in turn results in a misunderstanding of intent. For example:

“I stole money from the bank
“The bank of the river overflowed”

The word bank has different meanings according to the context of each sentence. The word embedding technique assigns a single vector to each word, which is forced to represent this wide range of possible intents. In Thai, relying solely on these vectors without context is a recipe for (shall we say politely) 'intellectually-challenged' chatbots.

These limitation across various techniques has led to demand from Proto's clients for what they call “real Thai NLP”. Specifically, they want chatbots that understand the language’s quirks.

So, without revealing Proto’s secret sauce, our NLP team developed a deep-learning model that trains a neural network to map word vectors according to (1) the entire sentence and (2) a word’s surrounding words. As a result, this deep-learning method delivers Thai chatbots powered by a more contextual word embedding algorithm.

Thus far, the algorithm has proven robust across various NLP tasks such as sentiment analysis and intent classification. In the example below, a Thai job application chatbot understands human intent with and without ค่ะ.

Proto's deep-learning technique maps Thai word vectors with a contextual embedding algorithm.

The commercial application of this deep-learning technique from Proto are far-reaching: enterprises can now deploy chatbots that not only deliver a savings advantage, but also a more humanized Thai customer experience compared to the competition.

Stay tuned for more innovations and insights from the NLP team at Proto!

Free trial

Dr. Natapon Pantuwong is an NLP engineer at Proto, specialized in deep-learning techniques for the Thai language. He served on the faculty of the computer science department at King Mongkut's Institute of Technology Ladkrabang for eleven years. To reach Dr. Pantuwong, please write to him at natapon@proto.cx.

Sign up today

Proto offer exclusive solutions. Share your info below and we'll get back to you with the best offer

Thank you for your interest in Proto product!
Our team will reach out to you shortly with more information.
(!) Something went wrong while submitting the form. Please try again.
Article
Team
Awards

Proto wins the 2019 Innovation Launchpad at iGaming Asia Congress

Proto wins the 2019 Innovation Launchpad Award at iGaming Asia Congress, the 2nd edition of Asia’s longest running C-level event for the online gaming industry.

Article
Solutions
HermesAI™

The Thai character confounding NLP engines

Proto broke the code as it includes Thai, the extremely difficult Southeast Asian language to its multilingual AI Customer Experience (AICX) solution.

Article
Team
Awards

Proto takes finalist position at AFI Global Policy Forum fintech showcase

Proto joined the list of 10 innovators chosen for their role in shaping the future of financial inclusion by the AFI Global Policy Forum's 1st fintech showcase.

Article
Partners

Proto partners with AIMS to empower Africa’s brightest machine learning graduates

Proto begins partnership with the African Institute for Mathematical Sciences (AIMS) empower Africa’s brightest machine learning graduates.

Article
Solutions
Financial

Africa’s central banks intend to deploy Proto's financial complaints automation for over 300M potential consumers

Africa’s central banks pick interest in Proto's financial complaints automation for over 300M potential consumers.

Article
Solutions

Proto releases AICX v2 product suite

Learn about the upgraded version v2 of Proto's product suite, with key features & language capabilities of its AI Customer Experience (AICX) solutions.

Article
Team

Sinitic is now Proto

Proto formerly known as Sinitic raises $2.1M for emerging market expansion improving customer experience and consumer protection.

Article
Partners

Proto partners with BFA Global for financial complaints automation solution at Africa's central banks

Automated solutions to help resolve customers complaints for Africa's central banks now made possible through Proto and BFA partnership.

Article
Solutions
Health

Proto provides free COVID-19 awareness chatbots to Global South healthcare services

Global South healthcare services leverage the free multi-lingual chatbots provided by Proto for COVID-19 awareness amongst the bottom-of-the-pyramid citizens.

Article
Solutions
Health

COVID-19 education chatbot launched for Nigeria’s Federal Capital Territory

Nigeria’s Federal Capital Territory (FCT) leverages Proto's solution provide support and educate the 2+ million multilingual residents of the city.

Article
Team

In 2021, AI customer experience is about inclusion

Proto leads as AI customer experience changes focus toward equality and inclusion for the year 2021 and the years ahead.

Article
Partners

MindwayAI and Proto partner up to localize responsible gaming

Mindway AI and Proto join forces in the launching of a multilingual virtual agent powered by GameScanner’s to localize responsible gaming.

Article
Solutions
Health

Proto to deploy multilingual chatbot for Ayurvedic medicare in India

Proto multilingual AI Customer Experience (AICX) solution adopted by Hempstreet, India’s leading cannabis-based Ayurvedic medicare provider.

Article
Solutions
Financial

African Development Bank and Proto to automate financial consumer protection in Ghana, Rwanda, and Zambia

Partnership between African Development Bank and Proto to automate financial consumer protection in Ghana, Rwanda, and Zambia.

Article
Partners

PaySwitch and Proto partner to provide conversational payment processing to Africa’s commercial banks

New partnership between PaySwitch and Proto to provide conversational payment processing to Africa’s commercial banks.

Article
Partners

Africa’s Talking and Proto partner to enable conversational financial services across Africa’s SMS networks

Proto provides African central banks and financial service providers with consumer support automation for African languages via SMS feature phones.

Article
Partners

Proto raises $1.8M seed round for emerging market expansion

Read how Yolo Investments boosts Proto’s expansion and position as a leader in inclusive chatbots and contact centre automation across emerging markets.

Article
Solutions
Gaming

3 questions for gaming operators before investing in machine translation for player support

90% of multilingual player queries should go to NLP-powered chatbots before using machine translation between gamers and agents.

Article
Partners

Emergent Payments and Proto partner to deploy chatbot payments across Africa's merchants

Integrated solution between Emergent Payments and Proto to power pan-African payment processing with local language chatbots.

Article
Team
Awards

Proto named to the Inclusive Fintech 50 by Visa, MetLife, IFC and Accion

Visa, the MetLife Foundation, Accion and the International Finance Corporation (IFC) of the World Bank Group recognized Proto as a promising early-stage fintech.

Article
Solutions
Financial

Central Bank of Liberia deploys chatbot to expand access to financial consumer protection

Central Bank of Liberia deploys Proto AICX to help financial consumers seek recourse against financial service providers.

Article
Solutions
Financial

GoodTech deploys Philippines chat banking with Proto’s AICX technology

Proto’s AI Customer Experience (AICX) solution for financial chat banking has been deployed by GoodTech across underserved communities in the Philippines.

Article
Solutions
Financial

Bank of Ghana expands access to financial consumer protection with Proto’s inclusive chatbot technology

Proto’s AI Customer Experience (AICX) solution for consumer protection has been deployed across 805 financial institutions in Ghana in partnership with the African Development Bank.