Multilingual Sentiment Analysis For Customer Insights
Automation is key to operational and economic efficiency. But when machine translation does not pick up on cultural subtleties, contextual references, and colloquialisms expressed in comments and reviews, we have to ask if it really serves its purpose. Employing a solution that uses translations in its algorithm gives inaccurate results precisely for this reason - it cannot discover the nuances of the language it’s translating. This article takes a look at the importance of a multilingual sentiment analysis solution that analyses the Voice of the Customer (VoC) in the native language, without having to resort to translations.
Why we need multilingual sentiment analysis
English is the global language of business, but it is not the native language of a majority of the world’s customers. In fact, only 13% of the world speaks English. What about the other 87% of consumers? Businesses value the feedback of all their customers so that they can cater to all of them regardless of their geography and language. After all, the main goal of businesses is to acquire new customers, and keep the ones they already have.
Companies thus need to understand the feelings and opinions of their customers, even if they express them in their native language. Although great in theory, this is difficult for most marketers since the manual analysis of each comment and review is not only time-consuming but also a very expensive task in the long run. It is not a sustainable model.
That’s why it is vital to have a native, multilingual approach to applying sentiment analysis to identify and analyze customer emotions across social media, surveys, customer service tickets. Thankfully, AI and Machine learning have made the automation of VoC analysis simpler. All that companies need to ensure is that the model they choose offers the same high accuracy score while analyzing all the languages they intend to use it for.
Language And The Challenge Of Translation
Translation is the process of transferring words or text from one language into another. Sounds simple right? Wrong. Due to the complexities of language in general, and the specific differences between various languages, translation is rarely a word-for-word transfer from say English to Italian, or vice versa. Translations must deal with contrasts between basic linguistic building blocks such as grammar, syntax, semantics, lexicons, morphology, and even tonality in the case of languages like Mandarin, Thai and Punjabi. Add in all the various literary devices that people use such as idioms, sarcasm and slang, and it’s easy to see that trying to accurately capture the intended meaning of a text through translation can be a daunting task.
But when it comes to businesses trying to know their consumers, what we really want to understand is the true intended meaning of their feelings when it comes to prices, product features, customer service, quality, etc. In other words, when it comes to analyzing Voice of the Customer, accuracy is everything.
See Repustate's multilingual sentiment analysis in action
Multilingual Sentiment Analysis
So what is multilingual sentiment analysis? I think many of us know what sentiment analysis is, basically the identification, extraction and analysis of consumer feelings and opinions expressed through social media or customer surveys. Multilingual sentiment analysis allows extraction of brand insights from customer feedback in the native language without using translation. It is indeed one of the most important features of sentiment analysis tools.
How does Repustate perform Sentiment Analysis in 23 Languages?
Repustate uses a group of semantic technologies to calculate sentiment by applying language specific rules to each piece of text. That means there isn’t one “true” algorithm; what works in English doesn’t necessarily apply to Arabic.
Repustate’s sentiment is done natively for each language. There are no intermediary translations being done, which means the accuracy is much higher.
With that said, regardless of the language being analyzed, each block of text goes through the following set of transformations:
Step 1: Part of speech tagging
This involves classifying each word at a grammatical level. That is, identify which words are nouns, verbs, adjectives, adverbs etc. and which words are objects of others. Repustate identifies conjunctions and subordinate clauses, prepositional phrases and noun phrases - all to help the Repustate engine “understand” the true meaning of the text.
Repustate has developed its own part of speech tagger for each language used. Part of speech tagging is done by first accumulating a massive corpus of pre-tagged text (i.e. humans have gone through and tagged words into their respective part of speech tags). With this information, Repustate trains a part of speech tagger and relies on probabilities to determine the correct part of speech for a given word in a given context. For example, the word “like” can be a verb (“I like you”) and a preposition (“He looks like his brother”). In the first case, “like” connotes positive sentiment, but in the second case, it does not.
In English, there are a handful of words that have this double (or triple) meaning where in some context, a word has sentiment, and in others it doesn’t. But it gets quite complex while doing Arabic sentiment analysis, as in Arabic, the words can have up to 12 different meanings given the surrounding context.
To accurately perform sentiment, you need a very finely tuned and well trained part of speech tagger. It must be language specific as some languages have a much more complex morphology than others.
Step 2: Lemmatization
The next step is to lemmatize each word where applicable. Lemmatization is the process of determining the root of a word. For example, “loved”, “loving”, “lover” are all based on the root word “love”. To make sure no word goes unanalyzed, a proper lemmatizer is required and again, it must be language specific. The rules of conjugating nouns and verbs based on number, gender, tense etc. differ wildly from language to language. Repustate handles this all for each language.
Step 3: Prior Polarity
There are many words that even without any surrounding context, immediately connote sentiment. Words like “love”, “hate”, “despise” etc. have an immediate polarizing effect. Sentiment analysis relies on having an exhaustive list of terms that have prior polarity in order to provide a foundation for determining sentiment.
Step 4: Negations, amplifiers & other grammatical constructs
We almost have all the tools we need, but if we stopped right here, Repustate’s sentiment analysis would be inaccurate in any but the most trivial cases. Repustate now layers on nuanced grammatical aspects, unique to each language, including negations and amplifiers.
A negation reverses the polarity of the following (or sometimes, preceding) term. Consider the difference between “I like coffee” and “I do not like coffee”. In some languages, the negation comes first, in some, it comes after, and in some it appears at the end of the sentence (Turkish for example). Repustate is aware of all these language specific nuances.
The phrase “could not have been” is what we call an “amplifier”. Even though it contains a negation (“not”), what it actually does is amplify the term or phrase that follows it (e.g. “This vacation could not have been better”) Conjunctions and subordinate clauses often act to contradict their preceding component. Consider the phrase “I wanted to like the movie, but it was so boring”. The first half of the sentence indicates positive sentiment but the conjunction “but” then works to counter the sentiment.
Negations, amplifiers and other grammatical constructs are what make determining sentiment complex. Repustate’s sentiment analysis handles these complexities quickly and thoroughly for all languages.
Step 5: Wrapping it all up using machine learning
Repustate uses machine learning to calculate a sentiment score that combines various factors including the presence of terms with prior polarity, any negations, amplifiers or other grammatical constructs, as well as the length of the text. Shorter text with a high ratio of polarizing terms to non-polarizing terms leads to a score closer to the bounds of -1 (true negative) and 1 (true positive). A score of 0 or very close to 0 (±0.05) can be interpreted as being neutral; either there was no sentiment expressed or it was ambiguous.
This step by step approach allows our API to overcome some of the most complicate sentiment analysis challenges.
Take a quick tour of Repustate's multilingual sentiment analysis solution
What is Voice of the Customer?
Voice of the customer, or VoC is the practice of analyzing customer feedback to improve your product, solution, or service. VoC is typically gauged by utilizing customer survey tools and feedback systems. Most companies understand the importance of customer feedback analysis and how it can supercharge your customer experience efforts, but a lot of organizations are relying on archaic methods to extract the data they need to better understand what their customers are trying to say.
Marketing departments, PR agencies, and even some automated social media listening tools, often use automated, machine translators like Google and Amazon to change foreign language text into English to conduct sentiment analysis on the text. One major disadvantage of machine translation is its inability to pick up on cultural nuances, contextual content clues, and local slang. This results in content that can feel mechanical, stilted and full of errors.
By using customer feedback with Voice of the Customer analysis, organizations can gain valuable insights into where in their organization they are doing a good job and where they might need some work. Consumer insighting doesn’t have to be over-complicated; with Repustate, use a 21st-century solution to get the best results. Get into the minds of your customers using opinion mining and read between the survey lines. To ensure that these insights can be fully leveraged, it is important for them to be precise, which means analyzing them in the native language in which the initial opinions are spoken.
How is Multilingual Sentiment Analysis related to VoC?
A multilingual approach to sentiment analysis begins with the simple belief that to understand Voice of the Customer, you must analyze consumer opinions and feelings in the original language in which they were expressed. If those comments are Portuguese, then they must be reviewed in that language, etc. For practitioners of this approach to text analytics, translation and inaccuracy go hand-in-hand as the two most severe errors you can commit as a brand marketer, product manager or market researcher.
Marketing departments, advertising agencies, and even some automated social listening and social monitoring tools, often use automated, machine translators like Google and Amazon to change one language to another to conduct sentiment analysis on the text. One major disadvantage of machine translation is its inability to pick up on cultural nuances, contextual content clues, and local slang. This results in content that can feel mechanical, stilted and full of errors. Translating any text data, be it a social media post or a product review, can reduce sentiment analysis accuracy by almost 20%. That’s huge if you are looking to use the results to guide your change management. In any form of data analytics inaccuracy is a cardinal sin.
Multilingual Semantic Similarity
Repustate provides sentiment analysis in 23 languages and dialects. The engine reads all the languages natively, thanks to dedicated natural language processing algorithms for each language and dialect. Because of this unique ability, and powered by AI and machine learning, Repustate’s sentiment analysis solution understands the semantic similarity between different languages. This inevitably leads to accurate analysis of reviews and comments, be in it any industry - from healthcare to restaurants.
Languages natively supported are English, Arabic, Portuguese, German, Dutch, Danish, Italian, Swedish, Finnish, Norwegian, Polish, Russian, French, Thai, Korean, Spanish, Urdu, Chinese, Turkish, Hebrew, Malaysian, Japanese, and Indonesian.
Check out the key applications of sentiment analysis solutions in important business areas today.