Multilingual Sentiment Analysis For Customer Insights
Automation is key to operational and economic efficiency. But when machine translation does not pick up on cultural subtleties, contextual references, and colloquialisms expressed in comments and reviews, we have to ask if it really serves its purpose. Employing a solution that uses translations in its algorithm gives inaccurate results precisely for this reason - it cannot discover the nuances of the language it’s translating. This article takes a look at the importance of a multilingual sentiment analysis solution that analyses the Voice of the Customer (VoC) in the native language, without having to resort to translations.
Why we need multilingual sentiment analysis
English is the global language of business, but it is not the native language of a majority of the world’s customers. In fact, only 13% of the world speaks English. What about the other 87% of consumers? Businesses value the feedback of all their customers so that they can cater to all of them regardless of their geography and language. After all, the main goal of businesses is to acquire new customers, and keep the ones they already have.
Companies thus need to understand the feelings and opinions of their customers, even if they express them in their native language. Although great in theory, this is difficult for most marketers since the manual analysis of each comment and review is not only time-consuming but also a very expensive task in the long run. It is not a sustainable model.
That’s why it is vital to have a native, multilingual approach to applying sentiment analysis to identify and analyze customer emotions across social media, surveys, customer service tickets. Thankfully, AI and Machine learning have made the automation of VoC analysis simpler. All that companies need to ensure is that the model they choose offers the same high accuracy score while analyzing all the languages they intend to use it for.
Learn more about Social media listening.
Language And The Challenge Of Translation
Translation is the process of transferring words or text from one language into another. Sounds simple right? Wrong. Due to the complexities of language in general, and the specific differences between various languages, translation is rarely a word-for-word transfer from say English to Italian, or vice versa. Translations must deal with contrasts between basic linguistic building blocks such as grammar, syntax, semantics, lexicons, morphology, and even tonality in the case of languages like Mandarin, Thai and Punjabi. Add in all the various literary devices that people use such as idioms, sarcasm and slang, and it’s easy to see that trying to accurately capture the intended meaning of a text through translation can be a daunting task.
But when it comes to businesses trying to know their consumers, what we really want to understand is the true intended meaning of their feelings when it comes to prices, product features, customer service, quality, etc. In other words, when it comes to analyzing Voice of the Customer, accuracy is everything.
Multilingual Sentiment Analysis
So what is multilingual sentiment analysis? I think many of us know what sentiment analysis is, basically the identification, extraction and analysis of consumer feelings and opinions expressed through social media or customer surveys. Multilingual sentiment analysis allows extraction of brand insights from customer feedback in the native language without using translation. It is indeed one of the most important features of sentiment analysis tools.
How does Repustate perform Sentiment Analysis in 23 Languages?
Repustate uses a group of semantic technologies to calculate sentiment by applying language specific rules to each piece of text. That means there isn’t one “true” algorithm; what works in English doesn’t necessarily apply to Arabic.
Repustate’s sentiment is done natively for each language. There are no intermediary translations being done, which means the accuracy is much higher.
With that said, regardless of the language being analyzed, each block of text goes through the following set of transformations:
Step 1: Part of speech tagging
This involves classifying each word at a grammatical level. That is, identify which words are nouns, verbs, adjectives, adverbs etc. and which words are objects of others. Repustate identifies conjunctions and subordinate clauses, prepositional phrases and noun phrases - all to help the Repustate engine “understand” the true meaning of the text.
Repustate has developed its own part of speech tagger for each language used. Part of speech tagging is done by first accumulating a massive corpus of pre-tagged text (i.e. humans have gone through and tagged words into their respective part of speech tags). With this information, Repustate trains a part of speech tagger and relies on probabilities to determine the correct part of speech for a given word in a given context. For example, the word “like” can be a verb (“I like you”) and a preposition (“He looks like his brother”). In the first case, “like” connotes positive sentiment, but in the second case, it does not.
In English, there are a handful of words that have this double (or triple) meaning where in some context, a word has sentiment, and in others it doesn’t. But it gets quite complex while doing Arabic sentiment analysis, as in Arabic, the words can have up to 12 different meanings given the surrounding context.
To accurately perform sentiment, you need a very finely tuned and well trained part of speech tagger. It must be language specific as some languages have a much more complex morphology than others.
Step 2: Lemmatization
The next step is to lemmatize each word where applicable. Lemmatization is the process of determining the root of a word. For example, “loved”, “loving”, “lover” are all based on the root word “love”. To make sure no word goes unanalyzed, a proper lemmatizer is required and again, it must be language specific. The rules of conjugating nouns and verbs based on number, gender, tense etc. differ wildly from language to language. Repustate handles this all for each language.
Step 3: Prior Polarity
There are many words that even without any surrounding context, immediately connote sentiment. Words like “love”, “hate”, “despise” etc. have an immediate polarizing effect. Sentiment analysis relies on having an exhaustive list of terms that have prior polarity in order to provide a foundation for determining sentiment.
Step 4: Negations, amplifiers & other grammatical constructs
We almost have all the tools we need, but if we stopped right here, Repustate’s sentiment analysis would be inaccurate in any but the most trivial cases. Repustate now layers on nuanced grammatical aspects, unique to each language, including negations and amplifiers.
A negation reverses the polarity of the following (or sometimes, preceding) term. Consider the difference between “I like coffee” and “I do not like coffee”. In some languages, the negation comes first, in some, it comes after, and in some it appears at the end of the sentence (Turkish for example). Repustate is aware of all these language specific nuances.
The phrase “could not have been” is what we call an “amplifier”. Even though it contains a negation (“not”), what it actually does is amplify the term or phrase that follows it (e.g. “This vacation could not have been better”) Conjunctions and subordinate clauses often act to contradict their preceding component. Consider the phrase “I wanted to like the movie, but it was so boring”. The first half of the sentence indicates positive sentiment but the conjunction “but” then works to counter the sentiment.
Negations, amplifiers and other grammatical constructs are what make determining sentiment complex. Repustate’s sentiment analysis handles these complexities quickly and thoroughly for all languages.
Step 5: Wrapping it all up using machine learning
Repustate uses machine learning to calculate a sentiment score that combines various factors including the presence of terms with prior polarity, any negations, amplifiers or other grammatical constructs, as well as the length of the text. Shorter text with a high ratio of polarizing terms to non-polarizing terms leads to a score closer to the bounds of -1 (true negative) and 1 (true positive). A score of 0 or very close to 0 (±0.05) can be interpreted as being neutral; either there was no sentiment expressed or it was ambiguous.
This step by step approach allows our API to overcome some of the most complicate sentiment analysis challenges.
Monolingual Sentiment Analysis And Multilingual Sentiment Analysis
Monolingual sentiment analysis is the process of emotion mining from data that is in a single language. Multilingual sentiment analysis refers to the process of gathering emotion insights from data that may be in several languages. This is usually the case when extracting sentiment from customer feedback data from a multi-ethnic audience or from different geographical locations with a homogenous customer base.
Monolingual Vs Multilingual Sentiment Analysis
While monolingual sentiment analysis can be fairly easy, analyzing sentiment in multilingual customer feedback or brand experience data needs additional effort to ensure that you get accurate insights from the data. A sentiment analysis platform needs to have the capability to read, understand, and analyze every language in its native tongue.
This means that the model needs to have speech taggers for each language it processes. As a marketer, this is of utmost importance because otherwise, you will end up having software at your hands that uses machine translations for multilingual sentiment analysis. The most critical disadvantage of having a sentiment analysis tool that uses machine translation is its inability to pick up on cultural nuances, contextual content clues, and local slang. This is because grammar rules, semantics, sentence structures, etc, can all vary to extreme lengths based on different groups of languages (Germanic, Romance, etc.). Machine-translated data can thus feel mechanical, stilted, and full of errors, be it a social media post or a product review, and can reduce sentiment analysis accuracy by almost 20%.
This may not be a major issue if you are using machine translations for the purposes of travel or leisure, but it’s a huge one if you are looking to use the results to guide your growth and management strategies.
What Are The Different Types Of Sentiment Analysis?
There are three main types of sentiment analysis based on the granularity of insights you need.
1. Document-based sentiment analysis
In document-based multilingual data analysis, emotion mining is conducted based on word representation, the composition of the document, and sentence structures. This kind of sentiment analysis is apt for documents with short answers such as surveys with character limits or forced-choice surveys, so long as the responses are straightforward.
2. Topic-based sentiment analysis
This type of sentiment analysis is used for consumer feedback data that is more descriptive in nature. Multilingual sentiment analysis of such data means that native natural language processing tasks analyze customer reviews and comments by breaking down the text into topics that it extracts from the data.
For example, in banking reviews, topics may be “online banking”, “customer service”, or “deposits”, and for a restaurant, it may be “food”, “convenience”, or “ambiance”, etc. Sentiment analysis thus derived can give marketers and companies a very good idea as to what kind of improvements they need to make to make customers happy.
3. Aspect-based sentiment analysis
Aspect-based sentiment analysis provides the most granular insights because it further breaks down topics and categorizes the most important aspects it discovers in them. Thus, from the topic “online banking”, it can find elements that it can categorize into different aspects such as “online balance checks”, “online transactions, “customer care”, and so on. Doing so gives even more insight into common customer grievances as well as the chance to prioritize investment in areas that are working well or need attention. Multilingual sentiment analysis can obtain rich insights from aspect-based sentiment analysis only if the data is processed by a tool using native part-of-speech taggers. This is the only way you can be sure that you harness all aspects from your data for fine-grained customer emotion mining.
Learn more about Types of Sentiment Analysis Methodology.
Multilingual Semantic Similarity
Repustate provides Sentiment Analysis API in 23 languages and dialects. The engine reads all the languages natively, thanks to dedicated natural language processing algorithms for each language and dialect. Because of this unique ability, and powered by AI and machine learning, Repustate’s sentiment analysis solution understands the semantic similarity between different languages. This inevitably leads to accurate analysis of reviews and comments, be in it any industry - from healthcare to restaurants.
Languages natively supported are English, Arabic, Portuguese, German, Dutch, Danish, Italian, Swedish, Finnish, Norwegian, Polish, Russian, French, Thai, Korean, Spanish, Urdu, Chinese, Turkish, Hebrew, Malaysian, Japanese, and Indonesian.
Repusate’s AI-powered sentiment analysis solution for customer experience, brand experience, and employee experience analyzes data in 23 languages and dialects through native natural language processing. Infinitely scalable, the platform has a processing speed of 1,000 reviews per second and provides accuracy in insights upwards of 85% for multilingual sentiment analysis.