Everything you need to know about sentiment analysis accuracy
When potential customers approach us, one of the first questions we’re asked is “How accurate is your sentiment analysis engine?". Well as any good MBA graduate would tell you, the correct answer is “It depends.” That might sound like a cop-out and a way to avoid answering the question, but in fact, it’s the most accurate response one can give.
Let’s see why. Take this sentence:
“It is sunny today."
Positive or negative? Most would perhaps say positive - Repustate would score this as being neutral as no opinion or intent is being stated, merely a fact. For those who would argue that this sentence is positive, let’s tweak this sentence:
“It is sunny today, my crops are getting dried out and will cause me to go bankrupt."
Well, that changes things doesn’t it? Put into a greater context and the polarity (positive or negative sentiment) of a phrase can change dramatically. That’s why it is difficult to achieve 100% accuracy with sentiment.
Let’s take a look at another example. From a review of a horror movie:
“It will make you quake in fear."
Positive or negative? Well, for a horror movie, this is a positive because horror movies are supposed to be scary. But if we were talking about watching a graphic video about torture, then the sentiment is negative. Again, context matters here more than the actual words themselves.
What about when we introduce conjunctions into the mix:
I thought the movie was great, but the popcorn was stale and too salty.
The first part of the sentence is positive, but the second part is negative. So what is the sentiment of this sentence? At Repustate, we would say “It depends.” The sentiment for the movie is positive; the sentiment surrounding the popcorn is negative. The question is: which topic are you interested in analyzing?
There is no one TRUE sentiment engine
As the previous examples have demonstrated, sentiment analysis is tricky and highly contextual. As a result, any benchmarks that companies post have to be taken with a pinch of salt. Repustate’s approach is the following:
- Make our global model as flexible as possible to catch as many cases as possible without being too inconsistent
- Allow customers the ability to load their own custom, domain-specific, definitions via our API (e.g. Make the phrase “quake in fear” positive for movie reviews)
- Allow sentiment to be scored in the context of only one topic, again, via our API
When shopping for sentiment analysis solutions, ask potential vendors about these points. Make sure whichever solution you end up going with can be tailored to your specific domain set because chances are your data has unique attributes and characteristics that might be overlooked or incorrectly accounted for a by a one-size-fits-all sentiment engine.