Top 8 Text Analytics Tools for Natural Language Processing
In this article, we’ll look at 8 Text Analytics tools and the most important text analytics features. We’ll also compare them based on the accuracy and semantic similarity of the results they generate.
From 2019 to 2020 there was a significant uptick in the use of text analytics tools (an 11% increase), with 61% of organizations using text analytics as a tool for market research in 2020. Given the power of text analytics to generate useful insight into customer behavior and opinion, this comes as no surprise.
Text analytics is an AI-powered technique that uses natural language processing (NLP) to extract information relevant to a company from big data and can run that data for sentiment analysis. It comprises named entity recognition (NER) that ensures that any important information specific to your industry or brand is recognized and extracted for sentiment analysis. For example, the metadata in NER can pick up a name and identify it as a person, brand, place, etc. The richer the NER, the richer the information. For instance, the NER capability can tell you about the person, his career (sportsperson), where he was born, which club he plays for, which politician or charity he supports, and data like this.
Which are the most important text analytics features?
Here are some of the key features of text analytics software:
Accuracy: Offers accurate results based on its training
Granularity: Identifies individual elements for feature and aspect classification
Multilingual: Automatic language detection and analysis
Speed: High data processing speed even at scale
Hosts a Knowledge Graph of the relationship between entities (people, places, events)
Offers insights on easy-to-understand and customizable dashboards
It is easy to use by non-IT staff
Has simple installation and does not need third-party supporting technology
Text Analytics versus Sentiment Analysis
Pure text analytics does not derive emotion from the text it breaks down. Instead, it performs a semantic search, extracts data, and characterizes useful bits of text to form patterns. Sentiment analysis, on the other hand, is the next step. It discerns meaning in the emotional content of a piece.
To explain text analysis, let’s take a simple sentence.
The Adventures of Sherlock Holmes is an outstanding collection of short stories by Scottish-born author Sir Arthur Conan Doyle
Text analytics will produce the following information, categorizing it by using NER:
“The Adventures of Sherlock Holmes” = Title
“Arthur Conan Doyle” = Author
“Sir” = Honorific
“Collection of Short Stories” = Product Type
“Scottish-born author” = Product Detail
Sentiment Analysis then picks out the adjectives (outstanding), which denotes a value judgment, and then categorizes this mention as a positive one. In terms of sentiment, the above sentence expresses approval.
It’s important not to confuse text analytics with sentiment analysis since the former doesn’t fulfill the same purpose as the latter. Let’s look at the most used text mining solutions, and what exactly they do.
Top 8 Text Analytics Tools
The Repustate Text Analytics API is designed to work across as wide a range of media as possible, including performing search inside video tasks and social media listening.
Basically, anywhere where text can be found, whether it’s on-screen captions in YouTube videos, audio transcripts of podcasts, or Amazon product reviews, our text analytics tools can derive the insight you need. The solution reads 23 languages natively, without using translations, thus giving you accurate results in all languages. It incidentally also has the most robust NER capability.
Undeniably powerful, since it has the power of Google’s search algorithms to draw upon, Google’s NLP offering includes what it calls AutoML (automated machine learning), whereby the algorithm becomes more efficient through time, without you having to reprogram it.
In our benchmarking test, it scored only 75% accuracy, albeit a score significantly higher than many of its competitors. It did perform well in terms of the granularity of the information it was able to derive. It did not do well on the speed of data gathering.
Another big contender is Microsoft, whose cloud-based APIs do not require user expertise in machine learning. As such, this is really a set of tools for developers, more than market researchers or salespeople. Microsoft offers comprehensive blogs, learning modules and support for anyone keen to get to grips with Azure.
However, you may want a more specialist tool for pure text analytics. In our benchmarking test, Azure scored an accuracy rating of just 50% across six languages. However, as with this cloud-based solution and Google’s NLP app, there is a lag in time for data retrieval. This is not inconsiderable at 3210 milliseconds.
Read enough? See Repustate's Sentiment Analysis API in action with a LIVE DEMO of REAL DATA
This offering fares a little better in benchmarking in terms of both accuracy and speed (63% and 250ms respectively).
Semantic similarity is one useful feature Dandelion includes, facilitating news gathering and anti-plagiarism uses. It will tell users whether compared pieces of text carry the same meaning, even if their sentence structure differswidely thus showing semantic similarity.
Dandelion has also bundled in a sentiment analysis API, making this a powerful set of tools with which to analyze a piece of text. That said, its comparatively low accuracy score may indicate that a lot of parameters must be tweaked to return allthe data you need.
Describing itself as a source of “end-to-end news intelligence”, Aylien will search over 80,000 significant news outlets for brand mentions. It has already categorized more than 300 million news articles in its database, to which it continues to add. This saves users time, as they don’t have to cover ground that has been analyzed previously.
Aylien is also reasonably fast, returning results within 150ms in our benchmarking test. The app’s media analytics reports are easy to read and attractively designed.
With an accuracy rating of just 42%, one might question the value of its analysis, however. There may be a lot of junk data to weed out from initial results.
Amazon’s Comprehend has a vast archive of Amazon product reviews to draw upon. At 67% accuracy, it proved as one of the highest contenders in our competitive benchmarking tests.
Uniquely, they offer an Amazon Comprehend Medical version to help search often challenging medical terms (especially when spelled incorrectly), which may prove useful for brands operating in the competitive supplement or lifestyle space.
A big benefit of going with Amazon would be its wider integration into the ever-expanding landscape of AWS products. They have recently released an audio transcription API, for instance, Amazon Transcribe Call Analytics, which could help you improve the performance of your support line or call center, or focus in on product improvements.
SpaCy is unusual in that it runs on local servers only, which explains its comparative speed, when tested alongside other text analytics solutions. SpaCy can achieve a 30ms speed in returning local data. Version 3.0 has recently been released, which may improve on the 45% accuracy we recorded in benchmarking.
It’s undeniably sophisticated but, as an open-source API library built on Python, it does require some programming expertise to get it to work well. If you want to run experimental NLP searches, it’s a great facility. However, if you’re looking for a user-friendly tool for your market research department, this may not be the right app for you.
Borrowing AWS Cloud and physical hardware, TextRazor is built in C++ and permits limitless customization in terms of both keywords and logic. This will allow you to build very specific custom searches.
Unlike SpaCy, TextRazor does not require programming expertise to use. In our benchmark test, it achieved 61% accuracy across 12 languages, placing it squarely in the middle for performance.
The use cases given on their website include classification, custom entities and disambiguation, debugging, and reference. It may not be the right tool if you are more interested in analyzing news stories or product reviews, which by their nature are highly variable in terms of style, grammar, spelling, and vocabulary.
Text Analytics – A Powerful Tool
In the race for customers, text mining solutions and analytics are invaluable additions to your toolkit. With the vast profusion of media which constitutes today’s internet, you want to remain on top of all your product mentions, queries, complaints, reviews, and news stories. A tool such as Repustate’s Text Analysis API can do all that – and more. Text Analytics is indeed the must-have tool in your toolbox.
Contact us for more information on how a text analytics solution can help you.