Until this point, Repustate has been concerned with analyzing text structurally. Part of speech tagging, grammatical analysis, even sentiment analysis is really all about the structure of the text. The order in which words come, the use of conjunctions, adjectives or adverbs to denote any sentiment. All of this is a great first step in understanding the content around you - but it's just that, a first step.
Today we're proud and excited to announce Semantic Analysis by Repustate. We consider this release to be the biggest product release in Repustate's history and the one that we're most proud of (although Arabic sentiment analysis was a doozy as well!)
Repustate can determine the subject matter of any piece of text. We know that a tweet saying "I love shooting hoops with my friends" has to do with sports, namely, basketball. Using Repustate's semantic analysis API you can now determine the theme or subject matter of any tweet, comment or blog post.
But beyond just identifying the subject matter of a piece of text, Repustate can dig deeper and understand each and every key entity in the text and disambiguate based on context.
Repustate's semantic analysis tool extracts each and every one of these entities and tells you the context. Repustate knows that the term "Obama" refers to "Barack Obama", the President of the United States. Repustate knows that in the sentence "I can't wait to see the new Scorsese film", Scorsese refers to "Martin Scorsese" the director. With very little context (and sometimes no context at all), Repustate knows exactly what an arbitrary piece of text is talking about. Take the following examples:
Here we have three instances where the term "Obama" is being used in different contexts. In the first example, there is no context, just the name 'Obama'. Repustate will use its internal probability model to determine the most likely usage for this term is in the name 'Barack Obama', hence an API call will return 'Barack Obama'. Similarly, in the second example, the word "President" acts as a hint to the single term 'Obama' and again, the API call will return 'Barack Obama'. But what about the third example?
Here, Repustate is smart enough to see the phrase "First Lady". This tells Repustate to select 'Michelle Obama' instead of Barack. Pretty neat, huh?
Like every other feature Repustate offers, no language takes a back seat and that's why semantic analysis is available in every language Repustate supports. Currently only English is publicly available but we're rolling out every other language in the coming weeks.
Repustate currently has over 5.5 million entities, including people, places, brands, companies and ideas in its ontology. There are over 500 categorizations of entities, and over 30 themes with which to classify a piece of text's subject matter. And an infinite number of ways to use Repustate to transform your text analysis.
Head on over to the Semantic Analysis Tour to see more.