Grammar rules vary from one language to another. The rules of verb conjugation, noun-verb agreement and negations vary from one language to another.
Russian is a unique language and it differs from English in a number of ways. To use the same techniques and language models that work for English sentiment analysis when conducting Russian sentiment analysis would yield terribly inaccurate results.
That's why Repustate developed Russian-specific tools to help in Russian sentiment analysis, including a Russian part of speech tagger, a Russian lemmatizer, and of course, Russian-specific sentiment models.
Russian part of speech tagging allows Repustate to narrow in on where sentiment may lie within a block of text. Verbs, nouns and adjectives, provide the cues necessary to determine sentiment.
In order to create a fast and accurate Russian part of speech tagger, you have to have a massive corpus of manually tagged Russian text. This Russian text can then be fed into a machine learning algorithm to create a Russian part of speech tagger.
The larger the corpus, and more importantly, the more varied the corpus, the better the results in creating the Russian part of speech tagger. Repustate has created a massive corpus of Russian text grabbing data from a variety of sources to ensure good coverage.
Repustate has developed sentiment language models specific to Russian to capture the various phrases, idioms and expressions that help define sentiment when writing in Russian. Understanding the various grammatical aspects of the Russian language that make it unique and very different from English is what allows Repustate's Russian sentiment analysis to be as fast and as accurate as it is.