Not all languages are the same

Grammar rules vary from one language to another. The rules of verb conjugation, noun-verb agreement and negations vary from one language to another.

Chinese is a unique language and it differs from English in a number of ways. To use the same techniques and language models that work for English sentiment analysis when conducting Chinese sentiment analysis would yield terribly inaccurate results.

That's why Repustate developed Chinese-specific tools to help in Chinese sentiment analysis, including a Chinese part of speech tagger, a Chinese lemmatizer, and of course, Chinese-specific sentiment models.

Chinese part of speech tagging

Chinese part of speech tagging allows Repustate to narrow in on where sentiment may lie within a block of text. Verbs, nouns and adjectives, provide the cues necessary to determine sentiment.

In order to create a fast and accurate Chinese part of speech tagger, you have to have a massive corpus of manually tagged Chinese text. This Chinese text can then be fed into a machine learning algorithm to create a Chinese part of speech tagger.

The larger the corpus, and more importantly, the more varied the corpus, the better the results in creating the Chinese part of speech tagger. Repustate has created a massive corpus of Chinese text grabbing data from a variety of sources to ensure good coverage.

Chinese language sentiment models

Repustate has developed sentiment language models specific to Chinese to capture the various phrases, idioms and expressions that help define sentiment when writing in Chinese. Understanding the various grammatical aspects of the Chinese language that make it unique and very different from English is what allows Repustate's Chinese sentiment analysis to be as fast and as accurate as it is.



中文 是一门独特的语言,并从几个方面与英语不同。在实行 中文 情感分析时,如果使用的是用于英语情感分析的技术和语言模型,则将产生极不准确的结果。

因此,Repustate开发了 中文 专用工具来帮助 中文 情感分析,其中包括 中文词类标签器、中文 词性归并器以及必不可少的 中文 专用情感模型。

中文 词类标签

中文 词类标签让Repustate缩小情感在一段文字中的可能所在位置。动词、名词和形容词为确认情感提供必要的线索。

为了创建一个快速准确的 中文 词类标签器,你必须拥有一个手动标记文字的大型文集。这些 中文 文字会被传送给一个机器学习算法以创建 中文 词类标签。

文集越大、越多样,创建 中文 词类标签器的结果就越好。Repustate通过从多个来源汲取数据来创建大型的 中文 文集,以此确保良好的涵盖范围。

中文 情感模型

Repustate开发了 中文 专用的情感语言模型以捕捉各种在中文里帮助定义情感的短语、成语和词句。Repustate理解赋予 中文 独特性并使其与英语截然不同的各个语法,因此Repustate的 中文 情感分析非常快速而且准确。

Have a question about Chinese sentiment analysis? Ask us!