Repustate is now a part of Sprout Social . Read the press release here

Deploy the Repustate Sentiment Analysis API on any of your servers anwyhere in the world

Book a 15-minute demo

Join leading companies using Repustate

Overview

The Repustate Server is a self-contained executable that provides the full functionality of the Repustate API but with the privacy of being hosted in your own data center. There are no quotas or usage restrictions when utilizing the Repustate Server. It is the ideal product for organizations who need text analytics at a very large scale.

Requirements

The Repustate Server can be installed on any 64-bit operating system including:

Windows 7, Windows 8, Windows 10
Ubuntu, Debian, Red Hat, CentOs
OS X

As a result of being platform agnostic, the Repustate Server can be installed on any dedicated hardware you own in your own private data center or up on cloud infrastructure such as Azure, Google Compute Engine or AWS.

The recommended specs for hardware needed to run the Repustate Server are:

At least 16GB RAM for sentiment; if using Deep Semantic Search entities, then 64GB is recommended
At least 60GB disk space (SSDs work best)
Multi-core CPU - the more CPU threads the better as Repustate is CPU bound.

Note about Deep Semantic Search: To enable Deep Semantic Search and/or to use the entity extraction API call, your server's CPU must support the AVX and AVX2 instructions.

Install

Once you've obtained your license, a link to your installer will be emailed to you. The steps that follow assume you've downloaded the installer executable on the server you plan on hosting Repustate.

Set the environment variable REPUSTATE_HOME to be the path to the directory where you'd like Repustate to be installed. By default, REPUSTATE_HOME is the current directory you're in.
Run the installer you downloaded. On Windows you can double click the installer.exe file, on Linux-like systems you can just run it as `./installer` (make sure it's executable e.g chmod +x installer). This will create a variety of directories and download various model files Repustate needs as well as downloading the Repustate executable itself.
To start up the Repustate Server, simply execute the binary passing in an optional argument "-port" to specify which port you'd like it to listen to. By default, Repustate runs on port 9000.
All API calls you see on the Repustate API documentation page are supported, but instead of sending requests to https://api.repustate.com, you send them to the IP address of your server (or localhost) e.g. http://localhost:9000/v4/$YOUR_API_KEY/score.json

Configuration

If you'll be using Deep Semantic Search then you have to configure the backend for storing data. The Repustate installer creates a directory config. In this directory, you'll find a file deepsearch which has sample configurations for all supported backends. Create a file called deepsearch and specify the backend and connection parameters for your backend. For example, if you were using PostgreSQL, your config/deepsearch file would look like:

[storage]
type = "postgres"
dsn = "user=postgres dbname=deepsearch sslmode=disable host=localhost port=5432 password=123"

The following backends are supported:

PostgreSQL and any database that implementes the PostgreSQL wire protocol (e.g. CockroachDB)
MySQL and any database that implementes the MySQL wire protocol (e.g. MariaDB, MemSQL)
SQLite
Microsoft SQL Server
Solr
Elasticsearch
MongoDB
HBase

Usage

The great thing about the Repustate Server is that you can use the same code and clients for the public API as you can for the Server, so it's easy to switch from one to the other.

All API calls that you see on our public API work exactly the same on your Repustate Server. The only difference is instead of sending your API calls to https://api.repustate.com, you send them to the IP address(es) of your Repustate Server instance(s).

The server also allows you to specify some options at the command line to configure behaviour (you can also run ./repustate -h to see these options). Alternatively, these same options can be set via environment variable and/or through the use of a .env file in the $REPUSTATE_HOME directory.

Option	Environment variable	Default	Description
--host	REPUSTATE_HOST	localhost (127.0.0.1)	Specify the IP address the Repustate Server should bind to
--port	REPUSTATE_PORT	9000	Specify which port the Repustate Server should listen to for incoming API calls
--langs	REPUSTATE_LANGS	All	Specify which languages to include at startup time. If you're only interested in analyzing a few languages, specify a comma separated list of language codes. This will help reduce startup time. e.g. --langs en,de,fr would enable only English, German and French
--verbose	REPUSTATE_VERBOSE		If included, the Repustate Server will output various status messages and periodically display mean response time for API calls. If settings via environment variable, set equal to 1.
--license			If included will display when your current license expires
--version			Display which version of the Repustate Server you're using

Updates & restarts

When Repustate releases a new version of the Server, you will receive an email with a link to download an updated installer. Download and run the installer. It will create a new Repustate executable that is meant to replace the existing one you have. While you could stop the old server, replace the executable binary, and then restart, this would result in downtime of a minute or two.

In order to have API calls get handled by the new version, merely replace the old executable with the new one and send a USR2 signal to the process ID of the old process. Any existing or in-flight API calls will be handled by the old process and once they're all done, the old process will shutdown and the new one will take over. The process ID can be found in a file called `repustate.pid` in the same directory as the executable itself. For Linux users (this is not yet supported on Windows), this is how a graceful restart can be accomplished:

kill -USR2 `cat repustate.pid`

Distributed deployment

In order to increase throughput and to add redundancy in the event of an unexpected outage, it is advisable to deploy Repustate across multiple servers. By putting a load balancer in front of your servers, such as HAProxy or nginx, you can round-robin your requests and spread the workload around to your many Repustate Servers.

To accommodate this sort of architecture, Repustate Servers come built-in with a feature called Repustate Sync. Repustate will periodicially poll your specified endpoints and update the rules on each server with the results the endpoint returns. This allows you to add as many nodes as you want so long that they can reach the endpoint you define.

To enable Repustate Sync, you must create a file called `sync` and put it in the `config` directory. The contents of the sync file are as follows:

sync_interval = 24h
sentiment_server = "http://example.com/sentiment-rules"
filter_server = "http://example.com/filters"
entity_server = "http://example.com/entities"

sync_interval: how often should Repustate poll your endpoints. Units are 's', 'm', 'h' (seconds, minutes, hours)
sentiment_server: the HTTP endpoint to retrieve your custom sentiment rules
filter_server: the HTTP endpoint to retrieve your custom filter rules
entity_server: the HTTP endpoint to retrieve your custom entities

For each endpoint, the response type is expected to be JSON and return HTTP 200. A status code other than 200 means the data will not be refreshed server side. An HTTP code of 304 means the content hasn't changed and Repustate won't do any updates. The following is the expected response format for each endpoint:

Endpoint type	Sample response
Sentiment	{ "apikey":"xxxxx", "rules":[ { "lang":"en", "subaccount":"xxx", // optional "text":"my rule", "sentiment":"pos", "id":"myid" // optional } ] }
Filters	{ "apikey":"xxxx", "filters":[ { "subaccount":"xxx", // optional "label":"canadian cities", "rule":"Montreal OR Toronto OR Vancouver" } ] }
Entities	{ "apikey":"xxxx", "entities":[ { "subaccount":"xxx", // optional "lang":"en", // optional, default is 'en' "title":"Repustate", "classifications":[ "Org.software_company", "Org.saas_business" ], "aliases":[ "Repustate Inc", "R-State" ] } ] }

Endpoint type

Sample response

Sentiment

{
"apikey":"xxxxx",
"rules":[
{
"lang":"en",
"subaccount":"xxx", // optional
"text":"my rule",
"sentiment":"pos",
"id":"myid" // optional
}
]
}

Filters

{
"apikey":"xxxx",
"filters":[
{
"subaccount":"xxx", // optional
"label":"canadian cities",
"rule":"Montreal OR Toronto OR Vancouver"
}
]
}

Entities

{
"apikey":"xxxx",
"entities":[
{
"subaccount":"xxx", // optional
"lang":"en", // optional, default is 'en'
"title":"Repustate",
"classifications":[
"Org.software_company",
"Org.saas_business"
],
"aliases":[
"Repustate Inc",
"R-State"
]
}
]
}

Best Practices

There are a few tweaks you can do to your servers to optimize the performance of the Repustate server. Firstly, we suggest not running anything else on your server other than Repustate as depending your workload, Repustate might be very resource intensive, particularly with RAM. We also suggest making the following changes to your /etc/sysctl.conf:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 2000 65535

After making these change, reload your sysctl.conf with sudo sysctl -p. This will allow your OS to reuse TCP connections quickly and not run out of file descriptors during heavier loads.

We also recommend bumping up the total number of open files allowed by editing /etc/security/limits.conf and adding two entries:

* hard nofile 65536
* soft nofile 65536

The * refers to which users should have their open file limit increased. If you want to restrict to just one user, replace the * with the relevant username