Overview

The Repustate Server is a self-contained executable that provides the full functionality of the Repustate API but with the privacy of being hosted in your own data center. There are no quotas or usage restrictions when utilizing the Repustate Server. It is the ideal product for organizations who need text analytics at a very large scale.

Technical Requirements

The Repustate Server can be installed on any 64-bit operating system including:

  • Windows 7, Windows 8, Windows 10
  • Ubuntu, Debian, Red Hat, CentOs
  • OS X

As a result of being platform agnostic, the Repustate Server can be installed on any dedicated hardware you own in your own private data center or up on cloud infrastructure such as Azure, Google Compute Engine or AWS.

The recommended specs for hardware needed to run the Repustate Server are:

  • At least 16GB RAM for sentiment; if using Deep Search entities, then 64GB is recommended
  • At least 60GB disk space (SSDs work best)
  • Multi-core CPU - the more CPU threads the better as Repustate is CPU bound.

Installation

Once you've obtained your license, a link to your installer will be emailed to you. The steps that follow assume you've downloaded the installer executable on the server you plan on hosting Repustate.

  1. Set the environment variable REPUSTATE_HOME to be the path to the directory where you'd like Repustate to be installed. By default, REPUSTATE_HOME is the current directory you're in.
  2. Run the installer you downloaded. On Windows you can double click the installer.exe file, on Linux-like systems you can just run it as `./installer` (make sure it's executable e.g chmod +x installer). This will create a directory called "models" and download various model files Repustate needs as well as downloading the repustate executable. Both the models directory and the main Repustate executable now reside within the directory defined in REPUSTATE_HOME.
  3. To start up the Repustate Server, simply execute the binary passing in an optional argument "-port" to specify which port you'd like it to listen to. By default, Repustate runs on port 9000.
  4. All API calls you see on the Repustate API documentation page are supported, but instead of sending requests to api.repustate.com, you send them to the IP address of your server (or localhost) e.g. http://localhost:9000/v3/$YOUR_API_KEY/score.json

Usage

The great thing about the Repustate Server is that you can use the same code and clients for the public API as you can for the Server, so it's easy to switch from one to the other.

All API calls that you see on our public API work exactly the same on your Repustate Server. The only difference is instead of sending your API calls to https://api.repustate.com, you send them to the IP address(es) of your Repustate Server instance(s).

The server also allows you to specify some options at the command line to configure behaviour (you can also run ./repustate -h to see these options):

Option Default Description
--host localhost (127.0.0.1) Specify the IP address the Repustate Server should bind to
--port 9000 Specify which port the Repustate Server should listen to for incoming API calls
--langs All Specify which languages to include at startup time. If you're only interested in analyzing a few languages, specify a comma separated list of language codes. This will help reduce startup time. e.g. --langs en,de,fr would enable only English, German and French
--verbose If included, the Repustate Server will output various status messages and periodically display mean response time for API calls
--license If included will display when your current license expires
--version Display which version of the Repustate Server you're using

Special note about the clients: While you're welcome to use the Repustate Client libraries we provide, make sure to change the host name that you send your requests to. By default, all client libraries are hard-coded to send requests to https://api.repustate.com.

Updates & Restarts

When Repustate releases a new version of the Server, you will receive an email with a link to download an updated installer. Download and run the installer. It will create a new Repustate executable that is meant to replace the existing one you have. While you could stop the old server, replace the executable binary, and then restart, this would result in downtime of a minute or two.

In order to have API calls get handled by the new version, merely replace the old executable with the new one and send a USR2 signal to the process ID of the old process. Any existing or in-flight API calls will be handled by the old process and once they're all done, the old process will shutdown and the new one will take over. The process ID can be found in a file called `repustate.pid` in the same directory as the executable itself. For Linux users (this is not yet supported on Windows), this is how a graceful restart can be accomplished:

kill -USR2 `cat repustate.pid`
            

Distributed Deployment

In order to increase throughput and to add redundancy in the event of an unexpected outage, it is advisable to deploy Repustate across multiple servers. By putting a load balancer in front of your servers, such as HAProxy or nginx, you can round-robin your requests and spread the workload around to your many Repustate Servers.

To accommodate this sort of architecture, Repustate Servers come built-in with a feature called Repustate Sync. Whenever you add new custom sentiment rules or new filter rules, Repustate Sync ensures all rules and filters are distributed across all of your Repustate Server instances.

To enable Repustate Sync, you must create two files, called "server" and "clients" within a directory called "sync" in your $REPUSTATE_HOME directory on EACH server that hosts a Repustate Server instance. Your filesystem layout should look like this:

$REPUSTATE_HOME/
    repustate
    installer
    custom/
        ...
    models/
        ...
    sync/
        server 
        clients
                

sync/server contains only 1 line of text: the IP address and port for this server, which MUST be reachable by all peers. It can be the public IP of the server if the peers are accessing it via a public network, or it can be an IP address that is internal to your network. All that matters is that the peers can reach it. Note: the port must be different than the port you're running Repustate on. For example, if you run Repustate on port 9000, configure your sync server to be on port 9001 (or any port other than 9000).

sync/clients contains the IPs and port numbers of all the peers you're interested in syncing with. List each IP and port on its own line. Again, all that is important is that each peer can reach the other peers so the IPs don't necessarily have to be the public IPs of the servers.

Here are sample server and clients configuration files.

If you've configured everything correctly, during startup you'll see a message stating "Repustate Sync enabled. Configured with $N peers" where $N is the number of client IP addresses you listed in sync/clients.

Special note for AWS EC2 users: Configure your server addresses using 0.0.0.0 as the host OR by using the private internal IP EC2 assigns. This private IP can be found either via ifconfig or by looking at the EC2 admin panel. Do not use the public IP assigned by Amazon.

If you decide to change the address/port of either your sync server or any of the peers or if you add/remove peers, simply update the configuration files on all servers and Repustate will automatically detect these changes, update itself, and ensure all new peers are synced up. No need to restart manually, it all happens automatically for you.

Best Practices

There are a few tweaks you can do to your servers to optimize the performance of the Repustate server. Firstly, we suggest not running anything else on your server other than Repustate as depending your workload, Repustate might be very resource intensive, particularly with RAM. We also suggest making the following changes to your /etc/sysctl.conf:

  • net.ipv4.tcp_tw_reuse = 1
  • net.ipv4.ip_local_port_range = 2000 65535

These changes will allow your OS to reuse TCP connections quickly and not run out of file descriptors during heavier loads.

We also recommend bumping up the total number of open files allowed by editing /etc/security/limits.conf and adding two entries:

  • * hard nofile 65536
  • * soft nofile 65536

The * refers to which users should have their open file limit increased. If you want to restrict to just one user, replace the * with the relevant username