Overview

The Repustate Server is a self-contained executable that provides the full functionality of the Repustate API but with the privacy of being hosted in your own data center. There are no quotas or usage restrictions when utilizing the Repustate Server. It is the ideal product for organizations who need text analytics at a very large scale.

Technical Requirements

The Repustate Server can be installed on any 64-bit operating system including:

  • Windows 7, Windows 8, Windows 10
  • Ubuntu, Debian, Red Hat, CentOs
  • OS X

As a result of being platform agnostic, the Repustate Server can be installed on any dedicated hardware you own in your own private data center or up on cloud infrastructure such as Azure, Google Compute Engine or AWS.

The recommended specs for hardware needed to run the Repustate Server are:

  • At least 16GB RAM
  • At least 30GB disk space (SSDs work best)
  • Quad core CPU

The Repustate Server is CPU-bound so the faster your CPU and the more CPU threads your hardware has, the more text you can analyze quickly with Repustate.

Installation

Once you've obtained your license, a link to your installer will be emailed to you. The steps that follow assume you've downloaded the installer executable on the server you plan on hosting Repustate.

  1. Set the environment variable REPUSTATE_HOME to be the path to the directory where you'd like Repustate to be installed. By default, REPUSTATE_HOME is the current directory you're in.
  2. Run the installer you downloaded. On Windows you can double click the installer.exe file, on Linux-like systems you can just run it as `./installer` (make sure it's executable e.g chmod +x installer). This will create a directory called "models" and download various model files Repustate needs as well as downloading the repustate executable. Both the models directory and the main Repustate executable now reside within the directory defined in REPUSTATE_HOME.
  3. To start up the Repustate Server, simply execute the binary passing in an optional argument "-port" to specify which port you'd like it to listen to. By default, Repustate runs on port 9000.
  4. All API calls you see on the Repustate API documentation page are supported, but instead of sending requests to api.repustate.com, you send them to the IP address of your server (or localhost) e.g. http://localhost:9000/v3/$YOUR_API_KEY/score.json

Usage

The great thing about the Repustate Server is that you can use the same code and clients for the public API as you can for the Server, so it's easy to switch from one to the other.

All API calls that you see on our public API work exactly the same on your Repustate Server. The only difference is instead of sending your API calls to api.repustate.com, you send them to the IP address(es) of your Repustate Server instance(s).

Special note about the clients: While you're welcome to use the Repustate Client libraries we provide, make sure to change the host name that you send your requests to. By default, all client libraries are hard-coded to send requests to https://api.repustate.com.

Distributed Deployment

In order to increase throughput and to add redundancy in the event of an unexpected outage, it is advisable to deploy Repustate across multiple servers. By putting a load balancer in front of your servers, such as HAProxy or nginx, you can round-robin your requests and spread the workload around to your many Repustate Servers.

To accommodate this sort of architecture, Repustate Servers come built-in with a feature called Repustate Sync. Whenever you add new custom sentiment rules or new filter rules, Repustate Sync ensures all rules and filters are distributed across all of your Repustate Server instances.

To enable Repustate Sync, you must create two files, called "server" and "clients" within a directory called "sync" in your $REPUSTATE_HOME directory on EACH server that hosts a Repustate Server instance. Your filesystem layout should look like this:

$REPUSTATE_HOME/
    repustate
    installer
    custom/
        ...
    models/
        ...
    sync/
        server 
        clients
                

sync/server contains only 1 line of text: the IP address and port for this server, which MUST be reachable by all peers. It can be the public IP of the server if the peers are accessing it via a public network, or it can be an IP address that is internal to your network. All that matters is that the peers can reach it.

sync/clients contains the IPs and port numbers of all the peers you're interested in syncing with. List each IP and port on its own line. Again, all that is important is that each peer can reach the other peers so the IPs don't necessarily have to be the public IPs of the servers.

Here are sample server and clients configuration files.

If you've configured everything correctly, during startup you'll see a message stating "Repustate Sync enabled. Configured with $N peers" where $N is the number of client IP addresses you listed in sync/clients.

Special note for AWS ec2 users: Configure your server addresses using 0.0.0.0 as the host OR by using the private internal IP ec2 assigns. This private IP can be found either via ifconfig or by looking at the ec2 admin panel. Do not use the public IP assigned by Amazon.