The Repustate Server is a self-contained executable that provides the full functionality of the Repustate API but with the privacy of being hosted in your own data center. There are no quotas or usage restrictions when utilizing the Repustate Server. It is the ideal product for organizations who need text analytics at a very large scale.
The Repustate Server can be installed on any 64-bit operating system including:
As a result of being platform agnostic, the Repustate Server can be installed on any dedicated hardware you own in your own private data center or up on cloud infrastructure such as Azure, Google Compute Engine or AWS.
The recommended specs for hardware needed to run the Repustate Server are:
Once you've obtained your license, a link to your installer will be emailed to you. The steps that follow assume you've downloaded the installer executable on the server you plan on hosting Repustate.
The great thing about the Repustate Server is that you can use the same code and clients for the public API as you can for the Server, so it's easy to switch from one to the other.
All API calls that you see on our public API work exactly the same on your Repustate Server. The only difference is instead of sending your API calls to https://api.repustate.com, you send them to the IP address(es) of your Repustate Server instance(s).
The server also allows you to specify some options at the command line to configure behaviour (you can also run ./repustate -h to see these options):
|--host||localhost (127.0.0.1)||Specify the IP address the Repustate Server should bind to|
|--port||9000||Specify which port the Repustate Server should listen to for incoming API calls|
|--langs||All||Specify which languages to include at startup time. If you're only interested in analyzing a few languages, specify a comma separated list of language codes. This will help reduce startup time. e.g. --langs en,de,fr would enable only English, German and French|
|--verbose||If included, the Repustate Server will output various status messages and periodically display mean response time for API calls|
|--license||If included will display when your current license expires|
|--version||Display which version of the Repustate Server you're using|
Special note about the clients: While you're welcome to use the Repustate Client libraries we provide, make sure to change the host name that you send your requests to. By default, all client libraries are hard-coded to send requests to https://api.repustate.com.
When Repustate releases a new version of the Server, you will receive an email with a link to download an updated installer. Download and run the installer. It will create a new Repustate executable that is meant to replace the existing one you have. While you could stop the old server, replace the executable binary, and then restart, this would result in downtime of a minute or two.
In order to have API calls get handled by the new version, merely replace the old executable with the new one and send a USR2 signal to the process ID of the old process. Any existing or in-flight API calls will be handled by the old process and once they're all done, the old process will shutdown and the new one will take over. The process ID can be found in a file called `repustate.pid` in the same directory as the executable itself. For Linux users (this is not yet supported on Windows), this is how a graceful restart can be accomplished:
kill -USR2 `cat repustate.pid`
In order to increase throughput and to add redundancy in the event of an unexpected outage, it is advisable to deploy Repustate across multiple servers. By putting a load balancer in front of your servers, such as HAProxy or nginx, you can round-robin your requests and spread the workload around to your many Repustate Servers.
To accommodate this sort of architecture, Repustate Servers come built-in with a feature called Repustate Sync. Whenever you add new custom sentiment rules or new filter rules, Repustate Sync ensures all rules and filters are distributed across all of your Repustate Server instances.
To enable Repustate Sync, you must create two files, called "server" and "clients" within a directory called "sync" in your $REPUSTATE_HOME directory on EACH server that hosts a Repustate Server instance. Your filesystem layout should look like this:
$REPUSTATE_HOME/ repustate installer custom/ ... models/ ... sync/ server clients
sync/server contains only 1 line of text: the IP address and port for this server, which MUST be reachable by all peers. It can be the public IP of the server if the peers are accessing it via a public network, or it can be an IP address that is internal to your network. All that matters is that the peers can reach it. Note: the port must be different than the port you're running Repustate on. For example, if you run Repustate on port 9000, configure your sync server to be on port 9001 (or any port other than 9000).
sync/clients contains the IPs and port numbers of all the peers you're interested in syncing with. List each IP and port on its own line. Again, all that is important is that each peer can reach the other peers so the IPs don't necessarily have to be the public IPs of the servers.
If you've configured everything correctly, during startup you'll see a message stating "Repustate Sync enabled. Configured with $N peers" where $N is the number of client IP addresses you listed in sync/clients.
Special note for AWS EC2 users: Configure your server addresses using 0.0.0.0 as the host OR by using the private internal IP EC2 assigns. This private IP can be found either via ifconfig or by looking at the EC2 admin panel. Do not use the public IP assigned by Amazon.
If you decide to change the address/port of either your sync server or any of the peers or if you add/remove peers, simply update the configuration files on all servers and Repustate will automatically detect these changes, update itself, and ensure all new peers are synced up. No need to restart manually, it all happens automatically for you.
There are a few tweaks you can do to your servers to optimize the performance of the Repustate server. Firstly, we suggest not running anything else on your server other than Repustate as depending your workload, Repustate might be very resource intensive, particularly with RAM. We also suggest making the following changes to your /etc/sysctl.conf:
These changes will allow your OS to reuse TCP connections quickly and not run out of file descriptors during heavier loads.
We also recommend bumping up the total number of open files allowed by editing /etc/security/limits.conf and adding two entries:
The * refers to which users should have their open file limit increased. If you want to restrict to just one user, replace the * with the relevant username