How a last minute, indifferent decision lead to our most popular API call.
Repustate’s mission statement is to become the world’s largest collection of natural language processing tools. To meet this challenge, we started out with a small set of API calls and are constantly adding and improving with each passing week. Internally, we developed a tool that extracted out the most important text from any web page. If you visit any site today, there’s usually some menu at the top, footer links on the bottom, maybe some ads on one side, perhaps links to other articles on the other side, and the main article down the middle. Often when data mining, you want what’s just right down the middle, the heart of the article.So we wrote a python script to do this. On a whim, we decided to expose this through our API as well. Wouldn’t you know it, clean-html is our most popular API call - by far. In fact, about 60% of all of our API calls are to clean-html, which suits us just fine, but it’s kinda funny. A throwaway decision ended up being our most popular feature.Just goes to show that what one man’s simple, utilitarian API call is another man’s invaluable data processing tool. We’re trademarking that last sentence.