Repustate’s API welcomes a new member to the family this week.
Our API will see two new updates this week. The first is our ever popular “clean-html” API call. We’ve beefed it up to handle more cases and to be more resilient in handling odd web pages. The next update is something we’re really happy about and that is our newest API call - ngrams. An n-gram is a string of consecutive tokens of ‘n’ length. For example, a bi-gram is two tokens, such as “I like”. A tri-gram would be “I like Repustate”. You can count as high as you like, but in english, rarely do you go above 5-grams.What’s the importance of n-grams? They let us see frequencies and commonalities which occur in written text, which of course is crucial to our cause. You know when you type in a Google search and it just happens to know what you’re looking for? That’s because it’s returning the most common n-grams people have typed. Google has the world’s largest collection of n-grams; Repustate is trying to get there!
As usual, let us know what you want to see in semnatic search or what you don’t like. We listen to everyone.