(This is a guest post by Sarah Harmon. Sarah is currently a PhD student at UC Santa Cruz studying artificial intelligence and natural language processing. Her website is NeuroGirl.com)
Click here to see a demo of Sarah’s work in action.
Using Repustate’s API to to get the most out of TripAdvisor customer reviews
I’m a big traveler, so I often check online ratings, such as those on TripAdvisor, to decide which local hotel or restaurant is worth my time. In this post, I use Repustate’s text categorization API to analyze the sentiment of hotel reviews, and - in so doing - work towards a better online hotel rating system.
Five-star scales aren’t good enough
The five-star rating scale is a standard ranking system for many products and services. It’s a fast way for consumers to check out the quality of something before paying the price, but it’s not always accurate. Five-star ratings are inherently subjective, don’t let raters say all they want to say, and force a generic labelling for a complex experience.
Asking people to write text reviews solves a few of these problems. Instead of relying on the five-star scale, we ask people to highlight the most memorable parts of their experience. Still, who has the time to read hundreds of reviews to get a true sense of what a hotel is like? Even the small sample of reviews shown on the first page could be unhelpful, unreliable or cleverly submitted by the hotel itself to make themselves look good. What we need is a way to summarize the review content - and ideally, we’d like a summary that’s specific to our own values.
Making a personalized hotel ranking system
Let’s take a look at a website that uses star ratings all the time: TripAdvisor, a travel resource that features an incredible wealth of user-reviewed hotels.
Here’s a simplified example of how you can use CasperJS to view the star ratings for every listed hotel from New York:
In a similar fashion, I used Python and CasperJS to retrieve a sample of 100 hotels each from five major locations - Italy, Spain, Thailand, the US, and the UK - and retrieved the top 400 English reviews from each hotel as listed on TripAdvisor. To ensure that each stored review was in English, I relied on Python’s Natural Language Toolkit. Finally, I called on the Repustate API to analyze each review using six hotel-specific categories: food, price, location, accommodations, amenities, and staff.
Check out the results in this “HotelRater” demo. Select a location and a niche you care about, and you’ll see hotels organized in order of highest sentiment for that category. To generate those results, the sentiment scores for each hotel across each category were averaged, and then placed on a scale from 1 to 100. (I chose to take a mean of the sentiment values because it’s a value that’s easy to calculate and understand.) The TripAdvisor five-star rating is shown for comparison. You can also click on the hotels listed to see how Repustate categorized each of their reviews.
When I started putting the app I built into practice, I could suddenly make sense out of TripAdvisor’s abundance of data. While a hotel might have a four star review on average, the customers were generally very happy with key aspects, such as its food, staff, and location. Desirable hotels popped up in my search results that I might never have even seen or considered because of their lower average star rating on TripAdvisor. The listed sentiment scores also helped to differentiate hotels, which I would have previously had trouble sorting through because they all shared the same 4.5 star rating.
This demo isn’t a replacement for TripAdvisor by any means, since there’s hardly enough stored data or options included to assist you in your quest for the perfect hotel on your vacation. That said, it’s a positive step towards a new ranking system that’s aligned with our individual values. We can quickly see how people are really feeling, without condensing their more specific thoughts into a blanket statement.
I’d give that five stars any day.