Do Amazon star ratings reflect sentiment in reviews?

I was recently asked about running sentiment analysis over various forms of customer feedback. How can we help with scaling this over large sets of data? This is an area of analytics I haven’t done much work in before so a bit of research was in order – bring on the Amazon Review Data.

I set myself the task of answering the following question:

Is it really necessary to read Amazon Reviews or can I just be confident in basing my buying decision on the product star rating?

Like many data science projects digging into this data for the first time opened up many areas that could be followed up in future work but in this blog I am going to concentrate on my initial analysis of the 130M+ reviews using Tableau Desktop.

Utilizing Kognitio available on AWS Marketplace, we used a python package called textblob to run sentiment analysis over the full set of 130M+ reviews. There was no need to code our own algorithm just write a simple wrapper for the package to pass data from Kognitio and results back from Python. Kognitio automatically scales processing based on available compute resource. We simply connect Tableau to Kognitio to explore the data. Interested in more technical details? Then check out Mark Marsh’s excellent technical blog.

Sentiment Analysis shows strong correlation between star rating and polarity

Looking for patterns in the sentiment metrics (produced with textblob) by star rating there appears to be strong correlations.

  • Polarity is an index between -1 and 1 that indicates how negative or positive the review body text is.
  • Subjectivity is a value between 0 and 1 on how personal the review is so use of “I”, “my” etc.

This suggests that the Amazon star rating is a good indicator of the sentiment but the data only includes reviews up to August 2015. Behavior might have changed in the last few years so lets look at trends.

Are reviews becoming clearer?

Sentiment Analysis shows that polarity scores by star rating are diverging with time. Are reviews becoming clearer?

In the line chart above I looked at the change in behavior over time. From 2000 (the first year with 1M+ reviews) the average polarity for each star rating appears to be quite stable until 2011. From here the average polarity values start to diverge. This suggests reviewers are becoming clearer in expressing their opinions.

Differences in subjectivity for lower star ratings are not really prevalent before 2004. After this 1 star ratings start to become less subjective. This drops further in 2013 when the average subjectivity shown all diverge significantly. Are we really more likely to “own” positive feedback than we were in previous years? And vice-versa with negative feedback. This requires deeper research into changes in language used in reviews. I am going to park subjectivity here, purely on the basis of blog length. I may re-visit it in a future blog though.

Why a divergent trend in polarity?

Its certainly easier than ever to post our opinions. Perhaps this increasing clarity reflects this? It occurred to me after my analysis that it would be interesting to look at the length of reviews and how this has changed too. My gut feel is it is easier for the algorithm to derive sentiment from shorter reviews. Are reviews getting shorter and more succinct? What did I say about initial analysis opening up more questions and areas of interest?

Another possible cause could be an explosion in fake reviews on Amazon. where the reviewer obviously sets out to be clear in their sentiment. There has certainly been quite a lot written about this lately but I’m not convinced this is driving the divergent polarity seen above. I would have thought fake reviews would be concentrated in 1 and 5 star ratings but the other star ratings are diverging too.

Sentiment analysis by product category

All product categories show correlation between star rating and polarity in sentiment analysis. Mobile App reviewers are the most negative
(click to enlarge)

Looking at the Top 10 categories by number of reviews posted we can see slight differences in behavior by category. Books and Music reviews are generally positive. Is it more difficult for the algorithm to isolate the sentiment? It does seem likely as these reviews may use more florid language. Mobile App reviews are generally more negative regardless of the star rating when compared to other categories. This category is quite new as is Digital Ebook Purchases both of which show stronger negative polarity.

Are new product categories (like Mobile Apps) driving the divergence in polarity over time?

In short: no. Bringing Year into the Pages Pane on Tableau allows us to play through the number of reviews and polarity by year – see video below. On the right we can see the explosion and diversification of reviews as Amazon grows. However, there is a clear diverging of polarity in all categories. By 2014 all categories, barring Music, have a negative polarity in their 1 star rated reviews and the 5 star reviews were also becoming more positive.

Is any of this really that helpful?

Obviously for me as a learning exercise – yes. I now have the framework in place for running sentiment analysis at scale. (See Mark Marsh’s blog for technical details) and I can swap in my preferred sentiment code and corpus as required.

My client has seen that sentiment analysis can be scaled easily (on-demand) using Kognitio on AWS and readily available python packages. It is possible for them to put the analysis in the hands of end-users. They can use their preferred BI tool without exposure to the underlying algorithms or code base.

Did I answer my question?

Well – sort of. The star ratings are a good indicator of the review text sentiment. On the face of it reading the content is not necessary. However, if this were a client project there is definitely further work required:

  • some serious data cleansing – yes the boring but essential stuff still needs doing – de-dups, rules for filtering out junk, you know the stuff (yawn).
  • algorithm development – the textblob python package used here was straightforward to get going. It also has the ability to add in your own extensions and corpus. Having the Kognitio framework in place means I can easily try and deploy other algorithms.
  • actionable insight – what does the end user really need to make business decisions based on the sentiment analysis? I did play around a bit further to produce a Tableau dashboard on Helpful Review Analysis but this blog is already too long.

Further exploration

Finally, there are also few areas of this sentiment analysis I would like to explore more because I think they may have applications for future client projects:

  • Review length – correlations with polarity. How much of a review is needed to get to sentiment?
  • Product Delivery is major factor in reviews – while carrying out analysis lots of ratings were driven by delivery and damage in transit. Should these reviews be separated out and analysed differently?
  • Fake Reviews – I would love to scale out some of the processing in algorithms being used to flag genuine reviews versus fake.

Why Kognitio and Tableau?

Kognitio is great for supporting Tableau data discovery and dashboard development as it is specifically designed for running complex SQL and analytics over large data sets. This means you don’t have to sample or extract data prior to discovery – the super-fast response times from Kognitio mean you can follow your train of thought directly from Tableau.
In this analysis I didn’t have to go back into the database at all I simply built out parameters, filters and metrics in Tableau as I went.

Note Kognitio does support access from any JDBC/ODBC connection so if you prefer a different tool you can connect it directly to Kognitio

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *