“Too much of anything is bad, but too much good whiskey is barely enough.” – Mark Twain
Whiskey is a popular alcoholic beverage and can vary in price from a few dollars to hundreds of dollars a bottle. Consumers would like to have a trusted source on the quality of whiskeys. This document will examine the data on whiskyanalysis.com link to answer the question if these crowd sourced reviews provide good data to help the whiskey shopper make sound buying decisions. I find personally interesting, as the store shelf can be very confusing and the ability to quickly access data on the quality of a particular bottle that I have not heard of can be very helpful.
The website is very easy to use, and I scraped the data from the XML table there. It includes almost 1800 reviews and 10 variables. What is interesting and different about this site is the ratings come from selected group of between 25 and 30 reviewers vs. allowing anyone to submit a review. I plan to use only some of the variables to examine if there are any bias or peculiarities in the the review scores. I will start by plotting some of the data to provide basic understanding of the data. I will then try to determine if there is any variance on the rating based on other variables to see if the reviews can be trusted.
Column Descriptions for whiskyanalysis.com
Whisky is the name of the particular whisky. A number followed by “yo” means how many years old is specified on the label. In brackets is additional information about the bottling, where relevant.
Meta.Critic score refers to average normalized score of all reviewers who have reported on that whiskey. This is not a raw score aggregation, but a proper statistical meta-analysis with standardized normalization.
STDEV is the standard deviation of the mean Meta Critic score, a measure of variance.
# is the number of reviewers on which the mean Meta Critic score and standard deviation is based.
Cost is an approximate indicator of the average worldwide price for the whisky in question measured in dollar signs (one thorough five plus).
Symbol | Cost Range |
---|---|
$ | <$30 CAD |
$ $ | $30~$50 CAD |
$ $ $ | $50-$70 CAD |
$ $ $ $ | $70~$125 CAD |
$ $ $ $ $ | $125~$300 |
$ $ $ $ $+ | >$300 CAD |
Class groups together whiskeys that share major common characteristics.
Super Cluster and Cluster typically refer to the revised flavour cluster analysis performed here, based on the earlier Wishart analysis (and expanded for all single malt-like whiskies in the dataset). The Super Clusters are groupings of Clusters where the characteristics are similar enough to overlap considerably on the principal component analysis. I will not use these in this analysis.
Country is the country of origin for the whiskey
Type is the actual source material used for the whiskey.
… And with that, let us have fun exploring the the Whiskey Database!
Summary Table
Below is a sample table of the variables I plan to use in this analysis.
Exploring the Data
When I found the data on whiskyanalysis.com I wanted to understand the data visually.
First I wanted to see how the whiskeys reviewed varied by cost. The website categorizes whiskey into 6 categories based on the number of dollar signs. The plot shows that it resembles a normal distribution (though skewed a bit).
When we review by class we see that the largest number of whiskey’s reviewed were malt whiskeys. This was followed closely by Blended Whiskeys, Bourbons, and Ryes. The other 5 classes had a very low number of reviews.
I was interested to see if there was a difference in ratings per price group using the mean. This box plot also allows us to see spread and if there are outlines. The chart shows that in general ratings go up the higher prices of the whiskey.
I also was interested in seeing if there is a difference in ratings between the types of whiskey. Focusing on the top 4 types Bourbon, Malt and Rye were similar, but blends were slightly lower.
Doing the same exercise by country creates the busy plot below.
Plotting the numbers of whiskeys reviewed by country of origin shows Scotland has the most received by far followed by the USA and Canada. While the top three by volume of reviews have similar results, some of the countries with less review provide interesting comparison. Japan has a similar number of whiskeys reviewed and seems to have slightly better reviews than Ireland which is known for its whiskey. Sweden and India which represent one half and two thirds the amount of reviews as Ireland performed better on reviews than that country as well.
The website provided the number of ratings for each whiskey. I was curious if the number of ratings had any relation to the rating.
The boxplot about is interesting but dos not seem to indicate a pattern.
Testing for heteroscedasticity find that the variance of residuals increase with fitted values of numbers of ratings.
In other words there is variance in the Ratings based on the number of people who rated it.
Based on my initial analysis I woud tend to trust the Meta Critic scores that have a higher number of reviews. To further examination this I would try and get the individual scores from the critics and conduct a correlation analysis of their scores. I would need to get additional data from the Website in order to do this additional analysis.