Introduction

This data set came from a Kaggle user who scrapped wine reviews from WineEnthusiast.com during the week of November 22nd, 2017. There are 129,971 rows, each representing one review of wine. Each review row is comprised of 14 columns, including information about the reviewer, the wine’s region, and the wines characteristics. The data points collected for each review are :

##  [1] "V1"                    "country"              
##  [3] "description"           "designation"          
##  [5] "points"                "price"                
##  [7] "province"              "region_1"             
##  [9] "region_2"              "taster_name"          
## [11] "taster_twitter_handle" "title"                
## [13] "variety"               "winery"

Graph One - Wine by Country

This first graph looks at the average wine rating for each country in the data set. The top 10 highest reviewed countries are represented. For the purpose of removing countries with extremely low numbers of reviews, I excluded countries that had less than 100 wine reviews in the data set. Quality means nothing without knowing a price point, so each country’s bar is also filled by the average price of wine from that country to indicate whether that country’s wines are a good value

Graph Two - Average Price per Bottle

Secondly, I wanted to examine which types of wine tend to command the highest price points. This graph looks at the top 10 most reviewed types of wine and then ranks them based on the average price points of that bottle. All wines of the same variety are aggregated and then their average price is taken to produce the graph

Graph Three - Domestic Wines

As WineEnthusiast is a US publication, over 40% of the wines reviewed come from the US. Because of that, I wanted to dive into which states produced the most wine and what types of wines those states produced. I looked at the top wine producing states and aggregated them based on the number of reviews their wines had in the data set. Then I broke out the number of reviews based on the variety of wine to discovery which types of wine are most common from each state.

Graph Four - Price to Point Analysis

Because wine can come at all different price points, I decided to look at whether or not there was a correlation between how much a wine costs and how well it has been reviewed. First I identified the top 10 most commonly reviewed wines from the data set to avoid picking any incredibly uncommon vintages. Then I took a random sample of 250 of those wines reviewed and graphed based on price and review points. What we see is that there is definitely a positive correlation between price and point, although not as strong as one may have thought.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'