1 Introduction
2 Importing the Dataset
3 Let us have a look at the dataset
4 Let us load the tidyverse
5 Let us try to count the wine within these reviews based on country
6 Let us try to also figure out the variety of wine listed within these reviews
7 Australian wines
- 7.1 For Australian wines
8 Conclusion

1 Introduction

Greetings!. I got this dataset from Kaggle although for personal convenience i loaded it and saved it to Excel. If you wish to get a look at this dataset, the link is here: https://www.kaggle.com/zynicide/wine-reviews

As a wine fanatic, i’m going to have a look at this partly as a matter of personal interest as well

2 Importing the Dataset

library(readxl)
winemag_data_first150k <- read_excel("D:/Working Directory/winemag_data_first150k.xlsx")
View(winemag_data_first150k)

3 Let us have a look at the dataset

head(winemag_data_first150k)

## # A tibble: 6 x 11
##    X__1 country description designation points price province region_1
##   <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1     0 US      This treme~ Martha's V~     96   235 Califor~ Napa Va~
## 2     1 Spain   Ripe aroma~ Carodorum ~     96   110 Norther~ Toro    
## 3     2 US      Mac Watson~ Special Se~     96    90 Califor~ Knights~
## 4     3 US      This spent~ Reserve         96    65 Oregon   Willame~
## 5     4 France  This is th~ La BrÃ»lade     95    66 Provence Bandol  
## 6     5 Spain   Deep, dens~ Numanthia       95    73 Norther~ Toro    
## # ... with 3 more variables: region_2 <chr>, variety <chr>, winery <chr>

str(winemag_data_first150k)

## Classes 'tbl_df', 'tbl' and 'data.frame':    150930 obs. of  11 variables:
##  $ X__1       : num  0 1 2 3 4 5 6 7 8 9 ...
##  $ country    : chr  "US" "Spain" "US" "US" ...
##  $ description: chr  "This tremendous 100% varietal wine hails from Oakville and was aged over three years in oak. Juicy red-cherry f"| __truncated__ "Ripe aromas of fig, blackberry and cassis are softened and sweetened by a slathering of oaky chocolate and vani"| __truncated__ "Mac Watson honors the memory of a wine once made by his mother in this tremendously delicious, balanced and com"| __truncated__ "This spent 20 months in 30% new French oak, and incorporates fruit from Ponzi's Aurora, Abetina and Madrona vin"| __truncated__ ...
##  $ designation: chr  "Martha's Vineyard" "Carodorum SelecciÃ³n Especial Reserva" "Special Selected Late Harvest" "Reserve" ...
##  $ points     : num  96 96 96 96 95 95 95 95 95 95 ...
##  $ price      : num  235 110 90 65 66 73 65 110 65 60 ...
##  $ province   : chr  "California" "Northern Spain" "California" "Oregon" ...
##  $ region_1   : chr  "Napa Valley" "Toro" "Knights Valley" "Willamette Valley" ...
##  $ region_2   : chr  "Napa" NA "Sonoma" "Willamette Valley" ...
##  $ variety    : chr  "Cabernet Sauvignon" "Tinta de Toro" "Sauvignon Blanc" "Pinot Noir" ...
##  $ winery     : chr  "Heitz" "Bodega Carmen RodrÃguez" "Macauley" "Ponzi" ...

summary(winemag_data_first150k)

##       X__1          country          description        designation       
##  Min.   :     0   Length:150930      Length:150930      Length:150930     
##  1st Qu.: 37732   Class :character   Class :character   Class :character  
##  Median : 75465   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 75465                                                           
##  3rd Qu.:113197                                                           
##  Max.   :150929                                                           
##                                                                           
##      points           price           province           region_1        
##  Min.   : 80.00   Min.   :   4.00   Length:150930      Length:150930     
##  1st Qu.: 86.00   1st Qu.:  16.00   Class :character   Class :character  
##  Median : 88.00   Median :  24.00   Mode  :character   Mode  :character  
##  Mean   : 87.89   Mean   :  33.13                                        
##  3rd Qu.: 90.00   3rd Qu.:  40.00                                        
##  Max.   :100.00   Max.   :2300.00                                        
##                   NA's   :13695                                          
##    region_2           variety             winery         
##  Length:150930      Length:150930      Length:150930     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##

names(winemag_data_first150k)

##  [1] "X__1"        "country"     "description" "designation" "points"     
##  [6] "price"       "province"    "region_1"    "region_2"    "variety"    
## [11] "winery"

4 Let us load the tidyverse

library(tidyverse)

## -- Attaching packages ----------------------------------------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0

## -- Conflicts -------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)

5 Let us try to count the wine within these reviews based on country

winemag_data_first150k %>%
  group_by(country) %>%
  count() %>%
  arrange(desc(n)) %>%
  
print(winemag_data_first150k)

## # A tibble: 49 x 2
## # Groups:   country [49]
##    country         n
##    <chr>       <int>
##  1 US          62397
##  2 Italy       23478
##  3 France      21098
##  4 Spain        8268
##  5 Chile        5816
##  6 Argentina    5631
##  7 Portugal     5322
##  8 Australia    4957
##  9 New Zealand  3320
## 10 Austria      3057
## # ... with 39 more rows

Impressed that America has the most number of wines within this review?. I am.

6 Let us try to also figure out the variety of wine listed within these reviews

Just out of interest

winemag_data_first150k %>%
  group_by(variety) %>%
  count() %>%
  arrange(desc(n)) %>%
  
print(winemag_data_first150k)

## # A tibble: 632 x 2
## # Groups:   variety [632]
##    variety                      n
##    <chr>                    <int>
##  1 Chardonnay               14482
##  2 Pinot Noir               14291
##  3 Cabernet Sauvignon       12800
##  4 Red Blend                10062
##  5 Bordeaux-style Red Blend  7347
##  6 Sauvignon Blanc           6320
##  7 Syrah                     5825
##  8 Riesling                  5524
##  9 Merlot                    5070
## 10 Zinfandel                 3799
## # ... with 622 more rows

Nothing beats a good Chardonnay!.

7 Australian wines

Since i am interested in Australian wines, let us focus the bulk of our code here

7.1 For Australian wines

winemag_Australian <- winemag_data_first150k %>%
  filter(country == "Australia") %>%
  group_by(variety) %>%


print(winemag_Australian)

## # A tibble: 4,957 x 11
## # Groups:   variety [81]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
##  1    61 Austra~ Moorooduc'~ <NA>            91    36 Victoria Morning~
##  2   631 Austra~ Tim Knapps~ The Dagger      90    20 South A~ Adelaid~
##  3   660 Austra~ This crisp~ Applejack ~     88    42 Victoria Yarra V~
##  4   825 Austra~ The Dead A~ The Dead A~     90    65 South A~ McLaren~
##  5   987 Austra~ This is st~ Coal River~     90    65 Tasmania Tasmania
##  6  1066 Austra~ Hints of l~ <NA>            88    25 South A~ Eden Va~
##  7  2025 Austra~ This mediu~ Ned & Henr~     90    25 South A~ Barossa~
##  8  2028 Austra~ From vines~ 1927 Vines      90    33 Victoria Nagambi~
##  9  2148 Austra~ Full-bodie~ The Factor      98   125 South A~ Barossa~
## 10  2325 Austra~ Based in C~ <NA>            90    17 South A~ Adelaid~
## # ... with 4,947 more rows, and 3 more variables: region_2 <chr>,
## #   variety <chr>, winery <chr>

7.1.1 What Aussie state has the most entries in this review?

winemag_Australian %>%
  group_by(province) %>%
  count() %>%
  arrange(desc(n)) %>%

print(winemag_Australian)

## # A tibble: 7 x 2
## # Groups:   province [7]
##   province              n
##   <chr>             <int>
## 1 South Australia    3004
## 2 Victoria            613
## 3 Australia Other     553
## 4 Western Australia   491
## 5 New South Wales     246
## 6 Tasmania             47
## 7 Queensland            3

South Australia obviously has the most number of wines listed in this review. No wonder why they call this state The Wine State.

7.1.2 Variety of wine most commonplace in Australia

winemag_Australian %>%
  group_by(variety) %>%
  count() %>%
  arrange(desc(n)) %>%
  
print(winemag_Australian)

## # A tibble: 81 x 2
## # Groups:   variety [81]
##    variety                      n
##    <chr>                    <int>
##  1 Shiraz                    1434
##  2 Chardonnay                 669
##  3 Cabernet Sauvignon         658
##  4 Riesling                   286
##  5 Pinot Noir                 190
##  6 Red Blend                  136
##  7 Sauvignon Blanc            133
##  8 Grenache                   117
##  9 Merlot                      85
## 10 Bordeaux-style Red Blend    82
## # ... with 71 more rows

Wow!. the Shiraz variety is the common wine in this review

7.1.3 Most expensive Australian wine

Suppose one might ask, what Australian wine in the list is the most expensive?.

winemag_Australian %>%
  arrange(desc(price))

## # A tibble: 4,957 x 11
## # Groups:   variety [81]
##      X__1 country description designation points price province region_1
##     <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
##  1  10505 Austra~ This has a~ Grange          98   850 South A~ South A~
##  2  28625 Austra~ This has a~ Grange          98   850 South A~ South A~
##  3 127614 Austra~ This stell~ Hill of Gr~     95   625 South A~ Eden Va~
##  4  10509 Austra~ This is da~ Grange          97   600 South A~ South A~
##  5  28629 Austra~ This is da~ Grange          97   600 South A~ South A~
##  6  57120 Austra~ This is a ~ Hill of Gr~     96   550 South A~ Eden Va~
##  7   3033 Austra~ This Caber~ Bin 707         95   500 South A~ South A~
##  8  61325 Austra~ Readers fo~ Hill of Gr~     93   400 South A~ Eden Va~
##  9  57201 Austra~ Astralis h~ Astralis        95   350 South A~ Clarend~
## 10  96771 Austra~ Astralis h~ Astralis        95   350 South A~ Clarend~
## # ... with 4,947 more rows, and 3 more variables: region_2 <chr>,
## #   variety <chr>, winery <chr>

7.1.4 Trying to convert the price to AUD

Suppose one might ask, how much is the price of these wines in AUD?

The last time i checked, 1 USD is 1.39 AUD.

winemag_Australian <- winemag_Australian %>%
  mutate(price = price * 1.39) %>%
  arrange(desc(price))

print(winemag_Australian)

## # A tibble: 4,957 x 11
## # Groups:   variety [81]
##      X__1 country description designation points price province region_1
##     <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
##  1  10505 Austra~ This has a~ Grange          98 1182. South A~ South A~
##  2  28625 Austra~ This has a~ Grange          98 1182. South A~ South A~
##  3 127614 Austra~ This stell~ Hill of Gr~     95  869. South A~ Eden Va~
##  4  10509 Austra~ This is da~ Grange          97  834. South A~ South A~
##  5  28629 Austra~ This is da~ Grange          97  834. South A~ South A~
##  6  57120 Austra~ This is a ~ Hill of Gr~     96  764. South A~ Eden Va~
##  7   3033 Austra~ This Caber~ Bin 707         95  695  South A~ South A~
##  8  61325 Austra~ Readers fo~ Hill of Gr~     93  556  South A~ Eden Va~
##  9  57201 Austra~ Astralis h~ Astralis        95  486. South A~ Clarend~
## 10  96771 Austra~ Astralis h~ Astralis        95  486. South A~ Clarend~
## # ... with 4,947 more rows, and 3 more variables: region_2 <chr>,
## #   variety <chr>, winery <chr>

7.1.5 Visualizing the distribution between price and rating

a <- ggplot(winemag_Australian, aes(x = points, y = price))
a <- a + geom_point()
a <- a + facet_wrap(~province)
a <- a + labs(x = "\n Rating \n")
a <- a + labs(y = "\n Price (in AUD) \n")

print(a)

## Warning: Removed 63 rows containing missing values (geom_point).

7.1.6 Let us filter per rating and visualize

For analysis, let’s assume 90-100 is excellent and 80-89 is okay.

7.1.6.1 Wines with 90 to 100 points

Excellent <- winemag_Australian %>%
  filter(points >= 90)

print(Excellent)

## # A tibble: 1,519 x 11
## # Groups:   variety [51]
##      X__1 country description designation points price province region_1
##     <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
##  1  10505 Austra~ This has a~ Grange          98 1182. South A~ South A~
##  2  28625 Austra~ This has a~ Grange          98 1182. South A~ South A~
##  3 127614 Austra~ This stell~ Hill of Gr~     95  869. South A~ Eden Va~
##  4  10509 Austra~ This is da~ Grange          97  834. South A~ South A~
##  5  28629 Austra~ This is da~ Grange          97  834. South A~ South A~
##  6  57120 Austra~ This is a ~ Hill of Gr~     96  764. South A~ Eden Va~
##  7   3033 Austra~ This Caber~ Bin 707         95  695  South A~ South A~
##  8  61325 Austra~ Readers fo~ Hill of Gr~     93  556  South A~ Eden Va~
##  9  57201 Austra~ Astralis h~ Astralis        95  486. South A~ Clarend~
## 10  96771 Austra~ Astralis h~ Astralis        95  486. South A~ Clarend~
## # ... with 1,509 more rows, and 3 more variables: region_2 <chr>,
## #   variety <chr>, winery <chr>

Excellent_Count <- Excellent %>%
  group_by(province) %>%
  count()

print(Excellent_Count)

## # A tibble: 6 x 2
## # Groups:   province [6]
##   province              n
##   <chr>             <int>
## 1 Australia Other      17
## 2 New South Wales      45
## 3 South Australia    1148
## 4 Tasmania              6
## 5 Victoria            179
## 6 Western Australia   124

Excellent_Varieties <- Excellent %>%
  group_by(variety) %>%
  count() %>%
  arrange(desc(n))

Excellent_Varieties5 <- Excellent_Varieties[1:5,]

print(Excellent_Varieties)

## # A tibble: 51 x 2
## # Groups:   variety [51]
##    variety                      n
##    <chr>                    <int>
##  1 Shiraz                     667
##  2 Cabernet Sauvignon         225
##  3 Chardonnay                 113
##  4 Riesling                    74
##  5 Grenache                    49
##  6 Shiraz-Viognier             36
##  7 Bordeaux-style Red Blend    34
##  8 Syrah                       34
##  9 Red Blend                   32
## 10 Muscat                      26
## # ... with 41 more rows

7.1.6.1.1 Aggregating wines

# Per rating
Excellent_Average <- aggregate(points ~ province, Excellent, FUN = mean)
Excellent_Average <- arrange(Excellent_Average, desc(points))

print(Excellent_Average)

##            province   points
## 1          Victoria 91.67039
## 2   South Australia 91.35105
## 3 Western Australia 90.88710
## 4   Australia Other 90.58824
## 5   New South Wales 90.46667
## 6          Tasmania 90.00000

7.1.6.2 Visualizing Number of wines rated 90 or higher and average rating per state

7.1.6.2.1 Number of wines each state has in the ratings

# Number of wines of states in the ratings
a1 <- ggplot(Excellent_Count, aes(x = reorder(province, -n), y = n, FILL = province))
a1 <- a1 + geom_bar(aes(fill = province), stat = "identity")
a1 <- a1 + geom_text(aes(label = round(n, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a1 <- a1 + labs(x = "\n Australian States \n") + labs(y = "\n Number of wines reviewed \n")
a1 <- a1 + labs(title = "\n Australian wines ranked 90 or higher \n")
a1 <- a1 + coord_cartesian(ylim = c(0, 1500))
a1 <- a1 + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9.5, face = "bold"))
a1 <- a1 + theme(legend.position = "none")
a1 <- a1 + theme(plot.title = element_text(size = 15, face = "bold"))

print(a1)

7.1.6.2.2 Mean rating per state

# Mean Rating per state
a2 <- ggplot(Excellent_Average, aes(x = reorder(province, -points), y = points, FILL = province))
a2 <- a2 + geom_bar(aes(fill = province), stat = "identity")
a2 <- a2 + geom_text(aes(label = round(points, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a2 <- a2 + labs(x = "\n Australian States \n") + labs(y = "\n Average Rating \n")
a2 <- a2 + labs(title = "\n State averages of wines 90 or higher \n")
a2 <- a2 + coord_cartesian(ylim = c(0, 100))
a2 <- a2 + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9.5, face = "bold"))
a2 <- a2 + theme(legend.position = "none")
a2 <- a2 + theme(plot.title = element_text(size = 15, face = "bold"))

print(a2)

7.1.6.2.3 Wine varieties rated

# Wine varieties 
a3 <- ggplot(Excellent_Varieties5, aes(x = reorder(variety, -n), y = n, FILL = variety))
a3 <- a3 + geom_bar(aes(fill = variety), stat = "identity")
a3 <- a3 + geom_text(aes(label = round(n, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a3 <- a3 + labs(x = "\n Variety of wine \n") + labs(y = "\n Number of wines reviewed \n")
a3 <- a3 + labs(title = "\n Top 5 varieties rated 90 or higher \n")
a3 <- a3 + coord_cartesian(ylim = c(0, 800))
a3 <- a3 + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 9.5, face = "bold"))
a3 <- a3 + theme(legend.position = "none")
a3 <- a3 + theme(plot.title = element_text(size = 15, face = "bold"))

print(a3)

7.1.6.3 Same but with Wines with 80 to 89

Okay <- winemag_Australian %>%
  filter(points <= 89)

print(Okay)

## # A tibble: 3,438 x 11
## # Groups:   variety [76]
##      X__1 country description designation points price province region_1
##     <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
##  1 101603 Austra~ Intensely ~ The Armagh      89  243. South A~ Clare V~
##  2 125783 Austra~ Intensely ~ The Armagh      89  243. South A~ Clare V~
##  3 139119 Austra~ A manageab~ Aphrodite       89  221. South A~ Barossa~
##  4  56798 Austra~ An icon in~ Quintet         88  202. Victoria Yarra V~
##  5 117087 Austra~ One of Aus~ Quintet         89  202. Victoria Yarra V~
##  6 130677 Austra~ One of Aus~ Quintet         89  202. Victoria Yarra V~
##  7  86383 Austra~ After rece~ The Pict        88  188. South A~ Barossa~
##  8  48261 Austra~ There's a ~ Graveyard ~     89  174. New Sou~ Hunter ~
##  9  50437 Austra~ Restraint ~ <NA>            89  174. Victoria Yarra V~
## 10  51050 Austra~ In 5â€“10 ~ Emily's Pa~     89  174. Victoria Heathco~
## # ... with 3,428 more rows, and 3 more variables: region_2 <chr>,
## #   variety <chr>, winery <chr>

Okay_Count <- Okay %>%
  group_by(province) %>%
  count()

print(Okay_Count)

## # A tibble: 7 x 2
## # Groups:   province [7]
##   province              n
##   <chr>             <int>
## 1 Australia Other     536
## 2 New South Wales     201
## 3 Queensland            3
## 4 South Australia    1856
## 5 Tasmania             41
## 6 Victoria            434
## 7 Western Australia   367

Okay_Varieties <- Okay %>%
  group_by(variety) %>%
  count() %>%
  arrange(desc(n))

print(Okay_Varieties)

## # A tibble: 76 x 2
## # Groups:   variety [76]
##    variety                       n
##    <chr>                     <int>
##  1 Shiraz                      767
##  2 Chardonnay                  556
##  3 Cabernet Sauvignon          433
##  4 Riesling                    212
##  5 Pinot Noir                  169
##  6 Sauvignon Blanc             120
##  7 Red Blend                   104
##  8 Merlot                       82
##  9 Grenache                     68
## 10 Shiraz-Cabernet Sauvignon    66
## # ... with 66 more rows

7.1.6.4 Aggregating wines that are okay

Okay_Average <- aggregate(points ~ province, Okay, FUN = mean)
Okay_Average <- arrange(Okay_Average, desc(points))

print(Okay_Average)

##            province   points
## 1          Tasmania 87.31707
## 2   South Australia 86.80873
## 3          Victoria 86.55069
## 4 Western Australia 86.54496
## 5   New South Wales 86.28358
## 6        Queensland 85.00000
## 7   Australia Other 84.63060

7.1.6.5 Visualizing

# Number of states in the ratings
a1 <- ggplot(Okay_Count, aes(x = reorder(province, -n), y = n, FILL = province))
a1 <- a1 + geom_bar(aes(fill = province), stat = "identity")
a1 <- a1 + geom_text(aes(label = round(n, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a1 <- a1 + labs(x = "\n Australian States \n") + labs(y = "\n Number of wines reviewed \n")
a1 <- a1 + labs(title = "\n Australian wines ranked 89 or less \n")
a1 <- a1 + coord_cartesian(ylim = c(0, 2000))
a1 <- a1 + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9.5, face = "bold"))
a1 <- a1 + theme(legend.position = "none")
a1 <- a1 + theme(plot.title = element_text(size = 15, face = "bold"))

print(a1)

7.1.6.5.1 Mean Rating per state

# Mean rating per state
a2 <- ggplot(Okay_Average, aes(x = reorder(province, -points), y = points, FILL = province))
a2 <- a2 + geom_bar(aes(fill = province), stat = "identity")
a2 <- a2 + geom_text(aes(label = round(points, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a2 <- a2 + labs(x = "\n Australian States \n") + labs(y = "\n Average Rating \n")
a2 <- a2 + labs(title = "\n State averages of wines ranked 89 or less \n")
a2 <- a2 + coord_cartesian(ylim = c(0, 100))
a2 <- a2 + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9.5, face = "bold"))
a2 <- a2 + theme(legend.position = "none")
a2 <- a2 + theme(plot.title = element_text(size = 15, face = "bold"))

print(a2)

##### Wine varieties

Okay_Varieties <- Okay_Varieties[1:5,]

#Wine Varieties

a3 <- ggplot(Okay_Varieties, aes(x = reorder(variety, -n), y = n, FILL = variety))
a3 <- a3 + geom_bar(aes(fill = variety), stat = "identity")
a3 <- a3 + geom_text(aes(label = round(n, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a3 <- a3 + labs(x = "\n Variety of wine \n") + labs(y = "\n Number of wines reviewed \n")
a3 <- a3 + labs(title = "\n Top 5 varieties rated 89 or less \n")
a3 <- a3 + coord_cartesian(ylim = c(0, 1000))
a3 <- a3 + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 9.5, face = "bold"))
a3 <- a3 + theme(legend.position = "none")
a3 <- a3 + theme(plot.title = element_text(size = 15, face = "bold"))

print(a3)

7.1.7 Analysis with regards to price and state

Here, we would want to see the visual relationship between the Price and the number of points in the rating. Never mind the variety and let’s talk about overall picture. .

7.1.8 First with wines ranked excellent

head(Excellent)

## # A tibble: 6 x 11
## # Groups:   variety [1]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  10505 Austra~ This has a~ Grange          98 1182. South A~ South A~
## 2  28625 Austra~ This has a~ Grange          98 1182. South A~ South A~
## 3 127614 Austra~ This stell~ Hill of Gr~     95  869. South A~ Eden Va~
## 4  10509 Austra~ This is da~ Grange          97  834. South A~ South A~
## 5  28629 Austra~ This is da~ Grange          97  834. South A~ South A~
## 6  57120 Austra~ This is a ~ Hill of Gr~     96  764. South A~ Eden Va~
## # ... with 3 more variables: region_2 <chr>, variety <chr>, winery <chr>

7.1.8.1 Visualizing the overall picture

Australia <- ggplot(winemag_Australian, aes(x = price, y = points, col = province))
Australia <- Australia + geom_point()
Australia <- Australia + labs(x = "\n Price \n")
Australia <- Australia + labs(y = "\n Points \n")
Australia <- Australia + labs(title = "\n Price/Points visual distribution (SA) \n")

print(Australia)

## Warning: Removed 63 rows containing missing values (geom_point).

7.1.8.1.1 Filtering by state

# For SA
Excellent_SA <- Excellent %>%
  filter(province == "South Australia") %>%
  mutate(AUD_per_point = price/points)

head(Excellent_SA)

## # A tibble: 6 x 12
## # Groups:   variety [1]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  10505 Austra~ This has a~ Grange          98 1182. South A~ South A~
## 2  28625 Austra~ This has a~ Grange          98 1182. South A~ South A~
## 3 127614 Austra~ This stell~ Hill of Gr~     95  869. South A~ Eden Va~
## 4  10509 Austra~ This is da~ Grange          97  834. South A~ South A~
## 5  28629 Austra~ This is da~ Grange          97  834. South A~ South A~
## 6  57120 Austra~ This is a ~ Hill of Gr~     96  764. South A~ Eden Va~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Excellent_SA)

## [1] 1148

# For Victoria
Excellent_Vic <- Excellent %>%
  filter(province == "Victoria") %>%
  mutate(AUD_per_point =  price/points)

head(Excellent_Vic)

## # A tibble: 6 x 12
## # Groups:   variety [3]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  19355 Austra~ Not a Cell~ Rare           100  417. Victoria Rutherg~
## 2  19364 Austra~ Incredibly~ Rare Musca~     97  417. Victoria Rutherg~
## 3  84035 Austra~ Not a Cell~ Rare           100  417. Victoria Rutherg~
## 4  84044 Austra~ Incredibly~ Rare Musca~     97  417. Victoria Rutherg~
## 5 101477 Austra~ This Musca~ Rare            95  417. Victoria Rutherg~
## 6 101780 Austra~ Befitting ~ Rare Musca~     93  417. Victoria Rutherg~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Excellent_Vic)

## [1] 179

# For WA
Excellent_WA <- Excellent %>%
  filter(province == "Western Australia") %>%
  mutate(AUD_per_point = price/points)

head(Excellent_WA)

## # A tibble: 6 x 12
## # Groups:   variety [2]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  27625 Austra~ Crisp and ~ Diana Made~     90  152. Western~ Margare~
## 2  72891 Austra~ The 2009 D~ Diana Made~     91  132. Western~ Margare~
## 3  55855 Austra~ With only ~ Diana Made~     93  131. Western~ Margare~
## 4  98815 Austra~ With only ~ Diana Made~     93  131. Western~ Margare~
## 5 121765 Austra~ With only ~ Diana Made~     93  131. Western~ Margare~
## 6  10793 Austra~ Having ear~ Art Series      94  124. Western~ Margare~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Excellent_WA)

## [1] 124

# For NSW 
Excellent_NSW <- Excellent %>%
  filter(province == "New South Wales") %>%
  mutate(AUD_per_point = price/points)

head(Excellent_NSW)

## # A tibble: 6 x 12
## # Groups:   variety [3]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1 100131 Austra~ One of the~ Graveyard ~     91 174.  New Sou~ Hunter ~
## 2 124281 Austra~ One of the~ Graveyard ~     91 174.  New Sou~ Hunter ~
## 3  77892 Austra~ Brokenwood~ ILR Reserv~     90  66.7 New Sou~ Hunter ~
## 4  68994 Austra~ Only conta~ No. 89          90  62.6 New Sou~ Orange  
## 5 114414 Austra~ Only conta~ No. 89          90  62.6 New Sou~ Orange  
## 6 137184 Austra~ Only conta~ No. 89          90  62.6 New Sou~ Orange  
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Excellent_NSW)

## [1] 45

# For Wines in Other Aussie states
# I personally do not understand why there is "Australia other"
Excellent_Other <- Excellent %>%
  filter(province == "Australia Other") %>%
  mutate(AUD_per_point = price/points)

head(Excellent_Other)

## # A tibble: 6 x 12
## # Groups:   variety [2]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  68251 Austra~ This big w~ Yattarna        92  90.4 Austral~ South E~
## 2 143222 Austra~ An excelle~ Yattarna        91  90.4 Austral~ South E~
## 3 150571 Austra~ This big w~ Yattarna        92  90.4 Austral~ South E~
## 4  55697 Austra~ A unique d~ Black Noble     90  52.8 Austral~ Austral~
## 5 104334 Austra~ From 40-ye~ Idyll Vine~     91  41.7 Austral~ South E~
## 6 131994 Austra~ From 40-ye~ Idyll Vine~     91  41.7 Austral~ South E~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Excellent_Other)

## [1] 17

#For Tasmania
Excellent_Tasmania <- Excellent %>%
  filter(province == "Tasmania") %>%
  mutate(AUD_per_point = price/points)

head(Excellent_Tasmania)

## # A tibble: 6 x 12
## # Groups:   variety [3]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1    987 Austra~ This is st~ Coal River~     90  90.4 Tasmania Tasmania
## 2  28707 Austra~ This is st~ Coal River~     90  90.4 Tasmania Tasmania
## 3  38700 Austra~ This pale ~ Premium Br~     90  30.6 Tasmania Tasmania
## 4  24759 Austra~ A lovely, ~ 42Â°S Prem~     90  27.8 Tasmania Tasmania
## 5  44439 Austra~ A lovely, ~ 42Â°S Prem~     90  27.8 Tasmania Tasmania
## 6 122319 Austra~ A lovely, ~ 42Â°S Prem~     90  27.8 Tasmania Tasmania
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Excellent_Tasmania)

## [1] 6

7.1.8.1.2 Visualizing per state

7.1.8.1.2.1 For SA

SA <- ggplot(Excellent_SA, aes(x = points, y = AUD_per_point, col = price))
SA <- SA + geom_point()
SA <- SA + labs(x = "\n Points \n")
SA <- SA + labs(y = "\n AUD per point \n")
SA <- SA + labs(title = "\n Price/Points visual distribution (SA) \n")
SA <- SA + coord_cartesian(xlim = c(89.5, 100))

print(SA)

## Warning: Removed 25 rows containing missing values (geom_point).

7.1.8.1.2.2 For Victoria

Vic <- ggplot(Excellent_Vic, aes(x = points, y = AUD_per_point, col = price))
Vic <- Vic + geom_point()
Vic <- Vic + labs(x = "\n Points \n")
Vic <- Vic + labs(y = "\n AUD per point \n")
Vic <- Vic + labs(title = "\n Price/Points visual distribution (VIC) \n")
Vic <- Vic + coord_cartesian(xlim = c(89.5, 100))

print(Vic)

## Warning: Removed 1 rows containing missing values (geom_point).

7.1.8.1.2.3 For WA

WA <- ggplot(Excellent_WA, aes(x = points, y = AUD_per_point, col = price))
WA <- WA + geom_point()
WA <- WA + labs(x = "\n Points \n")
WA <- WA + labs(y = "\n AUD per point \n")
WA <- WA + labs(title = "\n Price/Points visual distribution (WA) \n")
WA <- WA + coord_cartesian(xlim = c(89.5, 100))

print(WA)

7.1.8.1.2.4 For NSW

NSW <- ggplot(Excellent_NSW, aes(x = points, y = AUD_per_point, col = price))
NSW <- NSW + geom_point() 
NSW <- NSW + labs(x = "\n Points \n")
NSW <- NSW + labs(y = "\n AUD per point \n")
NSW <- NSW + labs(title = "\n Price/Points visual distribution (NSW) \n")
NSW <- NSW + coord_cartesian(xlim = c(89.5, 100))

print(NSW)

7.1.8.1.2.5 For Other

AUD_Other <- ggplot(Excellent_Other, aes(x = points, y = AUD_per_point, col = price))
AUD_Other <- AUD_Other + geom_point()
AUD_Other <- AUD_Other + labs(x = "\n Points \n")
AUD_Other <- AUD_Other + labs(y = "\n AUD per point \n")
AUD_Other <- AUD_Other + labs(title = "\n Price/Points visual distribution (Other) \n")
AUD_Other <- AUD_Other + coord_cartesian(xlim = c(89.5, 100))

print(AUD_Other)

7.1.8.1.2.6 For Tasmania

TAS <- ggplot(Excellent_Tasmania, aes(x = points, y = AUD_per_point, col = price))
TAS <- TAS + geom_point()
TAS <- TAS + labs(x = "\n Points \n")
TAS <- TAS + labs(y = "\n AUD per point \n")
TAS <- TAS + labs(title = "\n Price/Points visual distribution (TAS) \n")
TAS <- TAS + coord_cartesian(xlim = c(89.5, 100))

print(TAS)

I had to include Tasmania here in spite of the fact that they are more or less a speck onto the “Excellent”. Queensland has no entries rated 90 and above

7.1.9 Now, with wines ranked Okay

Since there are only 3 Tassie wines in the dataset, i might as well have to exclude them.

7.1.9.1 By state

# For SA 
Okay_SA <- Okay %>%
  filter(province == "South Australia") %>%
  mutate(AUD_per_point = price/points)

head(Okay_SA)

## # A tibble: 6 x 12
## # Groups:   variety [3]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1 101603 Austra~ Intensely ~ The Armagh      89  243. South A~ Clare V~
## 2 125783 Austra~ Intensely ~ The Armagh      89  243. South A~ Clare V~
## 3 139119 Austra~ A manageab~ Aphrodite       89  221. South A~ Barossa~
## 4  86383 Austra~ After rece~ The Pict        88  188. South A~ Barossa~
## 5 139710 Austra~ Just the t~ <NA>            88  174. South A~ Barossa~
## 6  13233 Austra~ This wine ~ Ashmead Si~     89  150. South A~ Barossa 
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Okay_SA)

## [1] 1856

# For Victoria
Okay_Vic <- Okay %>%
  filter(province == "Victoria") %>%
  mutate(AUD_per_point =  price/points)

head(Okay_Vic)

## # A tibble: 6 x 12
## # Groups:   variety [3]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  56798 Austra~ An icon in~ Quintet         88  202. Victoria Yarra V~
## 2 117087 Austra~ One of Aus~ Quintet         89  202. Victoria Yarra V~
## 3 130677 Austra~ One of Aus~ Quintet         89  202. Victoria Yarra V~
## 4  50437 Austra~ Restraint ~ <NA>            89  174. Victoria Yarra V~
## 5  51050 Austra~ In 5â€“10 ~ Emily's Pa~     89  174. Victoria Heathco~
## 6  51957 Austra~ Renowned f~ Quintet         85  174. Victoria Yarra V~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Okay_Vic)

## [1] 434

# For NSW 
Okay_NSW <- Okay %>%
  filter(province == "New South Wales") %>%
  mutate(AUD_per_point = price/points)

head(Okay_NSW)

## # A tibble: 6 x 12
## # Groups:   variety [3]
##    X__1 country description designation points price province region_1
##   <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1 48261 Austra~ There's a ~ Graveyard ~     89 174.  New Sou~ Hunter ~
## 2 91311 Austra~ There's a ~ Graveyard ~     89 174.  New Sou~ Hunter ~
## 3 16250 Austra~ Fits the C~ Vat 47          88  55.6 New Sou~ Hunter ~
## 4 52686 Austra~ A decidedl~ No. 8           85  55.6 New Sou~ Orange  
## 5 70124 Austra~ This Chard~ Vat 47          89  55.6 New Sou~ Hunter ~
## 6 70625 Austra~ A bit on t~ No. 8           85  55.6 New Sou~ Orange  
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Okay_NSW)

## [1] 201

# For Australia Other
Okay_Other <- Okay %>%
  filter(province == "Australia Other") %>%
  mutate(AUD_per_point = price/points)

head(Okay_Other)

## # A tibble: 6 x 12
## # Groups:   variety [4]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  15417 Austra~ This Chard~ Pyrenees        87  55.6 Austral~ Austral~
## 2 146083 Austra~ "This tawn~ <NA>            85  55.6 Austral~ South E~
## 3  20534 Austra~ This is an~ Old Boys V~     89  51.4 Austral~ South E~
## 4 110828 Austra~ A potentia~ Idyll Vine~     85  41.7 Austral~ South E~
## 5 127058 Austra~ A potentia~ Idyll Vine~     85  41.7 Austral~ South E~
## 6 147143 Austra~ Quite full~ Reserve         87  40.3 Austral~ Austral~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Okay_Other)

## [1] 536

# For Tasmania
Okay_Tasmania <- Okay %>%
  filter(province == "Tasmania") %>%
  mutate(AUD_per_point = price/points)

head(Okay_Tasmania)

## # A tibble: 6 x 12
## # Groups:   variety [3]
##    X__1 country description designation points price province region_1
##   <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  7409 Austra~ Pie cherry~ Bicheno Si~     87  90.4 Tasmania Tasmania
## 2 34857 Austra~ Pie cherry~ Bicheno Si~     87  90.4 Tasmania Tasmania
## 3 17530 Austra~ Lavish van~ Zdar            87  65.3 Tasmania Tasmania
## 4 31472 Austra~ Tasmania, ~ CuvÃ©e du ~     88  55.6 Tasmania Tasmania
## 5 15817 Austra~ In a recen~ Brut            89  44.5 Tasmania Tasmania
## 6 41597 Austra~ A zippy, e~ Brut            89  44.5 Tasmania Tasmania
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Okay_Tasmania)

## [1] 41

Okay_QLD <- Okay %>%
  filter(province == "Queensland") %>%
  mutate(AUD_per_point = price/points)

head(Okay_QLD)

## # A tibble: 3 x 12
## # Groups:   variety [1]
##     X__1 country description designation points price province region_1
##    <dbl> <chr>   <chr>       <chr>        <dbl> <dbl> <chr>    <chr>   
## 1  61296 Austra~ Comes from~ <NA>            85  18.1 Queensl~ Granite~
## 2 112146 Austra~ Comes from~ <NA>            85  18.1 Queensl~ Granite~
## 3 147156 Austra~ Comes from~ <NA>            85  18.1 Queensl~ Granite~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## #   AUD_per_point <dbl>

nrow(Okay_QLD)

## [1] 3

7.1.9.1.1 Visualizing per state

7.1.9.1.1.1 For SA

SA <- ggplot(Okay_SA, aes(x = points, y = AUD_per_point, col = price))
SA <- SA + geom_point()
SA <- SA + labs(x = "\n Points \n")
SA <- SA + labs(y = "\n AUD per point \n")
SA <- SA + labs(title = "\n Price/Points visual distribution (SA) \n")
SA <- SA + coord_cartesian(xlim = c(79.5, 90))

print(SA)

## Warning: Removed 24 rows containing missing values (geom_point).

7.1.9.1.1.2 For Victoria

Vic <- ggplot(Okay_Vic, aes(x = points, y = AUD_per_point, col = price))
Vic <- Vic + geom_point()
Vic <- Vic + labs(x = "\n Points \n")
Vic <- Vic + labs(y = "\n AUD per point \n")
Vic <- Vic + labs(title = "\n Price/Points visual distribution (VIC) \n")
Vic <- Vic + coord_cartesian(xlim = c(79.5, 90))

print(Vic)

## Warning: Removed 3 rows containing missing values (geom_point).

7.1.9.1.1.3 For NSW

NSW <- ggplot(Okay_NSW, aes(x = points, y = AUD_per_point, col = price))
NSW <- NSW + geom_point() 
NSW <- NSW + labs(x = "\n Points \n")
NSW <- NSW + labs(y = "\n AUD per point \n")
NSW <- NSW + labs(title = "\n Price/Points visual distribution (NSW) \n")
NSW <- NSW + coord_cartesian(xlim = c(79.5, 90))

print(NSW)

## Warning: Removed 2 rows containing missing values (geom_point).

7.1.9.1.1.4 For Australia Other

AUD_Other <- ggplot(Okay_Other, aes(x = points, y = AUD_per_point, col = price))
AUD_Other <- AUD_Other + geom_point()
AUD_Other <- AUD_Other + labs(x = "\n Points \n")
AUD_Other <- AUD_Other + labs(y = "\n AUD per point \n")
AUD_Other <- AUD_Other + labs(title = "\n Price/Points visual distribution (Other) \n")
AUD_Other <- AUD_Other + coord_cartesian(xlim = c(79.5, 90))

print(AUD_Other)

## Warning: Removed 2 rows containing missing values (geom_point).

7.1.9.1.1.5 For Tasmania

TAS <- ggplot(Okay_Tasmania, aes(x = points, y = AUD_per_point, col = price))
TAS <- TAS + geom_point()
TAS <- TAS + labs(x = "\n Points \n")
TAS <- TAS + labs(y = "\n AUD per point \n")
TAS <- TAS + labs(title = "\n Price/Points visual distribution (TAS) \n")
TAS <- TAS + coord_cartesian(xlim = c(79.5, 90))

print(TAS)

## Warning: Removed 2 rows containing missing values (geom_point).

7.1.9.1.1.6 For Queensland

QLD <- ggplot(Okay_QLD, aes(x = points, y = AUD_per_point, col = price))
QLD <- QLD + geom_point()
QLD <- QLD + labs(x = "\n Points \n")
QLD <- QLD + labs(y = "\n AUD per point \n")
QLD <- QLD + labs(title = "\n Price/Points visual distribution (QLD) \n")
QLD <- QLD + coord_cartesian(xlim = c(0, 100))

print(QLD)

8 Conclusion

South Australia has the greatest number of wines ranked 90 and above listed in this reeview although Victoria has the highest average ratings of wines rated 90 and above. Shiraz is the most-rated wine of the lot

Of the wines averaging 89 and less, South Australia has the greatest number of wines but Tasmania has the highest average ratings. Shiraz is the most-rated wine of the lot

Based on what we can see at the Points vs AUD per point distribution, it is that the greater AUD per point, the greater the rating and it is also dependent per state although in Victoria it is a different story since, comparing it to South Australia’s highest-rated wine, its highest-rated wine does not require an AUD-per-point ratio of 12 compared to South Australia’s.

Analyzing Wine Reviews for Australian wines, the Tidyverse way

Joel Jr Rudinas

November 14, 2018