Greetings!. I got this dataset from Kaggle although for personal convenience i loaded it and saved it to Excel. If you wish to get a look at this dataset, the link is here: https://www.kaggle.com/zynicide/wine-reviews
As a wine fanatic, i’m going to have a look at this partly as a matter of personal interest as well
library(readxl)
winemag_data_first150k <- read_excel("D:/Working Directory/winemag_data_first150k.xlsx")
View(winemag_data_first150k)
head(winemag_data_first150k)
## # A tibble: 6 x 11
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 0 US This treme~ Martha's V~ 96 235 Califor~ Napa Va~
## 2 1 Spain Ripe aroma~ Carodorum ~ 96 110 Norther~ Toro
## 3 2 US Mac Watson~ Special Se~ 96 90 Califor~ Knights~
## 4 3 US This spent~ Reserve 96 65 Oregon Willame~
## 5 4 France This is th~ La Brûlade 95 66 Provence Bandol
## 6 5 Spain Deep, dens~ Numanthia 95 73 Norther~ Toro
## # ... with 3 more variables: region_2 <chr>, variety <chr>, winery <chr>
str(winemag_data_first150k)
## Classes 'tbl_df', 'tbl' and 'data.frame': 150930 obs. of 11 variables:
## $ X__1 : num 0 1 2 3 4 5 6 7 8 9 ...
## $ country : chr "US" "Spain" "US" "US" ...
## $ description: chr "This tremendous 100% varietal wine hails from Oakville and was aged over three years in oak. Juicy red-cherry f"| __truncated__ "Ripe aromas of fig, blackberry and cassis are softened and sweetened by a slathering of oaky chocolate and vani"| __truncated__ "Mac Watson honors the memory of a wine once made by his mother in this tremendously delicious, balanced and com"| __truncated__ "This spent 20 months in 30% new French oak, and incorporates fruit from Ponzi's Aurora, Abetina and Madrona vin"| __truncated__ ...
## $ designation: chr "Martha's Vineyard" "Carodorum Selección Especial Reserva" "Special Selected Late Harvest" "Reserve" ...
## $ points : num 96 96 96 96 95 95 95 95 95 95 ...
## $ price : num 235 110 90 65 66 73 65 110 65 60 ...
## $ province : chr "California" "Northern Spain" "California" "Oregon" ...
## $ region_1 : chr "Napa Valley" "Toro" "Knights Valley" "Willamette Valley" ...
## $ region_2 : chr "Napa" NA "Sonoma" "Willamette Valley" ...
## $ variety : chr "Cabernet Sauvignon" "Tinta de Toro" "Sauvignon Blanc" "Pinot Noir" ...
## $ winery : chr "Heitz" "Bodega Carmen RodrÃguez" "Macauley" "Ponzi" ...
summary(winemag_data_first150k)
## X__1 country description designation
## Min. : 0 Length:150930 Length:150930 Length:150930
## 1st Qu.: 37732 Class :character Class :character Class :character
## Median : 75465 Mode :character Mode :character Mode :character
## Mean : 75465
## 3rd Qu.:113197
## Max. :150929
##
## points price province region_1
## Min. : 80.00 Min. : 4.00 Length:150930 Length:150930
## 1st Qu.: 86.00 1st Qu.: 16.00 Class :character Class :character
## Median : 88.00 Median : 24.00 Mode :character Mode :character
## Mean : 87.89 Mean : 33.13
## 3rd Qu.: 90.00 3rd Qu.: 40.00
## Max. :100.00 Max. :2300.00
## NA's :13695
## region_2 variety winery
## Length:150930 Length:150930 Length:150930
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
names(winemag_data_first150k)
## [1] "X__1" "country" "description" "designation" "points"
## [6] "price" "province" "region_1" "region_2" "variety"
## [11] "winery"
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts -------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
winemag_data_first150k %>%
group_by(country) %>%
count() %>%
arrange(desc(n)) %>%
print(winemag_data_first150k)
## # A tibble: 49 x 2
## # Groups: country [49]
## country n
## <chr> <int>
## 1 US 62397
## 2 Italy 23478
## 3 France 21098
## 4 Spain 8268
## 5 Chile 5816
## 6 Argentina 5631
## 7 Portugal 5322
## 8 Australia 4957
## 9 New Zealand 3320
## 10 Austria 3057
## # ... with 39 more rows
Impressed that America has the most number of wines within this review?. I am.
Just out of interest
winemag_data_first150k %>%
group_by(variety) %>%
count() %>%
arrange(desc(n)) %>%
print(winemag_data_first150k)
## # A tibble: 632 x 2
## # Groups: variety [632]
## variety n
## <chr> <int>
## 1 Chardonnay 14482
## 2 Pinot Noir 14291
## 3 Cabernet Sauvignon 12800
## 4 Red Blend 10062
## 5 Bordeaux-style Red Blend 7347
## 6 Sauvignon Blanc 6320
## 7 Syrah 5825
## 8 Riesling 5524
## 9 Merlot 5070
## 10 Zinfandel 3799
## # ... with 622 more rows
Nothing beats a good Chardonnay!.
Since i am interested in Australian wines, let us focus the bulk of our code here
winemag_Australian <- winemag_data_first150k %>%
filter(country == "Australia") %>%
group_by(variety) %>%
print(winemag_Australian)
## # A tibble: 4,957 x 11
## # Groups: variety [81]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 61 Austra~ Moorooduc'~ <NA> 91 36 Victoria Morning~
## 2 631 Austra~ Tim Knapps~ The Dagger 90 20 South A~ Adelaid~
## 3 660 Austra~ This crisp~ Applejack ~ 88 42 Victoria Yarra V~
## 4 825 Austra~ The Dead A~ The Dead A~ 90 65 South A~ McLaren~
## 5 987 Austra~ This is st~ Coal River~ 90 65 Tasmania Tasmania
## 6 1066 Austra~ Hints of l~ <NA> 88 25 South A~ Eden Va~
## 7 2025 Austra~ This mediu~ Ned & Henr~ 90 25 South A~ Barossa~
## 8 2028 Austra~ From vines~ 1927 Vines 90 33 Victoria Nagambi~
## 9 2148 Austra~ Full-bodie~ The Factor 98 125 South A~ Barossa~
## 10 2325 Austra~ Based in C~ <NA> 90 17 South A~ Adelaid~
## # ... with 4,947 more rows, and 3 more variables: region_2 <chr>,
## # variety <chr>, winery <chr>
winemag_Australian %>%
group_by(province) %>%
count() %>%
arrange(desc(n)) %>%
print(winemag_Australian)
## # A tibble: 7 x 2
## # Groups: province [7]
## province n
## <chr> <int>
## 1 South Australia 3004
## 2 Victoria 613
## 3 Australia Other 553
## 4 Western Australia 491
## 5 New South Wales 246
## 6 Tasmania 47
## 7 Queensland 3
South Australia obviously has the most number of wines listed in this review. No wonder why they call this state The Wine State.
winemag_Australian %>%
group_by(variety) %>%
count() %>%
arrange(desc(n)) %>%
print(winemag_Australian)
## # A tibble: 81 x 2
## # Groups: variety [81]
## variety n
## <chr> <int>
## 1 Shiraz 1434
## 2 Chardonnay 669
## 3 Cabernet Sauvignon 658
## 4 Riesling 286
## 5 Pinot Noir 190
## 6 Red Blend 136
## 7 Sauvignon Blanc 133
## 8 Grenache 117
## 9 Merlot 85
## 10 Bordeaux-style Red Blend 82
## # ... with 71 more rows
Wow!. the Shiraz variety is the common wine in this review
Suppose one might ask, what Australian wine in the list is the most expensive?.
winemag_Australian %>%
arrange(desc(price))
## # A tibble: 4,957 x 11
## # Groups: variety [81]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 10505 Austra~ This has a~ Grange 98 850 South A~ South A~
## 2 28625 Austra~ This has a~ Grange 98 850 South A~ South A~
## 3 127614 Austra~ This stell~ Hill of Gr~ 95 625 South A~ Eden Va~
## 4 10509 Austra~ This is da~ Grange 97 600 South A~ South A~
## 5 28629 Austra~ This is da~ Grange 97 600 South A~ South A~
## 6 57120 Austra~ This is a ~ Hill of Gr~ 96 550 South A~ Eden Va~
## 7 3033 Austra~ This Caber~ Bin 707 95 500 South A~ South A~
## 8 61325 Austra~ Readers fo~ Hill of Gr~ 93 400 South A~ Eden Va~
## 9 57201 Austra~ Astralis h~ Astralis 95 350 South A~ Clarend~
## 10 96771 Austra~ Astralis h~ Astralis 95 350 South A~ Clarend~
## # ... with 4,947 more rows, and 3 more variables: region_2 <chr>,
## # variety <chr>, winery <chr>
Suppose one might ask, how much is the price of these wines in AUD?
The last time i checked, 1 USD is 1.39 AUD.
winemag_Australian <- winemag_Australian %>%
mutate(price = price * 1.39) %>%
arrange(desc(price))
print(winemag_Australian)
## # A tibble: 4,957 x 11
## # Groups: variety [81]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 10505 Austra~ This has a~ Grange 98 1182. South A~ South A~
## 2 28625 Austra~ This has a~ Grange 98 1182. South A~ South A~
## 3 127614 Austra~ This stell~ Hill of Gr~ 95 869. South A~ Eden Va~
## 4 10509 Austra~ This is da~ Grange 97 834. South A~ South A~
## 5 28629 Austra~ This is da~ Grange 97 834. South A~ South A~
## 6 57120 Austra~ This is a ~ Hill of Gr~ 96 764. South A~ Eden Va~
## 7 3033 Austra~ This Caber~ Bin 707 95 695 South A~ South A~
## 8 61325 Austra~ Readers fo~ Hill of Gr~ 93 556 South A~ Eden Va~
## 9 57201 Austra~ Astralis h~ Astralis 95 486. South A~ Clarend~
## 10 96771 Austra~ Astralis h~ Astralis 95 486. South A~ Clarend~
## # ... with 4,947 more rows, and 3 more variables: region_2 <chr>,
## # variety <chr>, winery <chr>
a <- ggplot(winemag_Australian, aes(x = points, y = price))
a <- a + geom_point()
a <- a + facet_wrap(~province)
a <- a + labs(x = "\n Rating \n")
a <- a + labs(y = "\n Price (in AUD) \n")
print(a)
## Warning: Removed 63 rows containing missing values (geom_point).
For analysis, let’s assume 90-100 is excellent and 80-89 is okay.
Excellent <- winemag_Australian %>%
filter(points >= 90)
print(Excellent)
## # A tibble: 1,519 x 11
## # Groups: variety [51]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 10505 Austra~ This has a~ Grange 98 1182. South A~ South A~
## 2 28625 Austra~ This has a~ Grange 98 1182. South A~ South A~
## 3 127614 Austra~ This stell~ Hill of Gr~ 95 869. South A~ Eden Va~
## 4 10509 Austra~ This is da~ Grange 97 834. South A~ South A~
## 5 28629 Austra~ This is da~ Grange 97 834. South A~ South A~
## 6 57120 Austra~ This is a ~ Hill of Gr~ 96 764. South A~ Eden Va~
## 7 3033 Austra~ This Caber~ Bin 707 95 695 South A~ South A~
## 8 61325 Austra~ Readers fo~ Hill of Gr~ 93 556 South A~ Eden Va~
## 9 57201 Austra~ Astralis h~ Astralis 95 486. South A~ Clarend~
## 10 96771 Austra~ Astralis h~ Astralis 95 486. South A~ Clarend~
## # ... with 1,509 more rows, and 3 more variables: region_2 <chr>,
## # variety <chr>, winery <chr>
Excellent_Count <- Excellent %>%
group_by(province) %>%
count()
print(Excellent_Count)
## # A tibble: 6 x 2
## # Groups: province [6]
## province n
## <chr> <int>
## 1 Australia Other 17
## 2 New South Wales 45
## 3 South Australia 1148
## 4 Tasmania 6
## 5 Victoria 179
## 6 Western Australia 124
Excellent_Varieties <- Excellent %>%
group_by(variety) %>%
count() %>%
arrange(desc(n))
Excellent_Varieties5 <- Excellent_Varieties[1:5,]
print(Excellent_Varieties)
## # A tibble: 51 x 2
## # Groups: variety [51]
## variety n
## <chr> <int>
## 1 Shiraz 667
## 2 Cabernet Sauvignon 225
## 3 Chardonnay 113
## 4 Riesling 74
## 5 Grenache 49
## 6 Shiraz-Viognier 36
## 7 Bordeaux-style Red Blend 34
## 8 Syrah 34
## 9 Red Blend 32
## 10 Muscat 26
## # ... with 41 more rows
# Per rating
Excellent_Average <- aggregate(points ~ province, Excellent, FUN = mean)
Excellent_Average <- arrange(Excellent_Average, desc(points))
print(Excellent_Average)
## province points
## 1 Victoria 91.67039
## 2 South Australia 91.35105
## 3 Western Australia 90.88710
## 4 Australia Other 90.58824
## 5 New South Wales 90.46667
## 6 Tasmania 90.00000
# Number of wines of states in the ratings
a1 <- ggplot(Excellent_Count, aes(x = reorder(province, -n), y = n, FILL = province))
a1 <- a1 + geom_bar(aes(fill = province), stat = "identity")
a1 <- a1 + geom_text(aes(label = round(n, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a1 <- a1 + labs(x = "\n Australian States \n") + labs(y = "\n Number of wines reviewed \n")
a1 <- a1 + labs(title = "\n Australian wines ranked 90 or higher \n")
a1 <- a1 + coord_cartesian(ylim = c(0, 1500))
a1 <- a1 + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9.5, face = "bold"))
a1 <- a1 + theme(legend.position = "none")
a1 <- a1 + theme(plot.title = element_text(size = 15, face = "bold"))
print(a1)
# Mean Rating per state
a2 <- ggplot(Excellent_Average, aes(x = reorder(province, -points), y = points, FILL = province))
a2 <- a2 + geom_bar(aes(fill = province), stat = "identity")
a2 <- a2 + geom_text(aes(label = round(points, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a2 <- a2 + labs(x = "\n Australian States \n") + labs(y = "\n Average Rating \n")
a2 <- a2 + labs(title = "\n State averages of wines 90 or higher \n")
a2 <- a2 + coord_cartesian(ylim = c(0, 100))
a2 <- a2 + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9.5, face = "bold"))
a2 <- a2 + theme(legend.position = "none")
a2 <- a2 + theme(plot.title = element_text(size = 15, face = "bold"))
print(a2)
# Wine varieties
a3 <- ggplot(Excellent_Varieties5, aes(x = reorder(variety, -n), y = n, FILL = variety))
a3 <- a3 + geom_bar(aes(fill = variety), stat = "identity")
a3 <- a3 + geom_text(aes(label = round(n, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a3 <- a3 + labs(x = "\n Variety of wine \n") + labs(y = "\n Number of wines reviewed \n")
a3 <- a3 + labs(title = "\n Top 5 varieties rated 90 or higher \n")
a3 <- a3 + coord_cartesian(ylim = c(0, 800))
a3 <- a3 + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 9.5, face = "bold"))
a3 <- a3 + theme(legend.position = "none")
a3 <- a3 + theme(plot.title = element_text(size = 15, face = "bold"))
print(a3)
Okay <- winemag_Australian %>%
filter(points <= 89)
print(Okay)
## # A tibble: 3,438 x 11
## # Groups: variety [76]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 101603 Austra~ Intensely ~ The Armagh 89 243. South A~ Clare V~
## 2 125783 Austra~ Intensely ~ The Armagh 89 243. South A~ Clare V~
## 3 139119 Austra~ A manageab~ Aphrodite 89 221. South A~ Barossa~
## 4 56798 Austra~ An icon in~ Quintet 88 202. Victoria Yarra V~
## 5 117087 Austra~ One of Aus~ Quintet 89 202. Victoria Yarra V~
## 6 130677 Austra~ One of Aus~ Quintet 89 202. Victoria Yarra V~
## 7 86383 Austra~ After rece~ The Pict 88 188. South A~ Barossa~
## 8 48261 Austra~ There's a ~ Graveyard ~ 89 174. New Sou~ Hunter ~
## 9 50437 Austra~ Restraint ~ <NA> 89 174. Victoria Yarra V~
## 10 51050 Austra~ In 5–10 ~ Emily's Pa~ 89 174. Victoria Heathco~
## # ... with 3,428 more rows, and 3 more variables: region_2 <chr>,
## # variety <chr>, winery <chr>
Okay_Count <- Okay %>%
group_by(province) %>%
count()
print(Okay_Count)
## # A tibble: 7 x 2
## # Groups: province [7]
## province n
## <chr> <int>
## 1 Australia Other 536
## 2 New South Wales 201
## 3 Queensland 3
## 4 South Australia 1856
## 5 Tasmania 41
## 6 Victoria 434
## 7 Western Australia 367
Okay_Varieties <- Okay %>%
group_by(variety) %>%
count() %>%
arrange(desc(n))
print(Okay_Varieties)
## # A tibble: 76 x 2
## # Groups: variety [76]
## variety n
## <chr> <int>
## 1 Shiraz 767
## 2 Chardonnay 556
## 3 Cabernet Sauvignon 433
## 4 Riesling 212
## 5 Pinot Noir 169
## 6 Sauvignon Blanc 120
## 7 Red Blend 104
## 8 Merlot 82
## 9 Grenache 68
## 10 Shiraz-Cabernet Sauvignon 66
## # ... with 66 more rows
Okay_Average <- aggregate(points ~ province, Okay, FUN = mean)
Okay_Average <- arrange(Okay_Average, desc(points))
print(Okay_Average)
## province points
## 1 Tasmania 87.31707
## 2 South Australia 86.80873
## 3 Victoria 86.55069
## 4 Western Australia 86.54496
## 5 New South Wales 86.28358
## 6 Queensland 85.00000
## 7 Australia Other 84.63060
# Number of states in the ratings
a1 <- ggplot(Okay_Count, aes(x = reorder(province, -n), y = n, FILL = province))
a1 <- a1 + geom_bar(aes(fill = province), stat = "identity")
a1 <- a1 + geom_text(aes(label = round(n, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a1 <- a1 + labs(x = "\n Australian States \n") + labs(y = "\n Number of wines reviewed \n")
a1 <- a1 + labs(title = "\n Australian wines ranked 89 or less \n")
a1 <- a1 + coord_cartesian(ylim = c(0, 2000))
a1 <- a1 + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9.5, face = "bold"))
a1 <- a1 + theme(legend.position = "none")
a1 <- a1 + theme(plot.title = element_text(size = 15, face = "bold"))
print(a1)
# Mean rating per state
a2 <- ggplot(Okay_Average, aes(x = reorder(province, -points), y = points, FILL = province))
a2 <- a2 + geom_bar(aes(fill = province), stat = "identity")
a2 <- a2 + geom_text(aes(label = round(points, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a2 <- a2 + labs(x = "\n Australian States \n") + labs(y = "\n Average Rating \n")
a2 <- a2 + labs(title = "\n State averages of wines ranked 89 or less \n")
a2 <- a2 + coord_cartesian(ylim = c(0, 100))
a2 <- a2 + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9.5, face = "bold"))
a2 <- a2 + theme(legend.position = "none")
a2 <- a2 + theme(plot.title = element_text(size = 15, face = "bold"))
print(a2)
##### Wine varieties
Okay_Varieties <- Okay_Varieties[1:5,]
#Wine Varieties
a3 <- ggplot(Okay_Varieties, aes(x = reorder(variety, -n), y = n, FILL = variety))
a3 <- a3 + geom_bar(aes(fill = variety), stat = "identity")
a3 <- a3 + geom_text(aes(label = round(n, 0), hjust = 0.65, vjust = - 0.7), size = 4)
a3 <- a3 + labs(x = "\n Variety of wine \n") + labs(y = "\n Number of wines reviewed \n")
a3 <- a3 + labs(title = "\n Top 5 varieties rated 89 or less \n")
a3 <- a3 + coord_cartesian(ylim = c(0, 1000))
a3 <- a3 + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 9.5, face = "bold"))
a3 <- a3 + theme(legend.position = "none")
a3 <- a3 + theme(plot.title = element_text(size = 15, face = "bold"))
print(a3)
Here, we would want to see the visual relationship between the Price and the number of points in the rating. Never mind the variety and let’s talk about overall picture. .
head(Excellent)
## # A tibble: 6 x 11
## # Groups: variety [1]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 10505 Austra~ This has a~ Grange 98 1182. South A~ South A~
## 2 28625 Austra~ This has a~ Grange 98 1182. South A~ South A~
## 3 127614 Austra~ This stell~ Hill of Gr~ 95 869. South A~ Eden Va~
## 4 10509 Austra~ This is da~ Grange 97 834. South A~ South A~
## 5 28629 Austra~ This is da~ Grange 97 834. South A~ South A~
## 6 57120 Austra~ This is a ~ Hill of Gr~ 96 764. South A~ Eden Va~
## # ... with 3 more variables: region_2 <chr>, variety <chr>, winery <chr>
Australia <- ggplot(winemag_Australian, aes(x = price, y = points, col = province))
Australia <- Australia + geom_point()
Australia <- Australia + labs(x = "\n Price \n")
Australia <- Australia + labs(y = "\n Points \n")
Australia <- Australia + labs(title = "\n Price/Points visual distribution (SA) \n")
print(Australia)
## Warning: Removed 63 rows containing missing values (geom_point).
# For SA
Excellent_SA <- Excellent %>%
filter(province == "South Australia") %>%
mutate(AUD_per_point = price/points)
head(Excellent_SA)
## # A tibble: 6 x 12
## # Groups: variety [1]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 10505 Austra~ This has a~ Grange 98 1182. South A~ South A~
## 2 28625 Austra~ This has a~ Grange 98 1182. South A~ South A~
## 3 127614 Austra~ This stell~ Hill of Gr~ 95 869. South A~ Eden Va~
## 4 10509 Austra~ This is da~ Grange 97 834. South A~ South A~
## 5 28629 Austra~ This is da~ Grange 97 834. South A~ South A~
## 6 57120 Austra~ This is a ~ Hill of Gr~ 96 764. South A~ Eden Va~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Excellent_SA)
## [1] 1148
# For Victoria
Excellent_Vic <- Excellent %>%
filter(province == "Victoria") %>%
mutate(AUD_per_point = price/points)
head(Excellent_Vic)
## # A tibble: 6 x 12
## # Groups: variety [3]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 19355 Austra~ Not a Cell~ Rare 100 417. Victoria Rutherg~
## 2 19364 Austra~ Incredibly~ Rare Musca~ 97 417. Victoria Rutherg~
## 3 84035 Austra~ Not a Cell~ Rare 100 417. Victoria Rutherg~
## 4 84044 Austra~ Incredibly~ Rare Musca~ 97 417. Victoria Rutherg~
## 5 101477 Austra~ This Musca~ Rare 95 417. Victoria Rutherg~
## 6 101780 Austra~ Befitting ~ Rare Musca~ 93 417. Victoria Rutherg~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Excellent_Vic)
## [1] 179
# For WA
Excellent_WA <- Excellent %>%
filter(province == "Western Australia") %>%
mutate(AUD_per_point = price/points)
head(Excellent_WA)
## # A tibble: 6 x 12
## # Groups: variety [2]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 27625 Austra~ Crisp and ~ Diana Made~ 90 152. Western~ Margare~
## 2 72891 Austra~ The 2009 D~ Diana Made~ 91 132. Western~ Margare~
## 3 55855 Austra~ With only ~ Diana Made~ 93 131. Western~ Margare~
## 4 98815 Austra~ With only ~ Diana Made~ 93 131. Western~ Margare~
## 5 121765 Austra~ With only ~ Diana Made~ 93 131. Western~ Margare~
## 6 10793 Austra~ Having ear~ Art Series 94 124. Western~ Margare~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Excellent_WA)
## [1] 124
# For NSW
Excellent_NSW <- Excellent %>%
filter(province == "New South Wales") %>%
mutate(AUD_per_point = price/points)
head(Excellent_NSW)
## # A tibble: 6 x 12
## # Groups: variety [3]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 100131 Austra~ One of the~ Graveyard ~ 91 174. New Sou~ Hunter ~
## 2 124281 Austra~ One of the~ Graveyard ~ 91 174. New Sou~ Hunter ~
## 3 77892 Austra~ Brokenwood~ ILR Reserv~ 90 66.7 New Sou~ Hunter ~
## 4 68994 Austra~ Only conta~ No. 89 90 62.6 New Sou~ Orange
## 5 114414 Austra~ Only conta~ No. 89 90 62.6 New Sou~ Orange
## 6 137184 Austra~ Only conta~ No. 89 90 62.6 New Sou~ Orange
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Excellent_NSW)
## [1] 45
# For Wines in Other Aussie states
# I personally do not understand why there is "Australia other"
Excellent_Other <- Excellent %>%
filter(province == "Australia Other") %>%
mutate(AUD_per_point = price/points)
head(Excellent_Other)
## # A tibble: 6 x 12
## # Groups: variety [2]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 68251 Austra~ This big w~ Yattarna 92 90.4 Austral~ South E~
## 2 143222 Austra~ An excelle~ Yattarna 91 90.4 Austral~ South E~
## 3 150571 Austra~ This big w~ Yattarna 92 90.4 Austral~ South E~
## 4 55697 Austra~ A unique d~ Black Noble 90 52.8 Austral~ Austral~
## 5 104334 Austra~ From 40-ye~ Idyll Vine~ 91 41.7 Austral~ South E~
## 6 131994 Austra~ From 40-ye~ Idyll Vine~ 91 41.7 Austral~ South E~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Excellent_Other)
## [1] 17
#For Tasmania
Excellent_Tasmania <- Excellent %>%
filter(province == "Tasmania") %>%
mutate(AUD_per_point = price/points)
head(Excellent_Tasmania)
## # A tibble: 6 x 12
## # Groups: variety [3]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 987 Austra~ This is st~ Coal River~ 90 90.4 Tasmania Tasmania
## 2 28707 Austra~ This is st~ Coal River~ 90 90.4 Tasmania Tasmania
## 3 38700 Austra~ This pale ~ Premium Br~ 90 30.6 Tasmania Tasmania
## 4 24759 Austra~ A lovely, ~ 42°S Prem~ 90 27.8 Tasmania Tasmania
## 5 44439 Austra~ A lovely, ~ 42°S Prem~ 90 27.8 Tasmania Tasmania
## 6 122319 Austra~ A lovely, ~ 42°S Prem~ 90 27.8 Tasmania Tasmania
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Excellent_Tasmania)
## [1] 6
SA <- ggplot(Excellent_SA, aes(x = points, y = AUD_per_point, col = price))
SA <- SA + geom_point()
SA <- SA + labs(x = "\n Points \n")
SA <- SA + labs(y = "\n AUD per point \n")
SA <- SA + labs(title = "\n Price/Points visual distribution (SA) \n")
SA <- SA + coord_cartesian(xlim = c(89.5, 100))
print(SA)
## Warning: Removed 25 rows containing missing values (geom_point).
Vic <- ggplot(Excellent_Vic, aes(x = points, y = AUD_per_point, col = price))
Vic <- Vic + geom_point()
Vic <- Vic + labs(x = "\n Points \n")
Vic <- Vic + labs(y = "\n AUD per point \n")
Vic <- Vic + labs(title = "\n Price/Points visual distribution (VIC) \n")
Vic <- Vic + coord_cartesian(xlim = c(89.5, 100))
print(Vic)
## Warning: Removed 1 rows containing missing values (geom_point).
WA <- ggplot(Excellent_WA, aes(x = points, y = AUD_per_point, col = price))
WA <- WA + geom_point()
WA <- WA + labs(x = "\n Points \n")
WA <- WA + labs(y = "\n AUD per point \n")
WA <- WA + labs(title = "\n Price/Points visual distribution (WA) \n")
WA <- WA + coord_cartesian(xlim = c(89.5, 100))
print(WA)
NSW <- ggplot(Excellent_NSW, aes(x = points, y = AUD_per_point, col = price))
NSW <- NSW + geom_point()
NSW <- NSW + labs(x = "\n Points \n")
NSW <- NSW + labs(y = "\n AUD per point \n")
NSW <- NSW + labs(title = "\n Price/Points visual distribution (NSW) \n")
NSW <- NSW + coord_cartesian(xlim = c(89.5, 100))
print(NSW)
AUD_Other <- ggplot(Excellent_Other, aes(x = points, y = AUD_per_point, col = price))
AUD_Other <- AUD_Other + geom_point()
AUD_Other <- AUD_Other + labs(x = "\n Points \n")
AUD_Other <- AUD_Other + labs(y = "\n AUD per point \n")
AUD_Other <- AUD_Other + labs(title = "\n Price/Points visual distribution (Other) \n")
AUD_Other <- AUD_Other + coord_cartesian(xlim = c(89.5, 100))
print(AUD_Other)
TAS <- ggplot(Excellent_Tasmania, aes(x = points, y = AUD_per_point, col = price))
TAS <- TAS + geom_point()
TAS <- TAS + labs(x = "\n Points \n")
TAS <- TAS + labs(y = "\n AUD per point \n")
TAS <- TAS + labs(title = "\n Price/Points visual distribution (TAS) \n")
TAS <- TAS + coord_cartesian(xlim = c(89.5, 100))
print(TAS)
I had to include Tasmania here in spite of the fact that they are more or less a speck onto the “Excellent”. Queensland has no entries rated 90 and above
Since there are only 3 Tassie wines in the dataset, i might as well have to exclude them.
# For SA
Okay_SA <- Okay %>%
filter(province == "South Australia") %>%
mutate(AUD_per_point = price/points)
head(Okay_SA)
## # A tibble: 6 x 12
## # Groups: variety [3]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 101603 Austra~ Intensely ~ The Armagh 89 243. South A~ Clare V~
## 2 125783 Austra~ Intensely ~ The Armagh 89 243. South A~ Clare V~
## 3 139119 Austra~ A manageab~ Aphrodite 89 221. South A~ Barossa~
## 4 86383 Austra~ After rece~ The Pict 88 188. South A~ Barossa~
## 5 139710 Austra~ Just the t~ <NA> 88 174. South A~ Barossa~
## 6 13233 Austra~ This wine ~ Ashmead Si~ 89 150. South A~ Barossa
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Okay_SA)
## [1] 1856
# For Victoria
Okay_Vic <- Okay %>%
filter(province == "Victoria") %>%
mutate(AUD_per_point = price/points)
head(Okay_Vic)
## # A tibble: 6 x 12
## # Groups: variety [3]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 56798 Austra~ An icon in~ Quintet 88 202. Victoria Yarra V~
## 2 117087 Austra~ One of Aus~ Quintet 89 202. Victoria Yarra V~
## 3 130677 Austra~ One of Aus~ Quintet 89 202. Victoria Yarra V~
## 4 50437 Austra~ Restraint ~ <NA> 89 174. Victoria Yarra V~
## 5 51050 Austra~ In 5–10 ~ Emily's Pa~ 89 174. Victoria Heathco~
## 6 51957 Austra~ Renowned f~ Quintet 85 174. Victoria Yarra V~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Okay_Vic)
## [1] 434
# For NSW
Okay_NSW <- Okay %>%
filter(province == "New South Wales") %>%
mutate(AUD_per_point = price/points)
head(Okay_NSW)
## # A tibble: 6 x 12
## # Groups: variety [3]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 48261 Austra~ There's a ~ Graveyard ~ 89 174. New Sou~ Hunter ~
## 2 91311 Austra~ There's a ~ Graveyard ~ 89 174. New Sou~ Hunter ~
## 3 16250 Austra~ Fits the C~ Vat 47 88 55.6 New Sou~ Hunter ~
## 4 52686 Austra~ A decidedl~ No. 8 85 55.6 New Sou~ Orange
## 5 70124 Austra~ This Chard~ Vat 47 89 55.6 New Sou~ Hunter ~
## 6 70625 Austra~ A bit on t~ No. 8 85 55.6 New Sou~ Orange
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Okay_NSW)
## [1] 201
# For Australia Other
Okay_Other <- Okay %>%
filter(province == "Australia Other") %>%
mutate(AUD_per_point = price/points)
head(Okay_Other)
## # A tibble: 6 x 12
## # Groups: variety [4]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 15417 Austra~ This Chard~ Pyrenees 87 55.6 Austral~ Austral~
## 2 146083 Austra~ "This tawn~ <NA> 85 55.6 Austral~ South E~
## 3 20534 Austra~ This is an~ Old Boys V~ 89 51.4 Austral~ South E~
## 4 110828 Austra~ A potentia~ Idyll Vine~ 85 41.7 Austral~ South E~
## 5 127058 Austra~ A potentia~ Idyll Vine~ 85 41.7 Austral~ South E~
## 6 147143 Austra~ Quite full~ Reserve 87 40.3 Austral~ Austral~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Okay_Other)
## [1] 536
# For Tasmania
Okay_Tasmania <- Okay %>%
filter(province == "Tasmania") %>%
mutate(AUD_per_point = price/points)
head(Okay_Tasmania)
## # A tibble: 6 x 12
## # Groups: variety [3]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 7409 Austra~ Pie cherry~ Bicheno Si~ 87 90.4 Tasmania Tasmania
## 2 34857 Austra~ Pie cherry~ Bicheno Si~ 87 90.4 Tasmania Tasmania
## 3 17530 Austra~ Lavish van~ Zdar 87 65.3 Tasmania Tasmania
## 4 31472 Austra~ Tasmania, ~ Cuvée du ~ 88 55.6 Tasmania Tasmania
## 5 15817 Austra~ In a recen~ Brut 89 44.5 Tasmania Tasmania
## 6 41597 Austra~ A zippy, e~ Brut 89 44.5 Tasmania Tasmania
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Okay_Tasmania)
## [1] 41
Okay_QLD <- Okay %>%
filter(province == "Queensland") %>%
mutate(AUD_per_point = price/points)
head(Okay_QLD)
## # A tibble: 3 x 12
## # Groups: variety [1]
## X__1 country description designation points price province region_1
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 61296 Austra~ Comes from~ <NA> 85 18.1 Queensl~ Granite~
## 2 112146 Austra~ Comes from~ <NA> 85 18.1 Queensl~ Granite~
## 3 147156 Austra~ Comes from~ <NA> 85 18.1 Queensl~ Granite~
## # ... with 4 more variables: region_2 <chr>, variety <chr>, winery <chr>,
## # AUD_per_point <dbl>
nrow(Okay_QLD)
## [1] 3
SA <- ggplot(Okay_SA, aes(x = points, y = AUD_per_point, col = price))
SA <- SA + geom_point()
SA <- SA + labs(x = "\n Points \n")
SA <- SA + labs(y = "\n AUD per point \n")
SA <- SA + labs(title = "\n Price/Points visual distribution (SA) \n")
SA <- SA + coord_cartesian(xlim = c(79.5, 90))
print(SA)
## Warning: Removed 24 rows containing missing values (geom_point).
Vic <- ggplot(Okay_Vic, aes(x = points, y = AUD_per_point, col = price))
Vic <- Vic + geom_point()
Vic <- Vic + labs(x = "\n Points \n")
Vic <- Vic + labs(y = "\n AUD per point \n")
Vic <- Vic + labs(title = "\n Price/Points visual distribution (VIC) \n")
Vic <- Vic + coord_cartesian(xlim = c(79.5, 90))
print(Vic)
## Warning: Removed 3 rows containing missing values (geom_point).
NSW <- ggplot(Okay_NSW, aes(x = points, y = AUD_per_point, col = price))
NSW <- NSW + geom_point()
NSW <- NSW + labs(x = "\n Points \n")
NSW <- NSW + labs(y = "\n AUD per point \n")
NSW <- NSW + labs(title = "\n Price/Points visual distribution (NSW) \n")
NSW <- NSW + coord_cartesian(xlim = c(79.5, 90))
print(NSW)
## Warning: Removed 2 rows containing missing values (geom_point).
AUD_Other <- ggplot(Okay_Other, aes(x = points, y = AUD_per_point, col = price))
AUD_Other <- AUD_Other + geom_point()
AUD_Other <- AUD_Other + labs(x = "\n Points \n")
AUD_Other <- AUD_Other + labs(y = "\n AUD per point \n")
AUD_Other <- AUD_Other + labs(title = "\n Price/Points visual distribution (Other) \n")
AUD_Other <- AUD_Other + coord_cartesian(xlim = c(79.5, 90))
print(AUD_Other)
## Warning: Removed 2 rows containing missing values (geom_point).
TAS <- ggplot(Okay_Tasmania, aes(x = points, y = AUD_per_point, col = price))
TAS <- TAS + geom_point()
TAS <- TAS + labs(x = "\n Points \n")
TAS <- TAS + labs(y = "\n AUD per point \n")
TAS <- TAS + labs(title = "\n Price/Points visual distribution (TAS) \n")
TAS <- TAS + coord_cartesian(xlim = c(79.5, 90))
print(TAS)
## Warning: Removed 2 rows containing missing values (geom_point).
QLD <- ggplot(Okay_QLD, aes(x = points, y = AUD_per_point, col = price))
QLD <- QLD + geom_point()
QLD <- QLD + labs(x = "\n Points \n")
QLD <- QLD + labs(y = "\n AUD per point \n")
QLD <- QLD + labs(title = "\n Price/Points visual distribution (QLD) \n")
QLD <- QLD + coord_cartesian(xlim = c(0, 100))
print(QLD)
South Australia has the greatest number of wines ranked 90 and above listed in this reeview although Victoria has the highest average ratings of wines rated 90 and above. Shiraz is the most-rated wine of the lot
Of the wines averaging 89 and less, South Australia has the greatest number of wines but Tasmania has the highest average ratings. Shiraz is the most-rated wine of the lot
Based on what we can see at the Points vs AUD per point distribution, it is that the greater AUD per point, the greater the rating and it is also dependent per state although in Victoria it is a different story since, comparing it to South Australia’s highest-rated wine, its highest-rated wine does not require an AUD-per-point ratio of 12 compared to South Australia’s.