Hello, everybody. My name is Koki Ando. This is my first published. RMarkdown report.
I just wanted to analyze data about whisky, just because i love them. So i searched a little bit on safari and found this site. Classification of whiskies
Looks interesting, so let’s get started!
library(data.table)
library(tidyverse)## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## <U+221A> ggplot2 2.2.1 <U+221A> purrr 0.2.4
## <U+221A> tibble 1.3.4 <U+221A> dplyr 0.7.4
## <U+221A> tidyr 0.7.2 <U+221A> stringr 1.3.0
## <U+221A> readr 1.1.1 <U+221A> forcats 0.3.0
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x dplyr::between() masks data.table::between()
## x dplyr::filter() masks stats::filter()
## x dplyr::first() masks data.table::first()
## x dplyr::lag() masks stats::lag()
## x dplyr::last() masks data.table::last()
## x purrr::transpose() masks data.table::transpose()
library(sp)
whisky <- fread("http://outreach.mathstat.strath.ac.uk/outreach/nessie/datasets/whiskies.txt", data.table = FALSE)whisky$Longitude <- as.numeric(whisky$Longitude)
whisky$Latitude <- as.numeric(whisky$Latitude)
str(whisky)## 'data.frame': 86 obs. of 17 variables:
## $ RowID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Distillery: chr "Aberfeldy" "Aberlour" "AnCnoc" "Ardbeg" ...
## $ Body : int 2 3 1 4 2 2 0 2 2 2 ...
## $ Sweetness : int 2 3 3 1 2 3 2 3 2 3 ...
## $ Smoky : int 2 1 2 4 2 1 0 1 1 2 ...
## $ Medicinal : int 0 0 0 4 0 1 0 0 0 1 ...
## $ Tobacco : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Honey : int 2 4 2 0 1 1 1 2 1 0 ...
## $ Spicy : int 1 3 0 2 1 1 1 1 0 2 ...
## $ Winey : int 2 2 0 0 1 1 0 2 0 0 ...
## $ Nutty : int 2 2 2 1 2 0 2 2 2 2 ...
## $ Malty : int 2 3 2 2 3 1 2 2 2 1 ...
## $ Fruity : int 2 3 3 1 1 1 3 2 2 2 ...
## $ Floral : int 2 2 2 0 1 2 3 1 2 1 ...
## $ Postcode : chr "\tPH15 2EB" "\tAB38 9PJ" "\tAB5 5LI" "\tPA42 7EB" ...
## $ Latitude : num 286580 326340 352960 141560 355350 ...
## $ Longitude : num 749680 842570 839320 646220 829140 ...
Looks like someone has reviewed 86 whisky and recorded. it will be so subjective and biased dataset for sure but, i dont care. In terms of “data science”, it is problematic. However, drinking and reviewing whisky is always subjective, and there is nothing wrong with it, right? I dont care. im doing this analytics because for having fun.
First of all, i want to see where the distilleries are located.
library(maps)##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
world.map <- map_data ("world")
glimpse(world.map)## Observations: 99,338
## Variables: 6
## $ long <dbl> -69.89912, -69.89571, -69.94219, -70.00415, -70.0661...
## $ lat <dbl> 12.45200, 12.42300, 12.43853, 12.50049, 12.54697, 12...
## $ group <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2...
## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 1...
## $ region <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba"...
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
UK.map <- world.map %>% filter(region == "UK")
whiskies.coord <- data.frame(whisky$Latitude, whisky$Longitude)
coordinates(whiskies.coord) <- ~whisky.Latitude + whisky.Longitude
proj4string(whiskies.coord) <- CRS("+init=epsg:27700")
whiskies.coord <- spTransform(whiskies.coord, CRS("+init=epsg:4326"))
whisky.map <-
data.frame(Distillery = whisky$Distillery,
lat = whiskies.coord$whisky.Latitude,
long = whiskies.coord$whisky.Longitude)
UK.map %>%
filter(subregion == "Scotland") %>%
ggplot() +
geom_map(map = UK.map,
aes(x = long, y = lat, map_id = region),
fill="white", colour = "black") +
coord_map() +
geom_point(data = whisky.map,
aes(x=lat, y = long, colour = "red", alpha = .9))## Warning: Ignoring unknown aesthetics: x, y
This report helped me a lot.
ive never tasted over half of them, but i know some. Lets get into it.
whisky <- whisky %>% select(Distillery:Floral)
head(whisky)## Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
## 1 Aberfeldy 2 2 2 0 0 2 1 2
## 2 Aberlour 3 3 1 0 0 4 3 2
## 3 AnCnoc 1 3 2 0 0 2 0 0
## 4 Ardbeg 4 1 4 4 0 0 2 0
## 5 Ardmore 2 2 2 0 0 1 1 1
## 6 ArranIsleOf 2 3 1 1 0 1 1 1
## Nutty Malty Fruity Floral
## 1 2 2 2 2
## 2 2 3 3 2
## 3 2 2 3 2
## 4 1 2 1 0
## 5 2 3 1 1
## 6 0 1 1 2
whisky.score <- whisky %>%
gather(key = Review.point, value = Score, Body:Floral)
head(whisky.score)## Distillery Review.point Score
## 1 Aberfeldy Body 2
## 2 Aberlour Body 3
## 3 AnCnoc Body 1
## 4 Ardbeg Body 4
## 5 Ardmore Body 2
## 6 ArranIsleOf Body 2
whisky.score %>%
ggplot(aes(x=Review.point, y = Score, fill = Review.point)) +
geom_bar(stat = "identity") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
facet_wrap(~ Distillery)I really enjoyed this analytics. it is so fun analyzing data about something im interested in. Ive teaching myself R for months and a little bit surprised that im still loving it, to be honest. From now on, i want to share my R contents and hoppefully get connected with R lovers!
i will update this article in near future because i really want to do deeper analysis.
Thanks for reading my artice!
Koki