Hello, everybody. My name is Koki Ando. This is my first published. RMarkdown report.

I just wanted to analyze data about whisky, just because i love them. So i searched a little bit on safari and found this site. Classification of whiskies

Looks interesting, so let’s get started!

Preparation

library(data.table)
library(tidyverse)
## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## <U+221A> ggplot2 2.2.1     <U+221A> purrr   0.2.4
## <U+221A> tibble  1.3.4     <U+221A> dplyr   0.7.4
## <U+221A> tidyr   0.7.2     <U+221A> stringr 1.3.0
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x dplyr::between()   masks data.table::between()
## x dplyr::filter()    masks stats::filter()
## x dplyr::first()     masks data.table::first()
## x dplyr::lag()       masks stats::lag()
## x dplyr::last()      masks data.table::last()
## x purrr::transpose() masks data.table::transpose()
library(sp)
whisky <- fread("http://outreach.mathstat.strath.ac.uk/outreach/nessie/datasets/whiskies.txt", data.table = FALSE)

What’s inside the dataset

whisky$Longitude <- as.numeric(whisky$Longitude)
whisky$Latitude <- as.numeric(whisky$Latitude)
str(whisky)
## 'data.frame':    86 obs. of  17 variables:
##  $ RowID     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Distillery: chr  "Aberfeldy" "Aberlour" "AnCnoc" "Ardbeg" ...
##  $ Body      : int  2 3 1 4 2 2 0 2 2 2 ...
##  $ Sweetness : int  2 3 3 1 2 3 2 3 2 3 ...
##  $ Smoky     : int  2 1 2 4 2 1 0 1 1 2 ...
##  $ Medicinal : int  0 0 0 4 0 1 0 0 0 1 ...
##  $ Tobacco   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Honey     : int  2 4 2 0 1 1 1 2 1 0 ...
##  $ Spicy     : int  1 3 0 2 1 1 1 1 0 2 ...
##  $ Winey     : int  2 2 0 0 1 1 0 2 0 0 ...
##  $ Nutty     : int  2 2 2 1 2 0 2 2 2 2 ...
##  $ Malty     : int  2 3 2 2 3 1 2 2 2 1 ...
##  $ Fruity    : int  2 3 3 1 1 1 3 2 2 2 ...
##  $ Floral    : int  2 2 2 0 1 2 3 1 2 1 ...
##  $ Postcode  : chr  "\tPH15 2EB" "\tAB38 9PJ" "\tAB5 5LI" "\tPA42 7EB" ...
##  $ Latitude  : num  286580 326340 352960 141560 355350 ...
##  $ Longitude : num  749680 842570 839320 646220 829140 ...

Looks like someone has reviewed 86 whisky and recorded. it will be so subjective and biased dataset for sure but, i dont care. In terms of “data science”, it is problematic. However, drinking and reviewing whisky is always subjective, and there is nothing wrong with it, right? I dont care. im doing this analytics because for having fun.

Data Analysis

First of all, i want to see where the distilleries are located.

library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
world.map <- map_data ("world")
glimpse(world.map)
## Observations: 99,338
## Variables: 6
## $ long      <dbl> -69.89912, -69.89571, -69.94219, -70.00415, -70.0661...
## $ lat       <dbl> 12.45200, 12.42300, 12.43853, 12.50049, 12.54697, 12...
## $ group     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2...
## $ order     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 1...
## $ region    <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba"...
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...

Map Vizualization

UK.map <- world.map %>% filter(region == "UK")

whiskies.coord <- data.frame(whisky$Latitude, whisky$Longitude)
coordinates(whiskies.coord) <- ~whisky.Latitude + whisky.Longitude

proj4string(whiskies.coord) <- CRS("+init=epsg:27700")
whiskies.coord <- spTransform(whiskies.coord, CRS("+init=epsg:4326"))

whisky.map <- 
  data.frame(Distillery = whisky$Distillery,
             lat = whiskies.coord$whisky.Latitude,
             long = whiskies.coord$whisky.Longitude)

UK.map %>%
  filter(subregion == "Scotland") %>% 
  ggplot() + 
  geom_map(map = UK.map, 
           aes(x = long, y = lat, map_id = region),
           fill="white", colour = "black") + 
  coord_map() + 
  geom_point(data = whisky.map, 
             aes(x=lat, y = long, colour = "red", alpha = .9))
## Warning: Ignoring unknown aesthetics: x, y

This report helped me a lot.

ive never tasted over half of them, but i know some. Lets get into it.

Making whisky dataset tidy

whisky <- whisky %>% select(Distillery:Floral)
head(whisky)
##    Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
## 1   Aberfeldy    2         2     2         0       0     2     1     2
## 2    Aberlour    3         3     1         0       0     4     3     2
## 3      AnCnoc    1         3     2         0       0     2     0     0
## 4      Ardbeg    4         1     4         4       0     0     2     0
## 5     Ardmore    2         2     2         0       0     1     1     1
## 6 ArranIsleOf    2         3     1         1       0     1     1     1
##   Nutty Malty Fruity Floral
## 1     2     2      2      2
## 2     2     3      3      2
## 3     2     2      3      2
## 4     1     2      1      0
## 5     2     3      1      1
## 6     0     1      1      2
whisky.score <- whisky %>% 
  gather(key = Review.point, value = Score, Body:Floral)
head(whisky.score)
##    Distillery Review.point Score
## 1   Aberfeldy         Body     2
## 2    Aberlour         Body     3
## 3      AnCnoc         Body     1
## 4      Ardbeg         Body     4
## 5     Ardmore         Body     2
## 6 ArranIsleOf         Body     2

Data Vizualization

whisky.score %>% 
  ggplot(aes(x=Review.point, y = Score, fill = Review.point)) + 
  geom_bar(stat = "identity") + 
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  facet_wrap(~ Distillery)

Conclusion

I really enjoyed this analytics. it is so fun analyzing data about something im interested in. Ive teaching myself R for months and a little bit surprised that im still loving it, to be honest. From now on, i want to share my R contents and hoppefully get connected with R lovers!

i will update this article in near future because i really want to do deeper analysis.

Here is my GitHub kaggle

Thanks for reading my artice!

Koki