1 Introduction

I’ve teaching myself data science for a few months. And last week, when i was analyzing dataset about scotch whisky, I found a weird dataset.
In this article I am going to show how I did convert UTM coordinate data into Longitude-Latidude data.

2 Preparation

library(data.table)
library(sp)
library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000     ✔ purrr   0.2.4     
## ✔ tibble  1.4.2          ✔ dplyr   0.7.4     
## ✔ tidyr   0.8.0          ✔ stringr 1.3.0     
## ✔ readr   1.1.1          ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between()   masks data.table::between()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::first()     masks data.table::first()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ dplyr::last()      masks data.table::last()
## ✖ purrr::transpose() masks data.table::transpose()
## ✖ dplyr::vars()      masks ggplot2::vars()
library(raster)
## 
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
## 
##     select
## The following object is masked from 'package:tidyr':
## 
##     extract
## The following object is masked from 'package:ggplot2':
## 
##     calc
## The following object is masked from 'package:data.table':
## 
##     shift
whisky <- fread("http://outreach.mathstat.strath.ac.uk/outreach/nessie/datasets/whiskies.txt", data.table = FALSE)

3 Data Manipulation

3.1 What’s inside?

whisky %>% str()
## 'data.frame':    86 obs. of  17 variables:
##  $ RowID     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Distillery: chr  "Aberfeldy" "Aberlour" "AnCnoc" "Ardbeg" ...
##  $ Body      : int  2 3 1 4 2 2 0 2 2 2 ...
##  $ Sweetness : int  2 3 3 1 2 3 2 3 2 3 ...
##  $ Smoky     : int  2 1 2 4 2 1 0 1 1 2 ...
##  $ Medicinal : int  0 0 0 4 0 1 0 0 0 1 ...
##  $ Tobacco   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Honey     : int  2 4 2 0 1 1 1 2 1 0 ...
##  $ Spicy     : int  1 3 0 2 1 1 1 1 0 2 ...
##  $ Winey     : int  2 2 0 0 1 1 0 2 0 0 ...
##  $ Nutty     : int  2 2 2 1 2 0 2 2 2 2 ...
##  $ Malty     : int  2 3 2 2 3 1 2 2 2 1 ...
##  $ Fruity    : int  2 3 3 1 1 1 3 2 2 2 ...
##  $ Floral    : int  2 2 2 0 1 2 3 1 2 1 ...
##  $ Postcode  : chr  "\tPH15 2EB" "\tAB38 9PJ" "\tAB5 5LI" "\tPA42 7EB" ...
##  $ Latitude  : int  286580 326340 352960 141560 355350 194050 247670 340754 340754 270820 ...
##  $ Longitude : int  749680 842570 839320 646220 829140 649950 672610 848623 848623 885770 ...

The dataset looks like this.
i knew these last 2 variables showed location data, but i did not know how to deal with these numbers.
then i found this kind of data is called “UTM Location Data”
What is UTM data?
UTM convert

3.2 Data Manipulation

Let’s get started!
First, extract Latitude and Longitude variables and put them into simple data frame called lat.long.df.

lat.long.df <- data.frame(whisky$Latitude, whisky$Longitude) 
str(lat.long.df)
## 'data.frame':    86 obs. of  2 variables:
##  $ whisky.Latitude : int  286580 326340 352960 141560 355350 194050 247670 340754 340754 270820 ...
##  $ whisky.Longitude: int  749680 842570 839320 646220 829140 649950 672610 848623 848623 885770 ...

At this point, dataset is simple dataframe.

coordinates(lat.long.df) <-  ~whisky.Latitude + whisky.Longitude
str(lat.long.df)
## Formal class 'SpatialPoints' [package "sp"] with 3 slots
##   ..@ coords     : num [1:86, 1:2] 286580 326340 352960 141560 355350 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:86] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:2] "whisky.Latitude" "whisky.Longitude"
##   ..@ bbox       : num [1:2, 1:2] 126680 554260 381020 1009260
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "whisky.Latitude" "whisky.Longitude"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   .. .. ..@ projargs: chr NA

Function coordinates() creates a spatial object. more information from “?coordinates”
This process also makes the dataset Spatial dataset.

proj4string(lat.long.df)
## [1] NA

However, at this point, this dataset doesn’t have CRS. Spatial data can’t exist without CRS(Coordinates Reference System).
Spatial data requires, at least, coordinates and CRS.

proj4string(lat.long.df) <- CRS("+init=epsg:27700")
head(lat.long.df)
## class       : SpatialPoints 
## features    : 1 
## extent      : 286580, 286580, 749680, 749680  (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894

According to EPSG Geodetic Parameter Registry, EPSG code of United Kingdom is 27700.
Then get the information into the dataset using function CRS()

reference1
reference2

Now, dataset has coordinates and CRS.
Next thing is to convert this to Longitude-Latitude data.

dist.location <- spTransform(lat.long.df, CRS("+init=epsg:4326"))
dist.location
## class       : SpatialPoints 
## features    : 86 
## extent      : -6.359364, -2.316943, 54.85819, 58.96701  (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

“+init=epsg:4326”: In R, the details of a particular EPSG code can be obtained by this code.
this returns :“+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0”

Now, we succesfully transformed dataset.

whisky.map <- 
  data.frame(Distillery = whisky$Distillery,
             lat = dist.location$whisky.Latitude,
             long = dist.location$whisky.Longitude)

4 Quick Map Visualization

library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
world.map <- map_data ("world")
UK.map <- world.map %>% filter(region == "UK")
UK.map %>%
  filter(subregion == "Scotland") %>% 
  ggplot() + 
  geom_map(map = UK.map, 
           aes(x = long, y = lat, map_id = region),
           fill="white", colour = "black") + 
  coord_map() + 
  geom_point(data = whisky.map, 
             aes(x=lat, y = long, colour = "red", alpha = .9))
## Warning: Ignoring unknown aesthetics: x, y

5 Conclusion

whisky.map %>% head()
##    Distillery       lat     long
## 1   Aberfeldy -3.850199 56.62519
## 2    Aberlour -3.229644 57.46739
## 3      AnCnoc -2.785295 57.44175
## 4      Ardbeg -6.108503 55.64061
## 5     Ardmore -2.743629 57.35056
## 6 ArranIsleOf -5.278895 55.69915

I just learned a little bit about spatial data. Thus there are still so many things I don’t know.
So, let me know anything you think. Any opinions and advices are much welcome.

At the end, if you are interested, please check my R stuffs based on this dataset.
Whisky Suggest App
Whisky Data Analysis
Thanks for reading this article.
Koki