AS - Day 2

Introduction to R - Session 1

Dr. J. Kavanagh

2023-09-14

Importing Data

Here is a clear and straightforward way to import data into RStudio

read.csv('net-receipts-by-commodity.csv', stringsAsFactors = FALSE) -> receipts

Using some of the lessons from Day 1, explore the dataset

receipts %>% count(year)
##    year  n
## 1  2003 30
## 2  2004 30
## 3  2005 29
## 4  2006 30
## 5  2007 28
## 6  2008 28
## 7  2009 28
## 8  2010 35
## 9  2011 35
## 10 2012 35
## 11 2013 36
## 12 2014 36
## 13 2015 36
## 14 2016 35
## 15 2017 35
## 16 2018 36
## 17 2019 36
## 18 2020 36
glimpse(receipts)
## Rows: 594
## Columns: 3
## $ commodity_and_head_of_duty <chr> "Alcohols Beer - Import", "Alcohols Beer - …
## $ year                       <int> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2…
## $ net_receipts_.m            <int> 183235512, 167879917, 12701490, 40340740, 2…

Tax Receipts from Duties on Beer

receipts %>% filter(commodity_and_head_of_duty == "Alcohols Beer - Home") -> receipts_alcohol_home
receipts %>% filter(commodity_and_head_of_duty == "Alcohols Beer - Import") -> receipts_alcohol_import

Plot of Domestic Beer Tax Receipts

receipts_alcohol_home %>% ggplot(aes(x=year, y=net_receipts_.m)) + geom_line()

Simplify the Column Names

colnames(receipts_alcohol_home) <- c("Type", "Year", "Tax")

colnames(receipts_alcohol_import) <- c("Type", "Year", "Tax")

Class Exercise

Solution

rbind(receipts_alcohol_home, receipts_alcohol_import) -> receipts_alcohol_unified

receipts_alcohol_unified %>% ggplot(aes(x=Year, y=Tax, colour=Type, group=Type)) + geom_line()

Re-check the Data

receipts_alcohol_home %>% tail()
##                    Type Year       Tax
## 12 Alcohols Beer - Home 2009 278308367
## 13 Alcohols Beer - Home 2008 309303625
## 14 Alcohols Beer - Home 2007 306918459
## 15 Alcohols Beer - Home 2006 349185923
## 16 Alcohols Beer - Home 2005 361930565
## 17 Alcohols Beer - Home 2004 378081957
receipts_alcohol_import %>% tail()
##                      Type Year       Tax
## 14 Alcohols Beer - Import 2007 157883543
## 15 Alcohols Beer - Import 2006 111507924
## 16 Alcohols Beer - Import 2005  95377167
## 17 Alcohols Beer - Import 2004  80113005
## 18 Alcohols Beer - Import 2003  82770969
## 19 Alcohols Beer - Import 2003 372619049

Fixing the Data

In the receipts_alcohol_import dataframe, an error was made and 2003 was entered twice by the data creators. Therefore we need to fix the data.

# This shows all the years
receipts_alcohol_import$Year
##  [1] 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006
## [16] 2005 2004 2003 2003
# Using the square [] we can specify the date
receipts_alcohol_import$Year[19]
## [1] 2003
# Now we can change it
receipts_alcohol_import$Year[19] <- 2002
# Check the result
receipts_alcohol_import$Year
##  [1] 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006
## [16] 2005 2004 2003 2002

Exploring Spatial Data & Visualisations

So we’re going to be re-using the tidyverse library and then an additional HistData package to reproduce the Minard map of Napleon’s campaign.

install.packages("tidyverse", "ggthemes", "HistData", "lubridate", "gridExtra")

First Import the data from the HistData packaage

data("Minard.troops", "Minard.cities", "Minard.temp")
Minard.troops
##    long  lat survivors direction group
## 1  24.0 54.9    340000         A     1
## 2  24.5 55.0    340000         A     1
## 3  25.5 54.5    340000         A     1
## 4  26.0 54.7    320000         A     1
## 5  27.0 54.8    300000         A     1
## 6  28.0 54.9    280000         A     1
## 7  28.5 55.0    240000         A     1
## 8  29.0 55.1    210000         A     1
## 9  30.0 55.2    180000         A     1
## 10 30.3 55.3    175000         A     1
## 11 32.0 54.8    145000         A     1
## 12 33.2 54.9    140000         A     1
## 13 34.4 55.5    127100         A     1
## 14 35.5 55.4    100000         A     1
## 15 36.0 55.5    100000         A     1
## 16 37.6 55.8    100000         A     1
## 17 37.7 55.7    100000         R     1
## 18 37.5 55.7     98000         R     1
## 19 37.0 55.0     97000         R     1
## 20 36.8 55.0     96000         R     1
## 21 35.4 55.3     87000         R     1
## 22 34.3 55.2     55000         R     1
## 23 33.3 54.8     37000         R     1
## 24 32.0 54.6     24000         R     1
## 25 30.4 54.4     20000         R     1
## 26 29.2 54.3     20000         R     1
## 27 28.5 54.2     20000         R     1
## 28 28.3 54.3     20000         R     1
## 29 27.5 54.5     20000         R     1
## 30 26.8 54.3     12000         R     1
## 31 26.4 54.4     14000         R     1
## 32 25.0 54.4      8000         R     1
## 33 24.4 54.4      4000         R     1
## 34 24.2 54.4      4000         R     1
## 35 24.1 54.4      4000         R     1
## 36 24.0 55.1     60000         A     2
## 37 24.5 55.2     60000         A     2
## 38 25.5 54.7     60000         A     2
## 39 26.6 55.7     40000         A     2
## 40 27.4 55.6     33000         A     2
## 41 28.7 55.5     33000         A     2
## 42 28.7 55.5     33000         R     2
## 43 29.2 54.2     30000         R     2
## 44 28.5 54.1     30000         R     2
## 45 28.3 54.2     28000         R     2
## 46 24.0 55.2     22000         A     3
## 47 24.5 55.3     22000         A     3
## 48 24.6 55.8      6000         A     3
## 49 24.6 55.8      6000         R     3
## 50 24.2 54.4      6000         R     3
## 51 24.1 54.4      6000         R     3
Minard.troops
##    long  lat survivors direction group
## 1  24.0 54.9    340000         A     1
## 2  24.5 55.0    340000         A     1
## 3  25.5 54.5    340000         A     1
## 4  26.0 54.7    320000         A     1
## 5  27.0 54.8    300000         A     1
## 6  28.0 54.9    280000         A     1
## 7  28.5 55.0    240000         A     1
## 8  29.0 55.1    210000         A     1
## 9  30.0 55.2    180000         A     1
## 10 30.3 55.3    175000         A     1
## 11 32.0 54.8    145000         A     1
## 12 33.2 54.9    140000         A     1
## 13 34.4 55.5    127100         A     1
## 14 35.5 55.4    100000         A     1
## 15 36.0 55.5    100000         A     1
## 16 37.6 55.8    100000         A     1
## 17 37.7 55.7    100000         R     1
## 18 37.5 55.7     98000         R     1
## 19 37.0 55.0     97000         R     1
## 20 36.8 55.0     96000         R     1
## 21 35.4 55.3     87000         R     1
## 22 34.3 55.2     55000         R     1
## 23 33.3 54.8     37000         R     1
## 24 32.0 54.6     24000         R     1
## 25 30.4 54.4     20000         R     1
## 26 29.2 54.3     20000         R     1
## 27 28.5 54.2     20000         R     1
## 28 28.3 54.3     20000         R     1
## 29 27.5 54.5     20000         R     1
## 30 26.8 54.3     12000         R     1
## 31 26.4 54.4     14000         R     1
## 32 25.0 54.4      8000         R     1
## 33 24.4 54.4      4000         R     1
## 34 24.2 54.4      4000         R     1
## 35 24.1 54.4      4000         R     1
## 36 24.0 55.1     60000         A     2
## 37 24.5 55.2     60000         A     2
## 38 25.5 54.7     60000         A     2
## 39 26.6 55.7     40000         A     2
## 40 27.4 55.6     33000         A     2
## 41 28.7 55.5     33000         A     2
## 42 28.7 55.5     33000         R     2
## 43 29.2 54.2     30000         R     2
## 44 28.5 54.1     30000         R     2
## 45 28.3 54.2     28000         R     2
## 46 24.0 55.2     22000         A     3
## 47 24.5 55.3     22000         A     3
## 48 24.6 55.8      6000         A     3
## 49 24.6 55.8      6000         R     3
## 50 24.2 54.4      6000         R     3
## 51 24.1 54.4      6000         R     3
Minard.troops
##    long  lat survivors direction group
## 1  24.0 54.9    340000         A     1
## 2  24.5 55.0    340000         A     1
## 3  25.5 54.5    340000         A     1
## 4  26.0 54.7    320000         A     1
## 5  27.0 54.8    300000         A     1
## 6  28.0 54.9    280000         A     1
## 7  28.5 55.0    240000         A     1
## 8  29.0 55.1    210000         A     1
## 9  30.0 55.2    180000         A     1
## 10 30.3 55.3    175000         A     1
## 11 32.0 54.8    145000         A     1
## 12 33.2 54.9    140000         A     1
## 13 34.4 55.5    127100         A     1
## 14 35.5 55.4    100000         A     1
## 15 36.0 55.5    100000         A     1
## 16 37.6 55.8    100000         A     1
## 17 37.7 55.7    100000         R     1
## 18 37.5 55.7     98000         R     1
## 19 37.0 55.0     97000         R     1
## 20 36.8 55.0     96000         R     1
## 21 35.4 55.3     87000         R     1
## 22 34.3 55.2     55000         R     1
## 23 33.3 54.8     37000         R     1
## 24 32.0 54.6     24000         R     1
## 25 30.4 54.4     20000         R     1
## 26 29.2 54.3     20000         R     1
## 27 28.5 54.2     20000         R     1
## 28 28.3 54.3     20000         R     1
## 29 27.5 54.5     20000         R     1
## 30 26.8 54.3     12000         R     1
## 31 26.4 54.4     14000         R     1
## 32 25.0 54.4      8000         R     1
## 33 24.4 54.4      4000         R     1
## 34 24.2 54.4      4000         R     1
## 35 24.1 54.4      4000         R     1
## 36 24.0 55.1     60000         A     2
## 37 24.5 55.2     60000         A     2
## 38 25.5 54.7     60000         A     2
## 39 26.6 55.7     40000         A     2
## 40 27.4 55.6     33000         A     2
## 41 28.7 55.5     33000         A     2
## 42 28.7 55.5     33000         R     2
## 43 29.2 54.2     30000         R     2
## 44 28.5 54.1     30000         R     2
## 45 28.3 54.2     28000         R     2
## 46 24.0 55.2     22000         A     3
## 47 24.5 55.3     22000         A     3
## 48 24.6 55.8      6000         A     3
## 49 24.6 55.8      6000         R     3
## 50 24.2 54.4      6000         R     3
## 51 24.1 54.4      6000         R     3

Step 1

## plot path of troops, and another layer for city names
plot_troops <- ggplot(Minard.troops, aes(long, lat)) +
  geom_path(aes(linewidth = survivors, colour = direction, group = group),
            lineend = "round", linejoin = "round")
plot_troops

Step 2

plot_cities <- geom_text(aes(label = city), size = 4, data = Minard.cities)
## Combine these, and add scale information, labels, etc.
# Set the x-axis limits for longitude explicitly, to coincide with those for temperature

breaks <- c(1, 2, 3) * 10^5

Step 3

# Create a new gg object
plot_minard <- plot_troops + plot_cities +
  scale_size("Survivors", range = c(1, 10),
             breaks = breaks, labels = scales::comma(breaks)) +
  scale_color_manual("Direction",
                     values = c("grey50", "red"),
                     labels=c("Advance", "Retreat")) +
  coord_cartesian(xlim = c(24, 38)) +
  xlab(NULL) +
  ylab("Latitude") +
  ggtitle("Napoleon's March on Moscow") +
  theme_bw() +
  theme(legend.position=c(.8, .2), legend.box="horizontal")
plot_minard

Step 4

## plot temperature vs. longitude, with labels for dates
plot_temp <- ggplot(Minard.temp, aes(long, temp)) +
  geom_path(color="grey", size=1.5) +
  geom_point(size=2) +
  geom_text(aes(label=date)) +
  xlab("Longitude") + ylab("Temperature") +
  coord_cartesian(xlim = c(24, 38)) +
  theme_bw()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plot_temp
## Warning: Removed 1 rows containing missing values (`geom_text()`).

Step 5

grid.arrange(plot_minard, plot_temp, nrow=2, heights=c(3,1))
## Warning: Removed 1 rows containing missing values (`geom_text()`).