class: center, middle, inverse, title-slide # Session 1 ## S ### Julian Flowers ### 03-01-2022 (updated: 2022-01-05) --- class: left, top # Getting Started * Install R from https://www.r-project.org/ * Install latest version of RStudio IDE<sup>1</sup> from https://www.rstudio.com/products/rstudio/download/ ## Optional * Set up a Github account e.g. https://github.com/julianflowers12 * Set up an RPubs account https://rpubs.com/users/new * Open a browser with Google * Open a browser with Stack Overflow .footnote[ [1] Integrated development environment ] --- ### Power of R --- ## Some basics - Usually need to add packages + `install.packages("package name")` - First lines of code + `install.packages("pacman")` ## download and install a universal package manager + `library(pacman)` ## load into R + `p_load(tidyverse)` ## install and load `tidyverse` - more later ```r install.packages("pacman", repos = "https://cran.rstudio.com" ) ``` ``` ## ## The downloaded binary packages are in ## /var/folders/bk/jrqs03tx5mq9s28mhml5xzhm0000gn/T//RtmpC6NC0l/downloaded_packages ``` ```r library(pacman) p_load(tidyverse, viridis, gganimate, tweenr) ``` --- ### Key ideas * Tidy data and daat wrangling * End-to-end * Automation * Reproducibility * Open + Data + Source + Code * Sharing --- ### R difficulties * Multiple ways of achieving same result * Dependencies * Learning curve * --- ### Examples - In the code chunk below: + We are reading in a data from the Coronavirus Dashboard API as a csv file via `read_csv()` + (Dataset is daily test positivity by lower tier LA) + We are using the `head()` function to show the first 6 data rows of data `df1` + We are using the *pipe* function `%>%` + Data is a *data frame* - in this case a `tibble` ```r df1 <- read_csv("https://api.coronavirus.data.gov.uk/v2/data?areaType=ltla&metric=uniqueCasePositivityBySpecimenDateRollingSum&format=csv", show_col_types = FALSE) df1 %>% head() ``` ``` ## # A tibble: 6 × 5 ## areaCode areaName areaType date uniqueCasePositivityBySpec… ## <chr> <chr> <chr> <date> <dbl> ## 1 E06000003 Redcar and Cleveland ltla 2021-12-26 20.3 ## 2 E07000040 East Devon ltla 2021-12-26 13.8 ## 3 E07000090 Havant ltla 2021-12-26 21.7 ## 4 E07000214 Surrey Heath ltla 2021-12-26 22.2 ## 5 E07000229 Worthing ltla 2021-12-26 19 ## 6 E08000001 Bolton ltla 2021-12-26 30 ``` --- ### Lets plot some of the data ```r df1 %>% filter(str_detect(areaName, "Leeds")) %>% ## filter row-wise; `str_detect` is a good strategy for filtering among large numbers of text categories ggplot(aes(date, uniqueCasePositivityBySpecimenDateRollingSum)) + geom_line(colour = "darkblue") + geom_smooth(method = "loess", span = .3) + labs(title = "Test positivity") + theme(plot.title.position = "plot") ``` ``` ## `geom_smooth()` using formula 'y ~ x' ``` <!-- --> --- ### Further plots <!-- --> --- class: left, top ### Map code ``` library(tmap); library(sf) s2020 <- "https://opendata.arcgis.com/datasets/69d8b52032024edf87561fb60fe07c85_0.geojson" shp2020 <- st_read(s2020, quiet = T) ## read shape file shp2020 <- filter(shp2020, str_detect(LAD20CD, "^E")) shp2020 <- shp2020 %>% left_join(df1, by = c("LAD20CD" = "areaCode")) shp2020_nov <- filter(shp2020, date >= "2021-12-01") g <- ggplot(shp2020_nov) + geom_sf(aes(fill = uniqueCasePositivityBySpecimenDateRollingSum, colour = uniqueCasePositivityBySpecimenDateRollingSum) )+ coord_sf() + scale_fill_viridis(direction = -1, name = "Test positivity (%)", option = "inferno") + scale_colour_viridis(direction = -1, name = "Test positivity (%)", option = "inferno") + theme_void() + facet_wrap(~date, ncol = 8) g ``` --- ### Map <!-- --> --- ### Small multiples ```r p + facet_wrap(~areaName, ncol = 8) ``` <!-- --> ---