Use ADMINOH for timesheet

Artwork by @allison_horst

Introduction

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. More information about how to set code vs narrative script here, https://kbroman.org/knitr_knutshell/pages/Rmarkdown.html.

Today wont be an overview of R. There is simply too much. For those that have never been introduced to R to those who are proficient in other languages including R or not, my goal is to provide enough information to entice you to explore its capabilities on your own and see how it can fit into your workflows or even power entire projects A smaller goal I have is to show you how some geospatial workflows/tools that do NOT include a GUI like ArcGIS Pro.

I’ll provide some links but as with most things, a quick google will provide a wealth of information, tools and code samples!

Enough basics to hit the ground running

R is an open source language and environment for statistical computing and well-designed publication quality plots and graphics.This is its unparalleled strength! It was built for data analysis, so it can a bit more intuitive about how to bring in data, perform cleaning and analysis, perform statistical methods and diagnostics.

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an combination of segmented inflexible tools. https://www.r-project.org/about.html

Its at the forefront of academia and its growing everyday!

What you need to get started: a quick download of R and R studio (IDE) https://posit.co/download/rstudio-desktop/

Let me show you R studio briefly now, then we will come back later…

Data Cleaning and Processing

Artwork by @allison_horst

R excels in data manipulation and cleaning, crucial for preparing geospatial data. The dplyr and tidyr packages streamline data wrangling tasks, making it easier to manage large and complex spatial datasets.

Similar to python, you need to load your libraries of what packages you will be using. Before this step you need to install any packages if you are using them for the first time.

library(tidycensus)
library(sf)           
library(ggplot2)      
library(dplyr)        
library(tidyr)
library(leaflet)
library(lubridate)
library(tidycensus)
library(tidyverse)
library(plotly)
library(mapview)
library(tigris)
library(tmap)
library(RColorBrewer)
library(classInt)
tmap_mode("view") #global param for later mapping

Tip : You can use “here” package to create paths relative to top level directory. otherwise add extra backslash to path, for more info visit https://here.r-lib.org/. For this, I am using a hardcoded path.

Using R reminded me a lot of FME.

  • start with reading in your data

  • you pipe data along in your workflows

  • and I found the methods and functions very intuitive to what process I was trying to conduct

  • during my analysis as I am working and troubleshooting and building, I can interact with my data

#sharing a snippet of my data cleaning and wrangling of Seattle Fire Department 911 calls to analyze response times of medical calls
#open data request and need to make it analysis ready
#read in csv
sfdcalls <- read.csv("C:\\Users\\DMesler\\OneDrive - TRC\\Documents\\analyst_alliance\\R presentation\\2021-2024_SFD_Response_Time_Data.csv")

#cleaning
cleancalls <- sfdcalls %>% 
  mutate(timeassigned = mdy_hm(Time_First_Unit_Assigned), #using lubridate to change date type
         timearrive = mdy_hm(Time_First_Unit_Arrived),
         responsedate = mdy_hm(Response_Date),
         responseyear = year(responsedate)) %>%
  filter(Latitude != 0 & Longitude != 0) %>% #get rid of zeros
  mutate(Latitude = ifelse(nchar(Latitude) == 9, floor(Latitude / 10), Latitude), #standardize length of coord values
         lat = as.numeric(Latitude) * 0.000001, #change to numeric data type
         long = as.numeric(Longitude) * -.000001) %>% #change to numeric data type
  st_as_sf(coords = c("long", "lat"), crs = 4326) %>% #use sf to tranform into valid geom
  dplyr::select(-Time_First_Unit_Assigned, -Time_First_Unit_Arrived, -Response_Date, -Time_PhonePickUp) #remove fields I dont want

#examine unique incident types
#i have way too many incident types and not all of them are medical
distinctcalltype <- unique(cleancalls$Incident_Type)
distinctcalltype
##   [1] "Aid Response"                  "Advised Incident"             
##   [3] "Medic Response"                "Trans to SPD"                 
##   [5] "Scenes Of Violence 7"          "Illegal Burn"                 
##   [7] "2RED - 1 + 1"                  "Aid Response Yellow"          
##   [9] "Nurseline/AMR"                 "Automatic Fire Alarm Resd"    
##  [11] "Fuel Spill"                    "Rubbish Fire"                 
##  [13] "Auto Fire Alarm"               "1RED 1 Unit"                  
##  [15] "Low Acuity Referral"           "AFA4 - Auto Alarm 2 + 1 + 1"  
##  [17] "MVI - Motor Vehicle Incident"  "Medic Response, Overdose"     
##  [19] "Automatic Medical Alarm"       "Investigate Out Of Service"   
##  [21] "Rescue Elevator"               "Ladder Code Yellow"           
##  [23] "Low Acuity Response"           "Dumpster Fire"                
##  [25] "Brush Fire"                    "Medic Response, 7 per Rule"   
##  [27] "EVENT - Special Event"         "Trans to AMR"                 
##  [29] "4RED - 2 + 1 + 1"              "Automatic Fire Alarm False"   
##  [31] "Mutual Aid, Medic"             "Furnace Problem"              
##  [33] "Spill, Non-Hazmat"             "Hang-Up, Aid"                 
##  [35] "Rescue Extrication"            "Activated CO Detector"        
##  [37] "Encampment Fire"               "Medic Response, 6 per Rule"   
##  [39] "Alarm Bell"                    "Electrical Problem"           
##  [41] "SPD Stand By"                  "Poison Control"               
##  [43] "BC Medic Response, 6 per rule" "Car Fire"                     
##  [45] "Encampment Aid"                "Scenes Of Violence Aid"       
##  [47] "Water Job Minor"               "Wires Down"                   
##  [49] "Water Rescue Response"         "Marine Service Response"      
##  [51] "Assist SPD"                    "Engine Code Yellow"           
##  [53] "Aid Response Freeway"          "BC Aid Response"              
##  [55] "Car Fire Freeway"              "Fire in Building"             
##  [57] "Natural Gas Leak Major"        "Bark Fire"                    
##  [59] "MVI Freeway"                   "Food On The Stove"            
##  [61] "Rescue Lock In/Out"            "Tunnel Standby"               
##  [63] "Tunnel Aid"                    "Engine Code Red"              
##  [65] "Crisis Center"                 "TEST - MIS TEST"              
##  [67] "Scenes Of Violence Major"      "Water Job Major"              
##  [69] ""                              "Triaged Incident"             
##  [71] "Single Medic Unit"             "Medic Response, Overdose 14"  
##  [73] "HAZADV - Hazmat Advised"       "Marine Fire On Shore"         
##  [75] "LINK - Link Control Center"    "Tranformer Fire"              
##  [77] "Chimney Fire"                  "Fire Response Freeway"        
##  [79] "BC Medic Response"             "MVI Medic"                    
##  [81] "MVI Freeway Medic"             "Scenes Of Violence 14"        
##  [83] "Ladder Code Red"               "Natural Gas Odor"             
##  [85] "Natural Gas Leak"              "Unk Odor"                     
##  [87] "Shed Fire"                     "Fire In A Highrise"           
##  [89] "Mutual Aid, Ladder"            "Hazardous Mat, Spill-Leak"    
##  [91] "Dumpster Fire W/Exp."          "Rescue Standby"               
##  [93] "Vault Fire (Electrical)"       "Mutual Aid, Engine"           
##  [95] "Vessel Sinking On Shore"       "3RED - 1 +1 + 1"              
##  [97] "Rescue Rope"                   "Car Fire W/Exp."              
##  [99] "Fast Back Up"                  "Food On The Stove Out"        
## [101] "Hang-Up, Fire"                 "FIREWATCH"                    
## [103] "Garage Fire"                   "Brush Fire Freeway"           
## [105] "COMED Poss Patient"            "RMC Chief"                    
## [107] "Help the Fire Fighter"         "Mutual Aid, Task Force"       
## [109] "HazMat Reduced"                "Mutual Aid, Marine"           
## [111] "Rescue Trench"                 "Medic Response Freeway"       
## [113] "Mutual Aid, Aid"               "Water Rescue Standby"         
## [115] "Mutual Aid, Hazmat"            "Multiple Medic Resp 14 Per"   
## [117] "AFAH - Auto Alarm Hazmat"      "Vault Advised"                
## [119] "Public Assembly Assist SPD"    "Rescue Heavy Major"           
## [121] "Multiple Casualty Incident"    "Mutual Aid, Adv. Life"        
## [123] "Vessel Sinking On Water"       "Brush Fire W/Exp."            
## [125] "Explosion Minor"               "Mutual Aid, Strike Eng."      
## [127] "Marine Fire On Water"          "Brush Fire Major"             
## [129] "Encampment Medic"              "HazMat MCI"                   
## [131] "Tunnel MVI"                    "Rescue Confined Space"        
## [133] "AFAHI - Auto Alarm High Rise"  "Tunnel Fire"                  
## [135] "Water Rescue Response Major"   "Hazardous Material w/Fire"    
## [137] "Referral To Agency"            "Scenes Of Violence MCI"       
## [139] "Tunnel North Ops Bldg"         "Tunnel Rescue Standby"        
## [141] "Train Derailment wFireHzmt"    "Testing Only"
#store medical call types that I want for my analysis as a key word search
medtypes <- c("Encampment Aid", "Hang-up","Aid Response", "BC Aid Response", "Yellow", "Medical", "Medic", "Aid Response Freeway")

#formulate medical call dataset
medcleancalls <- cleancalls %>% #starting with the cleaned dataset
  filter(grepl(paste(medtypes, collapse='|'), Incident_Type)) %>% #filter out only the values from list that are in the incident_type field
  mutate(responsetime= (timearrive - timeassigned)/60, #date type response time for mapping
         num_responsetime = as.numeric(responsetime)) %>% #need this for statistics and for moran's statistic
  filter(timearrive != 0, responsetime !=0)

#examine unique med incident types
distinctmedtypes <- unique(medcleancalls$Incident_Type)
distinctmedtypes
##  [1] "Aid Response"                  "Medic Response"               
##  [3] "Aid Response Yellow"           "Medic Response, Overdose"     
##  [5] "Automatic Medical Alarm"       "Ladder Code Yellow"           
##  [7] "Medic Response, 7 per Rule"    "Mutual Aid, Medic"            
##  [9] "Medic Response, 6 per Rule"    "BC Medic Response, 6 per rule"
## [11] "Encampment Aid"                "Engine Code Yellow"           
## [13] "Aid Response Freeway"          "BC Aid Response"              
## [15] "Single Medic Unit"             "Medic Response, Overdose 14"  
## [17] "BC Medic Response"             "MVI Medic"                    
## [19] "MVI Freeway Medic"             "Medic Response Freeway"       
## [21] "Multiple Medic Resp 14 Per"    "Encampment Medic"

Spatial handling

Artwork by @allison_horst

R has a robust ecosystem of packages designed specifically for spatial analysis, such as sf, sp, raster, and rgdal. These packages offer extensive functions for handling spatial data, performing complex spatial operations, and integrating with other GIS software and formats.

Below are some examples sf usage, which implements a simple feature data model for vector data in R (points, lines, and polygons)

sf

# download block group shapefile/geometry from tigris to use for spatial operations
# transform to suitable projected coordinate system (PCS)
# for distance measuring would be a UTM, EPSG 32610: WGS 84 / UTM zone 10N

#list to remove super eastern block groups that dont participate in dataset
exclude <- c("530330328003", "530330327063", "530330315021", "530330328002", "530330315012", "530330328001")

# download block group geometry
king_block_groups <- tigris::block_groups("WA", "King") %>% #download geom using tigris
  tigris::erase_water() %>% #remove waterbodies
  sf::st_make_valid() %>% #remove invalid geometries
  sf::st_transform(crs = 32610) %>% #transform to projected coordinate system
  dplyr::filter(!GEOID %in% exclude)
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |============================                                          |  41%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |==========================================                            |  61%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  63%  |                                                                              |=============================================                         |  64%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |===================================================                   |  74%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  76%  |                                                                              |======================================================                |  77%  |                                                                              |=======================================================               |  78%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |==========================================================            |  84%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%
# AGGREGATE RAW POINTS TO POLYGON AREAS
# Methodology pulled from chapter 5 from lesson 4 of https://bookdown.org/mcwimberly/gdswr-book/vector-geospatial-data.html#practice-4
# Join to get geoid, then calc number of calls per block group (call density to scale means) and mean call time per block group
# join back to geometry to map

#first, transform to same crs and remove outlier calls
medcleancalls_trans <- medcleancalls %>% 
  filter(num_responsetime > 0) %>% #get rid of outlier response times
  st_transform(., st_crs(king_block_groups))

#spatial join, to inherit block group attributes based on spatial intersection/join
med_call_intersect <- st_join(medcleancalls_trans, king_block_groups, left = FALSE)

#calculate mean response time per block group, all medical calls
med_call_drop <- st_drop_geometry(med_call_intersect)

med_call_time_mean <- med_call_drop %>%
  group_by(GEOID) %>%
  summarize(mean_responsetime = mean(num_responsetime), 
            n = n(),
            sd_responsetime = sd(num_responsetime)) %>% #standard deviation
  mutate(cv_responsetime = sd_responsetime / mean_responsetime) #coefficient of variation

# join back to geometry, calc area for density calc, all medical calls
med_call_mean_map <- king_block_groups %>% #starting with block group geometry
  left_join(med_call_time_mean, by = "GEOID") %>%
  replace(is.na(.), 0) %>% #nulls are treated as 0's for later calcs
  mutate(area = sf::st_area(king_block_groups), #calc area
         calldens = 10^6 * n / area) #calc density as measured 1 call/ sq km

Visualizations

Artwork by @allison_horst

Static maps and plots are out, interactive and web-based are in!

mapview

leaflet

tmap

build interactive web aps with shiny

In this section I am going to show you how to interact with your data via a quick map and then make a sharable map with leaflet. Later in the presentation, I will show you how to build an interactive map with tmap.

Mapview, the best of both worlds!

This is one of main reasons why I wanted to do an R presentation. This is a game changer in analysis and troubleshooting phases of work.

#quick clean and transformation
med_map <- med_call_mean_map %>% sf::st_as_sf() %>% #converting to sf object
  filter(mean_responsetime != 0) %>% st_transform(crs= 4326) # get rid of zeros and use geographic coord sys for leaflet

mapview(med_map, zcol = "mean_responsetime") #one line of code for an interactive map

Leaflet

# map response time based on polygon areas + call density 

#MEDICAL leaflet MAP


palette <- brewer.pal(6, "Greens") #https://r-graph-gallery.com/38-rcolorbrewers-palettes.html

max_width <- 50

# Classify mean response times using Jenks natural breaks
jenks_breaks <- classIntervals(med_map$mean_responsetime, n = length(palette), style = "jenks")
jenks_classes <- cut(med_map$mean_responsetime, breaks = jenks_breaks$brks, include.lowest = TRUE)

# Create a color palette with the classification
pal <- colorFactor(palette = palette, domain = jenks_classes)

# Create the leaflet map
leaflet(data = med_map) %>% #setting data object as my filtered data
  addTiles() %>% #set up using default openstreetmap tiles
  addPolygons( #add layers
    fillColor = ~pal(jenks_classes),  # Use discrete color scheme based on Jenks breaks
    color = "black", #outline color
    weight = 1,
    fillOpacity = 1,
    label = ~mean_responsetime,
    popup = paste0(
      "GEOID: ", med_map$GEOID, "<br>",
      "Mean Response Time, in minutes: ", med_map$mean_responsetime, "<br>",
      "Standard Deviation of Mean Response Time: ", med_map$sd_responsetime, "<br>",
      "Coefficient of Variation of Mean Response Time: ", med_map$cv_responsetime, "<br>",
      "Block Group Call Density: ", med_map$calldens
    )
  ) %>% 
  leaflet::addLegend('bottomright',
                     pal = pal,
                     values = jenks_classes,
                     opacity = 0.9, 
                     title = str_wrap("Mean Response Time (minutes) for Medical Calls", max_width))

Using Census and ACS data

The old way

  • Download geometry shapefiles from census website for the geography you are mapping
  • Download csv’s (the non-spatial tabular data you actually want)
  • Load these into a GIS
  • Clean and examine fields to ensure they align
  • Perform joins to get spatial and non-spatial data in a single dataset

With tidycensus

  • use load_variables() to search for the dataset you want and the geographies its available in
  • use get_acs()

That’s it!

Introducing tidycensus and tigris!

“The tidycensus package (K. Walker and Herman 2021), first released in 2017, is an R package designed to facilitate the process of acquiring and working with US Census Bureau population data in the R environment. The package has two distinct goals.

First, tidycensus aims to make Census data available to R users in a tidyverse-friendly format, helping kick-start the process of generating insights from US Census data.

Second, the package is designed to streamline the data wrangling process for spatial Census data analysts. With tidycensus, R users can request geometry along with attributes for their Census data, helping facilitate mapping and spatial analysis.”

“The tigris R package simplifies the process for R users of obtaining and using Census geographic datasets. Functions in tigris download a requested Census geographic dataset from the US Census Bureau website, then load the dataset into R as a spatial object. Generally speaking, each type of geographic dataset available in the Census Bureau’s TIGER/Line database is available with a corresponding function in tigris.”

Uses the US Census API, you dont have to use a token for basic use.

The creator of tidycensus made a really helpful reference book: https://walker-data.com/umich-workshop-2023/spatial-data/#1 https://walker-data.com/tidycensus/

Integration Intro

R can seamlessly integrate with popular GIS software like QGIS and ArcGIS, as well as python including arcpy, leveraging the strengths of all environments. The RQGIS and arcgisbinding packages facilitate this integration, enabling the execution of R scripts within GIS platforms.

Esri’s Get Started Landing Page

Basic Set-up

Esri webinar providing an overview of R and integrated workflows with Pro

R-ArcGIS Bridge Meets the Cloud: Working with Remote Data

Biggest sellers for using the R-bridge integration from my perspective is:

  • creating GP tools with R (leveraging R specialty statistical methods etc.) and python
  • wanting to develop in R but use feature services or geodatabases (bring data into R)
  • with Microsoft R, you can use, process, and analyze big data, like spatiotemporal data in seconds
    • can leverage complex python libraries for deep learning
    • using R-ArcGIS bridge with Microsoft R you can integrate to bigger data platforms of MS like Azure and R-Server (so no longer using local processing much through virtual azure machines)

#install.packages("arcgisbinding")
library(arcgisbinding)
## *** Please call arc.check_product() to define a desktop license.
library(arcgis)
## Attaching core arcgis packages:
## → arcgisutils v0.3.0
## → arcgislayers v0.2.0
arc.check_product() #initialize connection to arcgis
## product: ArcGIS Pro (13.2.0.49743)
## license: Advanced
## version: 1.0.1.306
arc.check_portal() #check active portal account
## *** Current
##   url        : https://locana.maps.arcgis.com/
##   version    : 2024.1
##   user       : dmesler
##   organization   : Locana ArcGIS Online
## *** Available (signed in)
##   'https://www.arcgis.com/'
ID_Wetlands_fs <- arc.open('https://services.arcgis.com/CQaFpVeTI4SiADzi/arcgis/rest/services/Wetlands_AS/FeatureServer/0') #read in
ID_wetlands <- ID_Wetlands_fs %>%
  arc.select() %>% #read in as R data frame object
  arc.data2sf() #convert format to sf

mapview(ID_wetlands, zcol = "WETLAND_TY")

Key Takeaways

Artwork by @allison_horst

You as an analyst and we as a company, should not sleep on R!

It’s powerful on its own and easy to use. I hope you are now feeling incredibly free and empowered to work with data using R alone.

As a larger company, I also hope that we can possibly leverage the combined Microsoft, Esri and R integration for future projects!

We covered a wide sampling of items, I am happy to go back and answer questions about topics covered or hear what else you all want to learn for future presentations.

Some more helpful references

As usual with open source languages, a good google search(es) will get you a long way, but here are some resources to aid you in your R journey!

comprehensive ebooks: - https://r-spatial.org/book/part-1.html - https://bookdown.org/mcwimberly/gdswr-book/intro.html - https://r.geocompx.org/spatial-class

reference docs: - https://www.datacamp.com/cheat-sheet/getting-started-r - https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf