“Whether you want to uncover the secrets of the universe, or you want to pursue a career in the 21st century, basic computer programming is an essential skill to learn”

Stephen Hawking

Class Logistics

Date, Time and Location

  • Date:
    • October, Monday 21 to Friday 25, 2019
  • Times:
    • Start & End: 8:30 am - 4:30 pm
    • Last Day End: 11:30 am
  • Class location:
    • The USFWS National Conservation Training Center (Directions)
    • Room: G30 in the lower floor of the John Turner Building

Agenda

Contacts

Eric Kelchlin - Course Leader
The National Conservation Training Center

(304) 876-7453

FAQ

Do I need to bring my laptop to class?

Answer: It depends. If your class is at the National Conservation Training Center (NCTC), then bringing your laptop to class is optional. We’ll provide you with a computer during class, but you may want to work after hours with your own laptop. If your class is not at NCTC, then it is required that you bring your laptop with the software installed and configured according to the steps listed in the Homework section.

Do I need to install the class software on my work computer?

Answer: Yes. You’ll need to have R and the R Studio IDE installed to complete the pre-course homework. Also, we want you to hit the ground running when you come back from class.

My IT staff will not install the software on my work computer, what do I do?

Answer: You can not persuade with gifts, but you can perform acts of kindness. Open source software are preferred over commercial software in the Federal Government. If you are a DOI employee, R and R Studio are on Apps-to-Go.


Homework

Your learning starts now! Please install the required software and complete the two swirl courses in R Studio.

Install R and R Studio

Download and install the most current versions of R and RStudio for your operating system. Install R in your User folder (i.e., C:\Users\ekelchlin\Documents\R), so you won’t need administrative rights. You may, however, need administrative rights for R Studio.


Install R Packages

The instructions below will guide through a process on how to install R packages used in class. You can think of packages as “little add-ons” that expand what you can do in R. Tidyverse, for example, is one of my favorite packages for data science.

  • Open R Studio, select File > New File > R Script. This creates a new R script file and opens a new tab called Untitled1 in the Source window pane.


  • Highlight and copy the code below. Paste the code in the new R script file. This is a custom function (geesh! so fancy already) that looks to see if you already have the package installed and if not, it will install the package for you. More on packages later in the class.


packages <- function(x){
  x <- as.character(match.call()[[2]])
  if (!require(x,character.only = TRUE)) {
    install.packages(pkgs = x,repos = "http://cran.r-project.org")
    require(x,character.only = TRUE)
  }
}
  • Highlight the entire code block and click the run button or Ctrl + Enter. This will add a new function called packages in the Environment pane.


  • Next, highlight and copy the code below and paste in the R script file below the packages function section (i.e., line 9).


packages(knitr)
packages(rmarkdown)
packages(tidyverse)
packages(tinytex)
packages(yaml)
packages(installr)
packages(kableExtra)
packages(devtools)
packages(lattice)
packages(boot)
packages(AICcmodavg)
packages(car)
packages(epitools)
packages(gmodels)
packages(gplots)
packages(leaps)
packages(leaflet)
packages(lme4)
packages(maps)
packages(mapview)
packages(maptools)
packages(MASS)
packages(multcomp)
packages(nlme)
packages(odbc)
packages(plotly)
packages(pom)
packages(pwr)
packages(raster)
packages(rgbif)
packages(rgdal)
packages(rms)
packages(ROCR)
packages(sf)
packages(data.table)
packages(shiny)
packages(spocc)
packages(sp)
packages(survey)
packages(spsurvey)
packages(swirl)
packages(here)
  • Highlight this new block of code and click the run button or Ctrl + Enter. This will download packages and add them to your R library. You should see a lot of red, blue and black text scrolling up in the Console pane. The installation process will take about 30 seconds.


Your packages will be saved in the library folder. To find out where all the packages live, simply copy the code below, paste in the Source pane and run the code.


.libPaths()
## [1] "C:/Users/ekelchlin/Documents/R/R-3.5.3/library"


Change R Studio Settings

Let’s walk through some R Studio settings to standardize the look and feel of the Integrated Development Environment (IDE). This will allow the instructor(s) to respond quicker when you run into a glitch or an error. These settings will also speed-up your learning process.

Open R Studio and select Tools > Global Options… from the top menu bar.


General R Options

  • Change the default working directory to the class folder.
  • Also, check the boxes as shown below.


Source Code: Editing


Source Code: Display


Source Code: Saving


Source Code: Completion


Source Code: Diognostics


Appearance and Themes

  • I use the Dracula theme for those who like it dark and bloody, or the Cobalt theme for the late night sophisticated look.

  • I keep the font large, because lets face it, we are not getting any younger.


Pane Layout


Packages Management


Packages Development


RMarkdown


Sweave on Down



Install WinBugs

Install the latest version of WinBugs to analyse data with Bayesian statistics.


Learn the R Language

Open R Studio and type the following r code in the Console Pane. This will install a course from the swirl package for later use.

swirl::install_course("Getting and Cleaning Data")

Type the following r code in the Console Pane to launch the training program.

library("swirl")


Follow the onscreen instructions to run and complete the two courses listed below. This will take you 3-5 hours to complete.

  • R Programming (complete first)

  • Getting and Cleaning Data


Bring Data to Class

We’ll have time in the class for open lab work and consultations, so we encourage you to bring a sample of your data to class in .xls, xlsx, .csv or .txt file types. You may also be called on during class to share your data because it fits the lesson at hand. However, we do ask that your data be as clean as possible.

Clean data is free of errors and in a matrix structure of raw values in rows and columns. The columns represent the variables of interest and the rows identify the observed data values collected for each sample. The matrix should not contain summaries and graphs, just the raw boring numbers or text. Please see the example picture below.



Common Packages & Functions

Packages

Use require(package) or library(package) in your R script or R Markdown document to load the package. Package specific functions are only available after you “load” the required package.

  • tidyverse - A family of useful packages such as dplyr, tidyr, ggplot, readr, and stringr
  • readxl - Import Excel sheets (part of tidyverse, but needs to be loaded separately)
  • lubridate - An easier way to work with dates and times (part of tidyverse, but needs to be loaded separately)
  • data.table - Data frame tools
  • plotly - Turns ggplot objects into interactive figures
  • sf - Manipulate and analyze spatial data
  • mapview and leaflet - Create interactive maps
  • here - Simplifies file management (replaces setwd() and more!)

Data Importing Functions

The double colon symbol :: identifies the function with its parent package. While in R Studio, put your cursor on the function below and press F1 to call-up the specific help file in the Help Pane.

  • here::here() - Import and save your files relative to your R Studio project root directory
  • readxl::read_excel() - Read xls and xlsx files (sheet = “Sheet1”, range = A1:F10)
  • readr::read_csv() - Read comma separated (“,”) values (.csv files)
  • readr::read_tsv() - Read tab separated (") values (.txt files)
  • data.table::fread() - Read large or difficult to import files

Data Transformation Functions

  • tidyr::gather() - Convert table from wide to long format
  • tidyr::spread() - Convert table from long to wide format
  • tidyr::separate() - Separate one column into multiple columns
  • tidyr::unite() - Combine multiple columns into one column
  • tidyr::drop_na() - Drop records with NA’s
  • tidyr::replace_na()- Replace NA observations with a new value
  • base::names() - Get (names(df)) or replace names (names(df) <- newnames)
  • dplyr::rename() - Rename variables
  • dplyr::select() - Select or drop variables (see select_helpers)
  • dplyr::filter() - Subset or drop records
  • dplyr::mutate() - Create new variables or replace with transmute()
  • dplyr::group_by() - Collapsing data by grouping 1 or more variables
  • dplyr::summarize() - Summarize variable into single value, use with group_by()
  • dplyr::distinct() - Select/find all unique records
  • dplyr::count() - Count the number of occurrences of a categorical value
  • dplyr::arrange() - Sort records ascending or descending (desc)
  • dplyr::left_join() - Join records in table1 with matching records from table2
  • dplyr::inner_join()- Join records from both tables with matching records

Logical and Relational Operators

<   Less than                    !=       Not equal to
>   Greater than                 %in%     Group membership
==  Equal to                     is.na    is NA
<=  Less than or equal to        !is.na   is not NA
>=  Greater than or equal to     &,|,!    and, or, not

Pattern Matching and Replacment Functions

  • dplyr::if_else() - if_else(condition, true, false, else)); use base::ifelse if buggy
  • dplyr::case_when() - Used to nest multiple if_else() statements
  • dplyr::na_if() - Replace specific values with NA
  • dplyr::recode() - Recode values in a vector with named arguments (variable, old value = “new value”)
  • stringr::str_replace() - str_replace(variable, pattern, replacement)
  • stringr::str_split() - str_split(variable, pattern)
  • base::paste0() - combines objects and converts into single character vector
  • base::grep() - grep(pattern, variable, value=TRUE) - returns match
  • base::grepl() - grepl(pattern, variable) - returns TRUE or FALSE

Data Type Conversion Functions

  • base::as.numeric() - converts value to “numeric” (see also as.integer() and as.double())
  • base::as.character() - converts value to “character”
  • base::as.factor() - converts value to “factor” for graphing and modeling
  • lubridate::as_date() - converts value to “date”
  • lubridate::as_datetime() - converts value to “date-time”
  • lubridate::ymd() - converts a string to “date” as “year:month:day” see also dmy() and mdy()
  • lubridate::dmy_hms(x, tz = "GMT") - converts value to “day:month:year:hh:mm:ss”
  • base::as.POSIXct(variable, origin = "1899-12-30", tz = "GMT")) - convert Excel dates to day:month:year:hh:mm:ss

See R for Data Science and the lubridate package reference to learn more about date conversions

The Piping Workflow for Data Wrangling

Piping %>% allows you to apply a clear sequence of functions to your data to create a single outcome.

# use CTRL+Shift+M to create the pipe %>% 

data %>%                  # identify the data           
  select() %>%            # select the variables with their names or index location
  filter() %>%            # subset records with operators or expressions
  mutate() %>%            # create new variable(s) from existing ones or with expressions
  group_by() %>%          # group the data by the variable(s), and then
  summarize()             # collapse the data to a statistic like mean or std 

Learn more about using pipes

Main Components of ggplot2 for Graphing

data %>%                       # identify the data 
  ggplot(aes(x = , y = )) +    # identify what to plot and continue `+` to the next line of code
  geom_boxplot()               # provide a geometric function to draw the layer 

ggsave(filename, plot =, device = "pdf")  # export and save the plot for publication

Learn more about graphing with ggplot2