Course Overview Slides










Intro to R and R Studio

R Notebooks


I use R Notebooks (like this one) for lectures.

  • Good for creating a document with text explanation and executable R code together

  • A gray box is a code chunk - something you can execute in R

  • The white box shows the output from the code chunk

# this is a gray box
print("this is the output from the code chunk")
## [1] "this is the output from the code chunk"


You can follow along in R Studio on your computer.



The Four Windows of R Studio




The Console pane


Type into your console, and then press enter:
4 + 5
## [1] 9
14 - 7
## [1] 7

Things to notice:

  • Press return/enter on your keyboard to execute a command in the console
  • R keeps a history of your console commands
    • use up or down arrows to view previous commands



Create new objects (variables):

variable_name <- value of the variable

<- is the ‘assignment operator’, it is like and equal sign

# create a variable to hold the value 9
calc1 <- 4 + 5

# create a variable to hold the value 7
calc2 <- 14 - 7

# add the two variables together
total <- calc1 + calc2

total 
## [1] 16

Things to notice:

  • In R, create a new object with an assigment operator <-
    • keyboard shortcut for “<-”
      • Alt+- (Windows)
      • Option+- (Mac)
    • See all the keyboard shortcuts!
      • Alt+Shift+K (Windows)
      • Option+Shift+K (Mac)
  • A hash # tells R not to run that line of code
    • In R-speak, it’s called “comment out”
    • It is good practive to write comments to explain your code
  • When you define an object, the console does not display the value
    • type the object name to return the current value
    • you can see all defined objects in the Environment pane
  • Object names should start with a letter, and only contain letters, numbers, _ and .(periods).



calc3 <- 4 + calc2


# calc4 <- 4 + seven

Things to notice:

  • You can perform math operations on numbers and an object assigned as a number



The Source Pane


You can use the Source Pane to create scripts to save your work and create reproducable analysis

In-class exercise: Create your first script

  • First, create a file structure so that everything is organized and everyone in the class has the same file structure. This will help a lot when you are helping each other!
  • If you haven’t already, create a folder for this class. Call it “methods1”.
  • Within your methods1 folder, create a new folder called “class1”.
  • Within your class1 folder, create a new folder called “data”.
  • In R Studio, create a new script
    • File > New File > R Script
    • At the top, type “# this is my first script” (make sure you include the hash #)
    • Save the script in your class1 folder as my_first_script.R
    • Create the same variables in your script as above
    • Make sure you add comments to describe your work
  • Run the script by clicking the down arrow next to RUN on the top-right of your Source Window
  • Save your script
#### Do some simple math, and save the values ####

# Calculate 4 + 5
calc1 <- 4 + 5

calc2 <- 14 - 7 # Calculate 14 - 7

# Add our two calculations together
total <- calc1 + calc2

Things to notice:

  • You can save your work easily with a script
  • In R, a hash (#) comments out a line, meaning the the computer ignores it
    • Use # to explain your script as you go
    • You can comment before a line, or on the same line
  • There are lots of different ways to run your script
  • Place your cursor at the end of a line, Cmd+Return (Mac) / Ctrl+Return (Windows)
  • Place your cursor at the end of a line, Click RUN
  • Highlight the code to run, use keyboard shortcut or Click RUN
  • CMD-S/CNTRL-S is the keyboard shortcut to save - do it a lot

Projects

Projects are a good way to keep track of all of the files for a specific task or project. We’ll create projects for each class in this course.

In-class exercise: Create a project

  • Create a new project
  • File New Project
  • Existing Directory navigate to your class1 folder
  • Click Create Project

Things to notice:

  • There is now a class1.Rproj file in your class1 folder
    • It keeps track of your file path and a few other things
  • Projects make it easy to use a shorter file path
  • Within a project, the Files window shows all of the files in your project




The Files/Plots/Packages/Help pane


  • Files window is like file explorer
    • You should be in your class1 folder to see your script
  • Plots window display charts and maps you create
  • Packages window lists the packages you have installed
  • Help window is where you learn about packages and functions



Packages

Packages are collections of functions and datasets developed by the community.

Some, like the tidyverse, have become the backbone of analysis in R.

Help

Look at the help window

In the Help Search bar, type readr, look at the readr-package documentation.

# Alternately you can search the help directly in the console
??readr

Import a CSV

In-class exercise: Create a project


Download the 2018 education dataset from EdBuild.

Move it to the data folder in your class1 folder.

Now we’ll import our first dataset into R.

## Load in the tidyverse, an extremely useful set of packages that you installed before the class
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.4     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## import the education dataset for 2018 using read_csv from the readr package of the tidyverse
ed18 <- read_csv("data/full_data_18_fin_exc.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   NCESID = col_character(),
##   State = col_character(),
##   NAME = col_character(),
##   CONUM = col_character(),
##   County = col_character(),
##   dType = col_character(),
##   dUrbanicity = col_character(),
##   sd_type = col_character(),
##   state_id = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
## You can use the full path name too (this is the path on my computer)
## We will cover filepaths on Windows machines in our next lab -- this only works on Mac
# ed18 <- read_csv("~/spatial/NewSchool/methods1-materials-fall2021/class1/data/full_data_18_fin_exc.csv")

Things to notice:

  • Your imported csv is in your Environment
    • Click the down arrow next to the filename



Data frames

Data tables are called data frames in R

Let’s explore our first data frame

## list all of the column names in the console
names(ed18)

# print information about each column
glimpse(ed18)

# open the data frame
View(ed18)

Things to notice:

  • You can Search or Filter when you view the dataframe
  • The data type is displayed in the head and in the Environment window



Basic data types in R

  • Numeric
    • Integers (whole numbers)
    • Doubles (fractions)
  • Character (string)
  • Logical (boolean) - TRUE or FALSE

Things to notice:

  • The whole column will be the same type, if there is one character in a numeric column, the whole column will be type = character.
  • NA for missing value



Create new columns in dataframe

### Sometimes you want an id to be numeric instead of string
# Create a new numeric column for county number
ed18$county_num <- as.numeric(ed18$CONUM)

# Create a new numeric column - Percent of Student eligible for free-and-reduced price lunch
# This is a common measure of the level of economic distress in a school
ed18$percent_free_reduced_lunch <- ed18$dFRL/ed18$dEnroll_district

View(ed18)

Things to notice:

  • The new column is added to the end of the data frame
  • You can sort ascending and descending by any column



Select columns from your data frame

### Create a new data frame with fewer columns

ed18_select <- ed18[c("NCESID", "State", "NAME", "dEnroll_district",
                              "percent_free_reduced_lunch", "StPovRate")]

Things to notice:

  • In base R, brackets are used to select columns from a data frame.
  • new_data_frame <- old_data_frame[c(list of all columns to keep, in quotes, comma-separated)]



Select rows your dataframe

### Create a new dataframe for your New York
newyork18 <- subset(ed18_select, State == "New York")

# Calculate how many districts are there in New York?
ny_districts <- nrow(newyork18)
ny_districts
## [1] 675

Things to notice:

  • The value for ny_districts in your Environment has an L after it - this means its an integer



Write your data frame to your computer

### write new New York data frame of to my data folder
write_csv(newyork18, "data/new_york_18_selected.csv")




Homework

See assignments for week 1 in Canvas.