I use R Notebooks (like this one) for lectures.
Good for creating a document with text explanation and executable R code together
A gray box is a code chunk - something you can execute in R
The white box shows the output from the code chunk
# this is a gray box
print("this is the output from the code chunk")
## [1] "this is the output from the code chunk"
You can follow along in R Studio on your computer.
4 + 5
## [1] 9
14 - 7
## [1] 7
Things to notice:
variable_name <- value of the variable
<- is the ‘assignment operator’, it is like and equal sign
# create a variable to hold the value 9
calc1 <- 4 + 5
# create a variable to hold the value 7
calc2 <- 14 - 7
# add the two variables together
total <- calc1 + calc2
total
## [1] 16
Things to notice:
<-
# tells R not to run that line of code
calc3 <- 4 + calc2
# calc4 <- 4 + seven
Things to notice:
In-class exercise: Create your first script
- First, create a file structure so that everything is organized and everyone in the class has the same file structure. This will help a lot when you are helping each other!
- If you haven’t already, create a folder for this class. Call it “methods1”.
- Within your methods1 folder, create a new folder called “class1”.
- Within your class1 folder, create a new folder called “data”.
- In R Studio, create a new script
- File > New File > R Script
- At the top, type “# this is my first script” (make sure you include the hash
#)- Save the script in your class1 folder as my_first_script.R
- Create the same variables in your script as above
- Make sure you add comments to describe your work
- Run the script by clicking the down arrow next to RUN on the top-right of your Source Window
- Save your script
#### Do some simple math, and save the values ####
# Calculate 4 + 5
calc1 <- 4 + 5
calc2 <- 14 - 7 # Calculate 14 - 7
# Add our two calculations together
total <- calc1 + calc2
Things to notice:
Projects are a good way to keep track of all of the files for a specific task or project. We’ll create projects for each class in this course.
In-class exercise: Create a project
- Create a new project
- File New Project
- Existing Directory navigate to your class1 folder
- Click
Create Project
Things to notice:
Packages are collections of functions and datasets developed by the community.
Some, like the tidyverse, have become the backbone of analysis in R.
Look at the help window
In the Help Search bar, type readr, look at the readr-package documentation.
# Alternately you can search the help directly in the console
??readr
In-class exercise: Create a project
Download the 2018 education dataset from EdBuild.
Move it to the data folder in your class1 folder.
Now we’ll import our first dataset into R.
## Load in the tidyverse, an extremely useful set of packages that you installed before the class
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## import the education dataset for 2018 using read_csv from the readr package of the tidyverse
ed18 <- read_csv("data/full_data_18_fin_exc.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## NCESID = col_character(),
## State = col_character(),
## NAME = col_character(),
## CONUM = col_character(),
## County = col_character(),
## dType = col_character(),
## dUrbanicity = col_character(),
## sd_type = col_character(),
## state_id = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
## You can use the full path name too (this is the path on my computer)
## We will cover filepaths on Windows machines in our next lab -- this only works on Mac
# ed18 <- read_csv("~/spatial/NewSchool/methods1-materials-fall2021/class1/data/full_data_18_fin_exc.csv")
Things to notice:
- Your imported csv is in your Environment
- Click the down arrow next to the filename
Data tables are called data frames in R
Let’s explore our first data frame
## list all of the column names in the console
names(ed18)
# print information about each column
glimpse(ed18)
# open the data frame
View(ed18)
Things to notice:
Things to notice:
### Sometimes you want an id to be numeric instead of string
# Create a new numeric column for county number
ed18$county_num <- as.numeric(ed18$CONUM)
# Create a new numeric column - Percent of Student eligible for free-and-reduced price lunch
# This is a common measure of the level of economic distress in a school
ed18$percent_free_reduced_lunch <- ed18$dFRL/ed18$dEnroll_district
View(ed18)
Things to notice:
### Create a new data frame with fewer columns
ed18_select <- ed18[c("NCESID", "State", "NAME", "dEnroll_district",
"percent_free_reduced_lunch", "StPovRate")]
Things to notice:
### Create a new dataframe for your New York
newyork18 <- subset(ed18_select, State == "New York")
# Calculate how many districts are there in New York?
ny_districts <- nrow(newyork18)
ny_districts
## [1] 675
Things to notice:
- The value for ny_districts in your Environment has an L after it - this means its an integer
### write new New York data frame of to my data folder
write_csv(newyork18, "data/new_york_18_selected.csv")
See assignments for week 1 in Canvas.