Intro to R and R Studio

R Notebooks


I use R Notebooks (like this one) for lectures.

  • Good for creating a document with text explanation and executable R code together

  • A gray box is a code chunk - something you can execute in R

  • The white box shows the output from the code chunk

# this is a gray box
print("this is the output from the code chunk")
## [1] "this is the output from the code chunk"


You can follow along in R Studio on your computer.



R Studio Layout




The Console pane

The Console is where you can type code that executes immediately, and where you view the output.


Type into your console, and then press enter:

4 + 5
## [1] 9
14 - 7
## [1] 7

Things to notice:

  • Press return/enter on your keyboard to execute a command in the console
  • R keeps a history of your console commands
    • use up or down arrows to view previous commands



Create new objects (variables):

variable_name <- value of the variable

<- is the ‘assignment operator’, it is like and equal sign

# create a variable to hold the value 9
calc1 <- 4 + 5

# create a variable to hold the value 7
calc2 <- 14 - 7

# add the two variables together
total <- calc1 + calc2

total 
## [1] 16

Things to notice:

  • In R, create a new object with an assignment operator <-
    • keyboard shortcut for “<-”
      • Alt+- (Windows)
      • Option+- (Mac)
    • See all the keyboard shortcuts!
      • Alt+Shift+K (Windows)
      • Option+Shift+K (Mac)
  • A hash # tells R not to run that line of code
    • In R-speak, it’s called “comment out”
    • It is good practive to write comments to explain your code
  • When you define an object, the console does not display the value
    • type the object name to return the current value
    • you can see all defined objects in the Environment pane
  • Object names should start with a letter, and only contain letters, numbers, _ and .(periods).



calc3 <- 4 + calc2


# calc4 <- 4 + seven

Things to notice:

  • You can perform math operations on numbers and an object assigned as a number




The Source pane

You can use the Source Pane to write scripts to save your work. You can also open and run existing scripts.


In-class exercise: Create your first script

  • First, create a file structure so that everything is organized and everyone in the class has the same file structure. This will help a lot when you are helping each other!
  • If you haven’t already, create a folder for this class. Call it “methods1”.
  • Within your methods1 folder, create a new folder called “class1”.
  • Within your class1 folder, create a new folder called “data”.
  • In R Studio, create a new script
    • File > New File > R Script
    • At the top, type “# this is my first script” (make sure you include the hash #)
    • Save the script in your class1 folder as my_first_script.R
    • Create the same variables in your script as above
    • Make sure you add comments to describe your work
  • Run the script by clicking the down arrow next to RUN on the top-right of your Source Window
  • Save your script
#### Do some simple math, and save the values ####

# Calculate 4 + 5
calc1 <- 4 + 5

calc2 <- 14 - 7 # Calculate 14 - 7

# Add our two calculations together
total <- calc1 + calc2

Things to notice:

  • You can save your work easily with a script
  • In R, a hash (#) comments out a line, meaning the the computer ignores it
    • Use # to explain your script as you go
    • You can comment before a line, or on the same line
  • There are lots of different ways to run your script
  • Place your cursor at the end of a line, Cmd+Return (Mac) / Ctrl+Return (Windows)
  • Place your cursor at the end of a line, Click RUN
  • Highlight the code to run, use keyboard shortcut or Click RUN
  • CMD-S/CNTRL-S is the keyboard shortcut to save - do it a lot




Projects

Projects are a good way to keep track of all of the files for a specific task or project. We’ll create projects for each class in this course.

In-class exercise: Create a project

  • Create a new project
  • File New Project
  • Existing Directory navigate to your class1 folder
  • Click Create Project

Things to notice:

  • There is now a class1.Rproj file in your class1 folder
    • It keeps track of your file path and a few other things
  • Projects make it easy to use a shorter file path
  • Within a project, the Files window shows all of the files in your project




The Files pane

There are lots of useful tabs in this pane


Files

The Files window is like file explorer

  • You should be in your class1 folder to see your script

Plots

The Plots window display charts and maps you create

Packages

Packages are collections of functions and datasets developed by the R community.

Some, like the tidyverse, have become the backbone of analysis in R.

The Packages window lists the packages you have installed and provides a user interface to search for other packages and install them.

Help

The Help window is where you learn about packages and functions.

In the Help Search bar, type readr, look at the readr-package documentation.

# Alternately you can search the help directly in the console
??readr




The Environment pane

The Environment shows all of the objects that you have in your workspace


If you are following along, you should have at least 4 objects in your Environment.




Opening data in R Studio

So far we have created variables by typing directly into the console or in a script. Next we;ll learn how to open a data table in R Studio and and work with it.

In-class exercise: Import a CSV


Download the 2018 education dataset from EdBuild.

Move it to the data folder in your class1 folder.

Now we’ll import our first dataset into R.

## Load in the tidyverse, an extremely useful set of packages that you installed before the class
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## import the education dataset for 2018 using read_csv from the readr package of the tidyverse
ed18 <- read_csv("data/full_data_18_fin_exc.csv")
## Rows: 13036 Columns: 41
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): NCESID, State, NAME, CONUM, County, dType, dUrbanicity, sd_type, s...
## dbl (32): STATE_FIPS, ENROLL, LRPP, SRPP, SLRPP, LR, SR, SLR, SRPP_cola, LRP...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## You can use the full path name too (this is the path on my computer)
## We will cover filepaths on Windows machines in our next lab -- this only works on Mac
# ed18 <- read_csv("~/spatial/NewSchool/methods1-materials-fall2021/class1/data/full_data_18_fin_exc.csv")

Things to notice:

  • The csv is listed in your Environment
    • To view details about the data, click the down arrow next to the filename
  • Notice the messages in the Console
    • They provide information about the table and the process of importing it
    • The data type for each column was automatically determined



Data frames

Data tables are called data frames in R

Let’s explore our first data frame with some functions

## list all of the column names in the console
names(ed18)

# print information about each column (returns the name of each column, the data type, and the first few values)
glimpse(ed18)

# open the data frame
View(ed18)

Things to notice:

  • You can Search or Filter when you view the dataframe
  • The data type is displayed in the head and in the Environment window
  • You can also open the data frame by clicking on the data frame name in the Environment pane



Basic data types in R

  • Numeric
    • Integers (whole numbers)
    • Doubles (fractions)
  • Character (string)
  • Logical (boolean) - TRUE or FALSE

Things to notice:

  • The whole column is always the same type, if there is one character in a numeric column, the whole column will be type = character.
  • NA for missing value



Create new columns in dataframe

### Sometimes you want an id to be numeric instead of string
# Create a new numeric column for county number
ed18$county_num <- as.numeric(ed18$CONUM)

# Create a new numeric column - Percent of Student eligible for free-and-reduced price lunch
# This is a common measure of the level of economic distress in a school
ed18$percent_free_reduced_lunch <- ed18$dFRL/ed18$dEnroll_district

View(ed18)

Things to notice:

  • The new column is added to the end of the data frame
  • You can sort ascending and descending by any column



Select columns from your data frame

### Create a new data frame with fewer columns

ed18_select <- ed18[c("NCESID", "State", "NAME", "dEnroll_district",
                              "percent_free_reduced_lunch", "StPovRate")]

Things to notice:

  • In base R, brackets are used to select columns from a data frame.
  • new_data_frame <- old_data_frame[c(list of all columns to keep, in quotes, comma-separated)]



Select rows your dataframe

### Create a new dataframe for your New York
newyork18 <- subset(ed18_select, State == "New York")

# Calculate how many districts are there in New York?
ny_districts <- nrow(newyork18)
ny_districts
## [1] 675

Things to notice:

  • The value for ny_districts in your Environment has an L after it - this means its an integer



Write your data frame to your computer

### write new New York data frame of to my data folder
write_csv(newyork18, "data/new_york_18_selected.csv")




Homework

See assignments for week 1 in Canvas.