A Beginner's Guide to R

Topics

R and RStudio

R and RStudio

  • This section covers the two pieces of software you need to download
  • R is the core piece
  • RStudio is a nice integrated development environment (IDE) that makes it much easier to use R

R

R

  • To download R for Windows, see this page
  • If you open R itself, it will look very plain plain R console

RStudio

  • RStudio makes R a little more user friendly
  • It's free and can be downloaded at rstudio.com
  • It's not necessary to open RStudio to use R, but in these slides we will assume that RStudio is your interface to R

RStudio

When you first open RStudio, this is what you see first opening RStudio

RStudio

  • The left panel is the console for R
  • Type 1 + 1 then hit “Enter” and R will return the answer RStudio 1 + 1

RStudio

  • It's a good idea to use a script so you can save your code
  • Open a new script by selecting “File” -> “New File” -> “R Script” and it will appear in the top left panel of RStudio RStudio open script

RStudio

  • This is basically a text document that can be saved (go to “File” -> “Save As”)
  • You can type and run more than one line at a time by highlighting and clicking the “Run” button on the script tool bar RStudio many lines

RStudio

  • The bottom right panel can be used to find and open files, view plots, load packages, and look at help pages
  • The top right panel gives you information about what variables you're working with during your R session
  • We'll explain more about what to look for in those panels later

R basics

Doing math

  • Open up a script if you haven't already (“File” -> “New File” -> “R Script”)
  • Try some math by either typing the lines below or copying and pasting the lines into your script
10 + 5
10 - 5
10 * 5
10 / 5
  • Remember, to run the lines, highlight your code and click the “Run” button on the toolbar of the script panel

Creating a variable

  • A variable is a symbol that can take many different values
  • To create a variable in R we use =
  • On the right we've created the variables x and y by assigning some numbers to them
x = 10
y = 5
x + y
[1] 15

(Above, the top panel is what you run in your script, the bottom panel is the output)

Creating a variable

In RStudio, you will see the variables we created in the top right panel variables

Creating a variable

  • If you've already created a variable, you can replace the value with another value
x
[1] 10
x = 20
x
[1] 20

Creating a variable

In the top right panel you can see that the number stored in the variable x has changed variables2

Variable types

R has three main variable types

Type Description Examples
character letters and words "z", "red", "H2O"
numeric numbers 1, 3.14, log(10)
logical binary TRUE, FALSE

Grouping Data

There are several ways to group data to make them easier to work with:

  • Vectors - contain multiple values of the same type (e.g., all numbers or all words)
  • Lists - contain multiple values of different types (e.g., some numbers and some words)
  • Matrix - a table, like a spreadsheet, with only one data type
  • Data Frames - Like a matrix, but you can mix data types

Vectors

  • Vectors are variables with an ordered set of values
  • They contain only one type of data (numeric, character, or logical)
  • We use c( ) as a container for vector elements
x = c(1, 2, 3, 4, 5)
x
[1] 1 2 3 4 5

Lists

  • Lists are like vectors but can contain any mix of data types
  • We use list() as a container for list items
x = list("Benzene", 1.3, TRUE)
x
[[1]]
[1] "Benzene"

[[2]]
[1] 1.3

[[3]]
[1] TRUE

Data frames

  • Data frames are spreadsheet-like tables in R
  • We use data.frame() as a container for many vectors of the same length
student = c("Bob", "Thomas", "Cory")
score = c(90, 15, 6)
pass = c(TRUE, FALSE, FALSE)
my.data = data.frame(student, score, pass)
my.data
  student score  pass
1     Bob    90  TRUE
2  Thomas    15 FALSE
3    Cory     6 FALSE

Functions

  • Functions are a way to repeat the same task on different data
  • R has many built-in functions that perform common tasks
x = c(4, 8, 1, 14, 34)
mean(x) # Calculate the mean of the data set
[1] 12.2
y = c(1, 4, 3, 5, 10)
mean(y) # Mean of a different data set
[1] 4.6

Note on commenting

  • To write a comment in your script that will not be evaluated, type # in front of your comment
  • The text after# will not be evaluated
  • Run all of the code below and see what gets returned in the R console (bottom left panel in RStudio)
# Full line comment
x # partial line comment
"new line"

Functions

  • Back to functions: they all have the form function()
  • function is the name, which usually gives you a clue about what it does
  • () is where you put your data or indicate options
  • To see what goes inside (), type a question mark in front of the function and run it
?mean()

Functions

In RStudio, you will see the help page for mean() in the bottom right corner help page

Functions

  • On the help page, under Usage, you see mean(x, ...)
  • This means that the only necessary thing that has to go into () is x
  • On the help page under Arguments you will find a description of what x needs to be
  • (For most purposes, you will want the x in the mean function to be a numeric vector)

Plotting

  • Another example of a function is plot()
  • At a minimum it takes two arguments, plot(x, y)
  • x is a numeric vector that will be the x-axis coordinates of the plot
  • y is a numeric vector (of the same length as x) that will be the y-axis coordinates of the plot

Plotting

score = c(1.3, 4.5, 2.6, 3.4, 6.4)
day = c(1, 2, 3, 4, 5)
plot(x = day, y = score)

plot of chunk unnamed-chunk-12

Plotting

score = c(1.3, 3.5, 2.6, 3.4, 6.4)
day = c(1, 2, 3, 4, 5)
op = par()
par(oma = c(0, 0.5, 0, 0))
plot(x = day, y = score, cex.axis = 2, cex.lab = 2, cex = 2)

plot of chunk unnamed-chunk-13

par(op)

Using R packages

Using R packages

  • R comes with basic functionality, meaning that some functions will always be available when you start an R session
  • However, anyone can write functions for R that are not part of the base functionality and make it available to other R users in a package
  • Packages must be installed first then loaded before using it
  • This is similar to a mobile app: you must first install the R package (like first downloading an app) then you must load the package before using its functions (like opening an app to use it)

Using R packages

  • For example, lets say that R doesn't have a function you need
  • The best way to find out if another R package does have that function is to ask Google
  • Use a search with key words describing what you want the function to do and just add “R package” to the end

Using R packages

  • It's not available because we need to install the package first (again, like initially downloading an app)
  • In the bottom right panel of RStudio, click on the “Packages” tab then click “Install Packages” in the tool bar packages

Using R packages

  • A window will pop up
  • Start typing the name of the package into the “Packages” box, select that package, and click “Install” packages2

Using R packages

  • Now that we've installed the package, we still can't use the function we want
  • We've got to load the package first (opening the app)
  • For this, we will use the library() function to load “descr” package (for example)
library("descr")

Using R packages

  • Remember, when you close down RStudio, then start it up again, you don't have to download the package again
  • But you do have to load the package to use any function that's not in the R core functionality (this is very easy to forget)

Data exploration

About the data

  • Use the code below to obtain the data
data(airquality)
head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

About the data

  • airquality is a data frame with ozone readings from a monitor in New York
  • What column names are in the data frame?
colnames(airquality)
[1] "Ozone"   "Solar.R" "Wind"    "Temp"    "Month"   "Day"    

About the data

  • How many observations does the dataset contain?
  • We use the nrow() function to get the number of rows
nrow(airquality)
[1] 153

Viewing the data

RStudio has a special function called View() that makes it easier to look at data in a data frame

View(airquality)

Working with data frames

  • You can refer to specific columns in a data frame with the $ operator
  • Use this to feed specific columns into a function
mean(airquality$Temp) # Calculate the mean temperature
[1] 77.88

Scatter Plots

Take a look at the data using plot(x, y)

plot(airquality$Temp, airquality$Ozone)

reading data from websites

  • Reading .R file
source("http://www.openintro.org/stat/data/cdc.R")
##data frame called cdc is loaded.
names(cdc)
[1] "genhlth"  "exerany"  "hlthplan" "smoke100" "height"   "weight"  
[7] "wtdesire" "age"      "gender"  
head(cdc)
    genhlth exerany hlthplan smoke100 height weight wtdesire age gender
1      good       0        1        0     70    175      175  77      m
2      good       0        1        1     64    125      115  33      f
3      good       1        1        1     60    105      105  49      f
4      good       1        1        0     66    132      124  42      f
5 very good       0        1        0     61    150      130  55      f
6 very good       1        1        0     64    114      114  55      f