About

Setup




Inserting Chunks of Code

Rnotebook chunks can be inserted by:

  • Windows keyboard shortcut: Ctrl + Alt + I

  • Mac Keyboard: cmd + option + I

  • It will always begin with they ‘’’ symbols followed by ‘{r}’ and ending with the ‘’’ symbols.

  • The code goes after the ‘{r}’ parameter and before the second ‘’’ symbols.

  • Typically in an R Notebook you should display one output per chunk. If there is more than one output, it is best to split the code in two different chunks.




Executing Code Chunks

In order to run the code, you have several options.

  • Windows Keyboard: (Ctrl + Shift + Enter ) to run only one line ( Ctrl + Enter )

  • Mac Keyboard: cmd + Shift + Enter) to run only one line ( Cmd + Enter )

  • After writing R code inside the chunk, reading a file in the example below. We can run the chunk by clicking on the green arrow button () located on the right corner of the chunk.

If clicked the code will run. The result of the code should appear below the chunk.

Finall to run previous/next/all chunks you can use the short cut (Ctrl + Alt + R)




Errors

  • When coding, there are often small errors that are made. With R Notebooks, the message will display exactly what line your error is on, which allows you to fix it.

  • In the example below, we see that the error is made because we forgot to set the working directory.

  • This is a very common mistake and as a programmer, we will learn to identify these errors quickly.


Inserting Comments

  • It is important to leave comments within your code to understand what you are doing when you refer to it later.

  • A comment is made in the grey code chunk box by putting a “#” symbol in front of text.


Inserting Images

To add a picture, use the following code inside a code chunk: knitr::include_graphics("/path/to/image.png")

Below is an example of how this should look.

knitr::include_graphics("imgs/img11.jpg")


Publishing an RNotebook

  • After completing a lab, you will be submitting a URL on to Sakai for evaluation.

  • Create and account on http://rpubs.com

  • After creating an RNotebook is important to publish your work on RPubs.

  • At the top left, there is a button that says “Knit”. From the drop down menu select “Knit to HTML”.




A pop-up window asking to install required packages, may pop-up.


From the pop-up HTML document locate the “Publish” button on the top right corner of the document.


A pop-up window asking to install required packages, may pop-up.

It is recommended that the notebook title and URL have the same name.

Follow the directions to “Create an Account” if necessary. After doing so, you will be lead to “Step 2 of 2”.



knitr::include_graphics("imgs/img14.gif")



Five steps to publish an Rnotebook to rpubs.com



Introduction to R commands

First we will begin with a few basic operations.

x = 128
y = 16
z <- 5
vars = c(2,4,8,16,32) # This is a vector created using the generic combine function 'c'
x # display value of variable x
## [1] 128
z # displays value of variable z
## [1] 5
vars[1] #This calls the first value in the vector vars
## [1] 2
vars[2] #This calls the second value in the vector vars
## [1] 4
vars[1:3] #This calls the first through third values in the vector vars
## [1] 2 4 8
vars #This calls the vector 
## [1]  2  4  8 16 32


Common Arithmetic Operations

Below shows some simple arithmetic operations.

12*6
## [1] 72
128/16
## [1] 8
9^2
## [1] 81

Basic Data Types

R works with numerous data types. Some of the most basic types are: numeric,integers, logical (Boolean-TRUE/FALSE) and characters (string-"TEXT").

#Type: Character                   
#Example:"TRUE",'23.4'

v = "TRUE"                       
class(v)                           
## [1] "character"
#Type: Numeric                
#Example: 12.3,5

v = 23.5                  
class(v)                   
## [1] "numeric"
#Type: Logical    
#Example: TRUE,FALSE

v = TRUE
class(v)
## [1] "logical"
#Type: Factor (nominal, categorical)
#Example: m f m f m

v = as.factor(c("m", "f", "m"))
class(v)
## [1] "factor"


Functions

R Functions are invoked by its name, followed by the parenthesis, and zero or more arguments.

# The following applies the function 'c' (seen earlier) to combine three numeric values into a vector 
c(1,2,3)
## [1] 1 2 3
# Example of function mean() to calcule the mean of three values
mean(c(5,6,7))
## [1] 6
# Square root of a number
sqrt(99)
## [1] 9.949874


Importing Data and Variable Assignment

# Here we are reading a file of type csv (comma seperated values) typical of many Excel files
il_income = read.csv(file = "data/il_income.csv")
top_il_income = read.csv(file = "data/top_il_income.csv")

Arithmetic Operations with Data

We can extract values from the dataset to perform calculations.

DuPage = top_il_income$per_capita_income[1]
Lake = top_il_income$per_capita_income[2]
DuPage-Lake
## [1] 472
DuPage+Lake
## [1] 77390
(DuPage+Lake)/2
## [1] 38695
# Repeat the above arithmetic operations using instead McHenry and Sangamon counties 


McHenry = top_il_income$per_capita_income[3]
Sangamon = top_il_income$per_capita_income[4]
McHenry-Sangamon
## [1] 59
McHenry+Sangamon
## [1] 66177
(McHenry+Sangamon)/2
## [1] 33088.5

Basic Statistics

mean(il_income$per_capita_income)
## [1] 25164.14
median(il_income$per_capita_income)
## [1] 24808.5
quantile(il_income$per_capita_income)
##       0%      25%      50%      75%     100% 
## 14052.00 22666.00 24808.50 26899.75 38931.00
summary(il_income)
##       rank              county   per_capita_income   population     
##  Min.   :  1.00   Adams    : 1   Min.   :14052     Min.   :   4135  
##  1st Qu.: 26.25   Alexander: 1   1st Qu.:22666     1st Qu.:  14284  
##  Median : 51.50   Bond     : 1   Median :24808     Median :  26610  
##  Mean   : 51.50   Boone    : 1   Mean   :25164     Mean   : 126078  
##  3rd Qu.: 76.75   Brown    : 1   3rd Qu.:26900     3rd Qu.:  53319  
##  Max.   :102.00   Bureau   : 1   Max.   :38931     Max.   :5238216  
##                   (Other)  :96                                      
##      region     
##  Min.   :1.000  
##  1st Qu.:3.000  
##  Median :4.000  
##  Mean   :3.735  
##  3rd Qu.:5.000  
##  Max.   :5.000  
## 
# Repeat the basic statistics here using instead the data from the file top_il_income
mean(top_il_income$per_capita_income)
## [1] 32918.5
median(il_income$per_capita_income)
## [1] 24808.5
quantile(il_income$per_capita_income)
##       0%      25%      50%      75%     100% 
## 14052.00 22666.00 24808.50 26899.75 38931.00


Vectors

# vector of numeric values
c(2, 3, 5, 8)
## [1] 2 3 5 8
# vector of logical values.
c(TRUE, FALSE, TRUE)
## [1]  TRUE FALSE  TRUE
# vector of character strings.
c("A", "B", "B-", "C", "D")
## [1] "A"  "B"  "B-" "C"  "D"


Lists

scores = c(80, 75, 55)  # vector of numeric values                   
grades = c("B", "C", "D-")  # vector of character strings.          
office_hours = c(TRUE, FALSE, FALSE) # vector of logical values.

# student = list(scores,grades,office_hours) # list of vectors
student = list(80, "B",TRUE)
student = list(75, "C", FALSE)
student = list(55, "D-", FALSE)

List Slicing

We can retrieve components of the list with the single square bracket [] operator.

student[1]     
## [[1]]
## [1] 55
student[2]
## [[1]]
## [1] "D-"
student[3]
## [[1]]
## [1] FALSE
# first two components of the list
student[1:2]
## [[1]]
## [1] 55
## 
## [[2]]
## [1] "D-"


Member Reference

Using the double square bracket [[]] operator we can reference a member of the list directly. Using one bracket [] would still reference the list but will not allow you to extract a particular member of the list.

student[[1]] # Components of the Scores Vector
## [1] 55

First element of the Scores vector

student[[1]][1]
## [1] 55

First three elements of the Scores vector

student[[1]][1:3]
## [1] 55 NA NA


Named List Members

It’s possible to assign names to list members and reference them by names instead of by numeric indexes.

student = list(myscores = scores, mygrades = grades , myoffice_hours = office_hours) 

student
## $myscores
## [1] 80 75 55
## 
## $mygrades
## [1] "B"  "C"  "D-"
## 
## $myoffice_hours
## [1]  TRUE FALSE FALSE
student$myscores
## [1] 80 75 55
student$mygrades
## [1] "B"  "C"  "D-"
student$myoffice_hours
## [1]  TRUE FALSE FALSE


Matrices

All columns in a matrix must have the same data type and the same length.

Create a numeric matrix of 5 rows and 4 columns made of sequential numbers 1:20

x_mat = matrix(1:20, nrow=5, ncol=4)

Retrieve the 4th column of matrix

x_mat[1:5,4]
## [1] 16 17 18 19 20

Retrieve the 3rd row of matrix

x_mat[3,1:4]
## [1]  3  8 13 18

Retrieve rows 2,3,4 of columns 1,2,3

x_mat[2:4,1:3]
##      [,1] [,2] [,3]
## [1,]    2    7   12
## [2,]    3    8   13
## [3,]    4    9   14


Data Frames

A data frame is more general than a matrix, in that different columns can have different data types (numeric, character, logic, factor). It is a powerful way to work with mixed data structures.

Defining a Data Frame

When we need to store data in table form, we use data frames, which are created by combining lists of vectors of equal length. The variables of a data set are the columns and the observations are the rows.

The str() function helps us to display the internal structure of any R data structure or object to make sure that it’s correct.

str(il_income)
## 'data.frame':    102 obs. of  5 variables:
##  $ rank             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ county           : Factor w/ 102 levels "Adams","Alexander",..: 16 22 49 99 45 60 101 64 86 10 ...
##  $ per_capita_income: int  30468 38931 38459 30791 30645 23937 24802 30728 23279 26087 ...
##  $ population       : int  5238216 933736 703910 687263 530847 307343 287078 266209 264052 208861 ...
##  $ region           : int  1 2 2 2 2 2 2 5 5 3 ...

Creating a Data Frame

Snapshot of the solar system.

name = c("Earth", "Mars", "Jupiter")
type = c("Terrestrial", "Terrestrial", "Gas giant")
diameter = c(1, 0.532, 11.209)
rotation = c(1, 1.03, 0.41)
rings = c(FALSE, FALSE, TRUE)

Now, by combining the vectors of equal size, we can create a data frame object.

planets_df = data.frame(name,type,diameter,rotation,rings)
planets_df
##      name        type diameter rotation rings
## 1   Earth Terrestrial    1.000     1.00 FALSE
## 2    Mars Terrestrial    0.532     1.03 FALSE
## 3 Jupiter   Gas giant   11.209     0.41  TRUE


Resources

Exercises - Datacamp

Data Sources

Data samples used in this worksheet were downloaded from the U.S. Census Bureau American FactFinder site.

Overall Impact

  • RNotebooks allow to run and edit R code as well as to display the output immediately without have to run the entire sections of code, while creating an interactive document.

  • Please refer to this guide, TA or instructor with other questions.

  • http://rmarkdown.rstudio.com/r_notebooks.html