R Learning

Harrisburg University

2020-01-22

Introduction to R

  • R Programming Language & Rstudio.

    • Lets get you started by installing R and Rstudio.

      • We will explore how R works and the key windows in R including the console, editor and graphical windows.

      • We will also look at how to create variables, data-sets, and import and manipulate data in the R enviorment.

What is R?

  • R is a free software environment for statistical computing and graphics.

  • It is what’s known as ‘open source’, the people who developed R allow everyone to access their code.

  • In essence, R exists as a base package with a reasonable amount of functionality.

           Think of it as “A Giant T-I85 Calculator”

Why use R?

  • The main advantages of using R are that it is free, and it is a versatile and dynamic environment.

  • R’s written commands are a much more efficient way to work.

Install R Programming Language

  • To download the software you can use the following links & instructions:

Install R Programming Language

  • The screen should look like this!

Select a Server from CRAN

  • Pick the server that is closest to you:

    • When your downloading packages it will attempt to download from the set CRAN.

    • A closer location usually helps with a successful connection.

Select a Server from CRAN

  • The screen should look like this!

Choose the Proper OS

  • Choose the proper OS for installing the software:

Installation of R-studio

Installation of R-studio

  • The screen should look like this!

R-studio - The Free Version

  • Install the free version of the software:

R-studio - The Console

The Console’s Purpose

  • The console is the part that you can capture your commands in, but note nothing will be recorded for later reference.

    • It should be used for one task commands that may not be required to be recorded.

    • It is also where you see your input and output captured, if a reporting method is not being utilized.

The Console’s Purpose

  • Example: To assign a value to a variable you can use either of the following symbols = or <- directly inputted into the console.
y = 3
y <- 3
  • Note, this will store the variable in your R environment.

The Console’s Purpose

Example of the storing a variable from the console.

R-studio - The Environment Window

  • You can see your variables and the values that have been stored/assigned:

Predefined Functions in R

  • Lets try to create a list including some numbers and store them in our R environment for some use:

    • Use the c() function in your console.
list = c(1,2,3,4,5)
list
## [1] 1 2 3 4 5

Predefined Functions in R

  • Several mathematical equations are predefined in R such as sqrt(), log(), exp(), abs().

    • For Example:
sqrt(list)
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068

Predefined Functions in R

  • More Examples:
log(list)
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
exp(list)
## [1]   2.718282   7.389056  20.085537  54.598150 148.413159
abs(list)
## [1] 1 2 3 4 5

Help with Packages and Functions

  • To see how to use any function in R you can type either: ?(sqrt) or help(sqrt)

Basics of R - Variable Types

  • Numeric

    • Integers/Numbers:

      • (e.g. 7, 0, 120)
  • They can easily be stored in your environment under any variable name you’d like.

    • Note, variable names cannot start with numbers.
x = 1
str(x)
##  num 1

Basics of R - Variable Types

  • Strings

    • Characters/Letters:

      • (e.g. ‘Andy’, ‘Idiot’)
  • Strings are necessary to understand because many categorical variables are recorded in text format.

x = "a"
str(x)
##  chr "a"

Basics of R - Variable Types

  • encoding variables/factors

    • Uses numbers to represent different groups of data.

      • (e.g., Gender)
list = c("male", "female", "male", "male", "female")
x = as.factor(list)
print(x)
## [1] male   female male   male   female
## Levels: female male

Basics of R - Variable Types

  • Date (POSIxCT)

    • Dates

      • (e.g., 21-06-1973, 06-21-73, 21-Jun-1973)
x = as.Date(Sys.Date())
str(x)
##  Date[1:1], format: "2020-01-22"

Basics of R - Defining Variables

  • To define a vector you can use this command: c() function.

  • Here, you are creating a set of values using a simple c() function, for combining values into a vector/list.

x = c(1,2,3,4,5)
  • By typing the name of variable, you can see the values assigned to it:
print(x)
## [1] 1 2 3 4 5

Basics of R - Defining Variables

  • Other methods of sequence definitions include (using a : or a seq() function)

    • Note, it is good to know multiple ways to make vectors/lists.

      • Each one has it’s strengths for different scenarios.

Basics of R - Defining Variables

  • The : method
x = 1:5
print(x)
## [1] 1 2 3 4 5
  • The seq() method
y = seq(from=1, to= 5, by= 1)
print(y)
## [1] 1 2 3 4 5

Basics of R - Defining Variables

  • We can repeat values several times using rep() function.

    • For example: below you can see the number 1 is repeated 5 times.
rep(1, times = 5)
## [1] 1 1 1 1 1

Basics of R - Defining Variables

  • We can also define matrices.

  • The nrow argument indicates the numbers of rows of the matrix and byrow orders the numbers row wise.

x <- matrix(1:9, nrow = 3, byrow = TRUE)
print(x)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Basics of R - Defining and Indexing Variables

  • Let’s us define \(y\) as a list using seq() function and store it into our environment:
y = seq(from = 1, to = 11, by = 2)
print(y)
## [1]  1  3  5  7  9 11

Basics of R - Defining and Indexing Variables

  • We can address any index of \(y\) using [] with the variable name.

    • For example: to read the third value of vector \(y\), we can write the following:
y[3]
## [1] 5

Basics of R - Indexing Variables

  • Logical statements can be used to extract specific values from the variables.

    • For example:

      • If we want to see all the values of \(y\) which are less than 8 we can write the following:
y[y<3]
## [1] 1

Basics of R - Data Structures

  • So far, we have seen values, vectors, and matrices stored to variable names within our environment.

  • Data is many times already in a tabular format for us to analyze.

Basics of R - Data Structures

  • For Example:
library(datasets)
data(iris)
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Basics of R - Data Structures

  • However, datasets can have different types of structures.

    • Each is unique in its own form and handles different when programming.
  • Two main types of data structures exist:

    • Homogeneous

      • &
    • Heterogeneous

Basics of R - Homogeneous Data Structures

  • Vectors

    • We see that there are several ways to create vectors in R.
list = c(1,2,3,4,5)
print(list)
## [1] 1 2 3 4 5

Basics of R - Homogeneous Data Structures

  • Matrix

    • The matrix function is a nice method to structure any size matrix you require.
matrix = matrix(2:6, nrow = 5, byrow = FALSE)
print(matrix)
##      [,1]
## [1,]    2
## [2,]    3
## [3,]    4
## [4,]    5
## [5,]    6

Basics of R - Heterogeneous Data Structures

  • List

    • A vector and a list are very similar, however a list is dynamic and a vector contains only numerical value.

    • Lists are sometimes called recursive vectors, because a list can contain other lists.

      • This makes them fundamentally different from vectors.

Basics of R - Heterogeneous Data Structures

  • For Example:
x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
print(x)
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1]  TRUE FALSE  TRUE
## 
## [[4]]
## [1] 2.3 5.9

Basics of R - Heterogeneous Data Structures

  • Data frames

    • They are the most important structure for storing data in R.
    • It combines the power of lists and matrices to make a structure suited for any statistical need.

Basics of R - Heterogeneous Data Structures

  • For Example:
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
print(df)
##   x y
## 1 1 a
## 2 2 b
## 3 3 c

Basics of R - Heterogeneous Data Structures

  • The data frame is actually a matrix and list.

  • So we can actually pull lists from the table and all that is needed is the ‘$’.

df$x
## [1] 1 2 3

Basics of R - Heterogeneous Data Structures

  • A data frame also can be indexed like a matrix:
df[1:2,1]
## [1] 1 2

Basics in R: Combining Data Frames

  • You can combine data frames using cbind and rbind:

  • When combining column-wise, the number of rows must match, but row names are ignored.

  • When combining row-wise, both the number and names of columns must match.

Basics in R: Combining Data Frames

  • For Example:

    • By column. Must be equal number of observations.
cbind(df, data.frame(z = 3:1))
##   x y z
## 1 1 a 3
## 2 2 b 2
## 3 3 c 1

Basics in R: Combining Data Frames

  • For Example:

    • By row.
rbind(df, data.frame(x = 10, y = "z"))
##    x y
## 1  1 a
## 2  2 b
## 3  3 c
## 4 10 z
  • Use plyr::rbind.fill to combine data frames that don’t have the same columns.

Basics in R: Installing Packages

  • Many useful packages can be used to access data either built into R or importing data externally.

  • To install packages into R use the following sequence with the name of the desired package:

    • `install.packages’

Package Installation - R studio Package Tool - (Lower-Right)

  • Select ‘Packages’ from the (Lower-Right) window.

  • You should be able to see a list of all current packages installed in the Environment.

Package Installation - R studio Package Tool - (Lower-Right)

  • Your screen should look like this!

Package Installation - R studio Package Tool - (Lower-Right)

  • To install a new one select the ‘install’ feature and a window will populate.

    • Simply type the package you want to install and click install.
  • Recommended to check off install dependencies

Package Installation - R studio Package Tool - (Lower-Right)

  • Your screen should look like this!

Basics in R: Accessing Data Built into R

  • R has built in data-sets that can be easily obtained using the package datasets.

  • Install the package and take a look at only the head of all the available data-sets:

#install.packages("datasets")
head(ls("package:datasets"))
## [1] "ability.cov"   "airmiles"      "AirPassengers" "airquality"   
## [5] "anscombe"      "attenu"

Basics in R: Accessing Data Built into R

  • All the avaiable datasets:
#install.packages("datasets")
ls("package:datasets")
##   [1] "ability.cov"           "airmiles"             
##   [3] "AirPassengers"         "airquality"           
##   [5] "anscombe"              "attenu"               
##   [7] "attitude"              "austres"              
##   [9] "beaver1"               "beaver2"              
##  [11] "BJsales"               "BJsales.lead"         
##  [13] "BOD"                   "cars"                 
##  [15] "ChickWeight"           "chickwts"             
##  [17] "co2"                   "CO2"                  
##  [19] "crimtab"               "discoveries"          
##  [21] "DNase"                 "esoph"                
##  [23] "euro"                  "euro.cross"           
##  [25] "eurodist"              "EuStockMarkets"       
##  [27] "faithful"              "fdeaths"              
##  [29] "Formaldehyde"          "freeny"               
##  [31] "freeny.x"              "freeny.y"             
##  [33] "HairEyeColor"          "Harman23.cor"         
##  [35] "Harman74.cor"          "Indometh"             
##  [37] "infert"                "InsectSprays"         
##  [39] "iris"                  "iris3"                
##  [41] "islands"               "JohnsonJohnson"       
##  [43] "LakeHuron"             "ldeaths"              
##  [45] "lh"                    "LifeCycleSavings"     
##  [47] "Loblolly"              "longley"              
##  [49] "lynx"                  "mdeaths"              
##  [51] "morley"                "mtcars"               
##  [53] "nhtemp"                "Nile"                 
##  [55] "nottem"                "npk"                  
##  [57] "occupationalStatus"    "Orange"               
##  [59] "OrchardSprays"         "PlantGrowth"          
##  [61] "precip"                "presidents"           
##  [63] "pressure"              "Puromycin"            
##  [65] "quakes"                "randu"                
##  [67] "rivers"                "rock"                 
##  [69] "Seatbelts"             "sleep"                
##  [71] "stack.loss"            "stack.x"              
##  [73] "stackloss"             "state.abb"            
##  [75] "state.area"            "state.center"         
##  [77] "state.division"        "state.name"           
##  [79] "state.region"          "state.x77"            
##  [81] "sunspot.month"         "sunspot.year"         
##  [83] "sunspots"              "swiss"                
##  [85] "Theoph"                "Titanic"              
##  [87] "ToothGrowth"           "treering"             
##  [89] "trees"                 "UCBAdmissions"        
##  [91] "UKDriverDeaths"        "UKgas"                
##  [93] "USAccDeaths"           "USArrests"            
##  [95] "UScitiesD"             "USJudgeRatings"       
##  [97] "USPersonalExpenditure" "uspop"                
##  [99] "VADeaths"              "volcano"              
## [101] "warpbreaks"            "women"                
## [103] "WorldPhones"           "WWWusage"

Basics in R: Accessing Data Built into R

  • Access a data-set looking at the top five rows only:
library(datasets)
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Basics in R: Accessing Data Built into R

  • Than access the bottom five rows of the data:
tail(mtcars)
##                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

Importing Data Into R - The Two Ways

  • There are ‘two methods to import data’ into the R environment including:

    1. Command line
    2. Rstudio icon ‘import Dataset’ under the Enviorment tab in the ‘(top-right corner)’.
  • You should note that the icon method is not as efficient as a command line because it sometimes changes the values properties.

Importing Data Into R - The Icon Method

  • Click the ‘import Dataset’ button and selecting the file type from the following drop down.

Importing Data Into R - The Icon Method

  • Once you select the type of file to import, a window will populate to browse for the file location.

  • The window provides a section to browse for a data file locally on your machine and set various parameter controls.

Importing Data Into R - The Icon Method

  • The screen should look like this!

Importing Data into R - Line Command

  • Many types of files can be imported into R, such as xls, xlsx, csv, sav, etc.

    • Each of these require a different function to properly import the data into the R enviorment.
  • Lets start by installing the necessary packages for each type of file format.

Importing XLSX Files…

#Microsoft Excel Files...
#install.packages('readxl')
library(readxl)
Texting <- read_excel("./data/Texting.xlsx")
str(Texting)
## Classes 'tbl_df', 'tbl' and 'data.frame':    50 obs. of  3 variables:
##  $ Group     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Baseline  : num  52 68 85 47 73 57 63 50 66 60 ...
##  $ Six_months: num  32 48 62 16 63 53 59 58 59 57 ...

Importing CSV Files…

#CSV files...
#install.packages('readr')
library(readr)
hiccups = read_csv("./data/Hiccups.csv")
str(hiccups)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 15 obs. of  4 variables:
##  $ Baseline: num  15 13 9 7 11 14 20 9 17 19 ...
##  $ Tongue  : num  9 18 17 15 18 8 3 16 10 10 ...
##  $ Carotid : num  7 7 5 10 7 10 7 12 9 8 ...
##  $ Other   : num  2 4 4 5 4 3 3 3 4 4 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Baseline = col_double(),
##   ..   Tongue = col_double(),
##   ..   Carotid = col_double(),
##   ..   Other = col_double()
##   .. )

Importing SAV Files…

#SAV Files...
#install.packages('haven')
library(haven)
chickflick = read_spss("./data/ChickFlick.sav")
str(chickflick)
## Classes 'tbl_df', 'tbl' and 'data.frame':    40 obs. of  3 variables:
##  $ gender : 'haven_labelled' num  1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "label")= chr "Gender of Participant"
##   ..- attr(*, "format.spss")= chr "F8.0"
##   ..- attr(*, "labels")= Named num  1 2
##   .. ..- attr(*, "names")= chr  "Male" "Female"
##  $ film   : 'haven_labelled' num  1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "label")= chr "Name of Film"
##   ..- attr(*, "format.spss")= chr "F8.0"
##   ..- attr(*, "display_width")= int 18
##   ..- attr(*, "labels")= Named num  1 2
##   .. ..- attr(*, "names")= chr  "Bridget Jones's Diary" "Memento"
##  $ arousal: num  22 13 16 10 18 24 13 14 19 23 ...
##   ..- attr(*, "label")= chr "Psychological Arousal During the Film"
##   ..- attr(*, "format.spss")= chr "F8.0"

Writing R Scripts

  • The console in Rstudio is nice to write one line commands to execute but none of the code will be saved.

  • An easy method to start saving your code is the ‘Rscript’.

  • Its the editor window where you create your programs for any of your analyses.

    • Store .r files in your directory or project directory.

Access an Rscript as followed:

Setting a Working Directory

  • When writing Rscripts you will want to import data and store data from an analysis.

  • Setting the working directory, the location to import or export files from.

  • This is easily accomplished using the setwd() function.

    • For Example:
setwd("data/")

Defining Functions in R

  • In R you can build your own functions that can perform a defined task with provided elements.

    • For Example:
sum.of.squares <- function(x,y) {
  x^2 + y^2
}
sum.of.squares(2,3)
## [1] 13

Defining for Loops in R

  • for loops are great to perform tasks repeatedly.

    • For Example:
for (i in 2010:2015){
  print(paste("The year is", i))
}
## [1] "The year is 2010"
## [1] "The year is 2011"
## [1] "The year is 2012"
## [1] "The year is 2013"
## [1] "The year is 2014"
## [1] "The year is 2015"

Packages to Install

if (!require("revealjs")) {
  install.packages("revealjs")
  library(revealjs)
}
if (!require("knitr")) {
  install.packages("knitr")
  library(knitr)
}
if (!require("datasets")) {
  install.packages("datasets")
  library(datasets)
}
if (!require("readxl")) {
  install.packages("readxl")
  library(readxl)
}
if (!require("readr")) {
  install.packages("readr")
  library(readr)
}
if (!require("haven")) {
  install.packages("haven")
  library(haven)
}