R Programming Language & Rstudio.
Lets get you started by installing R and Rstudio.
We will explore how R works and the key windows in R including the console, editor and graphical windows.
We will also look at how to create variables, data-sets, and import and manipulate data in the R enviorment.
R is a free software environment for statistical computing and graphics.
It is what’s known as ‘open source’, the people who developed R allow everyone to access their code.
In essence, R exists as a base package with a reasonable amount of functionality.
The main advantages of using R are that it is free, and it is a versatile and dynamic environment.
R’s written commands are a much more efficient way to work.
To download the software you can use the following links & instructions:
R Project for Statistical Computing
Pick the server that is closest to you:
When your downloading packages it will attempt to download from the set CRAN.
A closer location usually helps with a successful connection.
After installing R, install an IDE (Integrated Development Environment) to make your programming faster and more convenient.
My suggestion is Rstudio:
The console is the part that you can capture your commands in, but note nothing will be recorded for later reference.
It should be used for one task commands that may not be required to be recorded.
It is also where you see your input and output captured, if a reporting method is not being utilized.
Example of the storing a variable from the console.
Lets try to create a list including some numbers and store them in our R environment for some use:
c()
function in your console.## [1] 1 2 3 4 5
Several mathematical equations are predefined in R such as sqrt()
, log()
, exp()
, abs()
.
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
## [1] 2.718282 7.389056 20.085537 54.598150 148.413159
## [1] 1 2 3 4 5
?(sqrt)
or help(sqrt)
Numeric
Integers/Numbers:
They can easily be stored in your environment under any variable name you’d like.
## num 1
Strings
Characters/Letters:
Strings are necessary to understand because many categorical variables are recorded in text format.
## chr "a"
encoding variables/factors
Uses numbers to represent different groups of data.
## [1] male female male male female
## Levels: female male
Date (POSIxCT)
Dates
## Date[1:1], format: "2020-01-22"
To define a vector you can use this command: c()
function.
Here, you are creating a set of values using a simple c()
function, for combining values into a vector/list.
## [1] 1 2 3 4 5
Other methods of sequence definitions include (using a : or a seq() function)
Note, it is good to know multiple ways to make vectors/lists.
## [1] 1 2 3 4 5
## [1] 1 2 3 4 5
We can repeat values several times using rep()
function.
## [1] 1 1 1 1 1
We can also define matrices.
The nrow
argument indicates the numbers of rows of the matrix and byrow
orders the numbers row wise.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [1] 1 3 5 7 9 11
We can address any index of \(y\) using [] with the variable name.
## [1] 5
Logical statements can be used to extract specific values from the variables.
For example:
## [1] 1
So far, we have seen values, vectors, and matrices stored to variable names within our environment.
Data is many times already in a tabular format for us to analyze.
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
However, datasets can have different types of structures.
Two main types of data structures exist:
Homogeneous
Heterogeneous
Vectors
## [1] 1 2 3 4 5
Matrix
## [,1]
## [1,] 2
## [2,] 3
## [3,] 4
## [4,] 5
## [5,] 6
List
A vector and a list are very similar, however a list is dynamic and a vector contains only numerical value.
Lists are sometimes called recursive vectors, because a list can contain other lists.
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] "a"
##
## [[3]]
## [1] TRUE FALSE TRUE
##
## [[4]]
## [1] 2.3 5.9
Data frames
## x y
## 1 1 a
## 2 2 b
## 3 3 c
The data frame is actually a matrix and list.
So we can actually pull lists from the table and all that is needed is the ‘$’.
## [1] 1 2 3
## [1] 1 2
You can combine data frames using cbind
and rbind
:
When combining column-wise, the number of rows must match, but row names are ignored.
When combining row-wise, both the number and names of columns must match.
For Example:
## x y z
## 1 1 a 3
## 2 2 b 2
## 3 3 c 1
For Example:
## x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 10 z
plyr::rbind.fill
to combine data frames that don’t have the same columns.Many useful packages can be used to access data either built into R or importing data externally.
To install packages into R use the following sequence with the name of the desired package:
Select ‘Packages’ from the (Lower-Right) window.
You should be able to see a list of all current packages installed in the Environment.
To install a new one select the ‘install’ feature and a window will populate.
Recommended to check off install dependencies
R has built in data-sets that can be easily obtained using the package datasets.
Install the package and take a look at only the head of all the available data-sets:
## [1] "ability.cov" "airmiles" "AirPassengers" "airquality"
## [5] "anscombe" "attenu"
## [1] "ability.cov" "airmiles"
## [3] "AirPassengers" "airquality"
## [5] "anscombe" "attenu"
## [7] "attitude" "austres"
## [9] "beaver1" "beaver2"
## [11] "BJsales" "BJsales.lead"
## [13] "BOD" "cars"
## [15] "ChickWeight" "chickwts"
## [17] "co2" "CO2"
## [19] "crimtab" "discoveries"
## [21] "DNase" "esoph"
## [23] "euro" "euro.cross"
## [25] "eurodist" "EuStockMarkets"
## [27] "faithful" "fdeaths"
## [29] "Formaldehyde" "freeny"
## [31] "freeny.x" "freeny.y"
## [33] "HairEyeColor" "Harman23.cor"
## [35] "Harman74.cor" "Indometh"
## [37] "infert" "InsectSprays"
## [39] "iris" "iris3"
## [41] "islands" "JohnsonJohnson"
## [43] "LakeHuron" "ldeaths"
## [45] "lh" "LifeCycleSavings"
## [47] "Loblolly" "longley"
## [49] "lynx" "mdeaths"
## [51] "morley" "mtcars"
## [53] "nhtemp" "Nile"
## [55] "nottem" "npk"
## [57] "occupationalStatus" "Orange"
## [59] "OrchardSprays" "PlantGrowth"
## [61] "precip" "presidents"
## [63] "pressure" "Puromycin"
## [65] "quakes" "randu"
## [67] "rivers" "rock"
## [69] "Seatbelts" "sleep"
## [71] "stack.loss" "stack.x"
## [73] "stackloss" "state.abb"
## [75] "state.area" "state.center"
## [77] "state.division" "state.name"
## [79] "state.region" "state.x77"
## [81] "sunspot.month" "sunspot.year"
## [83] "sunspots" "swiss"
## [85] "Theoph" "Titanic"
## [87] "ToothGrowth" "treering"
## [89] "trees" "UCBAdmissions"
## [91] "UKDriverDeaths" "UKgas"
## [93] "USAccDeaths" "USArrests"
## [95] "UScitiesD" "USJudgeRatings"
## [97] "USPersonalExpenditure" "uspop"
## [99] "VADeaths" "volcano"
## [101] "warpbreaks" "women"
## [103] "WorldPhones" "WWWusage"
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
There are ‘two methods to import data’ into the R environment including:
You should note that the icon method is not as efficient as a command line because it sometimes changes the values properties.
Once you select the type of file to import, a window will populate to browse for the file location.
The window provides a section to browse for a data file locally on your machine and set various parameter controls.
Many types of files can be imported into R, such as xls, xlsx, csv, sav, etc.
Lets start by installing the necessary packages for each type of file format.
#Microsoft Excel Files...
#install.packages('readxl')
library(readxl)
Texting <- read_excel("./data/Texting.xlsx")
str(Texting)
## Classes 'tbl_df', 'tbl' and 'data.frame': 50 obs. of 3 variables:
## $ Group : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Baseline : num 52 68 85 47 73 57 63 50 66 60 ...
## $ Six_months: num 32 48 62 16 63 53 59 58 59 57 ...
#CSV files...
#install.packages('readr')
library(readr)
hiccups = read_csv("./data/Hiccups.csv")
str(hiccups)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 15 obs. of 4 variables:
## $ Baseline: num 15 13 9 7 11 14 20 9 17 19 ...
## $ Tongue : num 9 18 17 15 18 8 3 16 10 10 ...
## $ Carotid : num 7 7 5 10 7 10 7 12 9 8 ...
## $ Other : num 2 4 4 5 4 3 3 3 4 4 ...
## - attr(*, "spec")=
## .. cols(
## .. Baseline = col_double(),
## .. Tongue = col_double(),
## .. Carotid = col_double(),
## .. Other = col_double()
## .. )
#SAV Files...
#install.packages('haven')
library(haven)
chickflick = read_spss("./data/ChickFlick.sav")
str(chickflick)
## Classes 'tbl_df', 'tbl' and 'data.frame': 40 obs. of 3 variables:
## $ gender : 'haven_labelled' num 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "Gender of Participant"
## ..- attr(*, "format.spss")= chr "F8.0"
## ..- attr(*, "labels")= Named num 1 2
## .. ..- attr(*, "names")= chr "Male" "Female"
## $ film : 'haven_labelled' num 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "Name of Film"
## ..- attr(*, "format.spss")= chr "F8.0"
## ..- attr(*, "display_width")= int 18
## ..- attr(*, "labels")= Named num 1 2
## .. ..- attr(*, "names")= chr "Bridget Jones's Diary" "Memento"
## $ arousal: num 22 13 16 10 18 24 13 14 19 23 ...
## ..- attr(*, "label")= chr "Psychological Arousal During the Film"
## ..- attr(*, "format.spss")= chr "F8.0"
The console in Rstudio is nice to write one line commands to execute but none of the code will be saved.
An easy method to start saving your code is the ‘Rscript’.
Its the editor window where you create your programs for any of your analyses.
When writing Rscripts you will want to import data and store data from an analysis.
Setting the working directory, the location to import or export files from.
This is easily accomplished using the setwd() function.
In R you can build your own functions that can perform a defined task with provided elements.
## [1] 13
for loops are great to perform tasks repeatedly.
## [1] "The year is 2010"
## [1] "The year is 2011"
## [1] "The year is 2012"
## [1] "The year is 2013"
## [1] "The year is 2014"
## [1] "The year is 2015"
if (!require("revealjs")) {
install.packages("revealjs")
library(revealjs)
}
if (!require("knitr")) {
install.packages("knitr")
library(knitr)
}
if (!require("datasets")) {
install.packages("datasets")
library(datasets)
}
if (!require("readxl")) {
install.packages("readxl")
library(readxl)
}
if (!require("readr")) {
install.packages("readr")
library(readr)
}
if (!require("haven")) {
install.packages("haven")
library(haven)
}