Week One Introduction

First step is to install R, R Studio, and then R Markdown

R Markdown is one way to break up the code and make it easier to work in R Studio. It also allows you to post results on the internet.

You can write comments in the white space and insert lines of code for R using the insert button and select R. You need to write in between the r code.

If nothing else R can work as a fancy calculator.

To run code in R Markdown, you can hit the play button for each chunck, hit run at the top, or highlight the code and hit command enter if you just want to run some of the code.

5+5

## [1] 10

10/5

## [1] 2

(10*2+6/80)^3

## [1] 8090.338

Most of the time you will want to assign variable so you can work with them later on.

a = 5
b = 10
a

## [1] 5

## [1] 10

a/b

## [1] 0.5

a^b

## [1] 9765625

a*b

## [1] 50

You can assign letters and phrases as well, but you need to use parentheses. To print, the results just rewrite the variable on its own line.

R = "R Rules SPSS Drools!!"
R

## [1] "R Rules SPSS Drools!!"

R has several different types of data that variables can be. We will review integer, double, and factor.

Integers and double are basically the same and contain only numbers; however double accounts for variables with decimals.

Using “c” means concatenate and is one way to combine elements like numbers in R.

integer = as.integer(c(2,4,5,6))
typeof(integer)

## [1] "integer"

double = c(4.5, 6.5, 9, 10)
typeof(double)

## [1] "double"

Factors can either be numbers or words. For example, a gender factor could be male, female, another gender identity, or 0,1,2.

If gender is numbers, you will want to tell R that the gender is a factor by making it a factor using the as.factor and overwriting the variable (or making a new variable).

If your variable is coded as words, you can change the reference level (i.e. the word that is alphabetically first) by using the relevel function and setting a new reference level.

genderNumbers = as.factor(c(0,1,2))
genderNumbers

## [1] 0 1 2
## Levels: 0 1 2

genderWords = as.factor(c("Male", "Female", "Another gender identity"))
genderWords

## [1] Male                    Female                  Another gender identity
## Levels: Another gender identity Female Male

genderWords = relevel(genderWords, ref = "Male")
genderWords

## [1] Male                    Female                  Another gender identity
## Levels: Male Another gender identity Female

There are several different data types in R as well. We will cover vectors, matrices, data frames.

We have mostly been dealing with vectors so far. They are one row of data. You can add them and each element will be added to the corresponding element.

vector_var1 = as.vector(c(2,3,45))
vector_var1

## [1]  2  3 45

vector_var2 = as.vector(c(5,4,3))

vector_var12 = vector_var1+vector_var2; vector_var12

## [1]  7  7 48

We can combine vectors to create matrices. You will need to specify the number of rows and columns. Given that we have two variables there should be two columns and three rows because each vector as three data points. Vectors with differing numbers of rows cannot be combined. To subset the data you can use []

matrix_example = c(1:10); matrix_example

##  [1]  1  2  3  4  5  6  7  8  9 10

matrix_example = matrix(matrix_example, nrow = 5, ncol = 2); matrix_example

##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10

#Rows 
matrix_example[1,]

## [1] 1 6

#Columns
matrix_example[,1]

## [1] 1 2 3 4 5

#Both
matrix_example[1,2]

## [1] 6

The most common data type you all will be working with is a data.frame. Data frames need variable names.

You can use the $ to get the variables, use the matrix notation, or use attach and just use the actual name.

data.frame12 = data.frame(vector_var1,vector_var2)
data.frame12

##   vector_var1 vector_var2
## 1           2           5
## 2           3           4
## 3          45           3

data.frameNames = data.frame(var1 = c(1,2,3), var2 = c(4,5,6))
data.frameNames

##   var1 var2
## 1    1    4
## 2    2    5
## 3    3    6

data.frame12$vector_var1

## [1]  2  3 45

data.frame12[,1]

## [1]  2  3 45

attach(data.frameNames)
var1

## [1] 1 2 3

var2

## [1] 4 5 6

You can also use logical operations like you would in excel.

var1 > var2

## [1] FALSE FALSE FALSE

var1 == var2

## [1] FALSE FALSE FALSE

var2 >= var1

## [1] TRUE TRUE TRUE

The first thing you want to do is set the working directory. This tells R where you want to read in and store data sets. Go to the session, set working directory, then choose the working directory. Then you can copy that path into the code so you don’t have to do that every time.

***** I am working on a mac so make sure you don’t copy and paste the setwd directly from this page and you actually find the specific file path for your computer if you have a PC.

Let’s first export the data set that we have to a csv file because that is the easiest file to work with. We can use the write.csv function to do that. Row names are likely to be false.

Then you can read the csv file using the read.csv function. Most of the time the first row in the dataset will be the variable names, so you will need to set the header to be true. You can also specify which data points are “NA”.

setwd("~/Desktop")
write.csv(data.frameNames, "data.frameNames.csv", row.names = FALSE)
data.frameNames = read.csv("data.frameNames.csv", header = TRUE, na.strings = c("na", " "))
data.frameNames

##   var1 var2
## 1    1    4
## 2    2    5
## 3    3    6

To get some summary statistics we will need some different statistical packages. This means we need to use the install.packages function to install the psych and prettyR packages and then library them.

You can also get summary statistics fairly quickly using summary and or describe for continuous variables and describe.factor for ordinal, categorical, and binary types.

describe.factor only works with a single variable; however, we will learn how to use it to provide counts and percentages for several variables at a time.

#install.packages("psych")
#install.packages("prettyR")
library(psych)
library(prettyR)

## 
## Attaching package: 'prettyR'

## The following objects are masked from 'package:psych':
## 
##     describe, skew

summary(data.frameNames)

##       var1          var2    
##  Min.   :1.0   Min.   :4.0  
##  1st Qu.:1.5   1st Qu.:4.5  
##  Median :2.0   Median :5.0  
##  Mean   :2.0   Mean   :5.0  
##  3rd Qu.:2.5   3rd Qu.:5.5  
##  Max.   :3.0   Max.   :6.0

describe(data.frameNames)

## Description of data.frameNames

## 
##  Numeric 
##      mean median var sd valid.n
## var1    2      2   1  1       3
## var2    5      5   1  1       3

describe.factor(genderWords)

##            
## genderWords     Male Another gender identity   Female
##     Count    1.00000                 1.00000  1.00000
##     Percent 33.33333                33.33333 33.33333

Also, to better understand the packages you can use the help function

help("summary")

Homework: Find a data set that you have access to (let Matt know if you need a data set) and get means and sds for the continuous variables (or all of them). Also if possible get counts and percentages for one categorical variable.