Now we are going to review dates and subset. Most dates are in the format m/d/y. However, R does not understand this format. R needs the format y/m/d. To change the format into the correct form, you can change a date variable by using as.Date and putting in the current format of the date. Then we can subset for the date range that we want.
testDate = c("2/4/2018", "3/4/2018")
testDate
## [1] "2/4/2018" "3/4/2018"
testDate = as.Date(testDate, format = "%m/%d/%Y")
testDate
## [1] "2018-02-04" "2018-03-04"
testDate = subset(testDate, testDate > "2018-02-04")
testDate
## [1] "2018-03-04"
Now we are moving to a slighly more advanced functions ifelse and apply. Apply has other versions lapply, mapply, but we will focus on apply. I think the best way to understand apply is through an example. Let us say that we have a PHQ-9 with nine columns of data and we want to create a total score. Let’s run the data code below to create the fake data set.
Also, if you want to see the first six rows, you can use head(data set name)
ordvar = c(1,2,3,4,5)
set.seed(124)
PHQ9 = data.frame(item1 = sample(ordvar, 100, replace = TRUE), item2 = sample(ordvar, 100, replace = TRUE), item3 = sample(ordvar, 100, replace = TRUE), item4 = sample(ordvar, 100, replace = TRUE), item5 = sample(ordvar, 100, replace = TRUE), item6 = sample(ordvar, 100, replace = TRUE), item7 = sample(ordvar, 100, replace = TRUE), item8 = sample(ordvar, 100, replace = TRUE), item9 = sample(ordvar, 100, replace = TRUE))
head(PHQ9)
## item1 item2 item3 item4 item5 item6 item7 item8 item9
## 1 1 2 2 2 5 4 3 2 3
## 2 3 3 5 4 2 1 2 2 5
## 3 3 3 1 1 2 2 5 1 3
## 4 2 1 5 4 5 4 4 4 3
## 5 2 1 5 2 3 4 5 4 4
## 6 2 5 2 2 1 5 1 3 4
Just like in excel sometimes we want to use an if else statement. If else statements allow us to change data based on some rules. For example, in our data set we may want to create a binary variable from the satisfaction variable where we have all agree (strongly agree and agree) as 1 and all disagrees (strongly disagree and disagree) as zero. We can use an ifelse statement to change the satisfaction variable.
head(PHQ9$item1, 10)
## [1] 1 3 3 2 2 2 3 3 5 2
PHQ9$item1 = ifelse(PHQ9$item1 >=4, 1, 0)
head(PHQ9$item1, 10)
## [1] 0 0 0 0 0 0 0 0 1 0
Now we can use the apply function to sum across the nine rows. First tell R which data set we want it to use, then we say 1, because we want it to sum across the rows (not columns), then we tell it what function we want it to use, which is the sum function. We are creating a new variable PHQ9Total, which we then combine with the original PHQ9 data set giving us a PHQ9Total variable.
PHQ9Total = apply(PHQ9, 1, sum)
head(PHQ9Total)
## [1] 23 24 18 30 28 23
PHQ9 = data.frame(PHQ9, PHQ9Total)
head(PHQ9)
## item1 item2 item3 item4 item5 item6 item7 item8 item9 PHQ9Total
## 1 0 2 2 2 5 4 3 2 3 23
## 2 0 3 5 4 2 1 2 2 5 24
## 3 0 3 1 1 2 2 5 1 3 18
## 4 0 1 5 4 5 4 4 4 3 30
## 5 0 1 5 2 3 4 5 4 4 28
## 6 0 5 2 2 1 5 1 3 4 23
Now starting with simple graphs
genderSamp = c(1,0)
GAD7Samp = c(7:49)
set.seed(123)
datWeekFour = cbind(PHQ9, gender = sample(genderSamp, 100, replace = TRUE), GAD7 = sample(GAD7Samp, 100, replace = TRUE))
head(datWeekFour)
## item1 item2 item3 item4 item5 item6 item7 item8 item9 PHQ9Total gender
## 1 0 2 2 2 5 4 3 2 3 23 1
## 2 0 3 5 4 2 1 2 2 5 24 0
## 3 0 3 1 1 2 2 5 1 3 18 1
## 4 0 1 5 4 5 4 4 4 3 30 0
## 5 0 1 5 2 3 4 5 4 4 28 0
## 6 0 5 2 2 1 5 1 3 4 23 1
## GAD7
## 1 32
## 2 21
## 3 28
## 4 48
## 5 27
## 6 45
Using ggplot2, which is the main graphical tool in R. First we create the plot and which variables that we want using the ggplot function and telling R which dataset and then using the aes function, which variables in that dataset that we want to use.
#install.packages("ggplot2")
library(ggplot2)
ggplot(datWeekFour, aes(GAD7, PHQ9Total))
Now we are ready to tell ggplot what kind of graph we want. Let us do a scatter plot first. To get a scatter plot, we will add geom_point() and for a line graph, we will add geom_line().
ggplot(datWeekFour, aes(GAD7, PHQ9Total))+
geom_point()
ggplot(datWeekFour, aes(GAD7, PHQ9Total))+
geom_line()
What other graphics are you all interested in? Are you all interested in online graphics: https://shiny.rstudio.com/gallery/ Or are you all more interested in statistics?