Data Preparation
Import Dataset.
K2_peaks <- read.table("/Users/mac/Desktop/DataSquad/DataUsage/Login Peaks (ACAD x 2018.09.01-2019.07.01).txt", sep = "\t")
Variables of the raw dataset includes lab names, week names, days of the week. The raw data is in the following format:
Raw Data
| Name |
Name2 |
Sunday |
Monday |
Tuesday |
Wednesday |
Thursday |
Friday |
Saturday |
NA |
| AAHI Misc |
|
|
|
|
|
|
|
|
NA |
|
Week of 8/26/18 |
|
|
|
|
|
|
1 |
NA |
|
Week of 9/2/18 |
1 |
1 |
2 |
1 |
1 |
1 |
1 |
NA |
|
Week of 9/9/18 |
2 |
2 |
1 |
2 |
2 |
3 |
2 |
NA |
|
Week of 9/16/18 |
2 |
3 |
2 |
1 |
1 |
1 |
2 |
NA |
|
Week of 9/23/18 |
2 |
3 |
3 |
2 |
2 |
1 |
2 |
NA |
Since we are only interested in data from Monday through Friday, we exclude the Saturday and Sunday data.
labdata <- K2_peaks[-1,-c(3,9,10)]
Since room name only exist in the first row of each group, we use the following function to add room name to each data row.
#Adding room to each data row instead of just first row in the group
for (i in 1: 3450){
if(labdata[i,1] != ""){
holder <- labdata[i,1]
}
else{
labdata[i,1] <- holder
}
}
Data is now in the following wide format, each row contains data of an entire week.
## Name Week Monday Tuesday Wednesday Thursday Friday
## 2 AAHI Misc
## 3 AAHI Misc Week of 8/26/18
## 4 AAHI Misc Week of 9/2/18 1 2 1 1 1
## 5 AAHI Misc Week of 9/9/18 2 1 2 2 3
## 6 AAHI Misc Week of 9/16/18 3 2 1 1 1
## 7 AAHI Misc Week of 9/23/18 3 3 2 2 1
## 8 AAHI Misc Week of 9/30/18 3 1 3 2 2
## 9 AAHI Misc Week of 10/7/18 2 3 3 3 2
## 10 AAHI Misc Week of 10/14/18 3 4 5 6 4
## 11 AAHI Misc Week of 10/21/18 3 4 2 2 3
We transformed the data from the wide format to the long format, so it becomes an appropriate dataframe for R to interpret. Monday through Friday becomes a category under the Day variable and correspond to a Peak value.
Empty fields represent days where there was no usage, thus are substituted with 0.
Non data-containing rows, such as row ##2 in the above data are dropped.
#Substitute empty cells with 0
labdata[is.na(labdata$Peak), 'Peak'] <- 0
#Change wide-format to long-format
labdata <- labdata %>% gather(key = "Day", value = "Peak", 3:7) %>% filter(Week != "")
The cleaned up data is in the following format.
## Name Week Day Peak
## 1 AAHI Misc Week of 8/26/18 Monday
## 2 AAHI Misc Week of 9/2/18 Monday 1
## 3 AAHI Misc Week of 9/9/18 Monday 2
## 4 AAHI Misc Week of 9/16/18 Monday 3
## 5 AAHI Misc Week of 9/23/18 Monday 3
## 6 AAHI Misc Week of 9/30/18 Monday 3
## 7 AAHI Misc Week of 10/7/18 Monday 2
## 8 AAHI Misc Week of 10/14/18 Monday 3
## 9 AAHI Misc Week of 10/21/18 Monday 3
## 10 AAHI Misc Week of 10/28/18 Monday 2
Change the variable types of Peak and Day.
labdata$Peak <- as.numeric(labdata$Peak)
labdata$Day <- as.factor(labdata$Day)
Group the data by the computer labs.
labdata <- labdata[order(labdata$Name),]
A Carleton acdemic year consists of three terms: the fall, the winter, and the spring terms. Each term spans across 11 weeks, including test days.
Since the data includes values of every week of last year, we matched each week with the corresponding term names. We can now create a categoricla variable 'Term' to label the data.
We filtered out the weeks that are not in school terms and left only the data we are interested in in a new dataframe labdata_term. Then we created the new 'Term' using the following condition.
labdata_term <- filter(labdata, Week %in% c(fallterm, winterterm, springterm))
labdata_term$Term <- ifelse(labdata_term$Week %in% fallterm, labdata_term$Term <- 'fall', ifelse(labdata_term$Week %in% winterterm, labdata_term$Term <- 'winter', labdata_term$Term <- 'spring'))
We would also like to create a 'Building' variable that indicates the campus building each computer lab is located. The following is an example of how Building variable is created for computer labs in CMC, Center for Mathematics and Computing.
labdata_term$Building[labdata_term$Name %in% c('CMC102', 'CMC104', 'CMC109', 'CMC110', 'CMC201', 'CMC304', 'CMC306', 'CMC307','CMC318')] <- 'CMC' #9 CMC labs
We also created a 'WeekId' variable to notate how many weeks it is into the term.
labdata_term$WeekId <- factor(1:11)