Analysis Code For Human Activity Recognition Using Smartphones
Read these files in from UCI HAR Dataset directory :
From the test directory:
- X_test.tx (measurement data)
- y_test.txt (activity data)
- subject_test.txt (subject data)
From the train directory
- X_train.txt (measurement data)
- y_train.txt (activity data)
- subject_train.txt (subject data)
Load and manipulate features data
- The file features.txt is read from the UCI HAR Dataset directory to create “feats” data set.
- The feats data set is filtered to retain only the mean[mean()] and standard deviation[std()] variables.
- The names of the retained features are changed to improve uniformity, remove confusing symbols, and make them more descriptive.
Manipulate measurement data to eliminate columns that do not pertain to retained features, and to attach descriptive variable names.
- Columns of measurement data frame(called “data”) are selected using the index variable(feats$V1) retained from filtering the features data frame.
- Variable names are attached using the features variable(feats$V2).
##This next segment deals with cleaning up the features data set and using it to clean the measurement data set and lable the variables
###Load features data set
feats <- read.table("./UCI HAR Dataset/features.txt", stringsAsFactor = FALSE)
###Select Only for only the variables that correspond with the mean and standard deviation measurements
feats <- filter(feats, grepl("mean\\(", V2) | grepl("std\\(", V2))
###Remove unneccessary puncuation and change remaining variable names to lower case
feats$V2 <- tolower(feats$V2)
feats$V2 <- gsub("-", "", feats$V2)
feats$V2 <- gsub("\\(", "", feats$V2)
feats$V2 <- gsub("\\)", "", feats$V2)
###Give features variables more descriptive names
feats$V2 <- gsub("std", "standarddeviation", feats$V2)
feats$V2 <- gsub("tbody", "timebody", feats$V2)
feats$V2 <- gsub("tgravity", "timegravity", feats$V2)
feats$V2 <- gsub("acc", "acceleration", feats$V2)
feats$V2 <- gsub("fbody", "frequencybody", feats$V2)
feats$V2 <- gsub("mag", "magnitude", feats$V2)
### Use column that indicated the index of the features to select for the corresponding columns from the measurement data
data <- select(data, feats$V1)
### Set variable names of the measurement data using the features data
colnames(data) <- feats$V2
Attach variables for activity and subject data to measurement data, and give activity descriptive names
- Activity and subject data are given descriptive variable names
- Activity and subject columns are added to the measurement data to create one unified data frame(data).
- Activity, subject, and feats data frames are removed from environment leaving only unified data frame.
- Numerical identifiers are replace with descriptive activity names in activity column.
## This section of the code covers labling the activity and subject data, attaching them to the measurement data and giving the activities descriptive names
###lable subject and activity data
colnames(activity) <- "activity"
colnames(subject) <- "subject"
###attach the subject and activity to the measurement data to creat one unified data set
data <- cbind(data, subject, activity)
rm(list = c("activity", "feats", "subject"))
###Give activity data descriptive names
data$activity[data$activity == 1] <- "walking"
data$activity[data$activity == 2] <- "walking upstairs"
data$activity[data$activity == 3] <- "walking downstairs"
data$activity[data$activity == 4] <- "sitting"
data$activity[data$activity == 5] <- "standing"
data$activity[data$activity == 6] <- "laying"
Create and save tidy data set
- New data set is is created from unified data frame that takes mean values for all measurements by subject and activity called “tidydata”.
- Tidy data frame is saved as tidydata.txt in working directory.
## The final segment creates a second tidy data set that contains the average for the measurements for each activity and subject
###Use dplyr to create indpendent tidy data set grouped by subject and activity then average the measurements for each grouping
tidy_df <- data %>%
group_by(subject, activity) %>%
summarise_each(funs(mean))
###Save data set as text file
write.table(tidy_df, file = "tidydata.txt", row.names = FALSE)