Analysis Code For Human Activity Recognition Using Smartphones

Read these files in from UCI HAR Dataset directory :

From the test directory:

  • X_test.tx (measurement data)
  • y_test.txt (activity data)
  • subject_test.txt (subject data)

From the train directory

  • X_train.txt (measurement data)
  • y_train.txt (activity data)
  • subject_train.txt (subject data)

Combine test data to training data to form three unified data sets.

  • Data from X-train is added to X-test to form data frame named “data”.
  • Data y-train is added to y-test to form data frame “activity”.
  • Training subject data is added to test subject data to form data frame “subject”.
  • Test and training data frames are removed from environment to leave only combined data sets.
##Load required packages
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
##This segment deals with combining the test and training data 
   
   ###The data sets needed for this procedure are the measurement data sets(marked with an X), activity data sets(marked with a y), 
   ###and the subject data sets.  The processed used in this code was to load the data set group(i.e. measurement data), create a new unified data 
   ###set by using row bind always with the test set in the lead. 
   
        ###The measurement data set
        test <- read.table("./UCI HAR Dataset/test/X_test.txt")

        train <- read.table("./UCI HAR Dataset/train/X_train.txt")

        data <- rbind(test, train)

        rm(list = c("test", "train"))

        ###The activity data set
        activity_test <- read.table("./UCI HAR Dataset/test/y_test.txt")

        activity_train <- read.table("./UCI HAR Dataset/train/y_train.txt")

        activity <- rbind(activity_test, activity_train)

        rm(list = c("activity_test", "activity_train"))

        ###The subject data set
        subtest <- read.table("./UCI HAR Dataset/test/subject_test.txt")
        subtrain <- read.table("./UCI HAR Dataset/train/subject_train.txt")

        subject <- rbind(subtest, subtrain)

        rm(list = c("subtest", "subtrain"))

Load and manipulate features data

  • The file features.txt is read from the UCI HAR Dataset directory to create “feats” data set.
  • The feats data set is filtered to retain only the mean[mean()] and standard deviation[std()] variables.
  • The names of the retained features are changed to improve uniformity, remove confusing symbols, and make them more descriptive.

Manipulate measurement data to eliminate columns that do not pertain to retained features, and to attach descriptive variable names.

  • Columns of measurement data frame(called “data”) are selected using the index variable(feats$V1) retained from filtering the features data frame.
  • Variable names are attached using the features variable(feats$V2).
##This next segment deals with cleaning up the features data set and using it to clean the measurement data set and lable the variables

        ###Load features data set
        feats <- read.table("./UCI HAR Dataset/features.txt", stringsAsFactor = FALSE)

        ###Select Only for only the variables that correspond with the mean and standard deviation measurements
        feats <- filter(feats, grepl("mean\\(", V2) | grepl("std\\(", V2))

        ###Remove unneccessary puncuation and change remaining variable names to lower case
        feats$V2 <- tolower(feats$V2)
        feats$V2 <- gsub("-", "", feats$V2)
        feats$V2 <- gsub("\\(", "", feats$V2)
        feats$V2 <- gsub("\\)", "", feats$V2)
        
        ###Give features variables more descriptive names
        feats$V2 <- gsub("std", "standarddeviation", feats$V2) 
        feats$V2 <- gsub("tbody", "timebody", feats$V2)
        feats$V2 <- gsub("tgravity", "timegravity", feats$V2) 
        feats$V2 <- gsub("acc", "acceleration", feats$V2) 
        feats$V2 <- gsub("fbody", "frequencybody", feats$V2) 
        feats$V2 <- gsub("mag", "magnitude", feats$V2) 

        ### Use column that indicated the index of the features to select for the corresponding columns from the measurement data
        data <- select(data, feats$V1)

        ### Set variable names of the measurement data using the features data 
        colnames(data) <- feats$V2

Attach variables for activity and subject data to measurement data, and give activity descriptive names

  • Activity and subject data are given descriptive variable names
  • Activity and subject columns are added to the measurement data to create one unified data frame(data).
  • Activity, subject, and feats data frames are removed from environment leaving only unified data frame.
  • Numerical identifiers are replace with descriptive activity names in activity column.
## This section of the code covers labling the activity and subject data, attaching them to the measurement data and giving the activities descriptive names
        
        ###lable subject and activity data 
        colnames(activity) <- "activity"
        colnames(subject) <- "subject"

        ###attach the subject and activity to the measurement data to creat one unified data set
        data <- cbind(data, subject, activity)

        rm(list = c("activity", "feats", "subject"))

        ###Give activity data descriptive names
        data$activity[data$activity == 1] <- "walking"
        data$activity[data$activity == 2] <- "walking upstairs"
        data$activity[data$activity == 3] <- "walking downstairs"
        data$activity[data$activity == 4] <- "sitting"
        data$activity[data$activity == 5] <- "standing"
        data$activity[data$activity == 6] <- "laying"

Create and save tidy data set

  • New data set is is created from unified data frame that takes mean values for all measurements by subject and activity called “tidydata”.
  • Tidy data frame is saved as tidydata.txt in working directory.
## The final segment creates a second tidy data set that contains the average for the measurements for each activity and subject  

  ###Use dplyr to create indpendent tidy data set grouped by subject and activity then average the measurements for each grouping
        tidy_df <- data %>%
        group_by(subject, activity) %>%
        summarise_each(funs(mean))
        
        ###Save data set as text file
        write.table(tidy_df, file = "tidydata.txt", row.names = FALSE)