Accelerometer & Gyroscope Data Analysis

Data Cleaning


Step 1.) Loading features data


The first line below reads the features.txt file into variable features. In the second line, we use grep to create a vector called featuresLite identifying the feature names that have ‘mean’ and ‘std’ in the names. Finally, we filter on those feature names only and store it in the variable featureNames.

#load features table
features <- read.table('./UCI HAR Dataset/features.txt')

#create vector for features with 'Mean' and 'Std' 
featuresLite <- grep('-[Mm]ean()|-[Ss]td()', features[,2])

#get just the feature names for 'Mean' and 'Std' filtering on logic above
featureNames <- as.character(features[featuresLite,2])

Step 2.) Clean up feature names and make descriptive


#clean up feature names and make them descriptive
featureNames <- gsub('*-meanFreq\\()*', ' MeanFreq ', featureNames)
featureNames <- gsub('*-mean\\()*', ' Mean ', featureNames)
featureNames <- gsub('*-std\\()*', ' StdDev ', featureNames)

Step 3.) Read in activity names as activityLabels and add column names


Here, we name the ID column of the activityLabels table activityID so that we have a common column name to later merge the activityLabels$activity column with the full dataset in Step 6.) below

#read in activity labels as new table
activityLabels <- read.table('./UCI HAR Dataset/activity_labels.txt')
colnames(activityLabels) <- c('activityID','activity')

Step 4.) Read in Training set data


#Read in the Training set data but get only the 'Mean' and 'Std' data
#  according to our featuresLite logical vector created above
subjectTrain <- read.table('./UCI HAR Dataset/train/subject_train.txt')
xtrain <- read.table('./UCI HAR Dataset/train/X_train.txt')[featuresLite]
ytrain <- read.table('./UCI HAR Dataset/train/Y_train.txt')

#Define column names for the training data
colnames(subjectTrain) <- "subjectID"
colnames(xtrain) <- featureNames
colnames(ytrain) <- "activityID"

#combine the subject, y, and x training sets into one
trainSet <- cbind(subjectTrain,ytrain,xtrain)

Step 5.) Read in Test set data


#Read in the Test set data but get only the 'Mean' and 'Std' data
#  according to our featuresLite vector created above
subjectTest <- read.table('./UCI HAR Dataset/test/subject_test.txt')
xtest <- read.table('./UCI HAR Dataset/test/X_test.txt')[featuresLite]
ytest <- read.table('./UCI HAR Dataset/test/Y_test.txt')

colnames(subjectTest) <- "subjectID"
colnames(xtest) <- featureNames
colnames(ytest) <- "activityID"

#combine the subject, y, and x test sets into one
testSet <- cbind(subjectTest,ytest,xtest)

Step 6.) Combine the Test and Training sets into one dataset


In this step, we combine the trainSet and testSet rows into one table using rbind(). We use the merge() function and join the new fullSetto the activityLabels table by the activityID common column. The activityLabels$activity column is then used for the descriptive activity names.

We then use tidyr::gather() to gather the columns and change the table from a wide to a long format.

#combine the training set and test set into one full set
fullSet <- rbind(trainSet,testSet)

#merge the fullSet dataset with the activityLabels table
fullSet <- merge(activityLabels,fullSet,'activityID')

#gather columns into a long format table to create clean dataset
cleanSet <- gather(fullSet,ncol=4:82)

Step 7.) Create tidy data set


The final step is to create a tidy dataset from the above created cleanSet. We do this using the tidyr package and chain operation %>%.
1.) First we separate() the feature column into the three delimited parts and labeling the new cols appropriately: ‘feature’,‘measure’,‘XYZ’.
2.) The next step is to group_by() the columns we want to summarise by. (Here, we drop the direction XYZ column so that we only deal with the average of values for all directions for a subject for a particular measurement)
3.) The third tidying is to summarise() the table to the mean of the values for the features we are grouping by.
4.) Finally we spread() the dataset by the newly created measure column and cleanup the table before writing it to a file.

# separating the feature column into the three delimited parts
#    and labeling the new cols appropriately: 'feature','measure','XYZ'
# grouping by all relevant columns
# summarising by the group above for the average of all values
# spreading the dataset by the newly created 'measure' column
tidySet <- cleanSet %>%
    separate(key, into=c('feature','measure','XYZ'), sep = ' ') %>%
    group_by(feature, activityID, activity, subjectID, XYZ, measure) %>%
    summarise(Value = mean(value)) %>%
    spread(measure, Value)

#clean XYZ column
tidySet$XYZ <- gsub('-','',tidySet$XYZ)
#convert to data frame    
tidySet <- as.data.frame(tidySet)
#save tidySet as Rdata file
save(tidySet, file='tidySet.Rdata')
write.table(tidySet, file='tidySet.txt')