Source file ⇒ Human_Activity_Recognition_Project.Rmd
Introduction: Today, there are many technologies that detect what activity a person is doing based on certain motions and movements in a three-dimensional space. Using an accelerometer and a gyrometer, data was collected for 30 volunteers. 70% of these volunteers were placed in the training group and 30% were placed in test group. This project aims to see the distribution of activities for these individuals.
For each record it is provided:- . Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
. Triaxial Angular velocity from the gyroscope.
. A 561-feature vector with time and frequency domain variables.
. Its activity label.
. An identifier of the subject who carried out the experiment.
The dataset includes the following files:
‘README.txt’
‘features_info.txt’: Shows information about the variables used on the feature vector.
‘features.txt’: List of all features.
‘activity_labels.txt’: Links the class labels with their activity name.
‘train/X_train.txt’: Training set.
‘train/y_train.txt’: Training labels.
‘test/X_test.txt’: Test set.
‘test/y_test.txt’: Test labels.
#reading in data files
getwd()
## [1] "/Users/bharveepatel/Documents/Data Science Projects"
setwd("/Users/bharveepatel/Documents/UCI HAR Dataset")
#extracting vector of activities measured
act_label <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/activity_labels.txt", sep = "")
activity <- as.character(act_label$V2)
#extracting vector of features measured with accelerometer
features <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/features.txt", sep = "")
feature <- features$V2
#70% of volunteers are part of the training data.
train_x <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/train/X_train.txt", sep = "")
names(train_x) <- feature
train_y = read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/train/y_train.txt", sep = "")
names(train_y) <- "ActivityNumber"
train_y$ActivityNumber <- as.factor(train_y$ActivityNumber)
levels(train_y$ActivityNumber) <- activity
train_subject <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/train/subject_train.txt", sep = "")
names(train_subject) <- "Subject"
train <- cbind(train_x, train_y, train_subject)
ncol(train)
## [1] 563
#30% of volunteers are in the test data
test_x <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/test/X_test.txt", sep = "")
names(test_x) <- feature
test_y <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/test/y_test.txt", sep = "")
names(test_y) <- "ActivityNumber"
test_y$ActivityNumber <- as.factor(test_y$ActivityNumber)
levels(test_y$ActivityNumber) <- activity
test_subject <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/test/subject_test.txt", sep = "")
names(test_subject) <- "Subject"
test <- cbind(test_x, test_y, test_subject)
ncol(test)
## [1] 563
library(ggplot2)
train$Group <- "Train"
test$Group <- "Test"
volunteers <- rbind(train, test)
volunteers$Group <- as.factor(volunteers$Group)
qplot(data = volunteers, x= Subject, fill= Group)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
volunteers %>%
ggplot(aes(x=Subject)) + geom_bar(aes(color= ActivityNumber))
From the graph above, it shows that activities are close to evenly distributed among these individuals. This data could be used to further understand how proportions of activities can lead to certain health issues or benefits, although the volunteers’ health would also need to be recorded for such analysis.
We can see how many companies could utilize the findings in this experiment to add features to their health products. For example, Fitbit could use such data to include an activity tracker, one that tracks activities like sitter and laying.