Source file ⇒ Human_Activity_Recognition_Project.Rmd

Introduction: Today, there are many technologies that detect what activity a person is doing based on certain motions and movements in a three-dimensional space. Using an accelerometer and a gyrometer, data was collected for 30 volunteers. 70% of these volunteers were placed in the training group and 30% were placed in test group. This project aims to see the distribution of activities for these individuals.

For each record it is provided:- . Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.

. Triaxial Angular velocity from the gyroscope.

. A 561-feature vector with time and frequency domain variables.

. Its activity label.

. An identifier of the subject who carried out the experiment.

The dataset includes the following files:

#reading in data files
getwd()
## [1] "/Users/bharveepatel/Documents/Data Science Projects"
setwd("/Users/bharveepatel/Documents/UCI HAR Dataset")

#extracting vector of activities measured
act_label <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/activity_labels.txt", sep = "")
activity <- as.character(act_label$V2)

#extracting vector of features measured with accelerometer
features <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/features.txt", sep = "")
feature <- features$V2


#70% of volunteers are part of the training data.
train_x <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/train/X_train.txt", sep = "")
names(train_x) <- feature

train_y = read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/train/y_train.txt", sep = "")
names(train_y) <- "ActivityNumber"
train_y$ActivityNumber <- as.factor(train_y$ActivityNumber)
levels(train_y$ActivityNumber) <- activity

train_subject <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/train/subject_train.txt", sep = "")
names(train_subject) <- "Subject"

train <- cbind(train_x, train_y, train_subject)
ncol(train)
## [1] 563
#30% of volunteers are in the test data
test_x <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/test/X_test.txt", sep = "")
names(test_x) <- feature

test_y <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/test/y_test.txt", sep = "")
names(test_y) <- "ActivityNumber"
test_y$ActivityNumber <- as.factor(test_y$ActivityNumber)
levels(test_y$ActivityNumber) <- activity

test_subject <- read.table("/Users/bharveepatel/Documents/UCI HAR Dataset/test/subject_test.txt", sep = "")
names(test_subject) <- "Subject"

test <- cbind(test_x, test_y, test_subject)
ncol(test)
## [1] 563
library(ggplot2)
train$Group <- "Train"
test$Group <- "Test"

volunteers <- rbind(train, test)
volunteers$Group <- as.factor(volunteers$Group)
qplot(data = volunteers, x= Subject, fill= Group)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

volunteers %>%
  ggplot(aes(x=Subject)) + geom_bar(aes(color= ActivityNumber))

From the graph above, it shows that activities are close to evenly distributed among these individuals. This data could be used to further understand how proportions of activities can lead to certain health issues or benefits, although the volunteers’ health would also need to be recorded for such analysis.

We can see how many companies could utilize the findings in this experiment to add features to their health products. For example, Fitbit could use such data to include an activity tracker, one that tracks activities like sitter and laying.