Human-Centered Computing (HCC) is an emerging research field based on the scientific study of cognition in human behaviour and integrate users and their social context with computer systems. HCC especially studies on understanding the differences between perceptual-motor, cognitive, and social aspects. Human Activity Recognition (HAR) aims to identify the actions that come from human behaviour. Anguita et al. [1] has conducted experiments in sensing human body motion from actions carried out by a person while using a smartphone. The results released as a public domain dataset for Human Activity Recognition using Smartphones[1]. The HAR dataset built from the recordings of 30 subjects doing Activities of Daily Living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.
The experiments have been carried out with a group of 30 subjects with ages ranging from 19 to 48 years. Each subject was assigned to accomplish six Activities of Daily Living (ADL), such as WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, and LAYING while wearing a waist-mounted Samsung Galaxy S II smartphone. The embedded accelerometer and gyroscope on the phone is used to capture 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset would be randomly divided into two sets, where 70% of the experimental data were selected for generating training data and 30% for test data.
The sensor signals of accelerometer and gyroscope were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal which has gravitational and body motion components were separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low-frequency components; therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain. There is a total of 17 signals were obtained by applying the signal processing and a total of 561 features were extracted to describe each activity window.
The HAR dataset is available in UCI Machine Learning Repository.
Libraries for analyzing the dataset:
library(ggplot2)
library(reshape2)
The dataset is provided in txt format. I used read.table function to load the dataset.
First, load the list the activity labels and features of these experiments.
# Data Input: Activity labels
adl.labels <- read.table("UCI HAR Dataset/activity_labels.txt", sep = "")
activityLabels <- as.character(adl.labels$V2)
# Data Input: Feature List
feature.lists <- read.table("UCI HAR Dataset/features.txt", sep = "")
attributeNames <- feature.lists$V2
The activity labels consist of six Activities of Daily Living (ADL):
## [1] "WALKING" "WALKING_UPSTAIRS" "WALKING_DOWNSTAIRS"
## [4] "SITTING" "STANDING" "LAYING"
Here are 17 main signals were used to estimate variables of the features :
For each main signal will have a set of these variables :
There are also additional vectors on the angle() variable: - gravityMean - tBodyAccMean - tBodyAccJerkMean - tBodyGyroMean - tBodyGyroJerkMean
So, the total features:
## [1] 561
Load the training set, its labels, and its corresponding subject.
# Data Input: Training set
Xtrain <- read.table("UCI HAR Dataset/train/X_train.txt", sep = "")
# named each column in xtrain with its associated feature names
names(Xtrain) <- attributeNames
# Data Input: Training set Labels
Ytrain <- read.table("UCI HAR Dataset/train/y_train.txt", sep = "")
# renamed the Ytrain column with "Activity"
names(Ytrain) <- "Activity"
# Convert it as a factor data type.
Ytrain$Activity <- as.factor(Ytrain$Activity)
# linked each level in 'Ytrain$Activity' with its associated activity labels.
levels(Ytrain$Activity) <- activityLabels
# Data Input: Subject who performed the activity for each window sample
trainSubjects <- read.table("UCI HAR Dataset/train/subject_train.txt", sep = "")
# renamed the 'trainSubjects' column with "subject"
names(trainSubjects) <- "subject"
# Convert it as a factor data type.
trainSubjects$subject <- as.factor(trainSubjects$subject)
Now, let’s paired the training set to its activity labels and the subject who carried out the experiment.
# combines the subjects, the activity labels, and the features into one data frame
train_set <- cbind(trainSubjects, Ytrain, Xtrain)
Load the test set, its labels, and its corresponding subject.
# Data Input: Test set
Xtest <- read.table("UCI HAR Dataset/test/X_test.txt", sep = "")
# named each column in xtest with its associated feature names
names(Xtest) <- attributeNames
# Data Input: Test set Labels
Ytest <- read.table("UCI HAR Dataset/test/y_test.txt", sep = "")
# renamed the Ytest column with "Activity"
names(Ytest) <- "Activity"
# Convert it as a factor data type.
Ytest$Activity <- as.factor(Ytest$Activity)
# linked each level in 'Ytest$Activity' with its associated activity labels.
levels(Ytest$Activity) <- activityLabels
# Data Input: Subject who performed the activity for each window sample
testSubjects <- read.table("UCI HAR Dataset/test/subject_test.txt", sep = "")
# renamed the 'testSubjects' column with "subject"
names(testSubjects) <- "subject"
# Convert it as a factor data type.
testSubjects$subject <- as.factor(testSubjects$subject)
Then, paired the test set to its activity labels and the subject who carried out the experiment.
# combines the subjects, the activity labels, and the features into one data frame
test_set <- cbind(testSubjects, Ytest, Xtest)
NA values can cause difficulties. It can be detected in several features of the train set and test set.
range(colSums(is.na(train_set)))
## [1] 0 0
There are no rows with NA values in the train set.
range(colSums(is.na(test_set)))
## [1] 0 0
There are no rows with NA values in the test set.
Insert a new column named Partition on each set to distinguish between two sets.
# Insert `Partition` column into `train` data frame, then assigned value="Train"
train_set$Partition <- "Train"
# Insert `Partition` column into `test` data frame, then assigned value="Test"
test_set$Partition <- "Test"
We need one big data frame consists of train_set and test set.
# Combine the `train` set and `test` set into one data frame by rows
alldf <- rbind(train_set,test_set)
# Convert it as a factor data type.
alldf$Partition <- as.factor(alldf$Partition)
After combining the train set and test set, the dimension of entire dataset:
## [1] 10299 564
Summary of subject variables in the train set.
## 1 3 5 6 7 8 11 14 15 16 17 19 21 22 23 25 26 27 28 29
## 347 341 302 325 308 281 316 323 328 366 368 360 408 321 372 409 392 376 382 344
## 30
## 383
Summary of subject variables in the test set.
## 2 4 9 10 12 13 18 20 24
## 302 317 288 294 320 327 364 354 381
Let’s plot the distribution of the subject variables in the train set and the test set.
colnames(alldf) <- make.unique(names(alldf))
qplot(data = alldf, x = subject, fill = Partition,
main = "The Distribution of Subject", xlab = "Subject", ylab = "Frequency")
Train set and 30% for Test set.Train setTest setThere are six Activities of Daily Living (ADL), such as WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, and LAYING. Let’s find out the distribution of ADL on each subject.
colnames(alldf) <- make.unique(names(alldf))
qplot(data = alldf , x = subject, fill = Activity,
main = "Activities of Daily Living Distribution", xlab = "Subject", ylab = "Frequency")
STANDING and SITTINGThe recognition of body transitions such as stand-to-sit required short time spans (in the order of seconds). We will analyze the shortest time on STANDING and SITTING in different axis from the data taken by Accelarator only.
# Subset only MIN features of the Body Accelerator
alldf_min <- subset(alldf,
select = c("subject","Activity", "tBodyAcc-min()-X", "tBodyAcc-min()-Y", "tBodyAcc-min()-Z"))
# subset only for STANDING and SITTING activities
alldf.stand.sit <- alldf_min[alldf_min$Activity == "STANDING" | alldf_min$Activity == "SITTING",]
# Data inspection
str(alldf.stand.sit)
## 'data.frame': 3683 obs. of 5 variables:
## $ subject : Factor w/ 30 levels "1","3","5","6",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Activity : Factor w/ 6 levels "WALKING","WALKING_UPSTAIRS",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ tBodyAcc-min()-X: num 0.853 0.849 0.844 0.844 0.849 ...
## $ tBodyAcc-min()-Y: num 0.686 0.686 0.682 0.682 0.683 ...
## $ tBodyAcc-min()-Z: num 0.814 0.823 0.839 0.838 0.838 ...
# Rename columns to its corresponding axis
names(alldf.stand.sit)[names(alldf.stand.sit) == "tBodyAcc-min()-X"] <- "X.Axis"
names(alldf.stand.sit)[names(alldf.stand.sit) == "tBodyAcc-min()-Y"] <- "Y.Axis"
names(alldf.stand.sit)[names(alldf.stand.sit) == "tBodyAcc-min()-Z"] <- "Z.Axis"
# Melt the columns of axises into one` column
melt_min <- melt(alldf.stand.sit, id.vars=c("subject", "Activity"))
# Rename the `variable` column to `axis` column
names(melt_min)[names(melt_min) == "variable"] <- "Axis"
# Sort data : ASCENDING
temp <- melt_min[order(melt_min$value), ]
# Find The subject and its activity label that has the shortest time in every axis.
bestX <- head(temp[temp$Axis == "X.Axis", ], 1)
bestY <- head(temp[temp$Axis == "Y.Axis", ], 1)
bestZ <- head(temp[temp$Axis == "Z.Axis", ], 1)
# combine
comb.best <- rbind(bestX, bestY, bestZ)
comb.best
STANDING and SITTING activities