Objective

In this project we go step by step over data importing and table structuring and cleaning. The final result is a .csv file with a table that is tidy, comprehensible, has no missing values or outliers and can be readly imported and used.

For this We will use the “Human Activity Recognition (HAR) Using Smartphones Data Set” collected by CETpD and SmartLab. The dataset is built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.

The dataset is comprised of 269 MB of .txt files. Our objective is to assemble this dataset and ensure that it is clean.


Data import

About the Data

The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

Please visit the data website for further details.

Download

download.url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip"
download.path <- "./UCI HAR Dataset.zip"

# Check if file exists, if not, downloads the file
if (!file.exists(download.path)) {
        download.file(download.url, download.path)
}

# Check if directory exists, if not, creates it by unzipping the downloaded file
unzip.path <- "UCI HAR Dataset"
if (!file.exists(unzip.path)) {
        unzip(download.path)
}

Import

# Lists all .txt files in the directory
all.files.paths <- list.files(path = "./UCI HAR Dataset",pattern = ".txt", recursive = T, full.names = T)
List of all unzipped files
Paths File.Description
./UCI HAR Dataset/activity_labels.txt Links the class labels with their activity name.
./UCI HAR Dataset/features.txt List of all features
./UCI HAR Dataset/features_info.txt Shows information about the variables used on the feature vector.
./UCI HAR Dataset/README.txt README file
./UCI HAR Dataset/test/Inertial Signals/body_acc_x_test.txt Body Acceleration Signals
./UCI HAR Dataset/test/Inertial Signals/body_acc_y_test.txt Body Acceleration Signals
./UCI HAR Dataset/test/Inertial Signals/body_acc_z_test.txt Body Acceleration Signals
./UCI HAR Dataset/test/Inertial Signals/body_gyro_x_test.txt Angular velocity measurements
./UCI HAR Dataset/test/Inertial Signals/body_gyro_y_test.txt Angular velocity measurements
./UCI HAR Dataset/test/Inertial Signals/body_gyro_z_test.txt Angular velocity measurements
./UCI HAR Dataset/test/Inertial Signals/total_acc_x_test.txt Total Acceleration Signals
./UCI HAR Dataset/test/Inertial Signals/total_acc_y_test.txt Total Acceleration Signals
./UCI HAR Dataset/test/Inertial Signals/total_acc_z_test.txt Total Acceleration Signals
./UCI HAR Dataset/test/subject_test.txt Each row identifies the subject who performed the activity
./UCI HAR Dataset/test/X_test.txt Compiled Test Set
./UCI HAR Dataset/test/y_test.txt Test Labels
./UCI HAR Dataset/train/Inertial Signals/body_acc_x_train.txt Body Acceleration Signals
./UCI HAR Dataset/train/Inertial Signals/body_acc_y_train.txt Body Acceleration Signals
./UCI HAR Dataset/train/Inertial Signals/body_acc_z_train.txt Body Acceleration Signals
./UCI HAR Dataset/train/Inertial Signals/body_gyro_x_train.txt Angular velocity measurements
./UCI HAR Dataset/train/Inertial Signals/body_gyro_y_train.txt Angular velocity measurements
./UCI HAR Dataset/train/Inertial Signals/body_gyro_z_train.txt Angular velocity measurements
./UCI HAR Dataset/train/Inertial Signals/total_acc_x_train.txt Total Acceleration Signals
./UCI HAR Dataset/train/Inertial Signals/total_acc_y_train.txt Total Acceleration Signals
./UCI HAR Dataset/train/Inertial Signals/total_acc_z_train.txt Total Acceleration Signals
./UCI HAR Dataset/train/subject_train.txt Each row identifies the subject who performed the activity
./UCI HAR Dataset/train/X_train.txt Compiled Train Set
./UCI HAR Dataset/train/y_train.txt Train Labels

Once the files are downloaded and unzipped, we notice that it came in several .txt files (as shown above). Taking a further look on the data website and README.txt file, we can begin to have an idea of how the final table might be structured.

It seems that:

  1. Most of the measurements are compiled in the X_test.txt and X_train.txt sets.
  2. The README.txt and features_info.txt are explanatory files and don’t constitute the final dataset.

This allows us to dispose of a lot of data, leaving us to:

# Pick only the necessary file paths
all.files.paths <- all.files.paths[c(1,2,14:16,26:28)]

# Data import as data.table
raw.data <- lapply(X = all.files.paths, FUN = fread)
List of required files to the final dataset
Paths File.Description
./UCI HAR Dataset/activity_labels.txt Links the class labels with their activity name.
./UCI HAR Dataset/features.txt List of all features
./UCI HAR Dataset/test/subject_test.txt Each row identifies the subject who performed the activity
./UCI HAR Dataset/test/X_test.txt Compiled Test Set
./UCI HAR Dataset/test/y_test.txt Test Labels
./UCI HAR Dataset/train/subject_train.txt Each row identifies the subject who performed the activity
./UCI HAR Dataset/train/X_train.txt Compiled Train Set
./UCI HAR Dataset/train/y_train.txt Train Labels

Table Assemble

Table Structure

Now that the required files have been imported. We want to have an idea of how they are going to be structured. The image below is a representation of how each txt file has to be bound on the final dataset.

  1. The test set (blue) is spread over three files, it should be put together and identified as “test set”.
  2. The same must be done with the train set (orange).
  3. Only then, both sets are to be bind.
  4. Finally, the sets must have its columns named. The column names are in the features.txt file, which has also been imported.

1. “Test” set assemble

test.table <- bind_cols(raw.data[[3]], raw.data[[5]], Set = rep("Test set", times = dim(raw.data[[3]])[1]), raw.data[[4]])

2. “Train” set assemble

train.table <- bind_cols(raw.data[[6]], raw.data[[8]], Set = rep("Train set", times = dim(raw.data[[6]])[1]), raw.data[[7]])

3. Bind “test” and “train” sets

table <- bind_rows(test.table, train.table)

4. Column names

# Set up the column names list
col.names <- c("Subject",
                  "Activity",
                  "Set",
                  unlist(raw.data[[2]][,2]))


Adjustments

Duplicate Values on col.names

Notice that we haven’t assigned the col.names to the table yet. That is because if we take a further look at it we will see that it actually has a lot of duplicate values.

# Check for duplicates
table(duplicated(col.names))
## 
## FALSE  TRUE 
##   480    84

Duplicated values are a problem because they do not allow packages likes dplyr to work properly. So, in order to make all values unique, we will add an exclamation point and idnumber to the end of each column name.

# Add a !# at the end of every variable name. This will avoid duplicates
col.names <- paste0(col.names, rep("!", times = dim(table)[2]), 1:dim(table)[2])

# Assign the list to be the column name of the table
colnames(table) <- col.names

Subset mean and std variables

There are 564 variables in the test/train set. If we are to apply a machine learning algorithm to that whole dataset a lot of computational power and time are going to be spent. So one of our tasks will be to extract only the mean and std variables of the dataset. This greatly reduces the computational power required to run the code, while not compromising the final result.

# Subset the variables corresponding to means and standard deviations
table.subset <- table %>%
        select(starts_with("t"), starts_with("f")) %>%
        select(contains("mean"), contains("std"))

# Replace the previous table with the subsetted one
table <- bind_cols(table[,1:3],table.subset)

Making the data Tidy

Descriptive column names

Let’s take a look at the column names we have just assigned:

List of all Column Names
V1 V2 V3
Subject!1 fBodyAcc-meanFreq()-Z!299 tBodyAccJerk-std()-Y!88
Activity!2 fBodyAccJerk-mean()-X!348 tBodyAccJerk-std()-Z!89
Set!3 fBodyAccJerk-mean()-Y!349 tBodyGyro-std()-X!127
tBodyAcc-mean()-X!4 fBodyAccJerk-mean()-Z!350 tBodyGyro-std()-Y!128
tBodyAcc-mean()-Y!5 fBodyAccJerk-meanFreq()-X!376 tBodyGyro-std()-Z!129
tBodyAcc-mean()-Z!6 fBodyAccJerk-meanFreq()-Y!377 tBodyGyroJerk-std()-X!167
tGravityAcc-mean()-X!44 fBodyAccJerk-meanFreq()-Z!378 tBodyGyroJerk-std()-Y!168
tGravityAcc-mean()-Y!45 fBodyGyro-mean()-X!427 tBodyGyroJerk-std()-Z!169
tGravityAcc-mean()-Z!46 fBodyGyro-mean()-Y!428 tBodyAccMag-std()!205
tBodyAccJerk-mean()-X!84 fBodyGyro-mean()-Z!429 tGravityAccMag-std()!218
tBodyAccJerk-mean()-Y!85 fBodyGyro-meanFreq()-X!455 tBodyAccJerkMag-std()!231
tBodyAccJerk-mean()-Z!86 fBodyGyro-meanFreq()-Y!456 tBodyGyroMag-std()!244
tBodyGyro-mean()-X!124 fBodyGyro-meanFreq()-Z!457 tBodyGyroJerkMag-std()!257
tBodyGyro-mean()-Y!125 fBodyAccMag-mean()!506 fBodyAcc-std()-X!272
tBodyGyro-mean()-Z!126 fBodyAccMag-meanFreq()!516 fBodyAcc-std()-Y!273
tBodyGyroJerk-mean()-X!164 fBodyBodyAccJerkMag-mean()!519 fBodyAcc-std()-Z!274
tBodyGyroJerk-mean()-Y!165 fBodyBodyAccJerkMag-meanFreq()!529 fBodyAccJerk-std()-X!351
tBodyGyroJerk-mean()-Z!166 fBodyBodyGyroMag-mean()!532 fBodyAccJerk-std()-Y!352
tBodyAccMag-mean()!204 fBodyBodyGyroMag-meanFreq()!542 fBodyAccJerk-std()-Z!353
tGravityAccMag-mean()!217 fBodyBodyGyroJerkMag-mean()!545 fBodyGyro-std()-X!430
tBodyAccJerkMag-mean()!230 fBodyBodyGyroJerkMag-meanFreq()!555 fBodyGyro-std()-Y!431
tBodyGyroMag-mean()!243 tBodyAcc-std()-X!7 fBodyGyro-std()-Z!432
tBodyGyroJerkMag-mean()!256 tBodyAcc-std()-Y!8 fBodyAccMag-std()!507
fBodyAcc-mean()-X!269 tBodyAcc-std()-Z!9 fBodyBodyAccJerkMag-std()!520
fBodyAcc-mean()-Y!270 tGravityAcc-std()-X!47 fBodyBodyGyroMag-std()!533
fBodyAcc-mean()-Z!271 tGravityAcc-std()-Y!48 fBodyBodyGyroJerkMag-std()!546
fBodyAcc-meanFreq()-X!297 tGravityAcc-std()-Z!49
fBodyAcc-meanFreq()-Y!298 tBodyAccJerk-std()-X!87

As we want to export a tidy data that is fully comprehensible, we must ensure that our variables names are more descriptive.

col.names <- colnames(table)
col.names <- tolower(col.names) # All to lower
col.names <- gsub("^t", "time.", col.names) # Replace t by "time"
col.names <- gsub("^f", "freq.", col.names) # Replace f by "frequency"
col.names <- gsub("-", ".", col.names) # Replace "-" by "."
col.names <- gsub('\\()', "", col.names) # Remove all "()" characters
col.names <- gsub('\\!.*', "", col.names) # Remove markers

# As all markes have been removed, lets check if there are duplicate values on the column names
table(duplicated(col.names))
## 
## FALSE 
##    82
# Reassign the list to the column names
colnames(table) <- col.names

Descriptive activity column

The activity column in our table is comprised of numbers, each number indicates which activity has been performed by the subject at the given row. As it is, the table states that different activities have been performed, but doesn’t make clear which activity was performed.

# Check what are the values in the "activity column"
table(table[,activity])
## 
##    1    2    3    4    5    6 
## 1722 1544 1406 1777 1906 1944

Luckily, the activity.txt file contains a list linking each number with it corresponding activity.

We will use the numbers as keys to link each row with its activity, joining both tables. This will make our final table fully comprehensible and complete.

# Join activity table and massive table to add the activities
activity.table <- raw.data[[1]]
table <- merge(activity.table, table, by.x = "V1", by.y = "activity")[,-1]
colnames(table)[1] <- "activity"

# Recheck the "activity column"
table(table[, activity])
## 
##             LAYING            SITTING           STANDING 
##               1944               1777               1906 
##            WALKING WALKING_DOWNSTAIRS   WALKING_UPSTAIRS 
##               1722               1406               1544

Cleaning

Now that we have our dataset, we want take a look at what is inside to ensure that it is “clean” for the final user.

str(table)
## Classes 'data.table' and 'data.frame':   10299 obs. of  82 variables:
##  $ activity                         : chr  "WALKING" "WALKING" "WALKING" "WALKING" ...
##  $ subject                          : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ set                              : chr  "Test set" "Test set" "Test set" "Test set" ...
##  $ time.bodyacc.mean.x              : num  0.204 0.249 0.325 0.309 0.266 ...
##  $ time.bodyacc.mean.y              : num  -0.03234 -0.00341 -0.0298 -0.02213 -0.01594 ...
##  $ time.bodyacc.mean.z              : num  -0.0969 -0.056 -0.0778 -0.1321 -0.1207 ...
##  $ time.gravityacc.mean.x           : num  0.92 0.913 0.921 0.925 0.926 ...
##  $ time.gravityacc.mean.y           : num  -0.347 -0.352 -0.352 -0.352 -0.348 ...
##  $ time.gravityacc.mean.z           : num  0.0197 0.025 0.0301 0.0268 0.0218 ...
##  $ time.bodyaccjerk.mean.x          : num  -0.077 0.1098 0.2018 -0.1578 0.0362 ...
##  $ time.bodyaccjerk.mean.y          : num  0.1975 0.242 -0.0531 -0.4142 0.0659 ...
##  $ time.bodyaccjerk.mean.z          : num  0.17867 0.38011 -0.22104 -0.23184 0.00615 ...
##  $ time.bodygyro.mean.x             : num  -0.2783 -0.1484 -0.0143 0.0295 0.025 ...
##  $ time.bodygyro.mean.y             : num  0.2344 0.032 -0.0765 -0.0873 -0.1465 ...
##  $ time.bodygyro.mean.z             : num  0.0817 0.1259 0.1062 0.1063 0.0988 ...
##  $ time.bodygyrojerk.mean.x         : num  -0.0568 -0.0294 0.1152 -0.0182 -0.128 ...
##  $ time.bodygyrojerk.mean.y         : num  -0.196 -0.0706 0.0933 -0.1509 -0.2406 ...
##  $ time.bodygyrojerk.mean.z         : num  0.11924 -0.2054 -0.1921 -0.01534 0.00789 ...
##  $ time.bodyaccmag.mean             : num  -0.356 -0.296 -0.278 -0.291 -0.344 ...
##  $ time.gravityaccmag.mean          : num  -0.356 -0.296 -0.278 -0.291 -0.344 ...
##  $ time.bodyaccjerkmag.mean         : num  -0.391 -0.295 -0.268 -0.318 -0.4 ...
##  $ time.bodygyromag.mean            : num  -0.417 -0.46 -0.482 -0.482 -0.513 ...
##  $ time.bodygyrojerkmag.mean        : num  -0.652 -0.638 -0.641 -0.645 -0.661 ...
##  $ freq.bodyacc.mean.x              : num  -0.362 -0.344 -0.373 -0.443 -0.458 ...
##  $ freq.bodyacc.mean.y              : num  -0.121 -0.1052 0.0859 0.1167 -0.1556 ...
##  $ freq.bodyacc.mean.z              : num  -0.521 -0.458 -0.402 -0.375 -0.483 ...
##  $ freq.bodyacc.meanfreq.x          : num  -0.2387 -0.2067 -0.0966 -0.0414 -0.3042 ...
##  $ freq.bodyacc.meanfreq.y          : num  0.1114 0.1512 0.2123 0.2108 -0.0106 ...
##  $ freq.bodyacc.meanfreq.z          : num  -0.00924 0.42195 -0.01455 0.18375 -0.16149 ...
##  $ freq.bodyaccjerk.mean.x          : num  -0.351 -0.289 -0.332 -0.45 -0.45 ...
##  $ freq.bodyaccjerk.mean.y          : num  -0.2039 -0.1005 0.0728 0.029 -0.2658 ...
##  $ freq.bodyaccjerk.mean.z          : num  -0.648 -0.624 -0.532 -0.539 -0.604 ...
##  $ freq.bodyaccjerk.meanfreq.x      : num  -0.1619 0.0165 0.0103 0.0952 -0.0783 ...
##  $ freq.bodyaccjerk.meanfreq.y      : num  -0.2491 -0.0751 -0.2732 -0.4531 -0.36 ...
##  $ freq.bodyaccjerk.meanfreq.z      : num  -0.339 -0.233 -0.344 -0.13 -0.202 ...
##  $ freq.bodygyro.mean.x             : num  -0.513 -0.496 -0.425 -0.524 -0.537 ...
##  $ freq.bodygyro.mean.y             : num  -0.568 -0.588 -0.568 -0.59 -0.577 ...
##  $ freq.bodygyro.mean.z             : num  -0.433 -0.432 -0.426 -0.487 -0.534 ...
##  $ freq.bodygyro.meanfreq.x         : num  -0.1338 -0.0983 0.2521 -0.2303 -0.0918 ...
##  $ freq.bodygyro.meanfreq.y         : num  -0.2035 -0.2866 -0.0591 -0.2582 -0.0289 ...
##  $ freq.bodygyro.meanfreq.z         : num  0.0638 0.045 0.0683 0.0435 0.041 ...
##  $ freq.bodyaccmag.mean             : num  -0.368 -0.372 -0.299 -0.331 -0.475 ...
##  $ freq.bodyaccmag.meanfreq         : num  0.163 0.39 0.428 0.454 0.332 ...
##  $ freq.bodybodyaccjerkmag.mean     : num  -0.332 -0.225 -0.135 -0.198 -0.4 ...
##  $ freq.bodybodyaccjerkmag.meanfreq : num  0.0317 0.1569 0.0162 0.0454 0.0831 ...
##  $ freq.bodybodygyromag.mean        : num  -0.574 -0.633 -0.571 -0.566 -0.595 ...
##  $ freq.bodybodygyromag.meanfreq    : num  0.0185 0.1961 0.25 0.1382 0.2023 ...
##  $ freq.bodybodygyrojerkmag.mean    : num  -0.69 -0.679 -0.674 -0.68 -0.683 ...
##  $ freq.bodybodygyrojerkmag.meanfreq: num  0.109 0.267 0.372 0.26 0.324 ...
##  $ time.bodyacc.std.x               : num  -0.466 -0.399 -0.465 -0.515 -0.501 ...
##  $ time.bodyacc.std.y               : num  -0.1808 -0.1378 0.0135 0.032 -0.1632 ...
##  $ time.bodyacc.std.z               : num  -0.455 -0.461 -0.368 -0.349 -0.389 ...
##  $ time.gravityacc.std.x            : num  -0.954 -0.97 -0.961 -0.979 -0.971 ...
##  $ time.gravityacc.std.y            : num  -0.947 -0.973 -0.985 -0.983 -0.977 ...
##  $ time.gravityacc.std.z            : num  -0.976 -0.969 -0.97 -0.959 -0.982 ...
##  $ time.bodyaccjerk.std.x           : num  -0.369 -0.257 -0.333 -0.427 -0.44 ...
##  $ time.bodyaccjerk.std.y           : num  -0.1931 -0.0974 0.1332 0.0887 -0.2381 ...
##  $ time.bodyaccjerk.std.z           : num  -0.678 -0.663 -0.587 -0.575 -0.622 ...
##  $ time.bodygyro.std.x              : num  -0.615 -0.603 -0.594 -0.587 -0.615 ...
##  $ time.bodygyro.std.y              : num  -0.543 -0.545 -0.557 -0.537 -0.555 ...
##  $ time.bodygyro.std.z              : num  -0.5 -0.489 -0.51 -0.553 -0.583 ...
##  $ time.bodygyrojerk.std.x          : num  -0.543 -0.534 -0.532 -0.516 -0.51 ...
##  $ time.bodygyrojerk.std.y          : num  -0.739 -0.71 -0.73 -0.733 -0.747 ...
##  $ time.bodygyrojerk.std.z          : num  -0.563 -0.581 -0.537 -0.55 -0.619 ...
##  $ time.bodyaccmag.std              : num  -0.444 -0.439 -0.416 -0.449 -0.52 ...
##  $ time.gravityaccmag.std           : num  -0.444 -0.439 -0.416 -0.449 -0.52 ...
##  $ time.bodyaccjerkmag.std          : num  -0.318 -0.245 -0.121 -0.176 -0.416 ...
##  $ time.bodygyromag.std             : num  -0.546 -0.605 -0.612 -0.574 -0.585 ...
##  $ time.bodygyrojerkmag.std         : num  -0.706 -0.692 -0.69 -0.681 -0.698 ...
##  $ freq.bodyacc.std.x               : num  -0.512 -0.423 -0.506 -0.546 -0.519 ...
##  $ freq.bodyacc.std.y               : num  -0.2668 -0.2105 -0.0923 -0.0824 -0.2201 ...
##  $ freq.bodyacc.std.z               : num  -0.462 -0.506 -0.398 -0.386 -0.387 ...
##  $ freq.bodyaccjerk.std.x           : num  -0.45 -0.289 -0.396 -0.453 -0.48 ...
##  $ freq.bodyaccjerk.std.y           : num  -0.238 -0.159 0.124 0.082 -0.259 ...
##  $ freq.bodyaccjerk.std.z           : num  -0.705 -0.702 -0.641 -0.608 -0.637 ...
##  $ freq.bodygyro.std.x              : num  -0.647 -0.637 -0.649 -0.608 -0.64 ...
##  $ freq.bodygyro.std.y              : num  -0.531 -0.523 -0.552 -0.51 -0.545 ...
##  $ freq.bodygyro.std.z              : num  -0.57 -0.556 -0.585 -0.618 -0.639 ...
##  $ freq.bodyaccmag.std              : num  -0.577 -0.567 -0.585 -0.614 -0.621 ...
##  $ freq.bodybodyaccjerkmag.std      : num  -0.304 -0.273 -0.109 -0.154 -0.437 ...
##  $ freq.bodybodygyromag.std         : num  -0.604 -0.652 -0.716 -0.655 -0.649 ...
##  $ freq.bodybodygyrojerkmag.std     : num  -0.75 -0.732 -0.736 -0.704 -0.741 ...
##  - attr(*, ".internal.selfref")=<externalptr>

All columns seem to be properly classified. Both activity and set columns are “characters”, while the subject column is classified as integer. All the rest refer to experimental measurements and are classified as “number”.

summary(table)
##    activity            subject          set            time.bodyacc.mean.x
##  Length:10299       Min.   : 1.00   Length:10299       Min.   :-1.0000    
##  Class :character   1st Qu.: 9.00   Class :character   1st Qu.: 0.2626    
##  Mode  :character   Median :17.00   Mode  :character   Median : 0.2772    
##                     Mean   :16.15                      Mean   : 0.2743    
##                     3rd Qu.:24.00                      3rd Qu.: 0.2884    
##                     Max.   :30.00                      Max.   : 1.0000    
##  time.bodyacc.mean.y time.bodyacc.mean.z time.gravityacc.mean.x
##  Min.   :-1.00000    Min.   :-1.00000    Min.   :-1.0000       
##  1st Qu.:-0.02490    1st Qu.:-0.12102    1st Qu.: 0.8117       
##  Median :-0.01716    Median :-0.10860    Median : 0.9218       
##  Mean   :-0.01774    Mean   :-0.10892    Mean   : 0.6692       
##  3rd Qu.:-0.01062    3rd Qu.:-0.09759    3rd Qu.: 0.9547       
##  Max.   : 1.00000    Max.   : 1.00000    Max.   : 1.0000       
##  time.gravityacc.mean.y time.gravityacc.mean.z time.bodyaccjerk.mean.x
##  Min.   :-1.000000      Min.   :-1.00000       Min.   :-1.00000       
##  1st Qu.:-0.242943      1st Qu.:-0.11671       1st Qu.: 0.06298       
##  Median :-0.143551      Median : 0.03680       Median : 0.07597       
##  Mean   : 0.004039      Mean   : 0.09215       Mean   : 0.07894       
##  3rd Qu.: 0.118905      3rd Qu.: 0.21621       3rd Qu.: 0.09131       
##  Max.   : 1.000000      Max.   : 1.00000       Max.   : 1.00000       
##  time.bodyaccjerk.mean.y time.bodyaccjerk.mean.z time.bodygyro.mean.x
##  Min.   :-1.000000       Min.   :-1.000000       Min.   :-1.00000    
##  1st Qu.:-0.018555       1st Qu.:-0.031552       1st Qu.:-0.04579    
##  Median : 0.010753       Median :-0.001159       Median :-0.02776    
##  Mean   : 0.007948       Mean   :-0.004675       Mean   :-0.03098    
##  3rd Qu.: 0.033538       3rd Qu.: 0.024578       3rd Qu.:-0.01058    
##  Max.   : 1.000000       Max.   : 1.000000       Max.   : 1.00000    
##  time.bodygyro.mean.y time.bodygyro.mean.z time.bodygyrojerk.mean.x
##  Min.   :-1.00000     Min.   :-1.00000     Min.   :-1.00000        
##  1st Qu.:-0.10399     1st Qu.: 0.06485     1st Qu.:-0.11723        
##  Median :-0.07477     Median : 0.08626     Median :-0.09824        
##  Mean   :-0.07472     Mean   : 0.08836     Mean   :-0.09671        
##  3rd Qu.:-0.05110     3rd Qu.: 0.11044     3rd Qu.:-0.07930        
##  Max.   : 1.00000     Max.   : 1.00000     Max.   : 1.00000        
##  time.bodygyrojerk.mean.y time.bodygyrojerk.mean.z time.bodyaccmag.mean
##  Min.   :-1.00000         Min.   :-1.00000         Min.   :-1.0000     
##  1st Qu.:-0.05868         1st Qu.:-0.07936         1st Qu.:-0.9819     
##  Median :-0.04056         Median :-0.05455         Median :-0.8746     
##  Mean   :-0.04232         Mean   :-0.05483         Mean   :-0.5482     
##  3rd Qu.:-0.02521         3rd Qu.:-0.03168         3rd Qu.:-0.1201     
##  Max.   : 1.00000         Max.   : 1.00000         Max.   : 1.0000     
##  time.gravityaccmag.mean time.bodyaccjerkmag.mean time.bodygyromag.mean
##  Min.   :-1.0000         Min.   :-1.0000          Min.   :-1.0000      
##  1st Qu.:-0.9819         1st Qu.:-0.9896          1st Qu.:-0.9781      
##  Median :-0.8746         Median :-0.9481          Median :-0.8223      
##  Mean   :-0.5482         Mean   :-0.6494          Mean   :-0.6052      
##  3rd Qu.:-0.1201         3rd Qu.:-0.2956          3rd Qu.:-0.2454      
##  Max.   : 1.0000         Max.   : 1.0000          Max.   : 1.0000      
##  time.bodygyrojerkmag.mean freq.bodyacc.mean.x freq.bodyacc.mean.y
##  Min.   :-1.0000           Min.   :-1.0000     Min.   :-1.0000    
##  1st Qu.:-0.9923           1st Qu.:-0.9913     1st Qu.:-0.9792    
##  Median :-0.9559           Median :-0.9456     Median :-0.8643    
##  Mean   :-0.7621           Mean   :-0.6228     Mean   :-0.5375    
##  3rd Qu.:-0.5499           3rd Qu.:-0.2646     3rd Qu.:-0.1032    
##  Max.   : 1.0000           Max.   : 1.0000     Max.   : 1.0000    
##  freq.bodyacc.mean.z freq.bodyacc.meanfreq.x freq.bodyacc.meanfreq.y
##  Min.   :-1.0000     Min.   :-1.00000        Min.   :-1.000000      
##  1st Qu.:-0.9832     1st Qu.:-0.41878        1st Qu.:-0.144772      
##  Median :-0.8954     Median :-0.23825        Median : 0.004666      
##  Mean   :-0.6650     Mean   :-0.22147        Mean   : 0.015401      
##  3rd Qu.:-0.3662     3rd Qu.:-0.02043        3rd Qu.: 0.176603      
##  Max.   : 1.0000     Max.   : 1.00000        Max.   : 1.000000      
##  freq.bodyacc.meanfreq.z freq.bodyaccjerk.mean.x freq.bodyaccjerk.mean.y
##  Min.   :-1.00000        Min.   :-1.0000         Min.   :-1.0000        
##  1st Qu.:-0.13845        1st Qu.:-0.9912         1st Qu.:-0.9848        
##  Median : 0.06084        Median :-0.9516         Median :-0.9257        
##  Mean   : 0.04731        Mean   :-0.6567         Mean   :-0.6290        
##  3rd Qu.: 0.24922        3rd Qu.:-0.3270         3rd Qu.:-0.2638        
##  Max.   : 1.00000        Max.   : 1.0000         Max.   : 1.0000        
##  freq.bodyaccjerk.mean.z freq.bodyaccjerk.meanfreq.x
##  Min.   :-1.0000         Min.   :-1.00000           
##  1st Qu.:-0.9873         1st Qu.:-0.29770           
##  Median :-0.9475         Median :-0.04544           
##  Mean   :-0.7436         Mean   :-0.04771           
##  3rd Qu.:-0.5133         3rd Qu.: 0.20447           
##  Max.   : 1.0000         Max.   : 1.00000           
##  freq.bodyaccjerk.meanfreq.y freq.bodyaccjerk.meanfreq.z
##  Min.   :-1.000000           Min.   :-1.00000           
##  1st Qu.:-0.427951           1st Qu.:-0.33139           
##  Median :-0.236530           Median :-0.10246           
##  Mean   :-0.213393           Mean   :-0.12383           
##  3rd Qu.: 0.008651           3rd Qu.: 0.09124           
##  Max.   : 1.000000           Max.   : 1.00000           
##  freq.bodygyro.mean.x freq.bodygyro.mean.y freq.bodygyro.mean.z
##  Min.   :-1.0000      Min.   :-1.0000      Min.   :-1.0000     
##  1st Qu.:-0.9853      1st Qu.:-0.9847      1st Qu.:-0.9851     
##  Median :-0.8917      Median :-0.9197      Median :-0.8877     
##  Mean   :-0.6721      Mean   :-0.7062      Mean   :-0.6442     
##  3rd Qu.:-0.3837      3rd Qu.:-0.4735      3rd Qu.:-0.3225     
##  Max.   : 1.0000      Max.   : 1.0000      Max.   : 1.0000     
##  freq.bodygyro.meanfreq.x freq.bodygyro.meanfreq.y
##  Min.   :-1.00000         Min.   :-1.00000        
##  1st Qu.:-0.27189         1st Qu.:-0.36257        
##  Median :-0.09868         Median :-0.17298        
##  Mean   :-0.10104         Mean   :-0.17428        
##  3rd Qu.: 0.06810         3rd Qu.: 0.01366        
##  Max.   : 1.00000         Max.   : 1.00000        
##  freq.bodygyro.meanfreq.z freq.bodyaccmag.mean freq.bodyaccmag.meanfreq
##  Min.   :-1.00000         Min.   :-1.0000      Min.   :-1.00000        
##  1st Qu.:-0.23240         1st Qu.:-0.9847      1st Qu.:-0.09663        
##  Median :-0.05369         Median :-0.8755      Median : 0.07026        
##  Mean   :-0.05139         Mean   :-0.5860      Mean   : 0.07688        
##  3rd Qu.: 0.12251         3rd Qu.:-0.2173      3rd Qu.: 0.24495        
##  Max.   : 1.00000         Max.   : 1.0000      Max.   : 1.00000        
##  freq.bodybodyaccjerkmag.mean freq.bodybodyaccjerkmag.meanfreq
##  Min.   :-1.0000              Min.   :-1.000000               
##  1st Qu.:-0.9898              1st Qu.:-0.002959               
##  Median :-0.9290              Median : 0.164180               
##  Mean   :-0.6208              Mean   : 0.173220               
##  3rd Qu.:-0.2600              3rd Qu.: 0.357307               
##  Max.   : 1.0000              Max.   : 1.000000               
##  freq.bodybodygyromag.mean freq.bodybodygyromag.meanfreq
##  Min.   :-1.0000           Min.   :-1.00000             
##  1st Qu.:-0.9825           1st Qu.:-0.23436             
##  Median :-0.8756           Median :-0.05210             
##  Mean   :-0.6974           Mean   :-0.04156             
##  3rd Qu.:-0.4514           3rd Qu.: 0.15158             
##  Max.   : 1.0000           Max.   : 1.00000             
##  freq.bodybodygyrojerkmag.mean freq.bodybodygyrojerkmag.meanfreq
##  Min.   :-1.0000               Min.   :-1.00000                 
##  1st Qu.:-0.9921               1st Qu.:-0.01948                 
##  Median :-0.9453               Median : 0.13625                 
##  Mean   :-0.7798               Mean   : 0.12671                 
##  3rd Qu.:-0.6122               3rd Qu.: 0.28896                 
##  Max.   : 1.0000               Max.   : 1.00000                 
##  time.bodyacc.std.x time.bodyacc.std.y time.bodyacc.std.z
##  Min.   :-1.0000    Min.   :-1.00000   Min.   :-1.0000   
##  1st Qu.:-0.9924    1st Qu.:-0.97699   1st Qu.:-0.9791   
##  Median :-0.9430    Median :-0.83503   Median :-0.8508   
##  Mean   :-0.6078    Mean   :-0.51019   Mean   :-0.6131   
##  3rd Qu.:-0.2503    3rd Qu.:-0.05734   3rd Qu.:-0.2787   
##  Max.   : 1.0000    Max.   : 1.00000   Max.   : 1.0000   
##  time.gravityacc.std.x time.gravityacc.std.y time.gravityacc.std.z
##  Min.   :-1.0000       Min.   :-1.0000       Min.   :-1.0000      
##  1st Qu.:-0.9949       1st Qu.:-0.9913       1st Qu.:-0.9866      
##  Median :-0.9819       Median :-0.9759       Median :-0.9665      
##  Mean   :-0.9652       Mean   :-0.9544       Mean   :-0.9389      
##  3rd Qu.:-0.9615       3rd Qu.:-0.9464       3rd Qu.:-0.9296      
##  Max.   : 1.0000       Max.   : 1.0000       Max.   : 1.0000      
##  time.bodyaccjerk.std.x time.bodyaccjerk.std.y time.bodyaccjerk.std.z
##  Min.   :-1.0000        Min.   :-1.0000        Min.   :-1.0000       
##  1st Qu.:-0.9913        1st Qu.:-0.9850        1st Qu.:-0.9892       
##  Median :-0.9513        Median :-0.9250        Median :-0.9543       
##  Mean   :-0.6398        Mean   :-0.6080        Mean   :-0.7628       
##  3rd Qu.:-0.2912        3rd Qu.:-0.2218        3rd Qu.:-0.5485       
##  Max.   : 1.0000        Max.   : 1.0000        Max.   : 1.0000       
##  time.bodygyro.std.x time.bodygyro.std.y time.bodygyro.std.z
##  Min.   :-1.0000     Min.   :-1.0000     Min.   :-1.0000    
##  1st Qu.:-0.9872     1st Qu.:-0.9819     1st Qu.:-0.9850    
##  Median :-0.9016     Median :-0.9106     Median :-0.8819    
##  Mean   :-0.7212     Mean   :-0.6827     Mean   :-0.6537    
##  3rd Qu.:-0.4822     3rd Qu.:-0.4461     3rd Qu.:-0.3379    
##  Max.   : 1.0000     Max.   : 1.0000     Max.   : 1.0000    
##  time.bodygyrojerk.std.x time.bodygyrojerk.std.y time.bodygyrojerk.std.z
##  Min.   :-1.0000         Min.   :-1.0000         Min.   :-1.0000        
##  1st Qu.:-0.9907         1st Qu.:-0.9922         1st Qu.:-0.9926        
##  Median :-0.9348         Median :-0.9548         Median :-0.9503        
##  Mean   :-0.7313         Mean   :-0.7861         Mean   :-0.7399        
##  3rd Qu.:-0.4865         3rd Qu.:-0.6268         3rd Qu.:-0.5097        
##  Max.   : 1.0000         Max.   : 1.0000         Max.   : 1.0000        
##  time.bodyaccmag.std time.gravityaccmag.std time.bodyaccjerkmag.std
##  Min.   :-1.0000     Min.   :-1.0000        Min.   :-1.0000        
##  1st Qu.:-0.9822     1st Qu.:-0.9822        1st Qu.:-0.9907        
##  Median :-0.8437     Median :-0.8437        Median :-0.9288        
##  Mean   :-0.5912     Mean   :-0.5912        Mean   :-0.6278        
##  3rd Qu.:-0.2423     3rd Qu.:-0.2423        3rd Qu.:-0.2733        
##  Max.   : 1.0000     Max.   : 1.0000        Max.   : 1.0000        
##  time.bodygyromag.std time.bodygyrojerkmag.std freq.bodyacc.std.x
##  Min.   :-1.0000      Min.   :-1.0000          Min.   :-1.0000   
##  1st Qu.:-0.9775      1st Qu.:-0.9922          1st Qu.:-0.9929   
##  Median :-0.8259      Median :-0.9403          Median :-0.9416   
##  Mean   :-0.6625      Mean   :-0.7780          Mean   :-0.6034   
##  3rd Qu.:-0.3940      3rd Qu.:-0.6093          3rd Qu.:-0.2493   
##  Max.   : 1.0000      Max.   : 1.0000          Max.   : 1.0000   
##  freq.bodyacc.std.y freq.bodyacc.std.z freq.bodyaccjerk.std.x
##  Min.   :-1.00000   Min.   :-1.0000    Min.   :-1.0000       
##  1st Qu.:-0.97689   1st Qu.:-0.9780    1st Qu.:-0.9920       
##  Median :-0.83261   Median :-0.8398    Median :-0.9562       
##  Mean   :-0.52842   Mean   :-0.6179    Mean   :-0.6550       
##  3rd Qu.:-0.09216   3rd Qu.:-0.3023    3rd Qu.:-0.3203       
##  Max.   : 1.00000   Max.   : 1.0000    Max.   : 1.0000       
##  freq.bodyaccjerk.std.y freq.bodyaccjerk.std.z freq.bodygyro.std.x
##  Min.   :-1.0000        Min.   :-1.0000        Min.   :-1.0000    
##  1st Qu.:-0.9865        1st Qu.:-0.9895        1st Qu.:-0.9881    
##  Median :-0.9280        Median :-0.9590        Median :-0.9053    
##  Mean   :-0.6122        Mean   :-0.7809        Mean   :-0.7386    
##  3rd Qu.:-0.2361        3rd Qu.:-0.5903        3rd Qu.:-0.5225    
##  Max.   : 1.0000        Max.   : 1.0000        Max.   : 1.0000    
##  freq.bodygyro.std.y freq.bodygyro.std.z freq.bodyaccmag.std
##  Min.   :-1.0000     Min.   :-1.0000     Min.   :-1.0000    
##  1st Qu.:-0.9808     1st Qu.:-0.9862     1st Qu.:-0.9829    
##  Median :-0.9061     Median :-0.8915     Median :-0.8547    
##  Mean   :-0.6742     Mean   :-0.6904     Mean   :-0.6595    
##  3rd Qu.:-0.4385     3rd Qu.:-0.4168     3rd Qu.:-0.3823    
##  Max.   : 1.0000     Max.   : 1.0000     Max.   : 1.0000    
##  freq.bodybodyaccjerkmag.std freq.bodybodygyromag.std
##  Min.   :-1.0000             Min.   :-1.0000         
##  1st Qu.:-0.9907             1st Qu.:-0.9781         
##  Median :-0.9255             Median :-0.8275         
##  Mean   :-0.6401             Mean   :-0.7000         
##  3rd Qu.:-0.3082             3rd Qu.:-0.4713         
##  Max.   : 1.0000             Max.   : 1.0000         
##  freq.bodybodygyrojerkmag.std
##  Min.   :-1.0000             
##  1st Qu.:-0.9926             
##  Median :-0.9382             
##  Mean   :-0.7922             
##  3rd Qu.:-0.6437             
##  Max.   : 1.0000

No variable seem to be skewed or shows an outlier.

sum(is.na(table))
## [1] 0

There doesn’t seem to have any “NA” value at the dataset.


Export

Now that we have a tidy and comprehensible dataset and we have checked for missing and outliers, we can proceed to the final step, exporting as .csv.

write.csv(table, "./export/tidy_UCI_HAR_Dataset.csv")