In this project we go step by step over data importing and table structuring and cleaning. The final result is a .csv file with a table that is tidy, comprehensible, has no missing values or outliers and can be readly imported and used.
For this We will use the “Human Activity Recognition (HAR) Using Smartphones Data Set” collected by CETpD and SmartLab. The dataset is built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.
The dataset is comprised of 269 MB of .txt files. Our objective is to assemble this dataset and ensure that it is clean.
The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.
The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.
Please visit the data website for further details.
download.url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip"
download.path <- "./UCI HAR Dataset.zip"
# Check if file exists, if not, downloads the file
if (!file.exists(download.path)) {
download.file(download.url, download.path)
}
# Check if directory exists, if not, creates it by unzipping the downloaded file
unzip.path <- "UCI HAR Dataset"
if (!file.exists(unzip.path)) {
unzip(download.path)
}
# Lists all .txt files in the directory
all.files.paths <- list.files(path = "./UCI HAR Dataset",pattern = ".txt", recursive = T, full.names = T)
| Paths | File.Description |
|---|---|
| ./UCI HAR Dataset/activity_labels.txt | Links the class labels with their activity name. |
| ./UCI HAR Dataset/features.txt | List of all features |
| ./UCI HAR Dataset/features_info.txt | Shows information about the variables used on the feature vector. |
| ./UCI HAR Dataset/README.txt | README file |
| ./UCI HAR Dataset/test/Inertial Signals/body_acc_x_test.txt | Body Acceleration Signals |
| ./UCI HAR Dataset/test/Inertial Signals/body_acc_y_test.txt | Body Acceleration Signals |
| ./UCI HAR Dataset/test/Inertial Signals/body_acc_z_test.txt | Body Acceleration Signals |
| ./UCI HAR Dataset/test/Inertial Signals/body_gyro_x_test.txt | Angular velocity measurements |
| ./UCI HAR Dataset/test/Inertial Signals/body_gyro_y_test.txt | Angular velocity measurements |
| ./UCI HAR Dataset/test/Inertial Signals/body_gyro_z_test.txt | Angular velocity measurements |
| ./UCI HAR Dataset/test/Inertial Signals/total_acc_x_test.txt | Total Acceleration Signals |
| ./UCI HAR Dataset/test/Inertial Signals/total_acc_y_test.txt | Total Acceleration Signals |
| ./UCI HAR Dataset/test/Inertial Signals/total_acc_z_test.txt | Total Acceleration Signals |
| ./UCI HAR Dataset/test/subject_test.txt | Each row identifies the subject who performed the activity |
| ./UCI HAR Dataset/test/X_test.txt | Compiled Test Set |
| ./UCI HAR Dataset/test/y_test.txt | Test Labels |
| ./UCI HAR Dataset/train/Inertial Signals/body_acc_x_train.txt | Body Acceleration Signals |
| ./UCI HAR Dataset/train/Inertial Signals/body_acc_y_train.txt | Body Acceleration Signals |
| ./UCI HAR Dataset/train/Inertial Signals/body_acc_z_train.txt | Body Acceleration Signals |
| ./UCI HAR Dataset/train/Inertial Signals/body_gyro_x_train.txt | Angular velocity measurements |
| ./UCI HAR Dataset/train/Inertial Signals/body_gyro_y_train.txt | Angular velocity measurements |
| ./UCI HAR Dataset/train/Inertial Signals/body_gyro_z_train.txt | Angular velocity measurements |
| ./UCI HAR Dataset/train/Inertial Signals/total_acc_x_train.txt | Total Acceleration Signals |
| ./UCI HAR Dataset/train/Inertial Signals/total_acc_y_train.txt | Total Acceleration Signals |
| ./UCI HAR Dataset/train/Inertial Signals/total_acc_z_train.txt | Total Acceleration Signals |
| ./UCI HAR Dataset/train/subject_train.txt | Each row identifies the subject who performed the activity |
| ./UCI HAR Dataset/train/X_train.txt | Compiled Train Set |
| ./UCI HAR Dataset/train/y_train.txt | Train Labels |
Once the files are downloaded and unzipped, we notice that it came in several .txt files (as shown above). Taking a further look on the data website and README.txt file, we can begin to have an idea of how the final table might be structured.
It seems that:
This allows us to dispose of a lot of data, leaving us to:
# Pick only the necessary file paths
all.files.paths <- all.files.paths[c(1,2,14:16,26:28)]
# Data import as data.table
raw.data <- lapply(X = all.files.paths, FUN = fread)
| Paths | File.Description |
|---|---|
| ./UCI HAR Dataset/activity_labels.txt | Links the class labels with their activity name. |
| ./UCI HAR Dataset/features.txt | List of all features |
| ./UCI HAR Dataset/test/subject_test.txt | Each row identifies the subject who performed the activity |
| ./UCI HAR Dataset/test/X_test.txt | Compiled Test Set |
| ./UCI HAR Dataset/test/y_test.txt | Test Labels |
| ./UCI HAR Dataset/train/subject_train.txt | Each row identifies the subject who performed the activity |
| ./UCI HAR Dataset/train/X_train.txt | Compiled Train Set |
| ./UCI HAR Dataset/train/y_train.txt | Train Labels |
Now that the required files have been imported. We want to have an idea of how they are going to be structured. The image below is a representation of how each txt file has to be bound on the final dataset.
test.table <- bind_cols(raw.data[[3]], raw.data[[5]], Set = rep("Test set", times = dim(raw.data[[3]])[1]), raw.data[[4]])
train.table <- bind_cols(raw.data[[6]], raw.data[[8]], Set = rep("Train set", times = dim(raw.data[[6]])[1]), raw.data[[7]])
table <- bind_rows(test.table, train.table)
# Set up the column names list
col.names <- c("Subject",
"Activity",
"Set",
unlist(raw.data[[2]][,2]))
Notice that we haven’t assigned the col.names to the table yet. That is because if we take a further look at it we will see that it actually has a lot of duplicate values.
# Check for duplicates
table(duplicated(col.names))
##
## FALSE TRUE
## 480 84
Duplicated values are a problem because they do not allow packages likes dplyr to work properly. So, in order to make all values unique, we will add an exclamation point and idnumber to the end of each column name.
# Add a !# at the end of every variable name. This will avoid duplicates
col.names <- paste0(col.names, rep("!", times = dim(table)[2]), 1:dim(table)[2])
# Assign the list to be the column name of the table
colnames(table) <- col.names
There are 564 variables in the test/train set. If we are to apply a machine learning algorithm to that whole dataset a lot of computational power and time are going to be spent. So one of our tasks will be to extract only the mean and std variables of the dataset. This greatly reduces the computational power required to run the code, while not compromising the final result.
# Subset the variables corresponding to means and standard deviations
table.subset <- table %>%
select(starts_with("t"), starts_with("f")) %>%
select(contains("mean"), contains("std"))
# Replace the previous table with the subsetted one
table <- bind_cols(table[,1:3],table.subset)
Let’s take a look at the column names we have just assigned:
| V1 | V2 | V3 |
|---|---|---|
| Subject!1 | fBodyAcc-meanFreq()-Z!299 | tBodyAccJerk-std()-Y!88 |
| Activity!2 | fBodyAccJerk-mean()-X!348 | tBodyAccJerk-std()-Z!89 |
| Set!3 | fBodyAccJerk-mean()-Y!349 | tBodyGyro-std()-X!127 |
| tBodyAcc-mean()-X!4 | fBodyAccJerk-mean()-Z!350 | tBodyGyro-std()-Y!128 |
| tBodyAcc-mean()-Y!5 | fBodyAccJerk-meanFreq()-X!376 | tBodyGyro-std()-Z!129 |
| tBodyAcc-mean()-Z!6 | fBodyAccJerk-meanFreq()-Y!377 | tBodyGyroJerk-std()-X!167 |
| tGravityAcc-mean()-X!44 | fBodyAccJerk-meanFreq()-Z!378 | tBodyGyroJerk-std()-Y!168 |
| tGravityAcc-mean()-Y!45 | fBodyGyro-mean()-X!427 | tBodyGyroJerk-std()-Z!169 |
| tGravityAcc-mean()-Z!46 | fBodyGyro-mean()-Y!428 | tBodyAccMag-std()!205 |
| tBodyAccJerk-mean()-X!84 | fBodyGyro-mean()-Z!429 | tGravityAccMag-std()!218 |
| tBodyAccJerk-mean()-Y!85 | fBodyGyro-meanFreq()-X!455 | tBodyAccJerkMag-std()!231 |
| tBodyAccJerk-mean()-Z!86 | fBodyGyro-meanFreq()-Y!456 | tBodyGyroMag-std()!244 |
| tBodyGyro-mean()-X!124 | fBodyGyro-meanFreq()-Z!457 | tBodyGyroJerkMag-std()!257 |
| tBodyGyro-mean()-Y!125 | fBodyAccMag-mean()!506 | fBodyAcc-std()-X!272 |
| tBodyGyro-mean()-Z!126 | fBodyAccMag-meanFreq()!516 | fBodyAcc-std()-Y!273 |
| tBodyGyroJerk-mean()-X!164 | fBodyBodyAccJerkMag-mean()!519 | fBodyAcc-std()-Z!274 |
| tBodyGyroJerk-mean()-Y!165 | fBodyBodyAccJerkMag-meanFreq()!529 | fBodyAccJerk-std()-X!351 |
| tBodyGyroJerk-mean()-Z!166 | fBodyBodyGyroMag-mean()!532 | fBodyAccJerk-std()-Y!352 |
| tBodyAccMag-mean()!204 | fBodyBodyGyroMag-meanFreq()!542 | fBodyAccJerk-std()-Z!353 |
| tGravityAccMag-mean()!217 | fBodyBodyGyroJerkMag-mean()!545 | fBodyGyro-std()-X!430 |
| tBodyAccJerkMag-mean()!230 | fBodyBodyGyroJerkMag-meanFreq()!555 | fBodyGyro-std()-Y!431 |
| tBodyGyroMag-mean()!243 | tBodyAcc-std()-X!7 | fBodyGyro-std()-Z!432 |
| tBodyGyroJerkMag-mean()!256 | tBodyAcc-std()-Y!8 | fBodyAccMag-std()!507 |
| fBodyAcc-mean()-X!269 | tBodyAcc-std()-Z!9 | fBodyBodyAccJerkMag-std()!520 |
| fBodyAcc-mean()-Y!270 | tGravityAcc-std()-X!47 | fBodyBodyGyroMag-std()!533 |
| fBodyAcc-mean()-Z!271 | tGravityAcc-std()-Y!48 | fBodyBodyGyroJerkMag-std()!546 |
| fBodyAcc-meanFreq()-X!297 | tGravityAcc-std()-Z!49 | |
| fBodyAcc-meanFreq()-Y!298 | tBodyAccJerk-std()-X!87 |
As we want to export a tidy data that is fully comprehensible, we must ensure that our variables names are more descriptive.
col.names <- colnames(table)
col.names <- tolower(col.names) # All to lower
col.names <- gsub("^t", "time.", col.names) # Replace t by "time"
col.names <- gsub("^f", "freq.", col.names) # Replace f by "frequency"
col.names <- gsub("-", ".", col.names) # Replace "-" by "."
col.names <- gsub('\\()', "", col.names) # Remove all "()" characters
col.names <- gsub('\\!.*', "", col.names) # Remove markers
# As all markes have been removed, lets check if there are duplicate values on the column names
table(duplicated(col.names))
##
## FALSE
## 82
# Reassign the list to the column names
colnames(table) <- col.names
The activity column in our table is comprised of numbers, each number indicates which activity has been performed by the subject at the given row. As it is, the table states that different activities have been performed, but doesn’t make clear which activity was performed.
# Check what are the values in the "activity column"
table(table[,activity])
##
## 1 2 3 4 5 6
## 1722 1544 1406 1777 1906 1944
Luckily, the activity.txt file contains a list linking each number with it corresponding activity.
We will use the numbers as keys to link each row with its activity, joining both tables. This will make our final table fully comprehensible and complete.
# Join activity table and massive table to add the activities
activity.table <- raw.data[[1]]
table <- merge(activity.table, table, by.x = "V1", by.y = "activity")[,-1]
colnames(table)[1] <- "activity"
# Recheck the "activity column"
table(table[, activity])
##
## LAYING SITTING STANDING
## 1944 1777 1906
## WALKING WALKING_DOWNSTAIRS WALKING_UPSTAIRS
## 1722 1406 1544
Now that we have our dataset, we want take a look at what is inside to ensure that it is “clean” for the final user.
str(table)
## Classes 'data.table' and 'data.frame': 10299 obs. of 82 variables:
## $ activity : chr "WALKING" "WALKING" "WALKING" "WALKING" ...
## $ subject : int 2 2 2 2 2 2 2 2 2 2 ...
## $ set : chr "Test set" "Test set" "Test set" "Test set" ...
## $ time.bodyacc.mean.x : num 0.204 0.249 0.325 0.309 0.266 ...
## $ time.bodyacc.mean.y : num -0.03234 -0.00341 -0.0298 -0.02213 -0.01594 ...
## $ time.bodyacc.mean.z : num -0.0969 -0.056 -0.0778 -0.1321 -0.1207 ...
## $ time.gravityacc.mean.x : num 0.92 0.913 0.921 0.925 0.926 ...
## $ time.gravityacc.mean.y : num -0.347 -0.352 -0.352 -0.352 -0.348 ...
## $ time.gravityacc.mean.z : num 0.0197 0.025 0.0301 0.0268 0.0218 ...
## $ time.bodyaccjerk.mean.x : num -0.077 0.1098 0.2018 -0.1578 0.0362 ...
## $ time.bodyaccjerk.mean.y : num 0.1975 0.242 -0.0531 -0.4142 0.0659 ...
## $ time.bodyaccjerk.mean.z : num 0.17867 0.38011 -0.22104 -0.23184 0.00615 ...
## $ time.bodygyro.mean.x : num -0.2783 -0.1484 -0.0143 0.0295 0.025 ...
## $ time.bodygyro.mean.y : num 0.2344 0.032 -0.0765 -0.0873 -0.1465 ...
## $ time.bodygyro.mean.z : num 0.0817 0.1259 0.1062 0.1063 0.0988 ...
## $ time.bodygyrojerk.mean.x : num -0.0568 -0.0294 0.1152 -0.0182 -0.128 ...
## $ time.bodygyrojerk.mean.y : num -0.196 -0.0706 0.0933 -0.1509 -0.2406 ...
## $ time.bodygyrojerk.mean.z : num 0.11924 -0.2054 -0.1921 -0.01534 0.00789 ...
## $ time.bodyaccmag.mean : num -0.356 -0.296 -0.278 -0.291 -0.344 ...
## $ time.gravityaccmag.mean : num -0.356 -0.296 -0.278 -0.291 -0.344 ...
## $ time.bodyaccjerkmag.mean : num -0.391 -0.295 -0.268 -0.318 -0.4 ...
## $ time.bodygyromag.mean : num -0.417 -0.46 -0.482 -0.482 -0.513 ...
## $ time.bodygyrojerkmag.mean : num -0.652 -0.638 -0.641 -0.645 -0.661 ...
## $ freq.bodyacc.mean.x : num -0.362 -0.344 -0.373 -0.443 -0.458 ...
## $ freq.bodyacc.mean.y : num -0.121 -0.1052 0.0859 0.1167 -0.1556 ...
## $ freq.bodyacc.mean.z : num -0.521 -0.458 -0.402 -0.375 -0.483 ...
## $ freq.bodyacc.meanfreq.x : num -0.2387 -0.2067 -0.0966 -0.0414 -0.3042 ...
## $ freq.bodyacc.meanfreq.y : num 0.1114 0.1512 0.2123 0.2108 -0.0106 ...
## $ freq.bodyacc.meanfreq.z : num -0.00924 0.42195 -0.01455 0.18375 -0.16149 ...
## $ freq.bodyaccjerk.mean.x : num -0.351 -0.289 -0.332 -0.45 -0.45 ...
## $ freq.bodyaccjerk.mean.y : num -0.2039 -0.1005 0.0728 0.029 -0.2658 ...
## $ freq.bodyaccjerk.mean.z : num -0.648 -0.624 -0.532 -0.539 -0.604 ...
## $ freq.bodyaccjerk.meanfreq.x : num -0.1619 0.0165 0.0103 0.0952 -0.0783 ...
## $ freq.bodyaccjerk.meanfreq.y : num -0.2491 -0.0751 -0.2732 -0.4531 -0.36 ...
## $ freq.bodyaccjerk.meanfreq.z : num -0.339 -0.233 -0.344 -0.13 -0.202 ...
## $ freq.bodygyro.mean.x : num -0.513 -0.496 -0.425 -0.524 -0.537 ...
## $ freq.bodygyro.mean.y : num -0.568 -0.588 -0.568 -0.59 -0.577 ...
## $ freq.bodygyro.mean.z : num -0.433 -0.432 -0.426 -0.487 -0.534 ...
## $ freq.bodygyro.meanfreq.x : num -0.1338 -0.0983 0.2521 -0.2303 -0.0918 ...
## $ freq.bodygyro.meanfreq.y : num -0.2035 -0.2866 -0.0591 -0.2582 -0.0289 ...
## $ freq.bodygyro.meanfreq.z : num 0.0638 0.045 0.0683 0.0435 0.041 ...
## $ freq.bodyaccmag.mean : num -0.368 -0.372 -0.299 -0.331 -0.475 ...
## $ freq.bodyaccmag.meanfreq : num 0.163 0.39 0.428 0.454 0.332 ...
## $ freq.bodybodyaccjerkmag.mean : num -0.332 -0.225 -0.135 -0.198 -0.4 ...
## $ freq.bodybodyaccjerkmag.meanfreq : num 0.0317 0.1569 0.0162 0.0454 0.0831 ...
## $ freq.bodybodygyromag.mean : num -0.574 -0.633 -0.571 -0.566 -0.595 ...
## $ freq.bodybodygyromag.meanfreq : num 0.0185 0.1961 0.25 0.1382 0.2023 ...
## $ freq.bodybodygyrojerkmag.mean : num -0.69 -0.679 -0.674 -0.68 -0.683 ...
## $ freq.bodybodygyrojerkmag.meanfreq: num 0.109 0.267 0.372 0.26 0.324 ...
## $ time.bodyacc.std.x : num -0.466 -0.399 -0.465 -0.515 -0.501 ...
## $ time.bodyacc.std.y : num -0.1808 -0.1378 0.0135 0.032 -0.1632 ...
## $ time.bodyacc.std.z : num -0.455 -0.461 -0.368 -0.349 -0.389 ...
## $ time.gravityacc.std.x : num -0.954 -0.97 -0.961 -0.979 -0.971 ...
## $ time.gravityacc.std.y : num -0.947 -0.973 -0.985 -0.983 -0.977 ...
## $ time.gravityacc.std.z : num -0.976 -0.969 -0.97 -0.959 -0.982 ...
## $ time.bodyaccjerk.std.x : num -0.369 -0.257 -0.333 -0.427 -0.44 ...
## $ time.bodyaccjerk.std.y : num -0.1931 -0.0974 0.1332 0.0887 -0.2381 ...
## $ time.bodyaccjerk.std.z : num -0.678 -0.663 -0.587 -0.575 -0.622 ...
## $ time.bodygyro.std.x : num -0.615 -0.603 -0.594 -0.587 -0.615 ...
## $ time.bodygyro.std.y : num -0.543 -0.545 -0.557 -0.537 -0.555 ...
## $ time.bodygyro.std.z : num -0.5 -0.489 -0.51 -0.553 -0.583 ...
## $ time.bodygyrojerk.std.x : num -0.543 -0.534 -0.532 -0.516 -0.51 ...
## $ time.bodygyrojerk.std.y : num -0.739 -0.71 -0.73 -0.733 -0.747 ...
## $ time.bodygyrojerk.std.z : num -0.563 -0.581 -0.537 -0.55 -0.619 ...
## $ time.bodyaccmag.std : num -0.444 -0.439 -0.416 -0.449 -0.52 ...
## $ time.gravityaccmag.std : num -0.444 -0.439 -0.416 -0.449 -0.52 ...
## $ time.bodyaccjerkmag.std : num -0.318 -0.245 -0.121 -0.176 -0.416 ...
## $ time.bodygyromag.std : num -0.546 -0.605 -0.612 -0.574 -0.585 ...
## $ time.bodygyrojerkmag.std : num -0.706 -0.692 -0.69 -0.681 -0.698 ...
## $ freq.bodyacc.std.x : num -0.512 -0.423 -0.506 -0.546 -0.519 ...
## $ freq.bodyacc.std.y : num -0.2668 -0.2105 -0.0923 -0.0824 -0.2201 ...
## $ freq.bodyacc.std.z : num -0.462 -0.506 -0.398 -0.386 -0.387 ...
## $ freq.bodyaccjerk.std.x : num -0.45 -0.289 -0.396 -0.453 -0.48 ...
## $ freq.bodyaccjerk.std.y : num -0.238 -0.159 0.124 0.082 -0.259 ...
## $ freq.bodyaccjerk.std.z : num -0.705 -0.702 -0.641 -0.608 -0.637 ...
## $ freq.bodygyro.std.x : num -0.647 -0.637 -0.649 -0.608 -0.64 ...
## $ freq.bodygyro.std.y : num -0.531 -0.523 -0.552 -0.51 -0.545 ...
## $ freq.bodygyro.std.z : num -0.57 -0.556 -0.585 -0.618 -0.639 ...
## $ freq.bodyaccmag.std : num -0.577 -0.567 -0.585 -0.614 -0.621 ...
## $ freq.bodybodyaccjerkmag.std : num -0.304 -0.273 -0.109 -0.154 -0.437 ...
## $ freq.bodybodygyromag.std : num -0.604 -0.652 -0.716 -0.655 -0.649 ...
## $ freq.bodybodygyrojerkmag.std : num -0.75 -0.732 -0.736 -0.704 -0.741 ...
## - attr(*, ".internal.selfref")=<externalptr>
All columns seem to be properly classified. Both activity and set columns are “characters”, while the subject column is classified as integer. All the rest refer to experimental measurements and are classified as “number”.
summary(table)
## activity subject set time.bodyacc.mean.x
## Length:10299 Min. : 1.00 Length:10299 Min. :-1.0000
## Class :character 1st Qu.: 9.00 Class :character 1st Qu.: 0.2626
## Mode :character Median :17.00 Mode :character Median : 0.2772
## Mean :16.15 Mean : 0.2743
## 3rd Qu.:24.00 3rd Qu.: 0.2884
## Max. :30.00 Max. : 1.0000
## time.bodyacc.mean.y time.bodyacc.mean.z time.gravityacc.mean.x
## Min. :-1.00000 Min. :-1.00000 Min. :-1.0000
## 1st Qu.:-0.02490 1st Qu.:-0.12102 1st Qu.: 0.8117
## Median :-0.01716 Median :-0.10860 Median : 0.9218
## Mean :-0.01774 Mean :-0.10892 Mean : 0.6692
## 3rd Qu.:-0.01062 3rd Qu.:-0.09759 3rd Qu.: 0.9547
## Max. : 1.00000 Max. : 1.00000 Max. : 1.0000
## time.gravityacc.mean.y time.gravityacc.mean.z time.bodyaccjerk.mean.x
## Min. :-1.000000 Min. :-1.00000 Min. :-1.00000
## 1st Qu.:-0.242943 1st Qu.:-0.11671 1st Qu.: 0.06298
## Median :-0.143551 Median : 0.03680 Median : 0.07597
## Mean : 0.004039 Mean : 0.09215 Mean : 0.07894
## 3rd Qu.: 0.118905 3rd Qu.: 0.21621 3rd Qu.: 0.09131
## Max. : 1.000000 Max. : 1.00000 Max. : 1.00000
## time.bodyaccjerk.mean.y time.bodyaccjerk.mean.z time.bodygyro.mean.x
## Min. :-1.000000 Min. :-1.000000 Min. :-1.00000
## 1st Qu.:-0.018555 1st Qu.:-0.031552 1st Qu.:-0.04579
## Median : 0.010753 Median :-0.001159 Median :-0.02776
## Mean : 0.007948 Mean :-0.004675 Mean :-0.03098
## 3rd Qu.: 0.033538 3rd Qu.: 0.024578 3rd Qu.:-0.01058
## Max. : 1.000000 Max. : 1.000000 Max. : 1.00000
## time.bodygyro.mean.y time.bodygyro.mean.z time.bodygyrojerk.mean.x
## Min. :-1.00000 Min. :-1.00000 Min. :-1.00000
## 1st Qu.:-0.10399 1st Qu.: 0.06485 1st Qu.:-0.11723
## Median :-0.07477 Median : 0.08626 Median :-0.09824
## Mean :-0.07472 Mean : 0.08836 Mean :-0.09671
## 3rd Qu.:-0.05110 3rd Qu.: 0.11044 3rd Qu.:-0.07930
## Max. : 1.00000 Max. : 1.00000 Max. : 1.00000
## time.bodygyrojerk.mean.y time.bodygyrojerk.mean.z time.bodyaccmag.mean
## Min. :-1.00000 Min. :-1.00000 Min. :-1.0000
## 1st Qu.:-0.05868 1st Qu.:-0.07936 1st Qu.:-0.9819
## Median :-0.04056 Median :-0.05455 Median :-0.8746
## Mean :-0.04232 Mean :-0.05483 Mean :-0.5482
## 3rd Qu.:-0.02521 3rd Qu.:-0.03168 3rd Qu.:-0.1201
## Max. : 1.00000 Max. : 1.00000 Max. : 1.0000
## time.gravityaccmag.mean time.bodyaccjerkmag.mean time.bodygyromag.mean
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9819 1st Qu.:-0.9896 1st Qu.:-0.9781
## Median :-0.8746 Median :-0.9481 Median :-0.8223
## Mean :-0.5482 Mean :-0.6494 Mean :-0.6052
## 3rd Qu.:-0.1201 3rd Qu.:-0.2956 3rd Qu.:-0.2454
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## time.bodygyrojerkmag.mean freq.bodyacc.mean.x freq.bodyacc.mean.y
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9923 1st Qu.:-0.9913 1st Qu.:-0.9792
## Median :-0.9559 Median :-0.9456 Median :-0.8643
## Mean :-0.7621 Mean :-0.6228 Mean :-0.5375
## 3rd Qu.:-0.5499 3rd Qu.:-0.2646 3rd Qu.:-0.1032
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## freq.bodyacc.mean.z freq.bodyacc.meanfreq.x freq.bodyacc.meanfreq.y
## Min. :-1.0000 Min. :-1.00000 Min. :-1.000000
## 1st Qu.:-0.9832 1st Qu.:-0.41878 1st Qu.:-0.144772
## Median :-0.8954 Median :-0.23825 Median : 0.004666
## Mean :-0.6650 Mean :-0.22147 Mean : 0.015401
## 3rd Qu.:-0.3662 3rd Qu.:-0.02043 3rd Qu.: 0.176603
## Max. : 1.0000 Max. : 1.00000 Max. : 1.000000
## freq.bodyacc.meanfreq.z freq.bodyaccjerk.mean.x freq.bodyaccjerk.mean.y
## Min. :-1.00000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.13845 1st Qu.:-0.9912 1st Qu.:-0.9848
## Median : 0.06084 Median :-0.9516 Median :-0.9257
## Mean : 0.04731 Mean :-0.6567 Mean :-0.6290
## 3rd Qu.: 0.24922 3rd Qu.:-0.3270 3rd Qu.:-0.2638
## Max. : 1.00000 Max. : 1.0000 Max. : 1.0000
## freq.bodyaccjerk.mean.z freq.bodyaccjerk.meanfreq.x
## Min. :-1.0000 Min. :-1.00000
## 1st Qu.:-0.9873 1st Qu.:-0.29770
## Median :-0.9475 Median :-0.04544
## Mean :-0.7436 Mean :-0.04771
## 3rd Qu.:-0.5133 3rd Qu.: 0.20447
## Max. : 1.0000 Max. : 1.00000
## freq.bodyaccjerk.meanfreq.y freq.bodyaccjerk.meanfreq.z
## Min. :-1.000000 Min. :-1.00000
## 1st Qu.:-0.427951 1st Qu.:-0.33139
## Median :-0.236530 Median :-0.10246
## Mean :-0.213393 Mean :-0.12383
## 3rd Qu.: 0.008651 3rd Qu.: 0.09124
## Max. : 1.000000 Max. : 1.00000
## freq.bodygyro.mean.x freq.bodygyro.mean.y freq.bodygyro.mean.z
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9853 1st Qu.:-0.9847 1st Qu.:-0.9851
## Median :-0.8917 Median :-0.9197 Median :-0.8877
## Mean :-0.6721 Mean :-0.7062 Mean :-0.6442
## 3rd Qu.:-0.3837 3rd Qu.:-0.4735 3rd Qu.:-0.3225
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## freq.bodygyro.meanfreq.x freq.bodygyro.meanfreq.y
## Min. :-1.00000 Min. :-1.00000
## 1st Qu.:-0.27189 1st Qu.:-0.36257
## Median :-0.09868 Median :-0.17298
## Mean :-0.10104 Mean :-0.17428
## 3rd Qu.: 0.06810 3rd Qu.: 0.01366
## Max. : 1.00000 Max. : 1.00000
## freq.bodygyro.meanfreq.z freq.bodyaccmag.mean freq.bodyaccmag.meanfreq
## Min. :-1.00000 Min. :-1.0000 Min. :-1.00000
## 1st Qu.:-0.23240 1st Qu.:-0.9847 1st Qu.:-0.09663
## Median :-0.05369 Median :-0.8755 Median : 0.07026
## Mean :-0.05139 Mean :-0.5860 Mean : 0.07688
## 3rd Qu.: 0.12251 3rd Qu.:-0.2173 3rd Qu.: 0.24495
## Max. : 1.00000 Max. : 1.0000 Max. : 1.00000
## freq.bodybodyaccjerkmag.mean freq.bodybodyaccjerkmag.meanfreq
## Min. :-1.0000 Min. :-1.000000
## 1st Qu.:-0.9898 1st Qu.:-0.002959
## Median :-0.9290 Median : 0.164180
## Mean :-0.6208 Mean : 0.173220
## 3rd Qu.:-0.2600 3rd Qu.: 0.357307
## Max. : 1.0000 Max. : 1.000000
## freq.bodybodygyromag.mean freq.bodybodygyromag.meanfreq
## Min. :-1.0000 Min. :-1.00000
## 1st Qu.:-0.9825 1st Qu.:-0.23436
## Median :-0.8756 Median :-0.05210
## Mean :-0.6974 Mean :-0.04156
## 3rd Qu.:-0.4514 3rd Qu.: 0.15158
## Max. : 1.0000 Max. : 1.00000
## freq.bodybodygyrojerkmag.mean freq.bodybodygyrojerkmag.meanfreq
## Min. :-1.0000 Min. :-1.00000
## 1st Qu.:-0.9921 1st Qu.:-0.01948
## Median :-0.9453 Median : 0.13625
## Mean :-0.7798 Mean : 0.12671
## 3rd Qu.:-0.6122 3rd Qu.: 0.28896
## Max. : 1.0000 Max. : 1.00000
## time.bodyacc.std.x time.bodyacc.std.y time.bodyacc.std.z
## Min. :-1.0000 Min. :-1.00000 Min. :-1.0000
## 1st Qu.:-0.9924 1st Qu.:-0.97699 1st Qu.:-0.9791
## Median :-0.9430 Median :-0.83503 Median :-0.8508
## Mean :-0.6078 Mean :-0.51019 Mean :-0.6131
## 3rd Qu.:-0.2503 3rd Qu.:-0.05734 3rd Qu.:-0.2787
## Max. : 1.0000 Max. : 1.00000 Max. : 1.0000
## time.gravityacc.std.x time.gravityacc.std.y time.gravityacc.std.z
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9949 1st Qu.:-0.9913 1st Qu.:-0.9866
## Median :-0.9819 Median :-0.9759 Median :-0.9665
## Mean :-0.9652 Mean :-0.9544 Mean :-0.9389
## 3rd Qu.:-0.9615 3rd Qu.:-0.9464 3rd Qu.:-0.9296
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## time.bodyaccjerk.std.x time.bodyaccjerk.std.y time.bodyaccjerk.std.z
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9913 1st Qu.:-0.9850 1st Qu.:-0.9892
## Median :-0.9513 Median :-0.9250 Median :-0.9543
## Mean :-0.6398 Mean :-0.6080 Mean :-0.7628
## 3rd Qu.:-0.2912 3rd Qu.:-0.2218 3rd Qu.:-0.5485
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## time.bodygyro.std.x time.bodygyro.std.y time.bodygyro.std.z
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9872 1st Qu.:-0.9819 1st Qu.:-0.9850
## Median :-0.9016 Median :-0.9106 Median :-0.8819
## Mean :-0.7212 Mean :-0.6827 Mean :-0.6537
## 3rd Qu.:-0.4822 3rd Qu.:-0.4461 3rd Qu.:-0.3379
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## time.bodygyrojerk.std.x time.bodygyrojerk.std.y time.bodygyrojerk.std.z
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9907 1st Qu.:-0.9922 1st Qu.:-0.9926
## Median :-0.9348 Median :-0.9548 Median :-0.9503
## Mean :-0.7313 Mean :-0.7861 Mean :-0.7399
## 3rd Qu.:-0.4865 3rd Qu.:-0.6268 3rd Qu.:-0.5097
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## time.bodyaccmag.std time.gravityaccmag.std time.bodyaccjerkmag.std
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9822 1st Qu.:-0.9822 1st Qu.:-0.9907
## Median :-0.8437 Median :-0.8437 Median :-0.9288
## Mean :-0.5912 Mean :-0.5912 Mean :-0.6278
## 3rd Qu.:-0.2423 3rd Qu.:-0.2423 3rd Qu.:-0.2733
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## time.bodygyromag.std time.bodygyrojerkmag.std freq.bodyacc.std.x
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9775 1st Qu.:-0.9922 1st Qu.:-0.9929
## Median :-0.8259 Median :-0.9403 Median :-0.9416
## Mean :-0.6625 Mean :-0.7780 Mean :-0.6034
## 3rd Qu.:-0.3940 3rd Qu.:-0.6093 3rd Qu.:-0.2493
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## freq.bodyacc.std.y freq.bodyacc.std.z freq.bodyaccjerk.std.x
## Min. :-1.00000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.97689 1st Qu.:-0.9780 1st Qu.:-0.9920
## Median :-0.83261 Median :-0.8398 Median :-0.9562
## Mean :-0.52842 Mean :-0.6179 Mean :-0.6550
## 3rd Qu.:-0.09216 3rd Qu.:-0.3023 3rd Qu.:-0.3203
## Max. : 1.00000 Max. : 1.0000 Max. : 1.0000
## freq.bodyaccjerk.std.y freq.bodyaccjerk.std.z freq.bodygyro.std.x
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9865 1st Qu.:-0.9895 1st Qu.:-0.9881
## Median :-0.9280 Median :-0.9590 Median :-0.9053
## Mean :-0.6122 Mean :-0.7809 Mean :-0.7386
## 3rd Qu.:-0.2361 3rd Qu.:-0.5903 3rd Qu.:-0.5225
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## freq.bodygyro.std.y freq.bodygyro.std.z freq.bodyaccmag.std
## Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9808 1st Qu.:-0.9862 1st Qu.:-0.9829
## Median :-0.9061 Median :-0.8915 Median :-0.8547
## Mean :-0.6742 Mean :-0.6904 Mean :-0.6595
## 3rd Qu.:-0.4385 3rd Qu.:-0.4168 3rd Qu.:-0.3823
## Max. : 1.0000 Max. : 1.0000 Max. : 1.0000
## freq.bodybodyaccjerkmag.std freq.bodybodygyromag.std
## Min. :-1.0000 Min. :-1.0000
## 1st Qu.:-0.9907 1st Qu.:-0.9781
## Median :-0.9255 Median :-0.8275
## Mean :-0.6401 Mean :-0.7000
## 3rd Qu.:-0.3082 3rd Qu.:-0.4713
## Max. : 1.0000 Max. : 1.0000
## freq.bodybodygyrojerkmag.std
## Min. :-1.0000
## 1st Qu.:-0.9926
## Median :-0.9382
## Mean :-0.7922
## 3rd Qu.:-0.6437
## Max. : 1.0000
No variable seem to be skewed or shows an outlier.
sum(is.na(table))
## [1] 0
There doesn’t seem to have any “NA” value at the dataset.
Now that we have a tidy and comprehensible dataset and we have checked for missing and outliers, we can proceed to the final step, exporting as .csv.
write.csv(table, "./export/tidy_UCI_HAR_Dataset.csv")