This project aims to predict the manner in which individuals perform
barbell lifts using data collected from wearable devices. The dataset
includes measurements from accelerometers placed on various body parts
(e.g., belt, forearm, and dumbbell). The classe variable is
the target, representing different lift techniques. The analysis uses a
Random Forest model to classify the lift type.
The classe variable was evenly distributed across its
five classes (A, B, C,
D, and E), making the dataset suitable for
classification without additional balancing techniques.
The dataset was split into training (80%) and testing (20%) subsets.
A Random Forest model was trained with 500 trees, optimizing feature
splits using the square root of the total features
(mtry).
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1116 0 0 0 0
## B 0 759 0 0 0
## C 0 0 684 3 0
## D 0 0 0 640 0
## E 0 0 0 0 721
##
## Overall Statistics
##
## Accuracy : 0.9992
## 95% CI : (0.9978, 0.9998)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.999
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 1.0000 1.0000 0.9953 1.0000
## Specificity 1.0000 1.0000 0.9991 1.0000 1.0000
## Pos Pred Value 1.0000 1.0000 0.9956 1.0000 1.0000
## Neg Pred Value 1.0000 1.0000 1.0000 0.9991 1.0000
## Prevalence 0.2845 0.1935 0.1744 0.1639 0.1838
## Detection Rate 0.2845 0.1935 0.1744 0.1631 0.1838
## Detection Prevalence 0.2845 0.1935 0.1751 0.1631 0.1838
## Balanced Accuracy 1.0000 1.0000 0.9995 0.9977 1.0000
D as
C.## problem_id predicted_classe
## 1 1 B
## 2 2 A
## 3 3 B
## 4 4 A
## 5 5 A
## 6 6 E
## 7 7 D
## 8 8 B
## 9 9 A
## 10 10 A
## 11 11 B
## 12 12 C
## 13 13 B
## 14 14 A
## 15 15 E
## 16 16 E
## 17 17 A
## 18 18 B
## 19 19 B
## 20 20 B
The model generated predictions for the testing dataset
(pml-testing.csv). The first six predictions were:
B, A, B, A,
A, E.Since the true labels for the test cases are unavailable, the accuracy of these predictions could not be directly evaluated.
The most critical features identified were:
roll_beltyaw_beltpitch_beltThese features had the highest impact on model accuracy, likely due to their ability to capture nuanced motion patterns during the exercises.
## True_Class Predicted_Class
## 12946 D C
## 12961 D C
## 15322 D C
The few misclassified instances indicated potential overlap in
movement characteristics between certain classes, such as D
and C.
## [1] "Correlation of roll_belt with classe: 0.0621513426488757"
## [1] "Correlation of yaw_belt with classe: 0.0136011047702848"
roll_belt: Weak positive correlation with
classe (0.062).yaw_belt: Extremely weak correlation
(0.014).The boxplots revealed roll_belt and
yaw_belt distributions were notably different for Class
E, explaining their importance in distinguishing this
class.
The project successfully demonstrated the application of machine
learning in activity classification using sensor data. The Random Forest
model achieved high accuracy, with roll_belt and
yaw_belt emerging as key contributors. These findings
underscore the importance of wearable sensors in fitness monitoring and
activity classification.