![]()
Abalone: You can see the rings
![]()
Detailed Rings…
- The data has Rings as the dependant variable. Number of Rings + 1.5 gives the age. We have added 1.5 to the column and will be predicting the Age directly.
- Overall visualization of data.
- Female -> RED
- Male -> BLUE
- Infant -> GREEN
- The weights ans legths etc of the Infants are of course lower than the adults
- We see some outliers for some paramenters like height, Shell Weight. So we will remove these outliers in the independant variables.
- Also notice the high multi-collinearity!!!
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
- Minimal outliers.
## [1] 50
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
- Making dummy columns for Gender then remove the oroginal column
## 'data.frame': 4139 obs. of 12 variables:
## $ Sex : Factor w/ 3 levels "F","I","M": 3 3 1 3 2 2 1 1 3 1 ...
## $ Length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
## $ Diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
## $ Height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
## $ Whole.weight : num 0.514 0.226 0.677 0.516 0.205 ...
## $ Shucked.weight: num 0.2245 0.0995 0.2565 0.2155 0.0895 ...
## $ Viscera.weight: num 0.101 0.0485 0.1415 0.114 0.0395 ...
## $ Shell.weight : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
## $ Rings : num 16.5 8.5 10.5 11.5 8.5 9.5 21.5 17.5 10.5 20.5 ...
## $ Sex_M : int 1 1 0 1 0 0 0 0 1 0 ...
## $ Sex_F : int 0 0 1 0 0 0 1 1 0 1 ...
## $ Sex_I : int 0 0 0 0 1 1 0 0 0 0 ...
## - attr(*, "na.action")= 'omit' Named int 130 164 165 166 167 169 237 892 1052 1208 ...
## ..- attr(*, "names")= chr "130" "164" "165" "166" ...
## Length Diameter Height Whole.weight Shucked.weight Viscera.weight
## 1 0.455 0.365 0.095 0.5140 0.2245 0.1010
## 2 0.350 0.265 0.090 0.2255 0.0995 0.0485
## 3 0.530 0.420 0.135 0.6770 0.2565 0.1415
## 4 0.440 0.365 0.125 0.5160 0.2155 0.1140
## 5 0.330 0.255 0.080 0.2050 0.0895 0.0395
## 6 0.425 0.300 0.095 0.3515 0.1410 0.0775
## Shell.weight Rings Sex_M Sex_F Sex_I
## 1 0.150 16.5 1 0 0
## 2 0.070 8.5 1 0 0
## 3 0.210 10.5 0 1 0
## 4 0.155 11.5 1 0 0
## 5 0.055 8.5 0 0 1
## 6 0.120 9.5 0 0 1
- Separating the data to train and test
- PCA on train data then applying the same model to test data
- Calculating the cumulative % contribution and plotting the same
- Choosing the 1st 6 PC
- Overall visualization of the 6 PCs
## [1] 7.062347e+00 1.521489e+00 9.126227e-01 2.025126e-01 1.329303e-01
## [6] 8.641709e-02 6.219228e-02 1.274643e-02 6.742391e-03 1.013978e-30
## [1] 0.7062347086 0.1521489122 0.0912622717 0.0202512552 0.0132930334
## [6] 0.0086417088 0.0062192281 0.0012746428 0.0006742391
- Normalizing train and test around train and test mean and standard deviation
- Then running the model thorough a keras dense sequential model
## [1] "PC1" "PC2" "PC3" "PC4" "PC5" "PC6" ""
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## dense_1 (Dense) (None, 64) 448
## ___________________________________________________________________________
## dropout_1 (Dropout) (None, 64) 0
## ___________________________________________________________________________
## dense_2 (Dense) (None, 64) 4160
## ___________________________________________________________________________
## dropout_2 (Dropout) (None, 64) 0
## ___________________________________________________________________________
## dense_3 (Dense) (None, 1) 65
## ===========================================================================
## Total params: 4,673
## Trainable params: 4,673
## Non-trainable params: 0
## ___________________________________________________________________________
- Plotting history
- Predicting on test data
- Showing results: MAE & RMSE
## [1] "MAE"
## [1] 1.600382
## [1] "RMSE"
## [1] 2.22334