Objective

Predict whether a person is married or divorced, based on the person’s self-reported responses regarding their relationships.

Dataset Public Repository

https://archive.ics.uci.edu/ml/datasets/Divorce+Predictors+data+set#

Data Source

Attribute Information

  1. When one of us apologizes when our discussions go bad, the issue does not extend.
  2. I know we can ignore our differences, even if things get hard sometimes.
  3. When we need to, we can take our discussions from the beginning and correct it.
  4. When I argue with my spouse, it will eventually work for me to contact him.
  5. The time I spent with my spouse is special for us.
  6. We don’t have time at home as partners.
  7. We are like two strangers who share the same environment at home rather than family.
  8. I enjoy our holidays with my spouse.
  9. I enjoy traveling with my spouse.
  10. My spouse and most of our goals are common.
  11. I think that some day, my spouse and I will bee in harmony with each other.
  12. My spouse and I have similar values regarding personal freedom.
  13. My spouse and I have similar entertainment.
  14. Most of our goals in regards to people (children, friends, etc.) are the same.
  15. My dreams of living are similar and harmonious with those of my spouse.
  16. I’m compatible with my spouse about what love should be.
  17. I share the same views with my spouse about being happy.
  18. My spouse and I have similar ideas about how marriage should be.
  19. My spouse and I have similar ideas about how roles should be in marriage.
  20. My spouse and I have similar values regarding trust.
  21. I know exactly what my spouse likes.
  22. I know how my spouse wants to be taken care of when she’s sick.
  23. I know my spouse’s favorite food.
  24. I can tell you what kind of stress my spouse is having in life.
  25. I have knowledge of my spouse’s inner world.
  26. I know my spouse’s basic concerns.
  27. I know what my spouse’s current sources of stress are.
  28. I know my spouse’s hopes and wishes.
  29. I know my spouse very well.
  30. I know my spouse’s friends and their social relationships.
  31. I feel aggressive when I argue with my spouse.
  32. When discussing with my spouse, I usually use expressions such as X, Y, Z.
  33. I can use negative statements about my spouse’s personality during our discussions.
  34. I can use offensive expressions during our discussions.
  35. I can insult our discussions.
  36. I can be humiliating when we argue.
  37. My argument with my spouse is not calm.
  38. I hate my spouse’s way of bringing it up.
  39. Fights often occur suddenly.
  40. We’re just starting a fight before I know what’s going on.
  41. When I talk to my spouse about something, my calm suddenly breaks.
  42. When I argue with my spouse, it only snaps in and I don’t say a word.
  43. I’m mostly willing to calm the environment a little bit.
  44. Sometimes I think it’s good for me to leave home for a while.
  45. I’d rather stay silent than argue with my spouse.
  46. Even if I’m right in the argument, I’m willing not to upset the other side.
  47. When I argue with my spouse, I remain silent because I am afraid of not being able to control my anger.
  48. I feel right in our discussions.
  49. I have nothing to do with what I’ve been accused of.
  50. I’m not actually the one who’s guilty of what I’m accused of.
  51. I’m not the one who’s wrong about problems at home.
  52. I wouldn’t hesitate to tell her about my spouse’s inadequacy.
  53. I remind my spouse of her inadequacies during our discussion.
  54. I’m not afraid to tell her about my spouse’s incompetence.

Load Dataset and Libraries

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
## corrplot 0.84 loaded
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin

Preview Dataset

##   atr1 atr2 atr3 atr4 atr5 atr6 atr7 atr8 atr9 atr10 atr11 atr12 atr13
## 1    2    2    4    1    0    0    0    0    0     0     1     0     1
## 2    4    4    4    4    4    0    0    4    4     4     4     3     4
## 3    2    2    2    2    1    3    2    1    1     2     3     4     2
## 4    3    2    3    2    3    3    3    3    3     3     4     3     3
## 5    2    2    1    1    1    1    0    0    0     0     0     1     0
## 6    0    0    1    0    0    2    0    0    0     1     0     2     1
##   atr14 atr15 atr16 atr17 atr18 atr19 atr20 atr21 atr22 atr23 atr24 atr25
## 1     1     0     1     0     0     0     1     0     0     0     0     0
## 2     0     4     4     4     4     3     2     1     1     0     2     2
## 3     3     3     3     3     3     3     2     1     0     1     2     2
## 4     4     3     3     3     3     3     4     1     1     1     1     2
## 5     1     1     1     1     1     2     1     1     0     0     0     0
## 6     0     2     0     2     1     0     1     0     0     0     0     2
##   atr26 atr27 atr28 atr29 atr30 atr31 atr32 atr33 atr34 atr35 atr36 atr37
## 1     0     0     0     0     1     1     2     1     2     0     1     2
## 2     1     2     0     1     1     0     4     2     3     0     2     3
## 3     2     2     2     3     2     3     3     1     1     1     1     2
## 4     1     1     1     1     3     2     3     2     2     1     1     3
## 5     2     1     2     1     1     1     1     1     1     0     0     0
## 6     2     0     0     0     0     4     1     1     1     1     1     1
##   atr38 atr39 atr40 atr41 atr42 atr43 atr44 atr45 atr46 atr47 atr48 atr49
## 1     1     3     3     2     1     1     2     3     2     1     3     3
## 2     4     2     4     2     2     3     4     2     2     2     3     4
## 3     1     3     3     3     3     2     3     2     3     2     3     1
## 4     3     4     4     2     2     3     2     3     2     2     3     3
## 5     0     2     1     0     2     3     0     2     2     1     2     3
## 6     2     0     2     2     1     2     3     0     2     2     1     2
##   atr50 atr51 atr52 atr53 atr54 class
## 1     3     2     3     2     1     1
## 2     4     4     4     2     2     1
## 3     1     1     2     2     2     1
## 4     3     3     2     2     2     1
## 5     2     2     2     1     0     1
## 6     1     1     1     2     0     1
## 'data.frame':    170 obs. of  55 variables:
##  $ atr1 : int  2 4 2 3 2 0 3 2 2 1 ...
##  $ atr2 : int  2 4 2 2 2 0 3 1 2 1 ...
##  $ atr3 : int  4 4 2 3 1 1 3 2 1 1 ...
##  $ atr4 : int  1 4 2 2 1 0 2 2 0 1 ...
##  $ atr5 : int  0 4 1 3 1 0 1 2 0 1 ...
##  $ atr6 : int  0 0 3 3 1 2 3 1 4 2 ...
##  $ atr7 : int  0 0 2 3 0 0 4 0 1 0 ...
##  $ atr8 : int  0 4 1 3 0 0 3 3 3 2 ...
##  $ atr9 : int  0 4 1 3 0 0 2 3 3 2 ...
##  $ atr10: int  0 4 2 3 0 1 2 2 3 2 ...
##  $ atr11: int  1 4 3 4 0 0 2 4 3 3 ...
##  $ atr12: int  0 3 4 3 1 2 2 3 3 0 ...
##  $ atr13: int  1 4 2 3 0 1 2 2 3 0 ...
##  $ atr14: int  1 0 3 4 1 0 3 3 3 2 ...
##  $ atr15: int  0 4 3 3 1 2 2 4 3 1 ...
##  $ atr16: int  1 4 3 3 1 0 3 3 3 0 ...
##  $ atr17: int  0 4 3 3 1 2 3 2 3 1 ...
##  $ atr18: int  0 4 3 3 1 1 3 3 3 2 ...
##  $ atr19: int  0 3 3 3 2 0 3 2 3 1 ...
##  $ atr20: int  1 2 2 4 1 1 2 1 3 0 ...
##  $ atr21: int  0 1 1 1 1 0 3 2 2 0 ...
##  $ atr22: int  0 1 0 1 0 0 3 1 2 0 ...
##  $ atr23: int  0 0 1 1 0 0 3 1 2 0 ...
##  $ atr24: int  0 2 2 1 0 0 3 2 3 1 ...
##  $ atr25: int  0 2 2 2 0 2 2 3 2 1 ...
##  $ atr26: int  0 1 2 1 2 2 3 3 3 1 ...
##  $ atr27: int  0 2 2 1 1 0 3 2 2 1 ...
##  $ atr28: int  0 0 2 1 2 0 2 2 3 1 ...
##  $ atr29: int  0 1 3 1 1 0 2 2 2 1 ...
##  $ atr30: int  1 1 2 3 1 0 2 3 3 1 ...
##  $ atr31: int  1 0 3 2 1 4 1 1 1 1 ...
##  $ atr32: int  2 4 3 3 1 1 2 1 1 1 ...
##  $ atr33: int  1 2 1 2 1 1 2 0 1 0 ...
##  $ atr34: int  2 3 1 2 1 1 1 2 1 1 ...
##  $ atr35: int  0 0 1 1 0 1 1 2 1 0 ...
##  $ atr36: int  1 2 1 1 0 1 2 1 1 0 ...
##  $ atr37: int  2 3 2 3 0 1 3 4 1 1 ...
##  $ atr38: int  1 4 1 3 0 2 2 4 2 1 ...
##  $ atr39: int  3 2 3 4 2 0 2 4 2 2 ...
##  $ atr40: int  3 4 3 4 1 2 3 4 2 2 ...
##  $ atr41: int  2 2 3 2 0 2 3 4 2 1 ...
##  $ atr42: int  1 2 3 2 2 1 3 4 2 2 ...
##  $ atr43: int  1 3 2 3 3 2 3 3 2 3 ...
##  $ atr44: int  2 4 3 2 0 3 4 2 2 2 ...
##  $ atr45: int  3 2 2 3 2 0 3 0 2 2 ...
##  $ atr46: int  2 2 3 2 2 2 3 0 1 2 ...
##  $ atr47: int  1 2 2 2 1 2 2 1 1 0 ...
##  $ atr48: int  3 3 3 3 2 1 3 2 1 2 ...
##  $ atr49: int  3 4 1 3 3 2 2 2 1 2 ...
##  $ atr50: int  3 4 1 3 2 1 3 2 1 2 ...
##  $ atr51: int  2 4 1 3 2 1 3 1 1 2 ...
##  $ atr52: int  3 4 2 2 2 1 2 1 1 4 ...
##  $ atr53: int  2 2 2 2 1 2 2 1 1 3 ...
##  $ atr54: int  1 2 2 2 0 0 2 0 1 3 ...
##  $ class: int  1 1 1 1 1 1 1 1 1 1 ...
## 
##  0  1 
## 86 84

Label and Factorize Outcome Variable

## 'data.frame':    170 obs. of  55 variables:
##  $ atr1 : int  2 4 2 3 2 0 3 2 2 1 ...
##  $ atr2 : int  2 4 2 2 2 0 3 1 2 1 ...
##  $ atr3 : int  4 4 2 3 1 1 3 2 1 1 ...
##  $ atr4 : int  1 4 2 2 1 0 2 2 0 1 ...
##  $ atr5 : int  0 4 1 3 1 0 1 2 0 1 ...
##  $ atr6 : int  0 0 3 3 1 2 3 1 4 2 ...
##  $ atr7 : int  0 0 2 3 0 0 4 0 1 0 ...
##  $ atr8 : int  0 4 1 3 0 0 3 3 3 2 ...
##  $ atr9 : int  0 4 1 3 0 0 2 3 3 2 ...
##  $ atr10: int  0 4 2 3 0 1 2 2 3 2 ...
##  $ atr11: int  1 4 3 4 0 0 2 4 3 3 ...
##  $ atr12: int  0 3 4 3 1 2 2 3 3 0 ...
##  $ atr13: int  1 4 2 3 0 1 2 2 3 0 ...
##  $ atr14: int  1 0 3 4 1 0 3 3 3 2 ...
##  $ atr15: int  0 4 3 3 1 2 2 4 3 1 ...
##  $ atr16: int  1 4 3 3 1 0 3 3 3 0 ...
##  $ atr17: int  0 4 3 3 1 2 3 2 3 1 ...
##  $ atr18: int  0 4 3 3 1 1 3 3 3 2 ...
##  $ atr19: int  0 3 3 3 2 0 3 2 3 1 ...
##  $ atr20: int  1 2 2 4 1 1 2 1 3 0 ...
##  $ atr21: int  0 1 1 1 1 0 3 2 2 0 ...
##  $ atr22: int  0 1 0 1 0 0 3 1 2 0 ...
##  $ atr23: int  0 0 1 1 0 0 3 1 2 0 ...
##  $ atr24: int  0 2 2 1 0 0 3 2 3 1 ...
##  $ atr25: int  0 2 2 2 0 2 2 3 2 1 ...
##  $ atr26: int  0 1 2 1 2 2 3 3 3 1 ...
##  $ atr27: int  0 2 2 1 1 0 3 2 2 1 ...
##  $ atr28: int  0 0 2 1 2 0 2 2 3 1 ...
##  $ atr29: int  0 1 3 1 1 0 2 2 2 1 ...
##  $ atr30: int  1 1 2 3 1 0 2 3 3 1 ...
##  $ atr31: int  1 0 3 2 1 4 1 1 1 1 ...
##  $ atr32: int  2 4 3 3 1 1 2 1 1 1 ...
##  $ atr33: int  1 2 1 2 1 1 2 0 1 0 ...
##  $ atr34: int  2 3 1 2 1 1 1 2 1 1 ...
##  $ atr35: int  0 0 1 1 0 1 1 2 1 0 ...
##  $ atr36: int  1 2 1 1 0 1 2 1 1 0 ...
##  $ atr37: int  2 3 2 3 0 1 3 4 1 1 ...
##  $ atr38: int  1 4 1 3 0 2 2 4 2 1 ...
##  $ atr39: int  3 2 3 4 2 0 2 4 2 2 ...
##  $ atr40: int  3 4 3 4 1 2 3 4 2 2 ...
##  $ atr41: int  2 2 3 2 0 2 3 4 2 1 ...
##  $ atr42: int  1 2 3 2 2 1 3 4 2 2 ...
##  $ atr43: int  1 3 2 3 3 2 3 3 2 3 ...
##  $ atr44: int  2 4 3 2 0 3 4 2 2 2 ...
##  $ atr45: int  3 2 2 3 2 0 3 0 2 2 ...
##  $ atr46: int  2 2 3 2 2 2 3 0 1 2 ...
##  $ atr47: int  1 2 2 2 1 2 2 1 1 0 ...
##  $ atr48: int  3 3 3 3 2 1 3 2 1 2 ...
##  $ atr49: int  3 4 1 3 3 2 2 2 1 2 ...
##  $ atr50: int  3 4 1 3 2 1 3 2 1 2 ...
##  $ atr51: int  2 4 1 3 2 1 3 1 1 2 ...
##  $ atr52: int  3 4 2 2 2 1 2 1 1 4 ...
##  $ atr53: int  2 2 2 2 1 2 2 1 1 3 ...
##  $ atr54: int  1 2 2 2 0 0 2 0 1 3 ...
##  $ class: Factor w/ 2 levels "divorced","married": 1 1 1 1 1 1 1 1 1 1 ...

Create Training and Testing Datasets

## 
##  divorced   married 
## 0.4745763 0.5254237
## 
##  divorced   married 
## 0.5384615 0.4615385

Difference In Mean by Outcome Variable

## # A tibble: 2 x 55
##   class  atr1  atr2  atr3  atr4   atr5  atr6   atr7  atr8   atr9 atr10
##   <fct> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
## 1 divo… 3.20  2.89  2.91  2.84  3.04   1.11  1.07   2.79  2.88   2.84 
## 2 marr… 0.452 0.371 0.629 0.226 0.0968 0.435 0.0161 0.129 0.0645 0.387
## # … with 44 more variables: atr11 <dbl>, atr12 <dbl>, atr13 <dbl>,
## #   atr14 <dbl>, atr15 <dbl>, atr16 <dbl>, atr17 <dbl>, atr18 <dbl>,
## #   atr19 <dbl>, atr20 <dbl>, atr21 <dbl>, atr22 <dbl>, atr23 <dbl>,
## #   atr24 <dbl>, atr25 <dbl>, atr26 <dbl>, atr27 <dbl>, atr28 <dbl>,
## #   atr29 <dbl>, atr30 <dbl>, atr31 <dbl>, atr32 <dbl>, atr33 <dbl>,
## #   atr34 <dbl>, atr35 <dbl>, atr36 <dbl>, atr37 <dbl>, atr38 <dbl>,
## #   atr39 <dbl>, atr40 <dbl>, atr41 <dbl>, atr42 <dbl>, atr43 <dbl>,
## #   atr44 <dbl>, atr45 <dbl>, atr46 <dbl>, atr47 <dbl>, atr48 <dbl>,
## #   atr49 <dbl>, atr50 <dbl>, atr51 <dbl>, atr52 <dbl>, atr53 <dbl>,
## #   atr54 <dbl>

Visualizing the Distribution Difference with Density Plots

Finding Correlated Variables

Dimensionality Reduction: Principal Component Analysis

Because we have many redundant, highly-correlated features in our dataset, we would like to reduce the number of feature variables. One method we can apply is principal component analysis (PCA), a very popular dimension reduction technique.

## Importance of components:
##                           PC1     PC2     PC3     PC4     PC5    PC6
## Standard deviation     6.3107 1.43265 1.28558 1.10536 0.95632 0.8940
## Proportion of Variance 0.7375 0.03801 0.03061 0.02263 0.01694 0.0148
## Cumulative Proportion  0.7375 0.77550 0.80611 0.82874 0.84567 0.8605
##                            PC7     PC8     PC9    PC10    PC11    PC12
## Standard deviation     0.87436 0.78410 0.76637 0.72687 0.68727 0.65491
## Proportion of Variance 0.01416 0.01139 0.01088 0.00978 0.00875 0.00794
## Cumulative Proportion  0.87463 0.88602 0.89689 0.90668 0.91542 0.92337
##                           PC13    PC14    PC15    PC16    PC17    PC18
## Standard deviation     0.63608 0.58895 0.53559 0.52314 0.51719 0.48026
## Proportion of Variance 0.00749 0.00642 0.00531 0.00507 0.00495 0.00427
## Cumulative Proportion  0.93086 0.93728 0.94259 0.94766 0.95262 0.95689
##                           PC19    PC20    PC21   PC22    PC23    PC24
## Standard deviation     0.45829 0.42975 0.42119 0.4027 0.39126 0.36180
## Proportion of Variance 0.00389 0.00342 0.00329 0.0030 0.00283 0.00242
## Cumulative Proportion  0.96078 0.96420 0.96748 0.9705 0.97332 0.97574
##                           PC25    PC26    PC27    PC28    PC29    PC30
## Standard deviation     0.35333 0.34048 0.32257 0.31750 0.29176 0.27841
## Proportion of Variance 0.00231 0.00215 0.00193 0.00187 0.00158 0.00144
## Cumulative Proportion  0.97806 0.98020 0.98213 0.98400 0.98557 0.98701
##                           PC31   PC32    PC33   PC34    PC35    PC36
## Standard deviation     0.27353 0.2546 0.25269 0.2434 0.23082 0.21535
## Proportion of Variance 0.00139 0.0012 0.00118 0.0011 0.00099 0.00086
## Cumulative Proportion  0.98839 0.9896 0.99078 0.9919 0.99286 0.99372
##                           PC37    PC38    PC39    PC40    PC41    PC42
## Standard deviation     0.20213 0.19589 0.18960 0.17684 0.16804 0.15455
## Proportion of Variance 0.00076 0.00071 0.00067 0.00058 0.00052 0.00044
## Cumulative Proportion  0.99447 0.99519 0.99585 0.99643 0.99695 0.99740
##                          PC43    PC44    PC45    PC46    PC47    PC48
## Standard deviation     0.1462 0.13633 0.13259 0.12337 0.11604 0.10787
## Proportion of Variance 0.0004 0.00034 0.00033 0.00028 0.00025 0.00022
## Cumulative Proportion  0.9978 0.99814 0.99846 0.99874 0.99899 0.99921
##                          PC49    PC50    PC51    PC52    PC53    PC54
## Standard deviation     0.1031 0.08839 0.08762 0.08051 0.07336 0.06928
## Proportion of Variance 0.0002 0.00014 0.00014 0.00012 0.00010 0.00009
## Cumulative Proportion  0.9994 0.99955 0.99969 0.99981 0.99991 1.00000
##           PC1        PC2        PC3        PC4        PC5        PC6
## 68  -6.259996 -1.3344072  0.9615677  1.0220966 -1.5267867  1.0619955
## 167  5.161627 -2.0799412  0.4831553 -0.2867672  0.9425143 -0.8157273
## 129  5.485848  1.2755808  3.4167542 -2.2705010 -1.0460598 -1.3683011
## 162  4.846028  2.0288484 -0.7191337 -0.0974272 -0.9512514  2.1142603
## 43  -7.480399  0.4698435 -0.9137240 -1.3869813  0.6703389 -0.2796404
## 14  -8.995516 -0.2129881 -0.8438006 -1.5367287  0.6111641 -0.6286081
##            PC7        PC8         PC9       PC10        PC11         PC12
## 68  -0.5616945  0.8662736  0.25187992  0.4959995  0.04979634 -0.363392924
## 167  0.0370225 -0.1461199  0.59544086  0.7775325 -0.24669965  0.759597405
## 129 -0.2422135  0.4964586  1.83894388 -0.6583357 -0.02918070 -1.117562886
## 162 -0.1072448  0.7273421  0.82690967  0.5469413 -0.64251858 -0.001581089
## 43   0.2915755 -1.1083297 -0.53706334 -0.5314219  0.01032799  0.017828368
## 14   0.2859603 -1.0277410 -0.07126251  0.2776853  0.73190191 -0.101421162
##            PC13       PC14        PC15        PC16       PC17       PC18
## 68  -0.59046229 -0.8145455 -0.34694570  0.05142315  0.8219933 -0.5709258
## 167 -0.47583194  0.1298447  0.06280186  0.04275281 -0.1942569 -0.1017798
## 129  1.61103907  0.3230529 -0.24534365  1.31410663 -0.4367401 -0.3730870
## 162  1.21382002  0.3446329 -0.22929098 -0.91320578 -0.0591777 -0.5052457
## 43   0.29537589  1.2483133  0.20880418 -0.32031367 -0.3361535  0.5385272
## 14   0.03909493  0.2212094  0.17733723 -0.11567159  0.2066434 -0.3181193
##            PC19        PC20         PC21        PC22         PC23
## 68   0.41757059 -0.57180456  0.362900223  0.13162007  0.003875525
## 167  0.07062134 -0.35746537 -0.128668049 -0.07550253  0.034094705
## 129  0.67130324 -0.40420446 -0.429534775 -0.38964109 -0.665138140
## 162 -0.80717947  0.09591556 -1.051969370 -0.75010046 -0.482219411
## 43  -0.28282620 -0.35634619 -0.227045648 -0.54371901  0.309271200
## 14  -0.17667712 -0.04335860 -0.005298769 -0.08699008 -0.131140976
##             PC24        PC25        PC26        PC27        PC28
## 68   0.185311922 -0.32654577 -0.25914581 -0.12776065 -0.13632314
## 167 -0.009135711  0.33333576  0.14606989  0.20440681  0.11193984
## 129  0.207134743 -0.36289397 -0.36796969 -0.62659512  0.28814981
## 162  0.163056091  0.47359407 -0.28859398  0.02014161 -0.75680426
## 43   0.153347184  0.04225617 -0.17405453 -0.22220531 -0.06159581
## 14   0.225202744 -0.11202381 -0.04091147  0.05660232 -0.01148325
##            PC29        PC30        PC31         PC32       PC33
## 68   0.09092058  0.17433000  0.21207672  0.023361127 0.02178616
## 167 -0.19620689  0.13932269  0.02814292  0.181911274 0.25050307
## 129 -0.49861228 -0.03663898 -0.21306284  0.051909498 0.08763653
## 162  0.54149325 -0.21865603  0.14615596  0.216909429 0.44371978
## 43   0.05219967 -0.20314112 -0.04733446 -0.097082597 0.08989949
## 14   0.03723305 -0.03124671 -0.01583253  0.004310976 0.12110513
##             PC34         PC35        PC36         PC37          PC38
## 68  -0.051397134  0.053920964  0.07812843 -0.035183298  0.1295634763
## 167  0.452066641 -0.044172252 -0.15487503  0.243716888  0.0931599076
## 129 -0.248445076 -0.106630694  0.01893183 -0.113123059 -0.0372000835
## 162  0.207782423  0.314142855  0.06067841  0.401655432 -0.2107595783
## 43   0.006576239  0.031901045 -0.16872989  0.017669834 -0.1670168772
## 14  -0.001670217 -0.009402096 -0.05139885  0.005162092  0.0009918929
##            PC39        PC40         PC41         PC42        PC43
## 68   0.01623038  0.08257751  0.028400291 -0.009031755  0.01970152
## 167 -0.01238713 -0.13661605  0.052926715 -0.124593234  0.17341487
## 129  0.13849899  0.12908410  0.095942637  0.021991969  0.24009812
## 162  0.03041638  0.19675623  0.261745098  0.143272794 -0.01013033
## 43  -0.02388117  0.05639081 -0.037846821  0.052624695 -0.13495473
## 14   0.02633626 -0.06628373 -0.003393285 -0.013377218 -0.01985946
##              PC44        PC45         PC46         PC47        PC48
## 68   0.0033521843 -0.03749666  0.038289147 -0.024556551 -0.05361309
## 167 -0.0651303451 -0.01818043 -0.139926729 -0.005122791 -0.07706994
## 129  0.0263172523 -0.04345311  0.128703831 -0.017007282 -0.14981586
## 162  0.0233832761  0.39793099  0.196991816 -0.103414318  0.14139603
## 43  -0.1111785230  0.07883866 -0.077827430 -0.067599260  0.06722406
## 14   0.0006843765  0.01406667 -0.006444003 -0.021034721  0.04764730
##             PC49         PC50         PC51         PC52         PC53
## 68  -0.053705774 -0.038328325  0.011451309 -0.005623588 -0.053125593
## 167  0.029167150 -0.027777368  0.031618514  0.011142500  0.015185277
## 129 -0.150516245  0.031539213 -0.089553873 -0.048958280 -0.069717344
## 162 -0.168703325  0.167553815 -0.072958706  0.102416013  0.048393311
## 43   0.060653177 -0.025027865  0.005867316  0.025421316  0.053846789
## 14   0.004883426 -0.000954366 -0.011922953  0.011424081 -0.005920278
##             PC54    class
## 68   0.013772926 divorced
## 167 -0.039490820  married
## 129 -0.001300610  married
## 162 -0.006493680  married
## 43  -0.026524123 divorced
## 14  -0.001586914 divorced

Visualize Variance Explained by Principal Components

Effectiveness of First Principal Components in Separating Out Data Points

Biplot of Principal Components

Applying Machine Learning Method: Logistic Regression

## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction divorced married
##   divorced       56       0
##   married         0      62
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9692, 1)
##     No Information Rate : 0.5254     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.4746     
##          Detection Rate : 0.4746     
##    Detection Prevalence : 0.4746     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : divorced   
## 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction divorced married
##   divorced       28       0
##   married         0      24
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9315, 1)
##     No Information Rate : 0.5385     
##     P-Value [Acc > NIR] : 1.047e-14  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.5385     
##          Detection Rate : 0.5385     
##    Detection Prevalence : 0.5385     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : divorced   
## 

Logistic Regression with Original Variables

## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction divorced married
##   divorced       56       0
##   married         0      62
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9692, 1)
##     No Information Rate : 0.5254     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.4746     
##          Detection Rate : 0.4746     
##    Detection Prevalence : 0.4746     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : divorced   
## 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction divorced married
##   divorced       27       0
##   married         1      24
##                                           
##                Accuracy : 0.9808          
##                  95% CI : (0.8974, 0.9995)
##     No Information Rate : 0.5385          
##     P-Value [Acc > NIR] : 4.772e-13       
##                                           
##                   Kappa : 0.9614          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9643          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9600          
##              Prevalence : 0.5385          
##          Detection Rate : 0.5192          
##    Detection Prevalence : 0.5192          
##       Balanced Accuracy : 0.9821          
##                                           
##        'Positive' Class : divorced        
## 
## 
## Call:
## glm(formula = class ~ ., family = "binomial", data = train)
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -9.537e-06  -1.261e-06   2.110e-08   2.078e-06   5.527e-06  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)  2.563e+01  4.034e+05       0        1
## atr1        -1.403e+00  2.249e+05       0        1
## atr2        -1.195e+00  2.309e+05       0        1
## atr3         4.305e-01  1.604e+05       0        1
## atr4        -5.261e+00  1.711e+05       0        1
## atr5         1.627e+00  2.884e+05       0        1
## atr6        -4.401e+00  1.439e+05       0        1
## atr7        -2.183e+00  1.921e+05       0        1
## atr8        -5.297e+00  2.581e+05       0        1
## atr9         2.317e+00  3.710e+05       0        1
## atr10        1.507e+00  3.766e+05       0        1
## atr11       -8.501e+00  3.639e+05       0        1
## atr12        4.628e+00  4.715e+05       0        1
## atr13        3.578e+00  2.585e+05       0        1
## atr14       -1.112e+01  3.138e+05       0        1
## atr15       -6.412e+00  2.695e+05       0        1
## atr16        3.631e-01  5.054e+05       0        1
## atr17       -6.988e+00  3.205e+05       0        1
## atr18        1.314e+01  3.851e+05       0        1
## atr19        8.184e+00  3.069e+05       0        1
## atr20        1.408e+00  5.364e+05       0        1
## atr21       -5.290e-01  2.751e+05       0        1
## atr22        2.892e+00  3.322e+05       0        1
## atr23        2.705e+00  2.428e+05       0        1
## atr24        6.190e+00  2.335e+05       0        1
## atr25        1.645e+00  3.054e+05       0        1
## atr26       -8.749e+00  2.998e+05       0        1
## atr27        2.803e+00  2.106e+05       0        1
## atr28       -2.371e-02  2.487e+05       0        1
## atr29       -2.999e+00  4.093e+05       0        1
## atr30       -3.976e+00  2.632e+05       0        1
## atr31       -5.472e-01  1.210e+05       0        1
## atr32       -2.859e+00  2.362e+05       0        1
## atr33        3.807e+00  2.465e+05       0        1
## atr34        2.421e+00  2.068e+05       0        1
## atr35       -4.430e+00  2.733e+05       0        1
## atr36        1.568e-01  3.224e+05       0        1
## atr37       -1.456e+00  1.599e+05       0        1
## atr38        3.244e+00  2.700e+05       0        1
## atr39        3.553e+00  2.437e+05       0        1
## atr40       -1.212e+01  2.284e+05       0        1
## atr41       -1.311e+00  1.566e+05       0        1
## atr42        2.774e-01  1.770e+05       0        1
## atr43        3.273e+00  1.009e+05       0        1
## atr44       -1.810e+00  1.341e+05       0        1
## atr45       -1.770e+00  1.375e+05       0        1
## atr46        2.583e+00  1.118e+05       0        1
## atr47       -3.112e+00  1.025e+05       0        1
## atr48        6.350e+00  1.609e+05       0        1
## atr49       -4.755e+00  1.984e+05       0        1
## atr50        1.734e+00  1.628e+05       0        1
## atr51        8.476e-01  1.343e+05       0        1
## atr52       -2.961e+00  1.351e+05       0        1
## atr53       -1.034e+00  1.839e+05       0        1
## atr54        4.666e+00  2.271e+05       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1.6328e+02  on 117  degrees of freedom
## Residual deviance: 8.8385e-10  on  63  degrees of freedom
## AIC: 110
## 
## Number of Fisher Scoring iterations: 25

Decision Tree with Original Variables

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction divorced married
##   divorced       56       0
##   married         0      62
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9692, 1)
##     No Information Rate : 0.5254     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.4746     
##          Detection Rate : 0.4746     
##    Detection Prevalence : 0.4746     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : divorced   
## 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction divorced married
##   divorced       28       1
##   married         0      23
##                                           
##                Accuracy : 0.9808          
##                  95% CI : (0.8974, 0.9995)
##     No Information Rate : 0.5385          
##     P-Value [Acc > NIR] : 4.772e-13       
##                                           
##                   Kappa : 0.9612          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.9583          
##          Pos Pred Value : 0.9655          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.5385          
##          Detection Rate : 0.5385          
##    Detection Prevalence : 0.5577          
##       Balanced Accuracy : 0.9792          
##                                           
##        'Positive' Class : divorced        
## 

Radar Plot: Benign vs. Malignant

## # A tibble: 2 x 55
##   class  atr1  atr2  atr3  atr4  atr5  atr6   atr7  atr8   atr9 atr10 atr11
##   <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1 divo… 3.19  2.87  2.92  2.73  3.01  1.13  0.988  2.81  2.89   2.77  3.21 
## 2 marr… 0.395 0.465 0.640 0.267 0.105 0.372 0.0116 0.128 0.0581 0.407 0.198
## # … with 43 more variables: atr12 <dbl>, atr13 <dbl>, atr14 <dbl>,
## #   atr15 <dbl>, atr16 <dbl>, atr17 <dbl>, atr18 <dbl>, atr19 <dbl>,
## #   atr20 <dbl>, atr21 <dbl>, atr22 <dbl>, atr23 <dbl>, atr24 <dbl>,
## #   atr25 <dbl>, atr26 <dbl>, atr27 <dbl>, atr28 <dbl>, atr29 <dbl>,
## #   atr30 <dbl>, atr31 <dbl>, atr32 <dbl>, atr33 <dbl>, atr34 <dbl>,
## #   atr35 <dbl>, atr36 <dbl>, atr37 <dbl>, atr38 <dbl>, atr39 <dbl>,
## #   atr40 <dbl>, atr41 <dbl>, atr42 <dbl>, atr43 <dbl>, atr44 <dbl>,
## #   atr45 <dbl>, atr46 <dbl>, atr47 <dbl>, atr48 <dbl>, atr49 <dbl>,
## #   atr50 <dbl>, atr51 <dbl>, atr52 <dbl>, atr53 <dbl>, atr54 <dbl>
##   atr1 atr2 atr3 atr4 atr5 atr6 atr7 atr8 atr9 atr10 atr11 atr12 atr13
## 1    4    4    4    4    4    4    4    4    4     4     4     4     4
##   atr14 atr15 atr16 atr17 atr18 atr19 atr20 atr21 atr22 atr23 atr24 atr25
## 1     4     4     4     4     4     4     4     4     4     4     4     4
##   atr26 atr27 atr28 atr29 atr30 atr31 atr32 atr33 atr34 atr35 atr36 atr37
## 1     4     4     4     4     4     4     4     4     4     4     4     4
##   atr38 atr39 atr40 atr41 atr42 atr43 atr44 atr45 atr46 atr47 atr48 atr49
## 1     4     4     4     4     4     4     4     4     4     4     4     4
##   atr50 atr51 atr52 atr53 atr54
## 1     4     4     4     4     4
##               atr1      atr2      atr3      atr4      atr5     atr6
## 1        4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.000000
## 2        0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
## divorced 3.1904762 2.8690476 2.9166667 2.7261905 3.0119048 1.130952
## married  0.3953488 0.4651163 0.6395349 0.2674419 0.1046512 0.372093
##                atr7     atr8       atr9     atr10     atr11     atr12
## 1        4.00000000 4.000000 4.00000000 4.0000000 4.0000000 4.0000000
## 2        0.00000000 0.000000 0.00000000 0.0000000 0.0000000 0.0000000
## divorced 0.98809524 2.809524 2.89285714 2.7738095 3.2142857 2.9404762
## married  0.01162791 0.127907 0.05813953 0.4069767 0.1976744 0.3953488
##              atr13     atr14     atr15     atr16     atr17      atr18
## 1        4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.00000000
## 2        0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000
## divorced 3.0952381 2.8809524 2.9404762 2.8214286 3.1666667 2.97619048
## married  0.6046512 0.2906977 0.2325581 0.1627907 0.1744186 0.09302326
##              atr19      atr20     atr21      atr22      atr23     atr24
## 1        4.0000000 4.00000000 4.0000000 4.00000000 4.00000000 4.0000000
## 2        0.0000000 0.00000000 0.0000000 0.00000000 0.00000000 0.0000000
## divorced 3.1785714 2.88095238 2.6547619 2.45238095 2.77380952 2.7857143
## married  0.1395349 0.06976744 0.1511628 0.06976744 0.08139535 0.2674419
##              atr25     atr26     atr27      atr28      atr29     atr30
## 1        4.0000000 4.0000000 4.0000000 4.00000000 4.00000000 4.0000000
## 2        0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000
## divorced 2.9523810 2.8095238 2.6785714 2.55952381 2.92857143 2.8214286
## married  0.3372093 0.1976744 0.1511628 0.08139535 0.09302326 0.1976744
##              atr31     atr32     atr33     atr34     atr35      atr36
## 1        4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.00000000
## 2        0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000
## divorced 3.4404762 3.4166667 3.3571429 3.2738095 3.2738095 3.21428571
## married  0.8372093 0.7325581 0.2906977 0.5581395 0.1046512 0.03488372
##             atr37     atr38     atr39     atr40     atr41    atr42
## 1        4.000000 4.0000000 4.0000000 4.0000000 4.0000000 4.000000
## 2        0.000000 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
## divorced 3.583333 3.4047619 3.6428571 3.5714286 3.5476190 3.333333
## married  0.627907 0.3488372 0.5697674 0.2093023 0.4767442 1.011628
##             atr43     atr44    atr45    atr46    atr47    atr48    atr49
## 1        4.000000 4.0000000 4.000000 4.000000 4.000000 4.000000 4.000000
## 2        0.000000 0.0000000 0.000000 0.000000 0.000000 0.000000 0.000000
## divorced 3.476190 3.3809524 3.285714 3.166667 3.321429 3.452381 3.511905
## married  1.953488 0.5348837 1.651163 1.953488 1.244186 2.046512 1.279070
##             atr50    atr51    atr52    atr53     atr54
## 1        4.000000 4.000000 4.000000 4.000000 4.0000000
## 2        0.000000 0.000000 0.000000 0.000000 0.0000000
## divorced 3.500000 3.357143 3.488095 3.321429 3.3690476
## married  1.383721 1.616279 1.569767 1.186047 0.6860465