Notes:
-RIDIT Transformation converts categorical responses into numerical
scores between -1 and 1.
-PCA helps reduce dimensionality and detect patterns in RIDIT scores
or other numeric indicators.
-The plotellipses function visualizes the contribution of
qualitative variables in PCA.
-This R Markdown file produces interactive visualizations and
reports results in an organized document.
-PCA reduces dimensionality; first few components explain majority
of variance.
-Using ‘quali.sup’ in PCA allows visualization of categorical
outcomes without including them in variance computation.
-Scree plots help determine the number of principal components to
retain.
1. Load Required Libraries
library(psych)
library(corrplot)
library(Hmisc)
library(tidyverse)
library(GGally)
library(factoextra)
library(FactoMineR)
library(emuR)
2. Load Dataset
filename <- "/Users/rameshbabuparamkusham/Documents/DAT 610/DAT 610 Auto Accident Personal Injury Claims.csv"
data <- read.csv(filename, header = TRUE)
head(data)
## Claim_Number Policy_ID CLAIM_AMOUNT PAID_AMOUNT CLAIM_SUSPICION_SCORE IND_01
## 1 NA NA NA NA
## 2 5001463 364697 $13,463 $3,646 3 1
## 3 NA NA NA NA
## 4 5004844 426960 $1,246 $594 3 1
## 5 NA NA NA NA
## 6 5005493 426313 $19,883 $15,138 3 1
## IND_02 IND_03 IND_04 IND_05 IND_06 IND_07 IND_08 IND_09 IND_10 IND_11 IND_12
## 1 NA NA NA NA NA NA NA NA NA NA NA
## 2 1 1 4 5 3 3 1 2 2 1 3
## 3 NA NA NA NA NA NA NA NA NA NA NA
## 4 2 1 4 1 1 5 1 2 1 1 5
## 5 NA NA NA NA NA NA NA NA NA NA NA
## 6 1 4 1 1 1 1 2 3 5 4 5
## IND_13 IND_14 IND_15 IND_16 IND_17 IND_18 IND_19 IND_20 RIDIT_01 RIDIT_02
## 1 NA NA NA NA NA NA NA NA NA NA
## 2 5 2 1 2 4 1 2 3 -0.5039841 -0.5059761
## 3 NA NA NA NA NA NA NA NA NA NA
## 4 1 2 2 1 1 5 1 2 -0.5039841 0.2290837
## 5 NA NA NA NA NA NA NA NA NA NA
## 6 1 1 4 1 1 1 3 1 -0.5039841 -0.5059761
## RIDIT_03 RIDIT_04 RIDIT_05 RIDIT_06 RIDIT_07 RIDIT_08 RIDIT_09
## 1 NA NA NA NA NA NA NA
## 2 -0.4701195 0.7888446 0.8964143 0.6513944 0.5916335 -0.5039841 0.2310757
## 3 NA NA NA NA NA NA NA
## 4 -0.4701195 0.7888446 -0.4980080 -0.4920319 0.9143426 -0.5039841 0.2310757
## 5 NA NA NA NA NA NA NA
## 6 0.7988048 -0.4920319 -0.4980080 -0.4920319 -0.5278884 0.2649402 0.6055777
## RIDIT_10 RIDIT_11 RIDIT_12 RIDIT_13 RIDIT_14 RIDIT_15 RIDIT_16
## 1 NA NA NA NA NA NA NA
## 2 0.3147410 -0.5139442 0.6354582 0.9302789 0.2370518 -0.4920319 0.2629482
## 3 NA NA NA NA NA NA NA
## 4 -0.4561753 -0.5139442 0.9203187 -0.4960159 0.2370518 0.2569721 -0.5000000
## 5 NA NA NA NA NA NA NA
## 6 0.9163347 0.7430279 0.9203187 -0.4960159 -0.5139442 0.7968127 -0.5000000
## RIDIT_17 RIDIT_18 RIDIT_19 RIDIT_20
## 1 NA NA NA NA
## 2 0.7549801 -0.4980080 0.3007968 0.6235060
## 3 NA NA NA NA
## 4 -0.5079681 0.8984064 -0.4780876 0.2450199
## 5 NA NA NA NA
## 6 -0.5079681 -0.4980080 0.6533865 -0.5079681
head(data$PAID_AMOUNT)
## [1] "" "$3,646 " "" "$594 " "" "$15,138 "
auto <- data
myData <- auto[1:502, 6:25]
4. Scatterplot of IND_01 vs RIDIT
attach(auto)
plot(auto[,7], auto[,27],
main = "Scatterplot of IND_01 vs RIDIT",
xlab = "IND_01",
ylab = "RIDIT_01",
pch = 19)

6. PCA using FactoMineR
# PCA on original data columns
res.pca <- PCA(auto[,6:25], scale.unit = TRUE, ncp = 5, graph = TRUE)


7. PCA with Qualitative Supplementary Variable