Data
This time we are going to use a typical credit scoring data with
predefined “default” variables and personal demografic and income data.
Please take a look closer at headers and descriptions of each
variable.
Scatterplots
First let’s visualize our quantitative relationships using
scatterplots.

Normalizing the skewed distribution of incomes using log &
Plot

Estimated Density Plotes
To see more closely if there any differences between those two
distributions adding their estimated density plots



Plots Together:

Giving Density Curves to Scatterplots

## `geom_smooth()` using formula = 'y ~ x'

Correlation coefficients - P.L correlation:
## [1] -0.02677729
Percentage of the explained variability:
## [1] 0.07170234
Difference between that one and the S-P coefficient:
## [1] 0.6018467
How can we interpret the obtained partial correlation coefficient?
What is the difference between that one and the semi-partial
coefficient:
## [1] 0.6018467
Rank correlation
For 2 different scales - like for example this pair of variables:
income vs. education levels - we cannot use Pearson’s coefficient. The
only possibility is to rank also incomes… and lose some more detailed
information about them.
First, let’s see boxplots of income by education levels.

Kendal’s coefficient of rank correlation:
#(robust for ties))
## [1] -0.01224209
Point-biserial correlation

Comparing QVar and DVar
## [1] -0.07096966
Eta Coefficient:
## [1] 0.708378
Correlation matrix

Correlation matrix with scatterplots
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [[1]]

##
## [[2]]

##
## [[3]]

##
## [[4]]

##
## [[5]]

##
## [[6]]

##
## [[7]]

## Warning in warn_if_args_exist(list(...)): Extra arguments: "aes_string" are
## being ignored. If these are meant to be aesthetics, submit them using the
## 'mapping' variable within ggpairs with ggplot2::aes or ggplot2::aes_string.

Exercise 1. Contingency analysis.
## Believe
## Gender Yes No
## Female 435 375
## Male 147 134

## Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels,
## w.exp = w.exp)
##
## Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries
## lower estimate upper
## unweighted kappa -0.043 0.011 0.065
## weighted kappa -0.043 0.011 0.065
##
## Number of subjects = 1091
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dane
## X-squared = 0.11103, df = 1, p-value = 0.739
## Believe
## Gender Yes No
## Female 0.3987168 0.3437214
## Male 0.1347388 0.1228231
## [1] 0.01218871


Exercise 2. Contingency analysis for the ‘Titanic’ data.
Dropping rows:
# Function to Drop NA
titanic <- titanic %>% drop_na()
titanic$Status <- as.factor(titanic$Status)
titanic$Gender <- as.factor(titanic$Gender)
# Create Contingency Table and Perform Chi-Square Test
contingency_table <- table(titanic$Gender, titanic$Status)
chi_square_test <- chisq.test(contingency_table)
## Warning in chisq.test(contingency_table): Chi-squared approximation may be
## incorrect
Creating Contingency Table
Phi(contingency_table)
## [1] 0.5257638
ContCoef(contingency_table)
## [1] 0.4653638
CramerV(contingency_table)
## [1] 0.5257638
TschuprowT(contingency_table)
## [1] 0.4421129
Ploting mosaicplot:
mosaicplot(contingency_table)

Ploting bar plot:
barplot(contingency_table)
### According to the data, we can see it was better to be a woman.
