Testing the
normality of the data and Applying Shapiro Wilk test
The p-p plot for different payment methods and Applying Shapiro Wilk
test to check the normality
nor.test(df$TotalCharges~PaymentMethod, data = df)
##
## Shapiro-Wilk Normality Test (alpha = 0.05)
## --------------------------------------------------
## data : df$TotalCharges and PaymentMethod
##
## Level Statistic p.value Normality
## 1 Bank transfer (automatic) 0.9228526 2.123728e-27 Reject
## 2 Credit card (automatic) 0.9175288 5.073903e-28 Reject
## 3 Electronic check 0.8496637 9.382990e-43 Reject
## 4 Mailed check 0.7103853 3.209453e-46 Reject
## --------------------------------------------------

The Shapiro wilk test shows the p-values for all the categories are
less than 0.05 and the tests are significant. Therefore, the normality
assumption is not met for the data.
Testing for
homogeneity
# Applying Levene's test
leveneTest(df$TotalCharges ~ df$PaymentMethod, data = df)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 218.93 < 2.2e-16 ***
## 7028
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- From the Levene’s test it is clear that the variances among the four
groups are not similar as the p-value is less than 0.001.
- The homogeniety assumprion is also not met for this data.
- Therefore instead of ANOVA parametric test, Kruskal Wallis non
parametric test will be Used.
Applying the
Kruskal-Wallis test
kruskal.test(df$TotalCharges ~ df$PaymentMethod, data = df)
##
## Kruskal-Wallis rank sum test
##
## data: df$TotalCharges by df$PaymentMethod
## Kruskal-Wallis chi-squared = 1077, df = 3, p-value < 2.2e-16
The p-value reported from the Kruskal Wallis test is less than 0.001,
therefore the null hypotheis is rejected and it is concluded that at
least one mean total charge is different from others.
Posthoc
Analysis
To observe which categories are different based on their mean total
charges, a paired post hoc analysis is applied.
pairwise.wilcox.test(df$TotalCharges, df$PaymentMethod, p.adjust.method = "BH")
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: df$TotalCharges and df$PaymentMethod
##
## Bank transfer (automatic) Credit card (automatic)
## Credit card (automatic) 0.7 -
## Electronic check <2e-16 <2e-16
## Mailed check <2e-16 <2e-16
## Electronic check
## Credit card (automatic) -
## Electronic check -
## Mailed check <2e-16
##
## P value adjustment method: BH
Explanation:
The total usage is highest in th ebank transfers and credit card
users. The lowest total charges are observed in the mailed check group.
A segmentation by payment method can be recommended as their behavious
and usages are totally different except bank transfers and credit card
holders.Further more mailed check customers can be manipulated to use
the other three methods to more digitization.