Shopping Mall Hypothesis Test

Venetia Polyzou

mydata <- read.table("./Shopping Mall Customer Segmentation Data2 .csv",
                     header = TRUE,
                     sep = ",",
                     dec = ",")

head(mydata)

##                            Customer.ID Age Gender Annual.Income Spending.Score
## 1 d410ea53-6661-42a9-ad3a-f554b05fd2a7  30   Male        151479             89
## 2 1770b26f-493f-46b6-837f-4237fb5a314e  58 Female        185088             95
## 3 e81aa8eb-1767-4b77-87ce-1620dc732c5e  62 Female         70912             76
## 4 9795712a-ad19-47bf-8886-4f997d6046e3  23   Male         55460             57
## 5 64139426-2226-4cd6-bf09-91bce4b4db5e  24   Male        153752             76
## 6 7e211337-e92f-4140-8231-5c9ac7a2aa12  42   Male        158335             40

General:

Unit of observation:an individual customer at a shopping mall
Sample size:200 observations(customers)

Variables:

A unique identifier for each customer
Type:Categorical-nominal

Age:

The age of the costumers.(in years)
Type:Numeric-ratio

Gender:

Gender of the costumer, categories:male/female
Type:Categorical-nominal

Annual Income:

Annual Income(money) of the costumer in United States Dollar(USD)
Type:Numeric-ratio

Spending Score:

A score assigned by the mall (typically from 1 to 100) based on customer behavior and spending patterns.
Type:numerical-ratio

Source of the data: kaggle.com (https://www.kaggle.com/datasets/zubairmustafa/shopping-mall-customer-segmentation-data)

mydata$GenderF <- factor(mydata$Gender, 
                         levels = c("Male", "Female"),
                         labels = c("Male", "Female"))

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
mydata <- mydata %>%
  rename(annual.income = Annual.Income) %>%
  drop_na()

mydataF <- mydata[mydata$annual.income > 150000 , ]

library(dplyr)
mydata2F <- mydata %>%
  filter(Gender == "Male" )

mydata3 <- mydata[mydata$Spending.Score >= 60 & mydata$Spending.Score <= 80 , ]

summary(mydata[ ,c(-1,-3,-6)])

##       Age        annual.income    Spending.Score  
##  Min.   :18.00   Min.   : 22655   Min.   :  1.00  
##  1st Qu.:32.50   1st Qu.: 69202   1st Qu.: 27.00  
##  Median :48.00   Median :111526   Median : 45.00  
##  Mean   :49.57   Mean   :112493   Mean   : 48.38  
##  3rd Qu.:65.00   3rd Qu.:157317   3rd Qu.: 73.50  
##  Max.   :90.00   Max.   :199879   Max.   :100.00

Calculated the descriptive statistics for the numeric variables, excluding the ID and the categorical variable of Gender.

The mean of the age is 49.57, which means that the average age of a consumer of the shopping mall is 49.57 years old. Also, the minimum is 18.00, which means that the minimum age of a consumer(the youngest) is 18 years old.
The median annual income is 111526, which means that half of the consumers of the shopping mall have an annual income of 111526 or less and the other half have more.
The 3rd Quartile is 73.50, which means that 75% have a spending score of 73.5 or lower(so they are spending at a moderate to lower level) and only 25% of the costumers have a spending score greater than 73.5(the top 25% of costumers are big spenders with scores above 73.5).
The maximum of the annual income is 199879, which means that the wealthiest costumer of the dataset of this shopping mall earns 199879 per year.

library(psych)
describeBy(mydata$Spending.Score, g = mydata$GenderF)

## 
##  Descriptive statistics by group 
## group: Male
##    vars  n  mean    sd median trimmed   mad min max range skew kurtosis   se
## X1    1 93 46.85 28.85     42   46.03 35.58   1 100    99 0.23    -1.09 2.99
## ------------------------------------------------------------ 
## group: Female
##    vars   n  mean   sd median trimmed   mad min max range skew kurtosis   se
## X1    1 106 49.72 28.3   50.5   49.78 34.84   1  98    97 0.02    -1.16 2.75

Research question:

We wonder if there are differences between males and females in their spending score at the shopping mall.

We will use the independent sample t-test, because we have one numeric variable(Spending. Score) and one factorial that has 2 independent groups(Gender:Male/Female).

According to the descriptive statistics of the sample, we can observe that the two means of the sample are different, because the mean of the spending score of males is 46.85 and the mean of the spending score of females is 49.72. However, we do not know yet if this difference is statistically significant to generalize also for the population. So, we need to do the independent sample t-test. Nevertheless, some specific assumptions, which we will examine below, should be met.

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

ggplot(mydata, aes(x = Spending.Score)) +
  geom_histogram(binwidth = 30, colour = "pink") +
  facet_wrap(~GenderF, ncol = 1) +
  ylab("Frequency")

According to the above histograms the distributions of the spending score for both males and females do not seem to be normal.

The first histogram about the males appear to be a bit platykurtic with a mild negative skewness.
The second histogram about the females, also seems to be skewed to the left(negatively).

In order to be sure we will do the two Shapiro tests, one for the normality of the spending score of males and one for the normality of the spending score of females.

##install.packages("rstatix")
library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

library(dplyr)
mydata %>%
  group_by(GenderF) %>%
  shapiro_test(Spending.Score)

## # A tibble: 2 × 4
##   GenderF variable       statistic       p
##   <fct>   <chr>              <dbl>   <dbl>
## 1 Male    Spending.Score     0.953 0.00210
## 2 Female  Spending.Score     0.957 0.00177

According to the 2 Shapiro tests above the Hypotheses are:

For males:

H0: Spending score is normally distributed for males.
H1: Spending score is not normally distributed for males.

The null hypothesis (H0) is rejected at the p value = 0.003, so we assume that the distribution is not normal.

For females:

H0: Spending score is normally distributed for females.
H1: Spending score is not normally distributed for females.

The null hypothesis (H0) is rejected at the p value = 0.002, so we assume that the distribution is not normal.

So, the assumption of normality is being violated.

##install.packages("ggpubr")
library(ggpubr)

ggqqplot(mydata,
         "Spending.Score",
         facet.by = "GenderF")

According to the 2 Quantile -Quantile plots:

We can confirm the violation of normality, because not all of the points lie in the gray area(the dispersion is not small).

So, according to the above three ways(histograms, Shapiro Tests, Quantile - Quantile plots) we assume the violation of normality for both groups and that is why we need to use the non - parametric test of Wilcoxon rank - sum test.

##install.packages("car")
library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

## The following object is masked from 'package:dplyr':
## 
##     recode

leveneTest(mydata$Spending.Score, group = mydata$GenderF)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   1  0.0027 0.9588
##       197

We also use the Levene’s test to check the other assumption of the homogenity of variances for the spending score of each group.

For the Levene’s test the Hypotheses are:

H0 : σ^2(males) = σ^2(females)
Η1 : σ^2(males) ≠ σ^2(females)

We can’t(don’t have enough evidence to) reject the null Hypothesis(H0), meaning that we will assume that the variances of the spending score for males and females are the same. As a result, we do not need to do the Welch correction.

t.test(mydata$Spending.Score ~ mydata$GenderF,
       var.equal = TRUE,
       alternative = "two.sided")

## 
##  Two Sample t-test
## 
## data:  mydata$Spending.Score by mydata$GenderF
## t = -0.70671, df = 197, p-value = 0.4806
## alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
## 95 percent confidence interval:
##  -10.869360   5.134322
## sample estimates:
##   mean in group Male mean in group Female 
##             46.84946             49.71698

Due to the fact that the assumption of normality is not met, the non - parametric test of Wilcoxon - rank sum test should be done. However,only for the needs of this home assignment we will also do the parametric test of the independent samples t-test.

Independent samples t-test Hypotheses:

H0: μ(males) = μ(females)
H1: μ(males) ≠ μ(females)

We can’t(don’t have enough evidence to) reject the null Hypothesis(H0), so we can’t say that the mean of the spending score of males is different from the mean of the spending score of females.

##install.packages("effectsize")
library(effectsize)

## 
## Attaching package: 'effectsize'

## The following objects are masked from 'package:rstatix':
## 
##     cohens_d, eta_squared

## The following object is masked from 'package:psych':
## 
##     phi

effectsize::cohens_d(mydata$Spending.Score ~ mydata$GenderF,
                     pooled_sd = FALSE)

## Cohen's d |        95% CI
## -------------------------
## -0.10     | [-0.38, 0.18]
## 
## - Estimated using un-pooled SD.

interpret_cohens_d(0.10, rules = "sawilowsky2009")

## [1] "very small"
## (Rules: sawilowsky2009)

We observe that the effectsize of the independent samples t-test is 0.10 which is interpreted as “very small”, which is logic since we did not have enough evidence to reject the null hypothesis(the difference between the two means was not statistically significant).

wilcox.test(mydata$Spending.Score ~ mydata$GenderF,
            correct = FALSE,
            exact = FALSE,
            alternative = "two.sided")

## 
##  Wilcoxon rank sum test
## 
## data:  mydata$Spending.Score by mydata$GenderF
## W = 4644, p-value = 0.4819
## alternative hypothesis: true location shift is not equal to 0

However, due to the fact that the assumption of the normality is being violated we should use the non-parametric test of Wilcoxon rank-sum test.

Wilcoxon rank-sum test:

H0:The distribution location of the spending score is the same for males and females.
H1:The distribution location of the spending score is not the same for males and females.

We can’t(don’t have enough evidence to) reject the null Hypothesis(H0). So, we can’t say that the local distributions of the spending score for males and females are different.

effectsize(wilcox.test(mydata$Spending.Score ~ mydata$GenderF,
                       correct = FALSE,
                       exact = FALSE,
                       alternative = "two.sided"))

## r (rank biserial) |        95% CI
## ---------------------------------
## -0.06             | [-0.22, 0.10]

The effectsize of the non-parametric Wilcoxon rank-sum test is 0.006, which is interpreted as “very small”, which is logic since we did not have enough evidence to reject the null hypothesis(the difference between the two means was not statistically significant).

interpret_rank_biserial(0.06)

## [1] "very small"
## (Rules: funder2019)

Summing up: **Research question:

We wonder if there are differences between males and females in their spending score at the shopping mall.**

We will use the independent sample t-test because, we have one numeric variable(Spending. Score) and one factorial that has 2 independent groups(Gender:Male/Female). Also, we need to check the assumption of the normality of the spending score for each group(both males and females) in order to know if we will use the parametric or the non-parametric test. Lastly, we need to check the homogeneity of the variances, to know if we will do the Welch correction or not.

So:

We examined the normality through the histograms, the Quantile-Quantile plots and the Shapiro tests and we concluded that it is being violated.
We also tested the homogeneity of variances and we assumed that the variances of the spending score for males and females are the same. So, we do not need to do the Welch correction.
Although the normality is being violated, we did the parametric independent sample t-test, only for the reasons of this home assignment.
However, in my case the non-parametric test of Wilcoxon rank sum test is more appropriate, because the assumption of normality is being violated.
According to the Wilcoxon rank sum test we can’t(don’t have enough evidence to) reject the null Hypothesis(H0). So, we can’t say that the local distributions of the spending score for males and females are different.

Shopping Mall Hypothesis Test

2025-03-26

Venetia Polyzou