Tipping culture is interesting to observe, since it reflects the generosity and social norms of a country. Particularly in the US, where the social pressure to tip is high, it’s intriguing to examine differences in tipping habits. Furthermore, it is interesting to see how these norms differ between the genders, and if there is a difference in “generousity” of the genders. Hence, this research question arises: Is there a difference between the relative amounts tipped by men and women?
This research question is concerning the whole population of men and women in the US. However, this population cannot be easily examined. A lot of transaction data is not available, or not accessible, and gathering it would be very resource consuming. Therefore, a sample must be conducted to investigate this research question. This sample should focus on two variables of interest: the gender of each observation and the corresponding relative amount tipped.
The data is imported with the following function and the first 6 rows
are displayed by head().
dt <- read.csv("./tip.csv", header = TRUE, sep = ",", dec = ".")
head(dt)
## total_bill tip sex smoker day time size
## 1 16.99 1.01 Female No Sun Dinner 2
## 2 10.34 1.66 Male No Sun Dinner 3
## 3 21.01 3.50 Male No Sun Dinner 3
## 4 23.68 3.31 Male No Sun Dinner 2
## 5 24.59 3.61 Female No Sun Dinner 4
## 6 25.29 4.71 Male No Sun Dinner 4
Furthermore, all non-numeric variables are factorised.
dt$sex <- factor(dt$sex,
levels = c("Female", "Male"),
labels = c("Female", "Male"))
dt$smoker <- factor(dt$smoker,
levels = c("No", "Yes"),
labels = c("No", "Yes"))
dt$day <- factor(dt$day,
levels = c("Thur", "Fri", "Sat", "Sun"),
labels = c("Thur", "Fri", "Sat", "Sun"))
dt$time <- factor(dt$time,
levels = c("Lunch", "Dinner"),
labels = c("Lunch", "Dinner"))
Explanation of all variables:
total_bill: Variable defining the total amount of the
bill in US$ (numeric)tip: Variable defining the amount of the tip in US$
(numeric)sex: Variable defining the gender of an observation -
Female or Male (string)smoker: Variable defining if an observation is a smoker
or not - No or Yes (string)day: Variable defining the day of the week the
observation occurred - Thur, Fri, Sat, or Sun (string)time: Variable defining at what time the dinner of a
specific observation took place - Lunch or Dinner (string)size: Variable defining how big the group of dinner was
for a specific observation in number of people (numeric)This dataset contains a lot of useful variables; however, not the most important. There is no variable for the relative amount of tip in comparison to price paid. However, this variable is quite important to make tips from different total amounts comparable. Therefore, this variable is created with the following code:
dt$tip_percent <- dt$tip/dt$total_bill
This variable is now how many percent of the bill were given as a tip.
The unit of observation is a group of customers in a restaurant on a given day. The sample size of this dataset can be determined using the following code:
nrow(dt)
## [1] 244
We see that this sample has 244 observations.
The sample is from kaggle and can be found here.
Descriptive Statistics
When using the describe() function from the psych
library, it is only sensible to use numeric data. Therefore, I will run
this command excluding non-numeric data.
library(psych)
describe(dt[sapply(dt, is.numeric)])
## vars n mean sd median trimmed mad min max range skew
## total_bill 1 244 19.79 8.90 17.80 18.73 7.46 3.07 50.81 47.74 1.12
## tip 2 244 3.00 1.38 2.90 2.84 1.33 1.00 10.00 9.00 1.45
## size 3 244 2.57 0.95 2.00 2.42 0.00 1.00 6.00 5.00 1.43
## tip_percent 4 244 0.16 0.06 0.15 0.16 0.05 0.04 0.71 0.67 3.31
## kurtosis se
## total_bill 1.14 0.57
## tip 3.50 0.09
## size 1.63 0.06
## tip_percent 26.31 0.00
What we can see from this output is that we have four numeric
variables, and for each observation, there is a value for each of those
variables (always n=244). Let me take total_bill to
describe what the individual parameters mean.
As already discussed, n reflects the number of
observations of that variable in the sample. So in total, there are 244
different observations for total_bill.
The mean of total_bill is about the average value of the
observations, calculated by dividing the sum of all values by the number
of values. Here, the mean of 19.79 means that on average, an observation
in this sample paid $19.79.
Sd stands for standard deviation and is a measure of spread. It gives information about how spread the data is around the sample mean. In this case, 68% of the data is within one standard deviation of the mean (19.79 ± 8.9).
The median gives us information about the true middle of the
observations. This means that for total_bill, 50% of
observations have a value above 17.8, and 50% of observations have a
value below 17.8.
The mad is about the mean absolute difference, so the average absolute distance datapoints are away from the mean. The value 7.46 means that on average, every observation is absolutely 7.46 away from the mean.
Trimmed stands for trimmed mean, so the mean of all observations excluding outliers.
Min and max are pretty self-explanatory; in this case, 3.07 is the lowest price for a meal paid and 50.81 is the highest price for a meal paid. Range is the absolute difference of min and max.
The skew defines if the data for this variable is skewed, e.g., that the mean and the median are not the same. Since the value 1.12 is a positive value, there is a positive skew, meaning that the data is skewed to the right. For data about spending, this is a pretty classical thing that happens.
Kurtosis can be seen as a measure of how “flat” a distribution is. The higher the values, the more spread out a distribution is and the more values are in the tails of the distribution.
Lastly, the se, the standard error, is a measure of how dispersed the sample mean estimation is around the true mean.
Summary Statistics
For non-numeric variables it is best to analyse frequencies of observations to get an overview.
summary(dt[!sapply(dt, is.numeric)])
## sex smoker day time
## Female: 87 No :151 Thur:62 Lunch : 68
## Male :157 Yes: 93 Fri :19 Dinner:176
## Sat :87
## Sun :76
The variable of interest is the proportion of a tip in relation to the total amount of the bill. This variable is numeric and continuous. Within this variable, I hypothesize differences between two groups, namely men and women. Since this grouping variable is categorical with two factors, we require a t-test. The measures in this sample are independent of each other, and each observation is only measured once. Therefore, an independent t-test, or if the assumptions are not met, the adequate non-parametric alternative, should be used for analysis.
Normality Assumption
The variable of interest, in this case, tip percentage of total bill, must be normally distributed for each of the two populations. Since we do not have population data, we use sample data. To test normality for each of the groups, one uses a Shapiro-Wilk normality test. This test has normality as the null hypothesis.
library(rstatix)
library(dplyr)
result_sw <- dt %>%
group_by(sex) %>%
shapiro_test(tip_percent)
print(result_sw)
## # A tibble: 2 × 4
## sex variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 Female tip_percent 0.898 4.72e- 6
## 2 Male tip_percent 0.745 3.22e-15
The groupwise Shapiro-Wilk test showed that the distribution of the tip percentage among females, as well as the distribution of tip percentage among males, departed significantly from normality (W = 0.898, p < 0.001; W = 0.745, p < 0.001). Therefore, we can reject the null hypothesis of normality for both groups. This already indicates that I have to use a non-parametric test; however, it might still be good practice to test the other assumptions.
Independence Assumption
It is somewhat safe to assume that the data comes from two independent samples. It is almost impossible that a person is in both gender groups. However, since this data was collected on multiple days, we cannot exclude the possibility that some people ate two times in a row at this place and hence were sampled two times. Nevertheless, for now, we just assume that this was not the case.
Equal Variance
To test equal variance of two samples Levene’s test is applied. This test has equal variance as null hypothesis. Levene’s test can be conducted the following way.
library(car)
leveneTest(dt$tip_percent, dt$sex)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.4592 0.4986
## 242
We cannot reject the null hypothesis of Levene’s Test of equal variances at a 5% alpha level, F(1, 242) = 0.46, p = 0.5. Therefore, equal variance can be assumed. However, since the normality assumption has been rejected anyway, I need to use a non-parametric alternative.
Graphical Analysis
It is also possible to analyze the normality and the independence assumption graphically using a histogram. We can do that by using the following code:
library(ggplot2)
gg_female <- ggplot(dt[dt$sex == "Female",], aes(x = tip_percent))+
geom_histogram(binwidth = 0.005)+
ggtitle("Female")+
xlab("Tip in percent")+
xlim(0,0.9)
gg_male <- ggplot(dt[dt$sex == "Male",], aes(x = tip_percent))+
geom_histogram(binwidth = 0.005)+
ggtitle("Male")+
xlab("Tip in percent")+
xlim(0,0.9)
library(ggpubr)
ggarrange(gg_female, gg_male,
nrow = 2)
What we can see in this graphical analysis is consistent with what we observed with the tests above. We can see that we have to reject the normality assumption because the histograms for both groups do not appear very normally distributed. The histogram for females looks somewhat bimodal, and the one for males has multiple peaks. Additionally, both histograms have at least one outlier.
Regarding the equal variance assumption, because the x-axis has the same length across both plots and because they are arranged on top of each other, it is quite easy to visually compare them. We can somewhat observe that the deviation in both plots is similar, which supports the findings of Levene’s test that there is indeed somewhat equal variance.
Nevertheless, since we cannot assume normality, we require a Wilcoxon rank sum test to analyze this relationship.
The Wilkoxon rank sum test can be performed in the following way:
wilcox.test(dt$tip_percent ~ dt$sex,
paired = FALSE,
correct = FALSE,
exact = FALSE,
alternative = "two.sided")
##
## Wilcoxon rank sum test
##
## data: dt$tip_percent by dt$sex
## W = 7619, p-value = 0.1349
## alternative hypothesis: true location shift is not equal to 0
The null hypothesis for the Wilcoxon rank sum test states that the two observations have the same location of distribution. With p = 0.135, we fail to reject the null hypothesis at a 5% alpha level. Therefore, we fail to find a significant difference in the location of distribution. This can be illustrated with the below density plot of both distributions.
ggplot(dt, aes(x = tip_percent, color = sex, fill = sex)) +
geom_density(size = 1, alpha = 0.5) +
labs(title = "Density Plot of Tip Percent by Sex",
x = "Tip Percent",
y = "Density")
What we can see from this plot is that the distributions are pretty much overlapping. Therefore, the Wilcoxon rank sum test cannot find sufficient evidence for a difference here.
The effect size can be calculated and interpreted as follows:
library(effectsize)
effectsize(wilcox.test(dt$tip_percent ~ dt$sex,
paired = FALSE,
correct = FALSE,
exact = FALSE,
alternative = "two.sided"))
## r (rank biserial) | 95% CI
## ---------------------------------
## 0.12 | [-0.04, 0.26]
interpret_rank_biserial(0.12)
## [1] "small"
## (Rules: funder2019)
We can see that the effect size with r = 0.12 is small, based on the rules by Funder (2009).
Research Question:
Is there a difference between the relative amounts tipped by men and women?
To examine this research question, a Wilcoxon rank sum test was applied. This test was used because the normality assumption, tested with a groupwise Shapiro-Wilk test (W = 0.898, p < 0.001; W = 0.745, p < 0.001), was violated for both groups, and therefore, a non-parametric test for the two-sample t-test was required. The Wilcoxon rank sum test found p = 0.5, and hence we fail to reject the null hypothesis. Therefore, we cannot find a difference in relative amounts tipped by men and women.
Note: