2404 respondents responded in a Mother’s Day survey with respect to their budgets and gifts to their mothers. The goal of this project is to explorarily and statistically anayze people’s plan on Mother’s Day, especially differences between mother respondents and non-mother respondents.
# Check if required packages are installed. If not, install them.
packages <- c("xlsx", "ggplot2", "reshape2", "scales", "dplyr", "tibble", "tidyr", "qdapTools", "knitr")
if (length(setdiff(packages, rownames(installed.packages() ) ) ) > 0) {
install.packages(setdiff(packages, rownames(installed.packages() ) ) )
}
lapply(packages, library, character.only = TRUE)
# import the data to R
setwd("F:\\Bin\\PlayR\\Mothers Day Survey Data")
dat <- read.xlsx("Research_Analyst_Hurdle.xlsx", sheetIndex = 2)
dat <- dat[,1:4]
dim(dat)
# Rename the columns
colnames(dat) <- c("ID", "mother", "budget", "gift")
names(dat)
summary(dat)
head(dat)
| ID | mother | budget | gift |
|---|---|---|---|
| 5578350 | Yes | $50-$100 | Jewelry (necklace, bracelet, ring, etc.) |
| 5580556 | No | $101-$150 | Flowers; Jewelry (necklace, bracelet, ring, etc.) |
| 5580593 | Yes | $151-$200 | Flowers; Spa-related gift (mani/pedi, massage, facial, etc.); Fashion/Apparel; Dining Experience (breakfast, lunch, brunch, etc.); A gift card |
| 5580602 | No | $50-$100 | Flowers; Greeting Card; Spa-related gift (mani/pedi, massage, facial, etc.); A gift card |
| 5580599 | Yes | $151-$200 | Time with family; Greeting Card; Fashion/Apparel; Dining Experience (breakfast, lunch, brunch, etc.) |
| 5580600 | No | Less than $50.00 | Flowers; Greeting Card |
Absolute and relative frequencies of buaget levels are calculated for the whole data and segemented (by respondent identity) data. Pie charts are used for data visualization.
# calculate the percetage (%) of each budget categories
Freq_table <- function(data) {
Freq <- cbind(table(data), prop.table(table(data)) )
colnames(Freq) <- c("Count", "Percentage")
Freq[, "Percentage"] <- round(Freq[, "Percentage"], digits = 3)
return(as.data.frame(Freq) )
}
budget.level <- c("Less than $50.00", "$50-$100", "$101-$150",
"$151-$200", "$201-$250", "$251-$300", "Over $300")
dat$budget <- ordered(dat$budget, levels = budget.level)
freq.table.all <- Freq_table(dat$budget)
freq.table.all <- rownames_to_column(freq.table.all, var = "Levels")
freq.table.all$Levels <- ordered(freq.table.all$Levels, levels = budget.level )
# Pie chart
# labels and embeded text position
freq.table.all <- freq.table.all %>%
mutate(percentile = Percentage*100,
labels.freq = paste0(percentile, "%"),
x.pos = c(1, 1, 1, 1.1, 1.2, 1.3, 1.4),
y.pos = 100 - (cumsum(percentile) - percentile/2 ) ) %>%
print
# use coord_polar(theta = "y") to create pie chart
budet.pie <- ggplot(freq.table.all, aes(x = "", y = percentile, fill = Levels) ) +
geom_bar(width = 1, stat = "identity") +
coord_polar(theta = "y") +
scale_fill_brewer("Budget Level") +
labs(x = "", y = "") +
ggtitle("Maximum Budgets for Mother's Day Gifts This Year\n(The Whole Data)") +
geom_text(aes(x = x.pos, y = y.pos, label = labels.freq ), col = "red", size = 4) +
theme(axis.ticks = element_blank(),
panel.grid = element_blank(),
axis.text = element_blank() ) +
theme(plot.title = element_text(size = 13, hjust = 0.5, face = "bold", color = "blue") ) +
theme(legend.text=element_text(size=10) ) +
theme(panel.background = element_rect(fill = "pink"))
# Segmentation Analysis Pie Chart (grid)
freq.table.segment <- function(data) {
freq.table <- cbind(table(data$budget), prop.table(table(data$budget)) )
colnames(freq.table) <- c("Count", "Percentage")
freq.table[, "Percentage"] <- round(freq.table[, "Percentage"], digits = 3)
freq.table <- as.data.frame(freq.table)
freq.table <- rownames_to_column(freq.table, var = "Levels")
freq.table$Levels <- ordered(freq.table$Levels, levels = budget.level)
freq.table$percentile <- freq.table$Percentage*100
freq.table$labels.freq = paste0(freq.table$percentile, "%")
freq.table$x.pos = c(1, 1, 1, 1.1, 1.2, 1.3, 1.4)
freq.table$y.pos = 100 - (cumsum(freq.table$percentile) - freq.table$percentile/2 )
return(freq.table)
}
library(plyr)
freq.table.by.mother <- ddply(dat, .(mother), freq.table.segment)
detach("package:plyr", unload=TRUE)
levels(freq.table.by.mother$mother) <- c("Non Mother", "Mother")
budet.pie.grid <- ggplot(freq.table.by.mother, aes(x = "", y = percentile, fill = Levels) ) +
geom_bar(width = 1, stat = "identity") +
facet_grid(facets=. ~ mother) +
coord_polar(theta = "y") +
scale_fill_brewer("Budget Level") +
labs(x = "", y = "") +
ggtitle("Maximum Budgets for Mother's Day Gifts This Year") +
geom_text(aes(x = x.pos, y = y.pos, label = labels.freq), col = "red", size = 6) +
theme(axis.ticks = element_blank(),
panel.grid = element_blank(),
axis.text = element_blank() ) +
theme(plot.title = element_text(size = 22, hjust = 0.5, face = "bold", color = "blue") ) +
theme(legend.text=element_text(size=10) ) +
theme(strip.text.x = element_text(size = 18, face = "bold", color = "seagreen") ) +
theme(panel.background = element_rect(fill = "pink"))
It seems that for majority people, the budgets for the gifts are less than $100. (34.7% less than $50 and 44.4% between $50 and $100)
It seems that non-monther respondents tend to have higher budgets than non-mother respondents. Expecially, there are 3% non-mother respondents willing to spend over $300 for Mother’s Day gifts, whereas there are only 1.2% mother respondents are as generous.
Absolute and relative frequencies of gift options are calculated for the whole data and segemented (by respondent identity) data. Bar Plots are used for data visualization.
# data cleaning (dirty data within one columns to multiple columns)
dat$gift <- gsub("\\n", "", dat$gift)
m1 <- (mtabulate(strsplit(as.character(dat$gift), "; "))!=0 )
dat <- bind_cols(dat, as.data.frame(m1) )
str(dat)
# frequency table and segemented frequency table
# frequency table
sum.gift <- colSums(dat[,-(1:4)])
sum.gift <- as.data.frame(sum.gift)
names(sum.gift) <- "total.number"
sum.gift <- rownames_to_column(sum.gift, var = "gift")
sum.gift$gift <- as.factor(sum.gift$gift)
sum.gift$gift <- relevel(sum.gift$gift, ref = "None of the above")
# segemented frequency table
library(plyr)
sum.gift.by.mother <- ddply(dat, .(mother), function(x) colSums(x[,-(1:4)]) )
detach("package:plyr", unload=TRUE)
sum.gift.by.mother <- as.data.frame(t(sum.gift.by.mother[,-1]) )
colnames(sum.gift.by.mother) <- c("Non Mother", "Mother")
sum.gift.by.mother <- rownames_to_column(sum.gift.by.mother, var = "gift")
sum.gift.by.mother$gift <- as.factor(sum.gift.by.mother$gift)
sum.gift.by.mother$gift <- relevel(sum.gift.by.mother$gift, ref = "None of the above")
sum.gift.by.mother.reshape <- gather(data = sum.gift.by.mother,
key = "mother.id", value = "total.number", -gift)
# Barplot
gift.bar <- ggplot(data = sum.gift, aes(x = gift, y = total.number) ) +
geom_bar(colour = "black", stat = "identity", fill = "pink", width = 0.8) +
labs(x = "Gifts", y = "Total Number of Respondents") +
ggtitle("Planned Gifts for Mother's Day (The Whole Data)") +
coord_flip() +
scale_y_continuous(breaks = seq(0, 1200, by = 200) ) +
geom_text(aes(label = total.number), hjust = 1.2, color = "black", size = 4) +
theme(plot.title = element_text(size = 20, face = "bold", color = "blue",
hjust = 0.5, vjust = 5)) +
theme(axis.title.y = element_text(face = "italic") ) +
theme(axis.title.x = element_text(face = "italic") ) +
theme(plot.margin = unit(c(1, 1, 1, 1), "cm")) +
theme(text = element_text(size = 16) ) +
theme(axis.text.x = element_text(face = "plain", color = "red", size = 12),
axis.text.y = element_text(face = "bold", color = "purple", size = 16) )
# Segmentation Analysis Barplot (stacked)
gift.bar.stack <- ggplot(data = sum.gift.by.mother.reshape,
aes(x = gift, y = total.number, fill = mother.id) ) +
geom_bar(color = "black", stat = "identity", position = "stack", width = 0.7) +
labs(x = "Gifts", y = "Total Number of Respondents",
fill = "Respondent Identity") +
ggtitle("Planned Gifts for Mother's Day") +
coord_flip() +
scale_y_continuous(breaks = seq(0, 1200, by = 200) ) +
geom_text(aes(label = total.number), hjust = 1, color = "black",
position = position_stack(vjust = 0.98), size = 4) +
theme(plot.title = element_text(size = 20, face = "bold", color = "blue",
hjust = 0.5, vjust = 5) ) +
theme(axis.title.y = element_text(face = "italic") ) +
theme(axis.title.x = element_text(face = "italic") ) +
theme(plot.margin = unit(c(1, 1, 1, 1), "cm") ) +
theme(text = element_text(size = 16) ) +
theme(axis.text.x = element_text(face = "plain", color = "red", size = 12),
axis.text.y = element_text(face = "bold", color = "purple", size = 16) )
It seems that greeting card is the most popular gift option for Mother’s Day. Flowers, dining experience and spending time with family are also favored by many people. This is consistent with the conclusion we made from the pie chart before, since these popular gifts are inexpensive. Extravagant gift options, such as jewelry and fashion/apparel are not very welcome.
The stacked bar plots do not show much differences between the distribution of gift options bwteen non-mother respondents and mother respondents. The seemingly tiny differences lie in the ratio of extravagant gift options (jewelry, spa-related gift and fashion/apparel, etc.) versus inexpensive gift options (greeting card, flowers, dining experience, etc. ). As concluded before, non-monther respondents tend to spend more money on the gifts and mother respondents care more about spiritual happiness.
Exploratory analysis dose give us some ituitive information that there might be some differences of maximum budgets between mother and non-mother. \(\chi^2\) Test will be used to statistical analyze if there is any significant assciation between maximum budgets and respondent identity.
melt.budget <- melt(freq.table.by.mother[, 1:3], id.vars = c("mother","Levels"))
contingency.budget <- dcast(melt.budget, formula = mother ~ Levels)
contingency.matrix.budget <- data.matrix(contingency.budget[, -1])
dimnames(contingency.matrix.budget) <- list(mother.id = contingency.budget$mother,
budget = freq.table.all$Levels)
chisq.test(contingency.matrix.budget)
Contingency Table of Maximum Budgets Between Mother and Non-mother
| Non Mother | Mother | |
|---|---|---|
| Less than $50.00 | 302 | 531 |
| $50-$100 | 413 | 655 |
| $101-$150 | 116 | 160 |
| $151-$200 | 56 | 59 |
| $201-$250 | 18 | 27 |
| $251-$300 | 9 | 13 |
| Over $300 | 28 | 17 |
\(\chi^2\) Test of Assciation Between Maximum Budgets and Respondent Identity
Pearson's Chi-squared test
data: contingency.matrix.budget
X-squared = 18.484, df = 6, p-value = 0.00513
\(H_0\): there is no significant association between maximum budgets and respondent identity
\(H_1\): there is significant association between maximum budgets and respondent identity
For \(\chi^2\) Test, test statistic = 18.484, p-value = 0.00513.
Thus, statistically speaking, non-mother and mother have different maximum budgets on Mother’s Day gifts.
Similarly, Exploratory analysis dose give us some ituitive information that there might be some differences of maximum budgets between mother and non-mother. \(\chi^2\) Test will be used to statistical analyze if there is any significant assciation between gift options and respondent identity.
melt.gift <- melt(freq.table.by.mother[, 1:3], id.vars = c("mother","Levels"))
contingency.gift <- dcast(melt.gift, formula = mother ~ Levels)
contingency.matrix.gift <- data.matrix(sum.gift.by.mother[, -1])
dimnames(contingency.matrix.gift) <- list(gift = sum.gift.by.mother$gift,
mother.id = colnames(contingency.matrix.gift) )
chisq.test(contingency.matrix.gift)
Contingency Table of Gift Options Between Mother and Non-mother
| Non Mother | Mother | |
|---|---|---|
| A gift card | 292 | 494 |
| Dining Experience (breakfast, lunch, brunch, etc.) | 446 | 525 |
| Fashion/Apparel | 145 | 185 |
| Flowers | 430 | 564 |
| Greeting Card | 459 | 646 |
| Jewelry (necklace, bracelet, ring, etc.) | 197 | 250 |
| None of the above | 40 | 71 |
| Spa-related gift (mani/pedi, massage, facial, etc.) | 207 | 266 |
| Time with family | 328 | 513 |
\(\chi^2\) Test of Assciation Between Gift Options and Respondent Identity
Pearson's Chi-squared test
data: contingency.matrix.gift
X-squared = 21.123, df = 8, p-value = 0.006828
\(H_0\): there is no significant association between gift options and respondent identity
\(H_1\): there is significant association between gift options and respondent identity
For \(\chi^2\) Test, test statistic = 21.123, p-value = 0.006828.
Thus, statistically speaking, non-mother and mother have different gift options on Mother’s Day.
Generally, for majority people, the budgets for the gifts are less than $100 and greeting card is the most popular gift option for Mother’s Day.
There are statistically significant diffrerences of budgets and gift options among respondents: non-monther respondents tend to spend more money on the gifts, while mother respondents care more about spiritual happiness.