Catalog

 
gift    

Summary


2404 respondents responded in a Mother’s Day survey with respect to their budgets and gifts to their mothers. The goal of this project is to explorarily and statistically anayze people’s plan on Mother’s Day, especially differences between mother respondents and non-mother respondents.

 
 

Import Data


# Check if required packages are installed. If not, install them.

packages <- c("xlsx", "ggplot2", "reshape2", "scales", "dplyr", "tibble", "tidyr", "qdapTools", "knitr")
if (length(setdiff(packages, rownames(installed.packages() ) ) ) > 0)   {
  install.packages(setdiff(packages, rownames(installed.packages() ) ) )  
}

lapply(packages, library, character.only = TRUE)

# import the data to R

setwd("F:\\Bin\\PlayR\\Mothers Day Survey Data")
dat <- read.xlsx("Research_Analyst_Hurdle.xlsx", sheetIndex = 2)
dat <- dat[,1:4]
dim(dat)

# Rename the columns

colnames(dat) <- c("ID", "mother", "budget", "gift")
names(dat)
summary(dat)
head(dat)

 
 
 

Raw Data (First 6 rows)


ID mother budget gift
5578350 Yes $50-$100 Jewelry (necklace, bracelet, ring, etc.)
5580556 No $101-$150 Flowers; Jewelry (necklace, bracelet, ring, etc.)
5580593 Yes $151-$200 Flowers; Spa-related gift (mani/pedi, massage, facial, etc.); Fashion/Apparel; Dining Experience (breakfast, lunch, brunch, etc.); A gift card
5580602 No $50-$100 Flowers; Greeting Card; Spa-related gift (mani/pedi, massage, facial, etc.); A gift card
5580599 Yes $151-$200 Time with family; Greeting Card; Fashion/Apparel; Dining Experience (breakfast, lunch, brunch, etc.)
5580600 No Less than $50.00 Flowers; Greeting Card

 
 
 

Task 1: Analysis of Maximum Budgets for Mother’s Day Gifts


Absolute and relative frequencies of buaget levels are calculated for the whole data and segemented (by respondent identity) data. Pie charts are used for data visualization.

 

# calculate the percetage (%) of each budget categories

Freq_table <- function(data) {
  Freq <- cbind(table(data), prop.table(table(data)) )
  colnames(Freq) <- c("Count", "Percentage")
  Freq[, "Percentage"] <- round(Freq[, "Percentage"], digits = 3)
  return(as.data.frame(Freq) )
}

budget.level <- c("Less than $50.00", "$50-$100", "$101-$150", 
                  "$151-$200", "$201-$250", "$251-$300", "Over $300")

dat$budget <- ordered(dat$budget, levels =  budget.level)
freq.table.all <- Freq_table(dat$budget)
freq.table.all <- rownames_to_column(freq.table.all, var = "Levels")
freq.table.all$Levels <- ordered(freq.table.all$Levels, levels = budget.level )

# Pie chart

# labels and embeded text position

freq.table.all <- freq.table.all %>% 
  mutate(percentile = Percentage*100,
         labels.freq = paste0(percentile, "%"),
         x.pos = c(1, 1, 1, 1.1, 1.2, 1.3, 1.4),
         y.pos = 100 - (cumsum(percentile) - percentile/2 ) ) %>% 
  print
  

# use coord_polar(theta = "y") to create pie chart

budet.pie <- ggplot(freq.table.all, aes(x = "", y = percentile, fill = Levels) ) +
  geom_bar(width = 1, stat = "identity") +
  coord_polar(theta = "y") +
  scale_fill_brewer("Budget Level") +
  labs(x = "", y = "") +
  ggtitle("Maximum Budgets for Mother's Day Gifts This Year\n(The Whole Data)") +
  geom_text(aes(x = x.pos, y = y.pos, label = labels.freq ), col = "red", size = 4)  +
  theme(axis.ticks = element_blank(), 
        panel.grid = element_blank(), 
        axis.text  = element_blank() ) +
  theme(plot.title = element_text(size = 13, hjust = 0.5, face = "bold", color = "blue") ) +
  theme(legend.text=element_text(size=10) )  +
  theme(panel.background = element_rect(fill = "pink")) 


# Segmentation Analysis Pie Chart (grid)

freq.table.segment <- function(data) {
  freq.table <- cbind(table(data$budget), prop.table(table(data$budget)) )
  colnames(freq.table) <- c("Count", "Percentage")
  freq.table[, "Percentage"] <- round(freq.table[, "Percentage"], digits = 3)
  freq.table <- as.data.frame(freq.table)
  freq.table <- rownames_to_column(freq.table, var = "Levels")
  freq.table$Levels <- ordered(freq.table$Levels, levels = budget.level)
  freq.table$percentile <- freq.table$Percentage*100
  freq.table$labels.freq = paste0(freq.table$percentile, "%")
  freq.table$x.pos = c(1, 1, 1, 1.1, 1.2, 1.3, 1.4)
  freq.table$y.pos = 100 - (cumsum(freq.table$percentile) - freq.table$percentile/2 )
  return(freq.table)
}


library(plyr)
freq.table.by.mother <- ddply(dat, .(mother), freq.table.segment)
detach("package:plyr", unload=TRUE)

levels(freq.table.by.mother$mother) <- c("Non Mother", "Mother")

budet.pie.grid <- ggplot(freq.table.by.mother, aes(x = "", y = percentile, fill = Levels) ) +
  geom_bar(width = 1, stat = "identity") +
  facet_grid(facets=. ~ mother) +
  coord_polar(theta = "y") +
  scale_fill_brewer("Budget Level") +
  labs(x = "", y = "") +
  ggtitle("Maximum Budgets for Mother's Day Gifts This Year") +
  geom_text(aes(x = x.pos, y = y.pos, label = labels.freq), col = "red", size = 6)  +
  theme(axis.ticks = element_blank(), 
        panel.grid = element_blank(), 
        axis.text  = element_blank() ) +
  theme(plot.title = element_text(size = 22, hjust = 0.5, face = "bold", color = "blue") ) +
  theme(legend.text=element_text(size=10) ) +
  theme(strip.text.x = element_text(size = 18, face = "bold", color = "seagreen") )  +
  theme(panel.background = element_rect(fill = "pink")) 

 
 

It seems that for majority people, the budgets for the gifts are less than $100. (34.7% less than $50 and 44.4% between $50 and $100)

 
 
 

It seems that non-monther respondents tend to have higher budgets than non-mother respondents. Expecially, there are 3% non-mother respondents willing to spend over $300 for Mother’s Day gifts, whereas there are only 1.2% mother respondents are as generous.

 
 
 
 

Task 2: Analysis of Gift Options for Mother’s Day


Absolute and relative frequencies of gift options are calculated for the whole data and segemented (by respondent identity) data. Bar Plots are used for data visualization.

 

# data cleaning (dirty data within one columns to multiple columns)

dat$gift <- gsub("\\n", "", dat$gift)
m1 <- (mtabulate(strsplit(as.character(dat$gift), "; "))!=0 )
dat <- bind_cols(dat, as.data.frame(m1) )
str(dat)


# frequency table and segemented frequency table

# frequency table

sum.gift <- colSums(dat[,-(1:4)])
sum.gift <- as.data.frame(sum.gift)
names(sum.gift) <- "total.number"
sum.gift <- rownames_to_column(sum.gift, var = "gift")
sum.gift$gift <- as.factor(sum.gift$gift)
sum.gift$gift <- relevel(sum.gift$gift, ref = "None of the above")

# segemented frequency table

library(plyr)
sum.gift.by.mother <- ddply(dat, .(mother), function(x) colSums(x[,-(1:4)]) )
detach("package:plyr", unload=TRUE)

sum.gift.by.mother <- as.data.frame(t(sum.gift.by.mother[,-1]) )
colnames(sum.gift.by.mother) <- c("Non Mother", "Mother")
sum.gift.by.mother <- rownames_to_column(sum.gift.by.mother, var = "gift")
sum.gift.by.mother$gift <- as.factor(sum.gift.by.mother$gift)
sum.gift.by.mother$gift <- relevel(sum.gift.by.mother$gift, ref = "None of the above")
sum.gift.by.mother.reshape <- gather(data = sum.gift.by.mother, 
                   key = "mother.id", value = "total.number", -gift)


# Barplot 

gift.bar <- ggplot(data = sum.gift, aes(x = gift, y = total.number) )  + 
  geom_bar(colour = "black", stat = "identity", fill = "pink", width = 0.8)  +                        
  labs(x = "Gifts", y = "Total Number of Respondents")  + 
  ggtitle("Planned Gifts for Mother's Day (The Whole Data)")  +     
  coord_flip()  +
  scale_y_continuous(breaks = seq(0, 1200, by = 200) )  +
  geom_text(aes(label = total.number), hjust = 1.2, color = "black", size = 4)  +
  theme(plot.title = element_text(size = 20, face = "bold", color = "blue", 
                                  hjust = 0.5, vjust = 5))  +
  theme(axis.title.y = element_text(face = "italic") )  +
  theme(axis.title.x = element_text(face = "italic") )  + 
  theme(plot.margin = unit(c(1, 1, 1, 1), "cm"))  +
  theme(text = element_text(size = 16) )  +
  theme(axis.text.x = element_text(face = "plain", color = "red", size = 12),
        axis.text.y = element_text(face = "bold", color = "purple", size = 16) )  


# Segmentation Analysis Barplot (stacked)

gift.bar.stack <- ggplot(data = sum.gift.by.mother.reshape, 
                         aes(x = gift, y = total.number, fill = mother.id) )  + 
  geom_bar(color = "black", stat = "identity", position = "stack", width = 0.7)  +                        
  labs(x = "Gifts", y = "Total Number of Respondents", 
       fill = "Respondent Identity")  + 
  ggtitle("Planned Gifts for Mother's Day")  +     
  coord_flip()  +
  scale_y_continuous(breaks = seq(0, 1200, by = 200) )  +
  geom_text(aes(label = total.number), hjust = 1, color = "black",
            position = position_stack(vjust = 0.98), size = 4)  +
  theme(plot.title = element_text(size = 20, face = "bold", color = "blue", 
                                  hjust = 0.5, vjust = 5) )  +
  theme(axis.title.y = element_text(face = "italic") )  +
  theme(axis.title.x = element_text(face = "italic") )  + 
  theme(plot.margin = unit(c(1, 1, 1, 1), "cm") )  +
  theme(text = element_text(size = 16) )  +
  theme(axis.text.x = element_text(face = "plain", color = "red", size = 12),
        axis.text.y = element_text(face = "bold", color = "purple", size = 16) )

 
 

It seems that greeting card is the most popular gift option for Mother’s Day. Flowers, dining experience and spending time with family are also favored by many people. This is consistent with the conclusion we made from the pie chart before, since these popular gifts are inexpensive. Extravagant gift options, such as jewelry and fashion/apparel are not very welcome.

 
 
 

The stacked bar plots do not show much differences between the distribution of gift options bwteen non-mother respondents and mother respondents. The seemingly tiny differences lie in the ratio of extravagant gift options (jewelry, spa-related gift and fashion/apparel, etc.) versus inexpensive gift options (greeting card, flowers, dining experience, etc. ). As concluded before, non-monther respondents tend to spend more money on the gifts and mother respondents care more about spiritual happiness.

   


 

Task 3: Statistically Analyze the differences of Maximum Budgets Between Mother and Non-mother


Exploratory analysis dose give us some ituitive information that there might be some differences of maximum budgets between mother and non-mother. \(\chi^2\) Test will be used to statistical analyze if there is any significant assciation between maximum budgets and respondent identity.

melt.budget <- melt(freq.table.by.mother[, 1:3], id.vars = c("mother","Levels"))    
contingency.budget <- dcast(melt.budget, formula = mother ~ Levels)
contingency.matrix.budget <- data.matrix(contingency.budget[, -1])
dimnames(contingency.matrix.budget) <- list(mother.id = contingency.budget$mother,
                                            budget = freq.table.all$Levels)

chisq.test(contingency.matrix.budget)
 

Contingency Table of Maximum Budgets Between Mother and Non-mother

Non Mother Mother
Less than $50.00 302 531
$50-$100 413 655
$101-$150 116 160
$151-$200 56 59
$201-$250 18 27
$251-$300 9 13
Over $300 28 17
 

\(\chi^2\) Test of Assciation Between Maximum Budgets and Respondent Identity


    Pearson's Chi-squared test

data:  contingency.matrix.budget
X-squared = 18.484, df = 6, p-value = 0.00513

 

\(H_0\): there is no significant association between maximum budgets and respondent identity
\(H_1\): there is significant association between maximum budgets and respondent identity
For \(\chi^2\) Test, test statistic = 18.484, p-value = 0.00513.
Thus, statistically speaking, non-mother and mother have different maximum budgets on Mother’s Day gifts.

 


 

Task 4: Statistically Analyze the differences of Gift Options Between Mother and Non-mother


Similarly, Exploratory analysis dose give us some ituitive information that there might be some differences of maximum budgets between mother and non-mother. \(\chi^2\) Test will be used to statistical analyze if there is any significant assciation between gift options and respondent identity.

melt.gift <- melt(freq.table.by.mother[, 1:3], id.vars = c("mother","Levels"))    
contingency.gift <- dcast(melt.gift, formula = mother ~ Levels)

contingency.matrix.gift <- data.matrix(sum.gift.by.mother[, -1])
dimnames(contingency.matrix.gift) <- list(gift = sum.gift.by.mother$gift,
                                     mother.id = colnames(contingency.matrix.gift) )

chisq.test(contingency.matrix.gift)
 

Contingency Table of Gift Options Between Mother and Non-mother

Non Mother Mother
A gift card 292 494
Dining Experience (breakfast, lunch, brunch, etc.) 446 525
Fashion/Apparel 145 185
Flowers 430 564
Greeting Card 459 646
Jewelry (necklace, bracelet, ring, etc.) 197 250
None of the above 40 71
Spa-related gift (mani/pedi, massage, facial, etc.) 207 266
Time with family 328 513
 

\(\chi^2\) Test of Assciation Between Gift Options and Respondent Identity


    Pearson's Chi-squared test

data:  contingency.matrix.gift
X-squared = 21.123, df = 8, p-value = 0.006828

 

\(H_0\): there is no significant association between gift options and respondent identity
\(H_1\): there is significant association between gift options and respondent identity
For \(\chi^2\) Test, test statistic = 21.123, p-value = 0.006828.
Thus, statistically speaking, non-mother and mother have different gift options on Mother’s Day.
 
 

Conclusion