# load the library for data cleansing
library(tidyverse)
setwd('D:\\Documents\\PortfolioProject\\Starbucks Costumer Survey Data Analysis')
df <- read_csv("Starbucks_satisfactory_survey.csv", show_col_types = FALSE)
Take a look on the data.
head(df)
## # A tibble: 6 x 21
## Timestamp `1. Your Gender` `2. Your Age` `3. Are you curr~ `4. What is your~
## <chr> <chr> <chr> <chr> <chr>
## 1 2019/10/01~ Female From 20 to 29 Student Less than RM25,0~
## 2 2019/10/01~ Female From 20 to 29 Student Less than RM25,0~
## 3 2019/10/01~ Male From 20 to 29 Employed Less than RM25,0~
## 4 2019/10/01~ Female From 20 to 29 Student Less than RM25,0~
## 5 2019/10/01~ Male From 20 to 29 Student Less than RM25,0~
## 6 2019/10/01~ Female From 20 to 29 Student Less than RM25,0~
## # ... with 16 more variables: 5. How often do you visit Starbucks? <chr>,
## # 6. How do you usually enjoy Starbucks? <chr>,
## # 7. How much time do you normally spend during your visit? <chr>,
## # 8. The nearest Starbucks's outlet to you is...? <chr>,
## # 9. Do you have Starbucks membership card? <chr>,
## # 10. What do you most frequently purchase at Starbucks? <chr>,
## # 11. On average, how much would you spend at Starbucks per visit? <chr>,
## # 12. How would you rate the quality of Starbucks compared to other brands (Coffee Bean, Old Town White Coffee..) to be: <dbl>,
## # 13. How would you rate the price range at Starbucks? <dbl>,
## # 14. How important are sales and promotions in your purchase decision? <dbl>,
## # 15. How would you rate the ambiance at Starbucks? (lighting, music, etc...) <dbl>,
## # 16. You rate the WiFi quality at Starbucks as.. <dbl>,
## # 17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..) <dbl>,
## # 18. How likely you will choose Starbucks for doing business meetings or hangout with friends? <dbl>,
## # 19. How do you come to hear of promotions at Starbucks? Check all that apply. <chr>,
## # 20. Will you continue buying at Starbucks? <chr>
Before dropping Timestamp column, use it as indicator to check any duplicate row.
print(c(nrow(df), df %>% distinct() %>% nrow()))
## [1] 122 122
There is no duplicate row and number of rows is still 122.
Drop Timestamp as it is irrelevant to the understand behavior of customer. Rename column name into shorter version. Mutate character column into factor column as preparation for mca.
# order level of factor
age_lv = c('Below 20','From 20 to 29','From 30 to 39', '40 and above')
income_lv = c('Less than RM25,000','RM25,000 - RM50,000','RM50,000 - RM100,000', 'RM100,000 - RM150,000', 'More than RM150,000')
frequency_lv = c('Never','Rarely','Monthly', 'Weekly', 'Daily')
time_lv = c('Below 30 minutes', 'Between 30 minutes to 1 hour', 'Between 1 hour to 2 hours', 'Between 2 hours to 3 hours', 'More than 3 hours')
spending_lv =c('Zero', 'Less than RM20', 'Around RM20 - RM40', 'More than RM40')
df1<- df %>%
select(-Timestamp) %>%
rename(gender = "1. Your Gender",
age = "2. Your Age",
work_status = "3. Are you currently....?",
annual_income = "4. What is your annual income?" ,
visit_frequency = "5. How often do you visit Starbucks?" ,
order_type= "6. How do you usually enjoy Starbucks?",
time_spent= "7. How much time do you normally spend during your visit?",
store_location = "8. The nearest Starbucks's outlet to you is...?",
membership_card = "9. Do you have Starbucks membership card?",
item_purchase = "10. What do you most frequently purchase at Starbucks?",
average_spending = "11. On average, how much would you spend at Starbucks per visit?",
quality_rate = "12. How would you rate the quality of Starbucks compared to other brands (Coffee Bean, Old Town White Coffee..) to be:",
price_rate= "13. How would you rate the price range at Starbucks?",
sales_promo_rate = "14. How important are sales and promotions in your purchase decision?",
ambiance_rate = "15. How would you rate the ambiance at Starbucks? (lighting, music, etc...)",
wifi_rate = "16. You rate the WiFi quality at Starbucks as..",
service_rate = "17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)",
meeting_rate = "18. How likely you will choose Starbucks for doing business meetings or hangout with friends?",
promo_method = "19. How do you come to hear of promotions at Starbucks? Check all that apply.",
loyal_customer = "20. Will you continue buying at Starbucks?") %>%
mutate(gender = factor(gender),
age = factor(age, levels = age_lv),
work_status = factor(work_status),
annual_income = factor(annual_income, levels = income_lv),
visit_frequency = factor(visit_frequency, levels = frequency_lv),
order_type= recode_factor(order_type, "Dine in" ="Dine in", "Take away" = "Take away", "Drive-thru" = "Drive-thru", .default = "Never", .missing ="Never"),
store_location= factor(store_location),
time_spent = factor(time_spent, levels = time_lv),
membership_card = factor(membership_card),
average_spending = factor(average_spending, levels = spending_lv),
loyal_customer = factor(loyal_customer))
library("FactoMineR")
library(ggrepel)
# subset to run mca
df_mca<- df1 %>%
dplyr::select(gender, age, work_status, annual_income, visit_frequency, order_type, time_spent, store_location, membership_card, average_spending, loyal_customer)
# number of categories per variable
cats <- apply(df_mca, 2, function(x) nlevels(as.factor(x)))
# apply MCA
mca_result<- MCA(df_mca, graph = FALSE)
# data frame with variable coordinates
mca_result_vars_df = data.frame(mca_result$var$coord, Variable = rep(names(cats), cats))
#Create a custom color scale
library(RColorBrewer)
myColors <- brewer.pal(11, "Paired")
names(myColors) <- levels(factor(mca_result_vars_df$Variable))
colScale <- scale_colour_manual(name = "Group",values = myColors)
Interpreting the relationship between loyalty and customer behavior.
Understanding how to read MCA plot:
Length of the line connecting the row label(loyalty) to the origin:
Longer lines indicate that the loyalty is highly associated with some of the column labels.
Length of the label connecting the column label(customer behavior) to the origin:
Longer lines indicate a high association between the customer behavior and loyalty.
Angle formed between these two lines:
Eeally small angles indicate association. 90 degree angles indicate no relationship. Angles near 180 degrees indicate negative associations.
# plot of variable categories
ggplot(data=mca_result_vars_df,
aes(x = Dim.1, y = Dim.2, colour=Variable,label = rownames(mca_result_vars_df))) +
geom_hline(yintercept = 0, colour = "gray70") +
geom_vline(xintercept = 0, colour = "gray70") +
geom_segment(aes(xend=Dim.1, yend = Dim.2), x = 0, y = 0)+
geom_point()+
geom_text_repel(max.overlaps = 5) +
colScale +
ggtitle("MCA plot loyalty and customer' s behavior")
First focus on Yes (right hand side), customer with annual income RM 50,000 to RM 100,000 has a very small angle between loyal customer, indicate they have strong association. Followed by customer with annual income RM 25,000 to 50,000, it is suggested Starbucks service image is successfully affect its customer group.
Customer aged 30 to 39 have smallest angle among other age group.
Employed customer tend to be big fan of Starbucks than Housewife and self-employed.
Having 1km to 3km distance to the store encouraging them to visit regularly.
Now we focus on No (left side), young, student, with annual income below RM 25,000 customer is unlikely to be the loyal customer.
Although some groups(age 40 and above, annual income RM 100,000 to RM 150,000, and annual income above RM 150,000) have long distances from origin, the angle in between is close to 90 degree, indicate no significant relationship.
Focus on Yes, customers visit weekly tend to be loyal.
Loyal customers is in favor for Drive-through.
Membership card is a effective sales method for building a connection with customers.
Average spending around RM 20 to RM 40 is likely to be its loyal customers.
library(ggcorrplot)
df1 %>%
select(quality_rate ,price_rate, sales_promo_rate, ambiance_rate, wifi_rate, service_rate ,meeting_rate, loyal_customer) %>%
mutate(loyal_customer = ifelse(loyal_customer == "No",0,1)) %>%
cor() %>%
ggcorrplot(hc.order = TRUE, type = "lower", lab = TRUE)
Price and quality have a moderate positive correlation on customer’s loyalty at Starbucks.
write a function to save coding
bar_loyal<- function(column, title){
ggplot(data = df1, aes(x = column, fill=as.factor(loyal_customer)))+
geom_bar(color="white")+
facet_wrap(~loyal_customer)+
ggtitle(title)+
theme(axis.title.x=element_blank())
}
People who said they will continue buying at Starbucks rate a higher quality of Starbucks compared to other brands. While other people found no difference in between those coffee brands and have a shape similar to normal distribution. Ambiance rating and Wifi rating show a similar trend to quality.
People who said they will continue buying at Starbucks shows they mainly agreed to the price range. While other people hostile to its price range.
Both groups concur sales and promotions are influential to their purchase decision. Both groups said Wifi quality is adequate for their needs. Both groups satisfy with the services provided.