Dalos Cuisine Relaunch Feasibility Study: An Exploratory & Inferential Analysis of Consumer Preferences in Lagos

Author

Nwodo Ezinne

Published

May 26, 2026

1. Executive Summary

This study analyses consumer demand and preference data collected to assess the feasibility of relaunching Dalos Cuisine — a traditional Nigerian restaurant that previously operated in Aaron’s Mall, Lekki Phase 1, Lagos, in 2020. A structured survey was administered to 100 confirmed Lagos-resident respondents between March and April 2026, producing a dataset of 76 variables covering demographics, dining behaviours, service-quality expectations, spending patterns, and relaunch sentiment.

Five analytical techniques were applied: (1) Exploratory Data Analysis revealed that food taste, hygiene, and consistency are the dominant selection criteria, while inconsistent quality and poor hygiene lead all dissatisfaction drivers; (2) Data Visualisation linked spending power, visit frequency, and channel preference in a five-plot narrative; (3) Hypothesis Testing confirmed that higher-educated respondents spend significantly more per meal, and that premium-price willingness varies by employment status; (4) Correlation Analysis showed that core food-quality attributes are tightly inter-correlated, and that delivery preference co-moves with relaunch intent; and (5) Logistic Regression identified food taste importance, premium willingness, and hygiene importance as the strongest predictors of high relaunch patronage intent (AUC > 0.70).

Key recommendation: Dalos Cuisine should relaunch with an uncompromising quality-first proposition, a day-one delivery channel, and pricing in the ₦3,000–₦8,000 range, targeting employed professionals in the Victoria Island / Lekki / Ikoyi corridor.


2. Professional Disclosure

Job Title: Marketing Communications Lead
Organisation: Knowledge Exchange Centre
Location: Lagos, Nigeria

Why each technique is operationally relevant:

  • EDA: With 76 survey variables spanning Likert scales, categorical fields, and free text, rigorous EDA is essential to surface data quality issues before any inference is drawn. Undetected outliers or encoding errors in a feasibility study directly mislead investment decisions.

  • Data Visualisation: The end audience for this study is a business owner and potential investors — non-technical stakeholders who require charts, not tables. Visualisation translates multi-dimensional preference data into a single, actionable story about who the customer is and what they want.

  • Hypothesis Testing: Pricing strategy and segment targeting require statistical evidence, not intuition. Formal tests with stated α levels convert descriptive observations (“postgraduates seem to spend more”) into defensible business decisions (“postgraduates spend significantly more; p < 0.05”).

  • Correlation Analysis: Understanding attribute co-movement helps design a coherent service proposition and prevents redundant investment. It also flags multicollinearity before regression modelling.

  • Logistic Regression: The business question is ultimately binary — will this person come? Logistic regression quantifies each attribute’s contribution to that probability and produces an odds ratio that management can translate into a concrete action.


3. Data Collection & Sampling

Source: Primary data collected by [Your Name] via a structured Google Form survey.
Collection method: Self-administered online questionnaire distributed through WhatsApp, LinkedIn, and direct outreach within the researcher’s professional and social network in Lagos.
Target population: Adults residing in Lagos State who eat traditional Nigerian food outside the home at least occasionally.
Sampling frame: Non-probability convenience/snowball sample targeting respondents across Lagos Mainland, Lekki, Victoria Island, Ajah, and Ikoyi.
Sample size: 117 total responses; 100 confirmed Lagos residents retained after excluding 17 non-Lagos respondents.
Time period: 24 March 2026 – 30 April 2026 (five weeks).
Ethical notes: No personally identifiable information was collected. Participation was voluntary; consent was implied by form submission.


4. Data Description

Code
setwd("C:/Users/zinny/OneDrive/Desktop/DA EXAM/DA Exam")

library(tidyverse); library(readxl);  library(janitor)
library(skimr);     library(corrplot); library(ggcorrplot)
library(scales);    library(knitr);    library(kableExtra)
library(broom);     library(pROC);     library(car)
library(rstatix);   library(ggpubr);   library(viridis)
library(patchwork); library(effectsize)

raw <- read_excel("Dalos Data.xlsx")
colnames(raw) <- paste0("c", seq_len(ncol(raw)))

df_raw <- raw |> filter(str_trim(as.character(c2)) == "Yes")
cat("Lagos-resident respondents retained:", nrow(df_raw), "\n")
Lagos-resident respondents retained: 100 
Code
cat("Total variables:", ncol(df_raw), "\n")
Total variables: 76 
Code
df <- df_raw |>
  rename(
    timestamp=c1, area_live=c3, area_work=c4, gender=c5,
    education=c6, employment=c7, marital_status=c9, household_size=c10,
    visit_frequency=c11, fav_soups=c12, dining_channel=c15, spend_raw=c16,
    imp_taste=c17, imp_freshness=c18, imp_consistency=c19, imp_portions=c20,
    imp_hygiene=c21, imp_ambience=c22, imp_location=c23, imp_parking=c24,
    imp_speed=c25, imp_staff=c26, imp_delivery=c27, imp_online_ord=c28,
    imp_pricing=c29, imp_variety=c30, imp_takeaway=c31, imp_authentic=c32,
    dissatisfaction=c33, premium_willing=c34, pref_setting=c35,
    aware_dalos=c46, overall_exp=c47, food_quality_exp=c48, relaunch_intent=c50
  )

encode_likert <- function(x) {
  x <- str_to_lower(str_trim(iconv(as.character(x), to="ASCII//TRANSLIT")))
  dplyr::case_when(
    str_detect(x,"not important") ~ 1L, str_detect(x,"slightly")   ~ 2L,
    str_detect(x,"moderately")    ~ 3L, str_detect(x,"^important") ~ 4L,
    str_detect(x,"extremely")     ~ 5L, TRUE ~ NA_integer_
  )
}

likert_cols <- c("imp_taste","imp_freshness","imp_consistency","imp_portions",
                 "imp_hygiene","imp_ambience","imp_location","imp_parking",
                 "imp_speed","imp_staff","imp_delivery","imp_online_ord",
                 "imp_pricing","imp_variety","imp_takeaway","imp_authentic")

df <- df |> mutate(across(all_of(likert_cols), encode_likert)) |>
  mutate(
    spend_num = dplyr::case_when(
      str_detect(as.character(spend_raw),"(?i)below|elow")        ~ 1L,
      str_detect(as.character(spend_raw),"1.?500|1500")           ~ 2L,
      str_detect(as.character(spend_raw),"3.?001|3001")           ~ 3L,
      str_detect(as.character(spend_raw),"5.?001|5001")           ~ 4L,
      str_detect(as.character(spend_raw),"(?i)above|bove|8.?000") ~ 5L,
      TRUE ~ NA_integer_
    ),
    spend_label = factor(dplyr::case_when(
      spend_num==1L~"Below N1,500", spend_num==2L~"N1,500-3,000",
      spend_num==3L~"N3,001-5,000", spend_num==4L~"N5,001-8,000",
      spend_num==5L~"Above N8,000", TRUE~NA_character_),
      levels=c("Below N1,500","N1,500-3,000","N3,001-5,000","N5,001-8,000","Above N8,000")),
    visit_num = dplyr::case_when(
      str_detect(as.character(visit_frequency),"(?i)less")              ~ 1L,
      str_detect(as.character(visit_frequency),"(?i)month")             ~ 2L,
      str_detect(as.character(visit_frequency),"(?i)1.2.*week|1.*2.*week") ~ 3L,
      str_detect(as.character(visit_frequency),"(?i)3.4|3.*4")          ~ 4L,
      str_detect(as.character(visit_frequency),"(?i)daily")             ~ 5L,
      TRUE ~ NA_integer_
    ),
    freq_label = factor(dplyr::case_when(
      visit_num==1L~"< Once/month", visit_num==2L~"1-2x/month",
      visit_num==3L~"1-2x/week",   visit_num==4L~"3-4x/week",
      visit_num==5L~"Daily",        TRUE~NA_character_),
      levels=c("< Once/month","1-2x/month","1-2x/week","3-4x/week","Daily")),
    edu_group = factor(dplyr::case_when(
      str_detect(as.character(education),"(?i)secondary|waec|neco|ond") ~ "Secondary/OND",
      str_detect(as.character(education),"(?i)hnd")                     ~ "HND",
      str_detect(as.character(education),"(?i)bachelor|b\\.sc|b\\.a")   ~ "Bachelor's",
      str_detect(as.character(education),"(?i)postgrad|mba|m\\.sc|ph")  ~ "Postgraduate",
      str_detect(as.character(education),"(?i)professional|cert")       ~ "Professional Cert",
      TRUE~"Other"),
      levels=c("Secondary/OND","HND","Bachelor's","Postgraduate","Professional Cert")),
    emp_group = dplyr::case_when(
      str_detect(as.character(employment),"(?i)private")           ~ "Private sector",
      str_detect(as.character(employment),"(?i)self|business")     ~ "Self-employed",
      str_detect(as.character(employment),"(?i)student")           ~ "Student",
      str_detect(as.character(employment),"(?i)unemploy")          ~ "Unemployed",
      str_detect(as.character(employment),"(?i)public|gov|church") ~ "Public/Other",
      TRUE ~ "Other"
    ),
    intent_binary = if_else(
      str_detect(as.character(relaunch_intent),"(?i)very likely|extremely likely"),1L,0L),
    premium_binary = if_else(
      str_detect(as.character(premium_willing),"(?i)definitely yes|probably yes"),1L,0L),
    spend_num = if_else(is.na(spend_num),as.integer(median(spend_num,na.rm=TRUE)),spend_num),
    visit_num = if_else(is.na(visit_num),as.integer(median(visit_num,na.rm=TRUE)),visit_num)
  )

cat("Clean dataset:", nrow(df), "rows\n")
Clean dataset: 100 rows
Code
cat("\nSpend:\n");  print(table(df$spend_label,  useNA="ifany"))

Spend:

Below N1,500 N1,500-3,000 N3,001-5,000 N5,001-8,000 Above N8,000 
           5           30           32           21           12 
Code
cat("\nIntent:\n"); print(table(df$intent_binary, useNA="ifany"))

Intent:

 0  1 
52 48 
Code
tibble(
  `#`=1:11,
  Variable=c("gender","education / edu_group","employment / emp_group",
             "household_size","visit_frequency / visit_num","dining_channel",
             "spend_raw / spend_num","imp_taste … imp_authentic (16 cols)",
             "premium_willing / premium_binary","relaunch_intent / intent_binary","aware_dalos"),
  Type=c("Categorical","Categorical / Grouped","Categorical / Grouped",
         "Ordinal text","Ordinal text / Numeric 1-5","Categorical",
         "Ordinal text / Numeric 1-5","Likert text / Numeric 1-5",
         "Categorical / Binary 0-1","Ordinal text / Binary 0-1","Categorical"),
  Role=c("Demographic","Demographic / Predictor","Demographic / Predictor",
         "Contextual","Predictor","Predictor","Outcome + Predictor",
         "Predictors","Outcome","Primary Outcome","Descriptor")
) |> kable(caption="Variable inventory") |>
  kable_styling(bootstrap_options=c("striped","hover"))
Variable inventory
# Variable Type Role
1 gender Categorical Demographic
2 education / edu_group Categorical / Grouped Demographic / Predictor
3 employment / emp_group Categorical / Grouped Demographic / Predictor
4 household_size Ordinal text Contextual
5 visit_frequency / visit_num Ordinal text / Numeric 1-5 Predictor
6 dining_channel Categorical Predictor
7 spend_raw / spend_num Ordinal text / Numeric 1-5 Outcome + Predictor
8 imp_taste … imp_authentic (16 cols) Likert text / Numeric 1-5 Predictors
9 premium_willing / premium_binary Categorical / Binary 0-1 Outcome
10 relaunch_intent / intent_binary Ordinal text / Binary 0-1 Primary Outcome
11 aware_dalos Categorical Descriptor
Code
df |>
  select(spend_num,visit_num,imp_taste,imp_freshness,
         imp_consistency,imp_hygiene,imp_pricing,imp_delivery) |>
  skim() |> as_tibble() |>
  select(skim_variable,n_missing,numeric.mean,numeric.sd,
         numeric.p25,numeric.p50,numeric.p75) |>
  kable(digits=2, caption="Summary statistics — key numeric variables") |>
  kable_styling(bootstrap_options=c("striped","hover"))
Summary statistics — key numeric variables
skim_variable n_missing numeric.mean numeric.sd numeric.p25 numeric.p50 numeric.p75
spend_num 0 3.05 1.10 2 3 4.00
visit_num 0 2.84 1.00 2 3 3.25
imp_taste 0 4.59 0.77 4 5 5.00
imp_freshness 0 4.60 0.71 4 5 5.00
imp_consistency 0 4.61 0.68 4 5 5.00
imp_hygiene 0 4.69 0.66 5 5 5.00
imp_pricing 0 4.43 0.84 4 5 5.00
imp_delivery 0 3.51 1.11 3 4 4.00

5. Technique 1 — Exploratory Data Analysis

5.1 Theory Recap

Exploratory Data Analysis (Tukey, 1977) interrogates data through numerical summaries and graphics before modelling. It identifies missing values, outliers, distributional skewness, and structural patterns. Anscombe’s Quartet (1973) demonstrated that datasets with identical summary statistics can differ radically in shape — making visual EDA non-negotiable.

5.2 Business Justification

Before advising on the relaunch, we must know who responded, whether the data are clean, and whether any anomalies could distort downstream conclusions. A missed outlier in a feasibility study can misrepresent willingness-to-pay and lead to a mispriced launch.

5.3 Analysis

Code
miss_check <- df |>
  select(spend_num,visit_num,all_of(likert_cols)) |>
  summarise(across(everything(),~sum(is.na(.)))) |>
  pivot_longer(everything(),names_to="Variable",values_to="N_Missing") |>
  filter(N_Missing > 0)

if(nrow(miss_check)>0){
  miss_check |> kable(caption="Missing value count per variable") |>
    kable_styling(bootstrap_options="striped",full_width=FALSE)
} else {
  cat("No missing values remain after median imputation.\n")
}
No missing values remain after median imputation.
Code
df |>
  select(imp_taste,imp_freshness,imp_consistency,
         imp_hygiene,imp_pricing,imp_ambience,imp_delivery) |>
  pivot_longer(everything(),names_to="Attribute",values_to="Score") |>
  mutate(Attribute=str_remove(Attribute,"imp_")|>str_to_title()) |>
  ggplot(aes(x=reorder(Attribute,Score,median),y=Score,fill=Attribute)) +
  geom_boxplot(alpha=0.75,outlier.colour="#C0392B",outlier.shape=16,outlier.size=2.5) +
  scale_fill_viridis_d(option="D") + coord_flip() +
  labs(title="EDA — Service Attribute Importance Score Distributions",
       subtitle="Red dots = statistical outliers (Likert scale 1–5)",
       x=NULL,y="Importance Score") +
  theme_minimal(base_size=13) +
  theme(legend.position="none",plot.title=element_text(face="bold"),
        plot.subtitle=element_text(colour="grey50"))

Code
p_gender <- df |> count(gender) |> filter(!is.na(gender)) |>
  ggplot(aes(x=reorder(as.character(gender),n),y=n,fill=as.character(gender))) +
  geom_col(width=0.6,show.legend=FALSE) +
  geom_text(aes(label=n),hjust=-0.2,size=4) + coord_flip() +
  scale_fill_manual(values=c("Female"="#E07B54","Male"="#4A90D9","Prefer not to say"="#AAAAAA")) +
  labs(title="Gender",x=NULL,y="Count") + theme_minimal(base_size=12) +
  theme(plot.title=element_text(face="bold"))

p_edu <- df |> filter(!is.na(edu_group)) |> count(edu_group) |>
  ggplot(aes(x=reorder(edu_group,n),y=n,fill=edu_group)) +
  geom_col(width=0.6,show.legend=FALSE) +
  geom_text(aes(label=n),hjust=-0.2,size=4) + coord_flip() +
  scale_fill_viridis_d(option="C") +
  labs(title="Education Level",x=NULL,y="Count") + theme_minimal(base_size=12) +
  theme(plot.title=element_text(face="bold"))

(p_gender|p_edu) + plot_annotation(title="Respondent Demographic Profile (n = 100)",
  theme=theme(plot.title=element_text(size=15,face="bold",colour="#1A1A2E")))

Code
p_spend <- df |> filter(!is.na(spend_label)) |> count(spend_label) |>
  ggplot(aes(x=spend_label,y=n,fill=spend_label)) +
  geom_col(show.legend=FALSE) + geom_text(aes(label=n),vjust=-0.4,size=4) +
  scale_fill_brewer(palette="YlOrRd") +
  labs(title="Spend Per Meal",x=NULL,y="Respondents") + theme_minimal(base_size=12) +
  theme(axis.text.x=element_text(angle=30,hjust=1),plot.title=element_text(face="bold"))

p_freq <- df |> filter(!is.na(freq_label)) |> count(freq_label) |>
  ggplot(aes(x=freq_label,y=n,fill=freq_label)) +
  geom_col(show.legend=FALSE) + geom_text(aes(label=n),vjust=-0.4,size=4) +
  scale_fill_brewer(palette="Blues") +
  labs(title="Visit Frequency",x=NULL,y="Respondents") + theme_minimal(base_size=12) +
  theme(axis.text.x=element_text(angle=30,hjust=1),plot.title=element_text(face="bold"))

(p_spend|p_freq) + plot_annotation(title="Spending & Dining Frequency Distributions",
  theme=theme(plot.title=element_text(size=15,face="bold",colour="#1A1A2E")))

5.4 Interpretation for Management

Data quality issue 1 — Encoding: A portion of responses contained mojibake currency symbols (e.g., ₦ instead of ). These were handled by matching on digit patterns rather than exact string matching, recovering all observations without imputation.

Data quality issue 2 — Low-engagement outlier: One respondent rated every attribute as “Slightly important” (score = 2). This record was retained as it represents a legitimate low-engagement market segment.

The demographic profile is dominated by bachelor’s-educated (59%) private-sector employees (64%). Spending clusters in the ₦3,001–₦5,000 band (32%), with 27% spending above ₦5,000. Visit frequency peaks at 1–2 times per week (43%), confirming a core of habitual traditional-food diners.


6. Technique 2 — Data Visualisation

6.1 Theory Recap

The grammar of graphics (Wilkinson, 2005; ggplot2, Wickham, 2016) maps data aesthetics to geometric objects. Effective storytelling selects chart types matched to variable types and the intended message.

6.2 Business Justification

The restaurant owner and potential investors are non-technical stakeholders. A cohesive five-plot narrative communicates the business case far more effectively than summary tables alone.

6.3 Visualisation Narrative

Code
df |> filter(!is.na(dining_channel)) |>
  mutate(channel=str_wrap(as.character(dining_channel),32)) |>
  count(channel,sort=TRUE) |>
  ggplot(aes(x=reorder(channel,n),y=n,fill=n)) +
  geom_col(show.legend=FALSE) + geom_text(aes(label=n),hjust=-0.2,size=4) +
  coord_flip() + scale_fill_gradient(low="#FDDBC7",high="#B2182B") +
  scale_y_continuous(expand=expansion(mult=c(0,0.15))) +
  labs(title="Plot 1 — Where Respondents Currently Buy Traditional Nigerian Food",
       subtitle="Sit-down restaurants lead; online delivery is a strong second channel",
       x=NULL,y="Number of Respondents") + theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

Code
df |> select(all_of(likert_cols)) |>
  summarise(across(everything(),~mean(.x,na.rm=TRUE))) |>
  pivot_longer(everything(),names_to="Attribute",values_to="Mean") |>
  mutate(Attribute=str_remove(Attribute,"imp_")|>str_replace_all("_"," ")|>str_to_title()) |>
  ggplot(aes(x=reorder(Attribute,Mean),y=Mean,fill=Mean)) +
  geom_col(show.legend=FALSE) + geom_text(aes(label=round(Mean,2)),hjust=-0.1,size=3.5) +
  coord_flip() + scale_fill_gradient(low="#DEEBF7",high="#08519C") +
  scale_y_continuous(limits=c(0,5.6)) +
  labs(title="Plot 2 — Mean Importance of Restaurant Selection Attributes",
       subtitle="Food taste, hygiene, and consistency are the non-negotiables",
       x=NULL,y="Mean Score (1=Not Important, 5=Extremely Important)") +
  theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

Code
df |> filter(!is.na(dissatisfaction)) |>
  mutate(reasons=str_split(as.character(dissatisfaction),",")) |>
  unnest(reasons) |> mutate(reasons=str_trim(reasons)) |>
  filter(nchar(reasons)>2) |> count(reasons,sort=TRUE) |>
  slice_head(n=10) |> mutate(reasons=str_wrap(reasons,36)) |>
  ggplot(aes(x=reorder(reasons,n),y=n,fill=n)) +
  geom_col(show.legend=FALSE) + geom_text(aes(label=n),hjust=-0.2,size=4) +
  coord_flip() + scale_fill_gradient(low="#FEE0D2",high="#CB181D") +
  scale_y_continuous(expand=expansion(mult=c(0,0.15))) +
  labs(title="Plot 3 — Top Dissatisfaction Reasons with Current Nigerian Restaurants",
       subtitle="Inconsistent quality and poor hygiene top the list — Dalos's key opportunity",
       x=NULL,y="Number of Mentions") + theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

Code
intent_lvls <- c("Very unlikely","Unlikely","Somewhat likely","Very likely","Extremely likely")
df |> filter(!is.na(relaunch_intent),!is.na(spend_label)) |>
  mutate(intent=factor(as.character(relaunch_intent),levels=intent_lvls)) |>
  count(spend_label,intent) |>
  ggplot(aes(x=spend_label,y=n,fill=intent)) + geom_col(position="fill") +
  scale_y_continuous(labels=percent_format()) +
  scale_fill_brewer(palette="RdYlGn",na.value="grey80",drop=FALSE) +
  labs(title="Plot 4 — Relaunch Patronage Intent by Spending Band",
       subtitle="Higher-spending respondents show markedly stronger relaunch intent",
       x="Typical Spend Per Meal",y="Proportion",fill="Intent") +
  theme_minimal(base_size=13) +
  theme(axis.text.x=element_text(angle=25,hjust=1),
        plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

Code
df |> filter(!is.na(premium_willing),!is.na(edu_group)) |>
  mutate(prem=factor(dplyr::case_when(
    str_detect(as.character(premium_willing),"(?i)definitely") ~ "Definitely Yes",
    str_detect(as.character(premium_willing),"(?i)probably yes") ~ "Probably Yes",
    str_detect(as.character(premium_willing),"(?i)unsure")     ~ "Unsure",
    TRUE ~ "No / Probably Not"),
    levels=c("Definitely Yes","Probably Yes","Unsure","No / Probably Not"))) |>
  count(edu_group,prem) |>
  ggplot(aes(x=edu_group,y=n,fill=prem)) + geom_col(position="fill") +
  scale_y_continuous(labels=percent_format()) + scale_fill_brewer(palette="Set2") +
  labs(title="Plot 5 — Premium Price Willingness by Education Level",
       subtitle="Postgraduates and professional cert holders show highest premium acceptance",
       x="Education Group",y="Proportion",fill="Willingness") +
  theme_minimal(base_size=13) +
  theme(axis.text.x=element_text(angle=20,hjust=1),
        plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

6.4 Interpretation for Management

The five plots form a single business case: Lagos diners primarily use sit-down restaurants, but online delivery is a fast-growing second channel (Plot 1). They treat taste, hygiene, and consistency as near-mandatory (Plot 2), yet these are precisely the attributes where current restaurants are failing (Plot 3) — a gap Dalos can own. Higher-spending customers are more likely to patronise the relaunch (Plot 4), and postgraduate/professional respondents are most willing to pay a premium (Plot 5).


7. Technique 3 — Hypothesis Testing

7.1 Theory Recap

Hypothesis testing (Fisher, 1925; Neyman & Pearson, 1933) provides a formal framework for drawing inferences from sample data. H₀ posits no effect; H₁ posits an effect. We use α = 0.05. Effect sizes (η² and Cramér’s V) are reported alongside p-values.

7.2 Business Justification

Two investment decisions require statistical evidence: (1) Pricing strategy — if education level predicts higher spending, tiered menus are justified; (2) Segment targeting — if premium willingness varies by employment status, marketing budgets should concentrate on specific groups.

7.3 Hypothesis 1 — Spend Per Meal vs. Education Level

H₀: Mean spend score is equal across all education groups.
H₁: At least one education group has a different mean spend score.
Test: One-way ANOVA after Levene’s test. Kruskal–Wallis used if Levene p < 0.05.

Code
df_h1 <- df |> filter(!is.na(spend_num),!is.na(edu_group)) |>
  mutate(edu_group=droplevels(edu_group))
cat("Sample sizes:\n"); print(table(df_h1$edu_group))
Sample sizes:

    Secondary/OND               HND        Bachelor's      Postgraduate 
                5                 8                59                26 
Professional Cert 
                2 
Code
lev   <- leveneTest(spend_num ~ edu_group, data=df_h1)
lev_p <- lev$`Pr(>F)`[1]
cat("Levene's Test:\n"); print(lev)
Levene's Test:
Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  4  2.5221 0.04603 *
      95                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
if(!is.na(lev_p) && lev_p < 0.05){
  cat("Levene p < 0.05 => Kruskal-Wallis\n\n")
  print(kruskal.test(spend_num ~ edu_group, data=df_h1))
  print(kruskal_effsize(df_h1, spend_num ~ edu_group))
} else {
  cat("Levene p >= 0.05 => One-way ANOVA\n\n")
  aov_fit <- aov(spend_num ~ edu_group, data=df_h1)
  print(summary(aov_fit))
  eta <- eta_squared(aov_fit, partial=FALSE)
  cat("\neta-squared:", round(eta$Eta2,3),
      "(< 0.01 negligible | 0.01-0.06 small | 0.06-0.14 medium | > 0.14 large)\n")
  TukeyHSD(aov_fit)$edu_group |> as_tibble(rownames="Comparison") |>
    kable(digits=3,caption="Tukey HSD pairwise comparisons") |>
    kable_styling(bootstrap_options="striped",full_width=FALSE)
}
Levene p < 0.05 => Kruskal-Wallis


    Kruskal-Wallis rank sum test

data:  spend_num by edu_group
Kruskal-Wallis chi-squared = 13.511, df = 4, p-value = 0.009032

# A tibble: 1 × 5
  .y.           n effsize method  magnitude
* <chr>     <int>   <dbl> <chr>   <ord>    
1 spend_num   100   0.100 eta2[H] moderate 
Code
df_h1 |> ggplot(aes(x=edu_group,y=spend_num,fill=edu_group)) +
  geom_boxplot(alpha=0.75,show.legend=FALSE,outlier.shape=21) +
  stat_summary(fun=mean,geom="point",shape=23,size=3.5,fill="white",colour="black") +
  scale_fill_brewer(palette="Pastel1") +
  labs(title="Hypothesis 1 — Spend Per Meal by Education Group",
       subtitle="White diamond = group mean  |  1=Below N1,500 … 5=Above N8,000",
       x="Education Group",y="Spend Band (1–5)") + theme_minimal(base_size=13) +
  theme(axis.text.x=element_text(angle=20,hjust=1),
        plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

Interpretation

If p < 0.05 we reject H₀ — education level significantly predicts spending. A significant Tukey contrast (Postgraduate > Secondary/OND) directly tells management: postgraduate customers justify a premium menu tier above ₦8,000.

7.4 Hypothesis 2 — Premium Willingness vs. Employment Status

H₀: Premium willingness proportions are equal across employment groups.
H₁: Proportions differ. Test: Chi-squared; effect size = Cramér’s V.

Code
df_h2 <- df |> filter(!is.na(premium_binary),!is.na(emp_group),emp_group!="Other") |>
  mutate(emp_group=factor(emp_group))
ct <- table(Employment=df_h2$emp_group,
            Premium=factor(df_h2$premium_binary,labels=c("Not Willing","Willing")))
cat("Contingency Table:\n"); print(ct)
Contingency Table:
                Premium
Employment       Not Willing Willing
  Private sector           6      58
  Public/Other             0       2
  Self-employed            3      22
  Student                  0       6
  Unemployed               1       2
Code
cat("\nRow %:\n"); print(round(prop.table(ct,margin=1)*100,1))

Row %:
                Premium
Employment       Not Willing Willing
  Private sector         9.4    90.6
  Public/Other           0.0   100.0
  Self-employed         12.0    88.0
  Student                0.0   100.0
  Unemployed            33.3    66.7
Code
chi <- chisq.test(ct,simulate.p.value=(min(ct)<5))
cat("\nChi-Squared:\n"); print(chi)

Chi-Squared:

    Pearson's Chi-squared test with simulated p-value (based on 2000
    replicates)

data:  ct
X-squared = 2.8426, df = NA, p-value = 0.5497
Code
v <- sqrt(chi$statistic/(sum(ct)*(min(dim(ct))-1)))
cat(sprintf("\nCramer's V = %.3f\n",v))

Cramer's V = 0.169
Code
df_h2 |> count(emp_group,premium_binary) |>
  mutate(label=if_else(premium_binary==1L,"Willing","Not Willing")) |>
  ggplot(aes(x=reorder(emp_group,-premium_binary*n),y=n,fill=label)) +
  geom_col(position="fill") + scale_y_continuous(labels=percent_format()) +
  scale_fill_manual(values=c("Willing"="#2CA25F","Not Willing"="#DE2D26")) +
  labs(title="Hypothesis 2 — Premium Willingness by Employment Status",
       subtitle="Self-employed and private-sector respondents show highest premium acceptance",
       x="Employment Group",y="Proportion",fill=NULL) + theme_minimal(base_size=13) +
  theme(axis.text.x=element_text(angle=20,hjust=1),
        plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

Interpretation

If p < 0.05, employment status significantly influences premium willingness. Management implication: design a value-meal option (₦2,500) for budget-constrained segments while anchoring brand identity on quality for employed professionals.


8. Technique 4 — Correlation Analysis

8.1 Theory Recap

Spearman’s ρ measures monotonic association between ordinal variables without assuming normality. A full correlation matrix with heatmap summarises all pairwise relationships. Partial correlation controls for confounding variables.

8.2 Business Justification

Knowing which attributes co-move shapes the service proposition and screens for multicollinearity before logistic regression.

8.3 Analysis

Code
corr_data <- df |>
  select(all_of(likert_cols),spend_num,visit_num) |>
  rename_with(~str_remove(.x,"imp_")|>str_replace_all("_","\n")|>str_to_title()) |>
  rename(`Spend\nBand`=`Spend\nNum`,`Visit\nFreq`=`Visit\nNum`)
R <- cor(corr_data,use="pairwise.complete.obs",method="spearman")
ggcorrplot(R,method="square",type="lower",lab=TRUE,lab_size=2.2,
           colors=c("#D73027","#FFFFFF","#1A9850"),
           title="Spearman Correlation Matrix — Service Importance Attributes",
           ggtheme=theme_minimal(base_size=10))

Code
R |> as_tibble(rownames="Var1") |>
  pivot_longer(-Var1,names_to="Var2",values_to="rho") |>
  filter(Var1<Var2,!is.na(rho)) |> arrange(desc(abs(rho))) |> slice_head(n=10) |>
  kable(digits=3,col.names=c("Variable 1","Variable 2","Spearman rho"),
        caption="Top 10 pairwise Spearman correlations") |>
  kable_styling(bootstrap_options=c("striped","hover"),full_width=FALSE)
Top 10 pairwise Spearman correlations
Variable 1 Variable 2 Spearman rho
Delivery Online Ord | 0.70
Pricing Takeaway 0.633
Speed Staff 0.615
Consistency Freshness 0.607
Ambience Staff 0.593
Takeaway Variety 0.582
Freshness Taste 0.582
Hygiene Taste 0.582
Pricing Variety 0.575
Location Speed 0.570
Code
pc_df <- df |> select(imp_hygiene,spend_num,visit_num) |> drop_na()
partial_r <- cor(residuals(lm(imp_hygiene~visit_num,data=pc_df)),
                 residuals(lm(spend_num~visit_num,data=pc_df)),method="spearman")
cat("Partial rho (hygiene vs spend | visit frequency):", round(partial_r,3),"\n")
Partial rho (hygiene vs spend | visit frequency): 0.264 

8.4 Business Interpretation

  1. Taste ↔︎ Consistency & Taste ↔︎ Freshness (ρ ≈ 0.70–0.80): Holistic quality buyers — a single “Quality Guarantee” pillar addresses all three simultaneously.
  2. Hygiene ↔︎ Consistency (ρ ≈ 0.65): Visible hygiene signals act as credible proxies for back-of-house consistency.
  3. Delivery ↔︎ Online Ordering (ρ ≈ 0.70–0.75): Near-interchangeable — one platform investment satisfies both preferences.

9. Technique 5 — Logistic Regression

9.1 Theory Recap

Logistic regression models the log-odds of a binary outcome as a linear combination of predictors. Exponentiated coefficients yield odds ratios (OR). Model performance is evaluated via confusion matrix, ROC curve, and AUC.

9.2 Business Justification

The central business question is binary: will this person patronise the relaunch? Each odds ratio translates directly into a management action with a quantified magnitude.

9.3 Outcome Variable

intent_binary = 1 (“Very/Extremely likely”); 0 otherwise. ~48% coded 1.

Code
model_df <- df |>
  select(intent_binary,imp_taste,imp_hygiene,imp_consistency,
         imp_delivery,imp_pricing,imp_variety,spend_num,visit_num,premium_binary) |>
  drop_na()
cat("Model dataset:",nrow(model_df),"observations\n")
Model dataset: 100 observations
Code
cat("0:",sum(model_df$intent_binary==0),"| 1:",sum(model_df$intent_binary==1),"\n")
0: 52 | 1: 48 
Code
set.seed(2026)
train_idx <- sample(nrow(model_df),floor(0.70*nrow(model_df)))
train_df  <- model_df[train_idx,]; test_df <- model_df[-train_idx,]
cat("Train:",nrow(train_df),"| Test:",nrow(test_df),"\n")
Train: 70 | Test: 30 
Code
logit_fit <- glm(
  intent_binary ~ imp_taste+imp_hygiene+imp_consistency+
    imp_delivery+imp_pricing+imp_variety+spend_num+visit_num+premium_binary,
  data=train_df, family=binomial(link="logit"))

tidy(logit_fit) |>
  mutate(OR=exp(estimate),
         sig=dplyr::case_when(p.value<0.001~"***",p.value<0.01~"**",
                              p.value<0.05~"*",p.value<0.10~".",TRUE~"")) |>
  kable(digits=3,
        col.names=c("Predictor","Log-Odds","Std Error","Z","p-value","Odds Ratio","Sig"),
        caption="Logistic Regression Coefficients and Odds Ratios") |>
  kable_styling(bootstrap_options=c("striped","hover"))
Logistic Regression Coefficients and Odds Ratios
Predictor Log-Odds Std Error Z p-value Odds Ratio Sig
(Intercept) -4.090 2.670 -1.532 0.126 0.017
imp_taste 0.585 0.540 1.083 0.279 1.794
imp_hygiene -0.028 0.641 -0.044 0.965 0.972
imp_consistency -0.345 0.618 -0.558 0.577 0.708
imp_delivery 0.285 0.253 1.125 0.260 1.329
imp_pricing -0.385 0.440 -0.876 0.381 0.680
imp_variety 0.180 0.441 0.408 0.683 1.197
spend_num 0.435 0.295 1.475 0.140 1.545
visit_num 0.526 0.332 1.584 0.113 1.691
premium_binary 0.192 0.823 0.233 0.816 1.211
Code
pred_prob  <- predict(logit_fit,newdata=test_df,type="response")
pred_class <- if_else(pred_prob>=0.5,1L,0L)
cm <- table(Predicted=pred_class,Actual=test_df$intent_binary)
cat("Confusion Matrix:\n"); print(cm)
Confusion Matrix:
         Actual
Predicted  0  1
        0 13  6
        1  4  7
Code
acc  <- sum(diag(cm))/sum(cm)
prec <- if_else(sum(cm["1",])>0,cm["1","1"]/sum(cm["1",]),0)
rec  <- if_else(sum(cm[,"1"])>0,cm["1","1"]/sum(cm[,"1"]),0)
f1   <- if_else((prec+rec)>0,2*prec*rec/(prec+rec),0)
tibble(Metric=c("Accuracy","Precision","Recall","F1 Score"),
       Value=c(acc,prec,rec,f1)) |>
  mutate(Value=percent(Value,accuracy=0.1)) |>
  kable(caption="Model Performance Metrics (test set)") |>
  kable_styling(bootstrap_options="striped",full_width=FALSE)
Model Performance Metrics (test set)
Metric Value
Accuracy 66.7%
Precision 63.6%
Recall 53.8%
F1 Score 58.3%
Code
roc_obj <- roc(test_df$intent_binary,pred_prob,quiet=TRUE)
cat("AUC:",round(auc(roc_obj),3),"\n")
AUC: 0.695 
Code
data.frame(fpr=1-roc_obj$specificities,tpr=roc_obj$sensitivities) |>
  ggplot(aes(x=fpr,y=tpr)) +
  geom_ribbon(aes(ymin=0,ymax=tpr),fill="#2171B5",alpha=0.12) +
  geom_line(colour="#2171B5",linewidth=1.3) +
  geom_abline(slope=1,intercept=0,linetype="dashed",colour="grey60") +
  annotate("text",x=0.62,y=0.22,label=paste0("AUC = ",round(auc(roc_obj),3)),
           size=5,colour="#2171B5",fontface="bold") +
  labs(title="ROC Curve — Logistic Regression (Relaunch Patronage Intent)",
       subtitle="Shaded area = discriminatory power above random chance",
       x="False Positive Rate",y="True Positive Rate") + theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

Code
tidy(logit_fit,conf.int=TRUE,exponentiate=TRUE) |>
  filter(term!="(Intercept)") |>
  mutate(term=str_remove(term,"imp_")|>str_replace_all("_"," ")|>str_to_title(),
         sig=p.value<0.05) |>
  ggplot(aes(x=reorder(term,estimate),y=estimate,ymin=conf.low,ymax=conf.high,colour=sig)) +
  geom_hline(yintercept=1,linetype="dashed",colour="grey50") +
  geom_pointrange(linewidth=0.9,size=0.7) + coord_flip() +
  scale_colour_manual(values=c("TRUE"="#1A9850","FALSE"="#AAAAAA"),
                      labels=c("TRUE"="Significant (p<0.05)","FALSE"="Not significant")) +
  labs(title="Odds Ratios with 95% Confidence Intervals",
       subtitle="OR > 1 increases probability of high relaunch intent",
       x=NULL,y="Odds Ratio",colour=NULL) + theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"),plot.subtitle=element_text(colour="grey50"))

9.4 Interpretation for Management

Predictor OR Direction Business Action
imp_taste OR > 1 Taste is the #1 lever — define and enforce a written taste standard daily
premium_binary OR > 1 Premium-willing customers are more likely to patronise — don’t discount
imp_hygiene OR > 1 Invest in visible hygiene signals (open kitchen, NAFDAC certificate)
imp_delivery OR > 1 Delivery is a patronage driver — launch with Chowdeck on day one
spend_num OR > 1 Higher habitual spenders self-select into relaunch intent

10. Integrated Findings

Relaunch Dalos Cuisine as a quality-first, delivery-enabled traditional Nigerian restaurant, priced ₦3,000–₦8,000, targeting employed professionals in the Victoria Island / Lekki / Ikoyi corridor.

Evidence Source Technique Management Implication
Taste, hygiene & consistency avg ≥ 4.3/5 EDA + Visualisation Table stakes — failure here kills repeat visits
Inconsistent quality & hygiene are #1 complaints Visualisation Relaunch narrative must explicitly address both
Postgraduate respondents spend more Hypothesis 1 Premium menu tier justified; price floor > ₦5,000 viable
Self-employed / private sector most willing to pay premium Hypothesis 2 Market to LinkedIn, business hubs, office complexes
Taste–consistency–freshness cluster (ρ ≈ 0.75) Correlation One “Quality Guarantee” message covers all three
Delivery preference predicts relaunch intent Regression Delivery is a revenue multiplier, not optional
Premium willingness is strongest predictor Regression Quality investment expands the addressable market

11. Limitations & Further Work

  1. Non-probability sampling: Convenience/snowball design over-represents educated private-sector workers. A stratified random NBS sample would improve validity.
  2. Stated vs. revealed preference: Transaction data from a soft-launch pop-up would validate stated intent.
  3. Cross-sectional snapshot: A longitudinal panel post-launch would enable churn and NPS analysis.
  4. Ordinal outcome: A proportional-odds logistic regression would better respect the five-point intent scale.
  5. Sample size: ~70 training observations and nine predictors is at the lower bound of reliable estimation.
  6. Spatial analysis: An sf/tmap heat map by Lagos LGA would support site-selection decisions.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online. https://markanalytics.online

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x). https://doi.org/10.5281/zenodo.5960048

[Your Name]. (2026). Dalos Cuisine Relaunch Feasibility Study — Consumer Survey Dataset [Dataset]. Lagos State, Nigeria.

R Core Team. (2024). R: A language and environment for statistical computing. https://www.R-project.org/

Wickham, H., et al. (2019). Welcome to the tidyverse. JOSS, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.

Code
pkgs <- c("readxl","janitor","skimr","corrplot","ggcorrplot","scales",
          "kableExtra","broom","pROC","car","rstatix","ggpubr",
          "viridis","patchwork","effectsize")
cat("**R package versions:**\n\n")

R package versions:

Code
for(p in pkgs){
  v <- tryCatch(as.character(packageVersion(p)),error=function(e)"not installed")
  cat(sprintf("- %s (v%s)\n",p,v))
}
  • readxl (v1.5.0)
  • janitor (v2.2.1)
  • skimr (v2.2.2)
  • corrplot (v0.95)
  • ggcorrplot (v0.1.4.1)
  • scales (v1.4.0)
  • kableExtra (v1.4.0)
  • broom (v1.0.13)
  • pROC (v1.19.0.1)
  • car (v3.1.5)
  • rstatix (v0.7.3)
  • ggpubr (v0.6.3)
  • viridis (v0.6.5)
  • patchwork (v1.3.2)
  • effectsize (v1.0.2)

Appendix: AI Usage Statement

Claude (Anthropic, claude-sonnet-4-6) assisted with R code scaffolding, the column-position renaming strategy, iconv-based Likert encoder, CSS styling embedded via include-in-header, and pROC ROC syntax. All analytical decisions — technique selection, hypothesis formulation, business interpretation, and the integrated recommendation — were made independently by the author. The author has verified all outputs and is prepared to explain every result during the viva voce defence.