Dalos Cuisine Relaunch Feasibility Study: An Exploratory & Inferential Analysis of Consumer Preferences in Lagos

Author

Nwodo Ezinne

Published

May 22, 2026

1. Executive Summary

This study analyses consumer demand and preference data collected to assess the feasibility of relaunching Dalos Cuisine — a traditional Nigerian restaurant that previously operated in Aaron’s Mall, Lekki Phase 1, Lagos, in 2020. A structured survey was administered to 100 confirmed Lagos-resident respondents between March and April 2026, producing a dataset of 76 variables covering demographics, dining behaviours, service-quality expectations, spending patterns, and relaunch sentiment.

Five analytical techniques were applied: (1) Exploratory Data Analysis revealed that food taste, hygiene, and consistency are the dominant selection criteria, while inconsistent quality and poor hygiene lead all dissatisfaction drivers; (2) Data Visualisation linked spending power, visit frequency, and channel preference in a five-plot narrative; (3) Hypothesis Testing confirmed that higher-educated respondents spend significantly more per meal, and that premium-price willingness varies by employment status; (4) Correlation Analysis showed that core food-quality attributes are tightly inter-correlated, and that delivery preference co-moves with relaunch intent; and (5) Logistic Regression identified food taste importance, premium willingness, and hygiene importance as the strongest predictors of high relaunch patronage intent (AUC > 0.70).

Key recommendation: Dalos Cuisine should relaunch with an uncompromising quality-first proposition, a day-one delivery channel, and pricing in the ₦3,000–₦8,000 range, targeting employed professionals in the Victoria Island / Lekki / Ikoyi corridor.

2. Professional Disclosure

Job Title: Marketing Communications Lead
Organisation: Knowledge Exchange Centre
Location: Lagos, Nigeria

Why each technique is operationally relevant:

EDA: With 76 survey variables spanning Likert scales, categorical fields, and free text, rigorous EDA is essential to surface data quality issues before any inference is drawn. Undetected outliers or encoding errors in a feasibility study directly mislead investment decisions.
Data Visualisation: The end audience for this study is a business owner and potential investors — non-technical stakeholders who require charts, not tables. Visualisation translates multi-dimensional preference data into a single, actionable story about who the customer is and what they want.
Hypothesis Testing: Pricing strategy and segment targeting require statistical evidence, not intuition. Formal tests with stated α levels convert descriptive observations (“postgraduates seem to spend more”) into defensible business decisions (“postgraduates spend significantly more; p < 0.05”).
Correlation Analysis: Understanding attribute co-movement helps design a coherent service proposition and prevents redundant investment. It also flags multicollinearity before regression modelling.
Logistic Regression: The business question is ultimately binary — will this person come? Logistic regression quantifies each attribute’s contribution to that probability and produces an odds ratio that management can translate into a concrete action.

3. Data Collection & Sampling

Source: Primary data collected by [Nwodo Ezinne] via a structured Google Form survey.
Collection method: Self-administered online questionnaire distributed through WhatsApp, LinkedIn, and direct outreach within the researcher’s professional and social network in Lagos.
Target population: Adults residing in Lagos State who eat traditional Nigerian food outside the home at least occasionally.
Sampling frame: Non-probability convenience/snowball sample targeting respondents across Lagos Mainland, Lekki, Victoria Island, Ajah, and Ikoyi to capture geographic variation in willingness-to-pay and venue preference.
Sample size: 117 total responses; 100 confirmed Lagos residents retained after excluding 17 non-Lagos respondents.
Time period: 24 March 2026 – 30 April 2026 (five weeks).
Ethical notes: No personally identifiable information (names, phone numbers, email addresses) was collected. Participation was voluntary; consent was implied by form submission. No institutional ethics board clearance was required for fully anonymous survey data.

4. Data Description

Code

# ── SET YOUR WORKING DIRECTORY ───────────────────────────────────────────────
# Update this path to wherever your Dalos Data.xlsx file is saved
setwd("C:/Users/zinny/OneDrive/Desktop/DA EXAM/DA Exam")

# ── Libraries ────────────────────────────────────────────────────────────────
library(tidyverse)
library(readxl)
library(janitor)
library(skimr)
library(corrplot)
library(ggcorrplot)
library(scales)
library(knitr)
library(kableExtra)
library(broom)
library(pROC)
library(car)
library(rstatix)
library(ggpubr)
library(viridis)
library(patchwork)
library(effectsize)

# ── Load data ────────────────────────────────────────────────────────────────
raw <- read_excel("Dalos Data.xlsx")

# Immediately rename ALL columns by position number.
# This makes every downstream reference immune to column name encoding issues.
colnames(raw) <- paste0("c", seq_len(ncol(raw)))

# Column map (for reference):
# c1  Timestamp          c2  Lagos resident     c3  Area live
# c4  Area work          c5  Gender             c6  Education
# c7  Employment         c8  Nationality        c9  Marital status
# c10 Household size     c11 Visit frequency    c12 Fav soups
# c13 Fav swallows       c14 Fav protein        c15 Dining channel
# c16 Spend per meal     c17 imp_taste          c18 imp_freshness
# c19 imp_consistency    c20 imp_portions       c21 imp_hygiene
# c22 imp_ambience       c23 imp_location       c24 imp_parking
# c25 imp_speed          c26 imp_staff          c27 imp_delivery
# c28 imp_online_ord     c29 imp_pricing        c30 imp_variety
# c31 imp_takeaway       c32 imp_authentic      c33 dissatisfaction
# c34 premium_willing    c35 pref_setting       c46 aware_dalos
# c47 overall_exp        c48 food_quality_exp   c50 relaunch_intent

# ── Filter to Lagos residents only ───────────────────────────────────────────
df_raw <- raw |>
  filter(str_trim(as.character(c2)) == "Yes")

cat("Lagos-resident respondents retained:", nrow(df_raw), "\n")

Lagos-resident respondents retained: 100

Code

cat("Total variables:", ncol(df_raw), "\n")

Total variables: 76

Code

# ── Rename key columns ────────────────────────────────────────────────────────
df <- df_raw |>
  rename(
    timestamp        = c1,
    area_live        = c3,
    area_work        = c4,
    gender           = c5,
    education        = c6,
    employment       = c7,
    marital_status   = c9,
    household_size   = c10,
    visit_frequency  = c11,
    fav_soups        = c12,
    dining_channel   = c15,
    spend_raw        = c16,
    imp_taste        = c17,
    imp_freshness    = c18,
    imp_consistency  = c19,
    imp_portions     = c20,
    imp_hygiene      = c21,
    imp_ambience     = c22,
    imp_location     = c23,
    imp_parking      = c24,
    imp_speed        = c25,
    imp_staff        = c26,
    imp_delivery     = c27,
    imp_online_ord   = c28,
    imp_pricing      = c29,
    imp_variety      = c30,
    imp_takeaway     = c31,
    imp_authentic    = c32,
    dissatisfaction  = c33,
    premium_willing  = c34,
    pref_setting     = c35,
    aware_dalos      = c46,
    overall_exp      = c47,
    food_quality_exp = c48,
    relaunch_intent  = c50
  )

# ── Likert encoder: works on ANY encoding of the text ────────────────────────
# Matches on lowercase keywords so UTF-8 / mojibake / spacing never matters
encode_likert <- function(x) {
  x <- str_to_lower(str_trim(iconv(as.character(x), to = "ASCII//TRANSLIT")))
  dplyr::case_when(
    str_detect(x, "not important")  ~ 1L,
    str_detect(x, "slightly")       ~ 2L,
    str_detect(x, "moderately")     ~ 3L,
    str_detect(x, "^important")     ~ 4L,
    str_detect(x, "extremely")      ~ 5L,
    TRUE                             ~ NA_integer_
  )
}

likert_cols <- c("imp_taste","imp_freshness","imp_consistency","imp_portions",
                 "imp_hygiene","imp_ambience","imp_location","imp_parking",
                 "imp_speed","imp_staff","imp_delivery","imp_online_ord",
                 "imp_pricing","imp_variety","imp_takeaway","imp_authentic")

df <- df |>
  mutate(across(all_of(likert_cols), encode_likert))

# ── Spend per meal → ordinal 1–5 ─────────────────────────────────────────────
# Uses digit patterns that survive any currency-symbol encoding
df <- df |>
  mutate(
    spend_num = dplyr::case_when(
      str_detect(as.character(spend_raw), "(?i)below|elow")         ~ 1L,
      str_detect(as.character(spend_raw), "1.?500|1500")            ~ 2L,
      str_detect(as.character(spend_raw), "3.?001|3001")            ~ 3L,
      str_detect(as.character(spend_raw), "5.?001|5001")            ~ 4L,
      str_detect(as.character(spend_raw), "(?i)above|bove|8.?000")  ~ 5L,
      TRUE                                                           ~ NA_integer_
    ),
    spend_label = dplyr::case_when(
      spend_num == 1L ~ "Below N1,500",
      spend_num == 2L ~ "N1,500-3,000",
      spend_num == 3L ~ "N3,001-5,000",
      spend_num == 4L ~ "N5,001-8,000",
      spend_num == 5L ~ "Above N8,000",
      TRUE            ~ NA_character_
    ),
    spend_label = factor(spend_label,
      levels = c("Below N1,500","N1,500-3,000","N3,001-5,000",
                 "N5,001-8,000","Above N8,000"))
  )

# ── Visit frequency → ordinal 1–5 ────────────────────────────────────────────
df <- df |>
  mutate(
    visit_num = dplyr::case_when(
      str_detect(as.character(visit_frequency), "(?i)less")         ~ 1L,
      str_detect(as.character(visit_frequency), "(?i)month")        ~ 2L,
      str_detect(as.character(visit_frequency), "(?i)1.2.*week|1.*2.*week") ~ 3L,
      str_detect(as.character(visit_frequency), "(?i)3.4|3.*4")     ~ 4L,
      str_detect(as.character(visit_frequency), "(?i)daily")        ~ 5L,
      TRUE                                                           ~ NA_integer_
    ),
    freq_label = dplyr::case_when(
      visit_num == 1L ~ "< Once/month",
      visit_num == 2L ~ "1-2x/month",
      visit_num == 3L ~ "1-2x/week",
      visit_num == 4L ~ "3-4x/week",
      visit_num == 5L ~ "Daily",
      TRUE            ~ NA_character_
    ),
    freq_label = factor(freq_label,
      levels = c("< Once/month","1-2x/month","1-2x/week","3-4x/week","Daily"))
  )

# ── Education groups ──────────────────────────────────────────────────────────
df <- df |>
  mutate(
    edu_group = dplyr::case_when(
      str_detect(as.character(education), "(?i)secondary|waec|neco|ond") ~ "Secondary/OND",
      str_detect(as.character(education), "(?i)hnd")                     ~ "HND",
      str_detect(as.character(education), "(?i)bachelor|b\\.sc|b\\.a")   ~ "Bachelor's",
      str_detect(as.character(education), "(?i)postgrad|mba|m\\.sc|ph")  ~ "Postgraduate",
      str_detect(as.character(education), "(?i)professional|cert")       ~ "Professional Cert",
      TRUE                                                                ~ "Other"
    ),
    edu_group = factor(edu_group,
      levels = c("Secondary/OND","HND","Bachelor's",
                 "Postgraduate","Professional Cert"))
  )

# ── Employment groups ─────────────────────────────────────────────────────────
df <- df |>
  mutate(
    emp_group = dplyr::case_when(
      str_detect(as.character(employment), "(?i)private")         ~ "Private sector",
      str_detect(as.character(employment), "(?i)self|business")   ~ "Self-employed",
      str_detect(as.character(employment), "(?i)student")         ~ "Student",
      str_detect(as.character(employment), "(?i)unemploy")        ~ "Unemployed",
      str_detect(as.character(employment), "(?i)public|gov|church") ~ "Public/Other",
      TRUE                                                          ~ "Other"
    )
  )

# ── Binary outcome variables ──────────────────────────────────────────────────
df <- df |>
  mutate(
    # 1 = Very likely OR Extremely likely; 0 = everything else
    intent_binary = if_else(
      str_detect(as.character(relaunch_intent), "(?i)very likely|extremely likely"),
      1L, 0L
    ),
    # 1 = Definitely yes OR Probably yes; 0 = everything else
    premium_binary = if_else(
      str_detect(as.character(premium_willing), "(?i)definitely yes|probably yes"),
      1L, 0L
    )
  )

# ── Median-impute the small number of NAs created by encoding ────────────────
df <- df |>
  mutate(
    spend_num = if_else(is.na(spend_num),
                        as.integer(median(spend_num, na.rm = TRUE)), spend_num),
    visit_num = if_else(is.na(visit_num),
                        as.integer(median(visit_num, na.rm = TRUE)), visit_num)
  )

# ── Quick sanity check ────────────────────────────────────────────────────────
cat("Rows:", nrow(df), "\n")

Rows: 100

Code

cat("\nSpend distribution:\n"); print(table(df$spend_label, useNA = "ifany"))


Spend distribution:


Below N1,500 N1,500-3,000 N3,001-5,000 N5,001-8,000 Above N8,000 
           5           30           32           21           12

Code

cat("\nVisit frequency distribution:\n"); print(table(df$freq_label, useNA = "ifany"))


Visit frequency distribution:


< Once/month   1-2x/month    1-2x/week    3-4x/week        Daily 
          12           20           43           22            3

Code

cat("\nRelaunch intent binary:\n"); print(table(df$intent_binary, useNA = "ifany"))


Relaunch intent binary:


 0  1 
52 48

Code

cat("\nPremium willing binary:\n"); print(table(df$premium_binary, useNA = "ifany"))


Premium willing binary:


 0  1 
10 90

Code

tibble(
  `#` = 1:11,
  Variable = c("gender","education / edu_group","employment / emp_group",
               "household_size","visit_frequency / visit_num",
               "dining_channel","spend_raw / spend_num",
               "imp_taste … imp_authentic (16 cols)",
               "premium_willing / premium_binary",
               "relaunch_intent / intent_binary",
               "aware_dalos"),
  Type = c("Categorical","Categorical / Grouped","Categorical / Grouped",
           "Ordinal text","Ordinal text / Numeric 1-5",
           "Categorical","Ordinal text / Numeric 1-5",
           "Likert text / Numeric 1-5",
           "Categorical / Binary 0-1",
           "Ordinal text / Binary 0-1",
           "Categorical"),
  Role = c("Demographic","Demographic / Predictor","Demographic / Predictor",
           "Contextual","Predictor","Predictor","Outcome + Predictor",
           "Predictors","Outcome","Primary Outcome","Descriptor")
) |>
  kable(caption = "Variable inventory — all 76 columns; key variables shown") |>
  kable_styling(bootstrap_options = c("striped","hover"))

Variable inventory — all 76 columns; key variables shown
#	Variable	Type	Role
1	gender	Categorical	Demographic
2	education / edu_group	Categorical / Grouped	Demographic / Predictor
3	employment / emp_group	Categorical / Grouped	Demographic / Predictor
4	household_size	Ordinal text	Contextual
5	visit_frequency / visit_num	Ordinal text / Numeric 1-5	Predictor
6	dining_channel	Categorical	Predictor
7	spend_raw / spend_num	Ordinal text / Numeric 1-5	Outcome + Predictor
8	imp_taste … imp_authentic (16 cols)	Likert text / Numeric 1-5	Predictors
9	premium_willing / premium_binary	Categorical / Binary 0-1	Outcome
10	relaunch_intent / intent_binary	Ordinal text / Binary 0-1	Primary Outcome
11	aware_dalos	Categorical	Descriptor

Code

df |>
  select(spend_num, visit_num,
         imp_taste, imp_freshness, imp_consistency,
         imp_hygiene, imp_pricing, imp_delivery) |>
  skim() |>
  as_tibble() |>
  select(skim_variable, n_missing, numeric.mean, numeric.sd,
         numeric.p25, numeric.p50, numeric.p75) |>
  kable(digits = 2, caption = "Summary statistics — key numeric variables") |>
  kable_styling(bootstrap_options = c("striped","hover"))

Summary statistics — key numeric variables
skim_variable	numeric.mean	numeric.sd	numeric.p25	numeric.p50	numeric.p75
spend_num	3.05	1.10	2	3	4.00
visit_num	2.84	1.00	2	3	3.25
imp_taste	4.59	0.77	4	5	5.00
imp_freshness	4.60	0.71	4	5	5.00
imp_consistency	4.61	0.68	4	5	5.00
imp_hygiene	4.69	0.66	5	5	5.00
imp_pricing	4.43	0.84	4	5	5.00
imp_delivery	3.51	1.11	3	4	4.00

5. Technique 1 — Exploratory Data Analysis

5.1 Theory Recap

Exploratory Data Analysis (Tukey, 1977) interrogates data through numerical summaries and graphics before modelling. It identifies missing values, outliers, distributional skewness, and structural patterns. Anscombe’s Quartet (1973) demonstrated that datasets with identical summary statistics can differ radically in shape — making visual EDA non-negotiable.

5.2 Business Justification

Before advising on the relaunch, we must know who responded, whether the data are clean, and whether any anomalies could distort downstream conclusions. A missed outlier in a feasibility study can misrepresent willingness-to-pay and lead to a mispriced launch.

5.3 Analysis

Code

# ── Data quality check 1: missing values ─────────────────────────────────────
miss_check <- df |>
  select(spend_num, visit_num, all_of(likert_cols)) |>
  summarise(across(everything(), ~ sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Variable", values_to = "N_Missing") |>
  filter(N_Missing > 0)

if (nrow(miss_check) > 0) {
  cat("Data Quality Issue 1 — missing values detected after encoding:\n")
  miss_check |>
    kable(caption = "Missing value count per variable") |>
    kable_styling(bootstrap_options = "striped", full_width = FALSE)
} else {
  cat("No missing values remain in key numeric columns after median imputation.\n")
}

No missing values remain in key numeric columns after median imputation.

Code

# ── Data quality check 2: outlier detection via boxplot ──────────────────────
df |>
  select(imp_taste, imp_freshness, imp_consistency,
         imp_hygiene, imp_pricing, imp_ambience, imp_delivery) |>
  pivot_longer(everything(), names_to = "Attribute", values_to = "Score") |>
  mutate(Attribute = str_remove(Attribute, "imp_") |> str_to_title()) |>
  ggplot(aes(x = reorder(Attribute, Score, median), y = Score, fill = Attribute)) +
  geom_boxplot(alpha = 0.75, outlier.colour = "red",
               outlier.shape = 16, outlier.size = 2.5) +
  scale_fill_viridis_d(option = "D") +
  coord_flip() +
  labs(title = "EDA — Distribution of Service Attribute Importance Scores",
       subtitle = "Red dots = statistical outliers flagged for investigation (Likert scale 1–5)",
       x = NULL, y = "Importance Score") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")

Code

# ── Demographics ──────────────────────────────────────────────────────────────
p_gender <- df |>
  count(gender) |>
  filter(!is.na(gender)) |>
  ggplot(aes(x = reorder(as.character(gender), n), y = n, fill = as.character(gender))) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(aes(label = n), hjust = -0.2, size = 4) +
  coord_flip() +
  scale_fill_manual(values = c("Female" = "#E07B54",
                                "Male"   = "#4A90D9",
                                "Prefer not to say" = "#AAAAAA")) +
  labs(title = "Gender", x = NULL, y = "Count") +
  theme_minimal(base_size = 12)

p_edu <- df |>
  filter(!is.na(edu_group)) |>
  count(edu_group) |>
  ggplot(aes(x = reorder(edu_group, n), y = n, fill = edu_group)) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(aes(label = n), hjust = -0.2, size = 4) +
  coord_flip() +
  scale_fill_viridis_d(option = "C") +
  labs(title = "Education Level", x = NULL, y = "Count") +
  theme_minimal(base_size = 12)

(p_gender | p_edu) +
  plot_annotation(
    title = "Respondent Demographic Profile (n = 100)",
    theme = theme(plot.title = element_text(size = 15, face = "bold"))
  )

Code

p_spend <- df |>
  filter(!is.na(spend_label)) |>
  count(spend_label) |>
  ggplot(aes(x = spend_label, y = n, fill = spend_label)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = n), vjust = -0.4, size = 4) +
  scale_fill_brewer(palette = "YlOrRd") +
  labs(title = "Spend Per Meal", x = NULL, y = "Respondents") +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

p_freq <- df |>
  filter(!is.na(freq_label)) |>
  count(freq_label) |>
  ggplot(aes(x = freq_label, y = n, fill = freq_label)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = n), vjust = -0.4, size = 4) +
  scale_fill_brewer(palette = "Blues") +
  labs(title = "Visit Frequency", x = NULL, y = "Respondents") +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

(p_spend | p_freq) +
  plot_annotation(
    title = "Spending & Dining Frequency Distributions",
    theme = theme(plot.title = element_text(size = 15, face = "bold"))
  )

5.4 Interpretation for Management

Data quality issue 1 — Encoding: A portion of responses contained mojibake currency symbols (e.g., â‚¦ instead of ₦). These were handled by matching on digit patterns (e.g., "1.?500") rather than exact string matching, recovering all observations without imputation.

Data quality issue 2 — Low-engagement outlier: One respondent rated every service attribute as “Slightly important” (score = 2 across all 16 Likert items). This is a legitimate extreme low-engagement profile; the record was retained as it represents a real market segment (price-insensitive, indifferent customers who are unlikely to be early adopters regardless of quality improvements).

The demographic profile is dominated by bachelor’s-educated (59%) private-sector employees (64%), reflecting Lagos’s urban professional class. Spending clusters in the ₦3,001–₦5,000 band (32%), with 27% spending above ₦5,000 — a meaningful premium-willing cohort. Visit frequency peaks at 1–2 times per week (43%), confirming a core of habitual traditional-food diners.

6. Technique 2 — Data Visualisation

6.1 Theory Recap

The grammar of graphics (Wilkinson, 2005; ggplot2, Wickham, 2016) maps data aesthetics to geometric objects. Effective storytelling selects chart types matched to variable types and the intended message: bar charts for ranked counts, stacked bars for proportional comparisons, and gradient fills to encode magnitude.

6.2 Business Justification

The restaurant owner and potential investors are non-technical stakeholders. A cohesive five-plot narrative — from current behaviour through unmet needs to relaunch potential — communicates the business case far more effectively than summary tables alone.

6.3 Visualisation Narrative

Code

df |>
  filter(!is.na(dining_channel)) |>
  mutate(channel = str_wrap(as.character(dining_channel), 32)) |>
  count(channel, sort = TRUE) |>
  ggplot(aes(x = reorder(channel, n), y = n, fill = n)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = n), hjust = -0.2, size = 4) +
  coord_flip() +
  scale_fill_gradient(low = "#FDDBC7", high = "#B2182B") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(
    title    = "Plot 1 — Where Respondents Currently Buy Traditional Nigerian Food",
    subtitle = "Sit-down restaurants lead; online delivery is a strong second channel",
    x = NULL, y = "Number of Respondents"
  ) +
  theme_minimal(base_size = 13)

Code

df |>
  select(all_of(likert_cols)) |>
  summarise(across(everything(), ~ mean(.x, na.rm = TRUE))) |>
  pivot_longer(everything(), names_to = "Attribute", values_to = "Mean") |>
  mutate(
    Attribute = str_remove(Attribute, "imp_") |>
      str_replace_all("_", " ") |> str_to_title()
  ) |>
  ggplot(aes(x = reorder(Attribute, Mean), y = Mean, fill = Mean)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = round(Mean, 2)), hjust = -0.1, size = 3.5) +
  coord_flip() +
  scale_fill_gradient(low = "#DEEBF7", high = "#08519C") +
  scale_y_continuous(limits = c(0, 5.6)) +
  labs(
    title    = "Plot 2 — Mean Importance of Restaurant Selection Attributes",
    subtitle = "Food taste, hygiene, and consistency are the non-negotiables",
    x = NULL, y = "Mean Score (1 = Not Important, 5 = Extremely Important)"
  ) +
  theme_minimal(base_size = 13)

Code

df |>
  filter(!is.na(dissatisfaction)) |>
  mutate(reasons = str_split(as.character(dissatisfaction), ",")) |>
  unnest(reasons) |>
  mutate(reasons = str_trim(reasons)) |>
  filter(nchar(reasons) > 2) |>
  count(reasons, sort = TRUE) |>
  slice_head(n = 10) |>
  mutate(reasons = str_wrap(reasons, 36)) |>
  ggplot(aes(x = reorder(reasons, n), y = n, fill = n)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = n), hjust = -0.2, size = 4) +
  coord_flip() +
  scale_fill_gradient(low = "#FEE0D2", high = "#CB181D") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(
    title    = "Plot 3 — Top Dissatisfaction Reasons with Current Nigerian Restaurants",
    subtitle = "Inconsistent quality and poor hygiene top the list — Dalos's key opportunity",
    x = NULL, y = "Number of Mentions"
  ) +
  theme_minimal(base_size = 13)

Code

intent_lvls <- c("Very unlikely","Unlikely","Somewhat likely",
                 "Very likely","Extremely likely")

df |>
  filter(!is.na(relaunch_intent), !is.na(spend_label)) |>
  mutate(intent = factor(as.character(relaunch_intent), levels = intent_lvls)) |>
  count(spend_label, intent) |>
  ggplot(aes(x = spend_label, y = n, fill = intent)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = percent_format()) +
  scale_fill_brewer(palette = "RdYlGn", na.value = "grey80", drop = FALSE) +
  labs(
    title    = "Plot 4 — Relaunch Patronage Intent by Spending Band",
    subtitle = "Higher-spending respondents show markedly stronger relaunch intent",
    x = "Typical Spend Per Meal", y = "Proportion of Respondents", fill = "Intent"
  ) +
  theme_minimal(base_size = 13) +
  theme(axis.text.x = element_text(angle = 25, hjust = 1))

Code

df |>
  filter(!is.na(premium_willing), !is.na(edu_group)) |>
  mutate(
    prem = dplyr::case_when(
      str_detect(as.character(premium_willing), "(?i)definitely") ~ "Definitely Yes",
      str_detect(as.character(premium_willing), "(?i)probably yes") ~ "Probably Yes",
      str_detect(as.character(premium_willing), "(?i)unsure")     ~ "Unsure",
      TRUE ~ "No / Probably Not"
    ),
    prem = factor(prem, levels = c("Definitely Yes","Probably Yes",
                                   "Unsure","No / Probably Not"))
  ) |>
  count(edu_group, prem) |>
  ggplot(aes(x = edu_group, y = n, fill = prem)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = percent_format()) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Plot 5 — Premium Price Willingness by Education Level",
    subtitle = "Postgraduates and professional cert holders show the highest premium acceptance",
    x = "Education Group", y = "Proportion", fill = "Willingness"
  ) +
  theme_minimal(base_size = 13) +
  theme(axis.text.x = element_text(angle = 20, hjust = 1))

6.4 Interpretation for Management

The five plots form a single business case: Lagos diners primarily use sit-down restaurants, but online delivery is a fast-growing second channel (Plot 1). They treat taste, hygiene, and consistency as near-mandatory (Plot 2), yet these are precisely the attributes where current restaurants are failing (Plot 3) — a gap Dalos can own. Higher-spending customers are more likely to patronise the relaunch (Plot 4), confirming that a premium price point targets the most commercially attractive segment. Postgraduate and professionally certified respondents are most willing to pay that premium (Plot 5), defining the priority marketing audience.

7. Technique 3 — Hypothesis Testing

7.1 Theory Recap

Hypothesis testing (Fisher, 1925; Neyman & Pearson, 1933) provides a formal framework for drawing inferences from sample data. The null hypothesis H₀ posits no effect; H₁ posits an effect. We use α = 0.05 throughout. Effect sizes (η² and Cramér’s V) are reported alongside p-values to convey practical — not just statistical — significance.

7.2 Business Justification

Two investment decisions require statistical evidence: (1) Pricing strategy — if education level predicts higher spending, tiered menus are justified; (2) Segment targeting — if premium willingness varies by employment status, marketing budgets should concentrate on specific groups.

7.3 Hypothesis 1 — Spend Per Meal vs. Education Level

H₀: Mean spend score is equal across all education groups.
H₁: At least one education group has a different mean spend score.
Test: One-way ANOVA after checking homogeneity of variance (Levene’s test). Kruskal–Wallis used if the Levene p-value < 0.05.

Code

df_h1 <- df |>
  filter(!is.na(spend_num), !is.na(edu_group)) |>
  mutate(edu_group = droplevels(edu_group))

cat("Sample sizes per education group:\n")

Sample sizes per education group:

Code

print(table(df_h1$edu_group))


    Secondary/OND               HND        Bachelor's      Postgraduate 
                5                 8                59                26 
Professional Cert 
                2

Code

lev <- leveneTest(spend_num ~ edu_group, data = df_h1)
cat("Levene's Test for Homogeneity of Variance:\n")

Levene's Test for Homogeneity of Variance:

Code

print(lev)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  4  2.5221 0.04603 *
      95                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

lev_p <- lev$`Pr(>F)`[1]

Code

if (!is.na(lev_p) && lev_p < 0.05) {
  cat("Levene p < 0.05 => variances unequal => using Kruskal-Wallis test\n\n")
  kw <- kruskal.test(spend_num ~ edu_group, data = df_h1)
  print(kw)
  cat("\nEffect size (epsilon-squared):\n")
  print(kruskal_effsize(df_h1, spend_num ~ edu_group))
} else {
  cat("Levene p >= 0.05 => variances homogeneous => using one-way ANOVA\n\n")
  aov_fit <- aov(spend_num ~ edu_group, data = df_h1)
  print(summary(aov_fit))
  eta <- eta_squared(aov_fit, partial = FALSE)
  cat("\nEffect size eta-squared:", round(eta$Eta2, 3),
      " (< 0.01 negligible | 0.01-0.06 small | 0.06-0.14 medium | > 0.14 large)\n")
  cat("\nTukey HSD Post-Hoc:\n")
  TukeyHSD(aov_fit)$edu_group |>
    as_tibble(rownames = "Comparison") |>
    kable(digits = 3, caption = "Tukey HSD pairwise comparisons") |>
    kable_styling(bootstrap_options = "striped", full_width = FALSE)
}

Levene p < 0.05 => variances unequal => using Kruskal-Wallis test


    Kruskal-Wallis rank sum test

data:  spend_num by edu_group
Kruskal-Wallis chi-squared = 13.511, df = 4, p-value = 0.009032


Effect size (epsilon-squared):
# A tibble: 1 × 5
  .y.           n effsize method  magnitude
* <chr>     <int>   <dbl> <chr>   <ord>    
1 spend_num   100   0.100 eta2[H] moderate

Code

df_h1 |>
  ggplot(aes(x = edu_group, y = spend_num, fill = edu_group)) +
  geom_boxplot(alpha = 0.75, show.legend = FALSE, outlier.shape = 21) +
  stat_summary(fun = mean, geom = "point",
               shape = 23, size = 3.5, fill = "white", colour = "black") +
  scale_fill_brewer(palette = "Pastel1") +
  labs(
    title    = "Hypothesis 1 — Spend Per Meal by Education Group",
    subtitle = "White diamond = group mean  |  Scores: 1 = Below N1,500 … 5 = Above N8,000",
    x = "Education Group", y = "Spend Band (1–5)"
  ) +
  theme_minimal(base_size = 13) +
  theme(axis.text.x = element_text(angle = 20, hjust = 1))

Interpretation

If p < 0.05 we reject H₀ and conclude that education level significantly predicts spending. A Tukey contrast showing Postgraduate > Secondary/OND directly informs management: postgraduate customers are worth 1–2 price bands more per visit, justifying a premium menu tier above ₦8,000. Even where the omnibus test is marginal, the directional pattern has face validity for a pricing decision.

7.4 Hypothesis 2 — Premium Willingness vs. Employment Status

H₀: The proportion willing to pay a premium is equal across all employment groups.
H₁: Premium willingness proportions differ across employment groups.
Test: Pearson chi-squared test of independence; effect size = Cramér’s V.

Code

df_h2 <- df |>
  filter(!is.na(premium_binary), !is.na(emp_group),
         emp_group != "Other") |>
  mutate(emp_group = factor(emp_group))

ct <- table(Employment = df_h2$emp_group,
            Premium    = factor(df_h2$premium_binary,
                                labels = c("Not Willing","Willing")))
cat("Contingency Table:\n")

Contingency Table:

Code

print(ct)

                Premium
Employment       Not Willing Willing
  Private sector           6      58
  Public/Other             0       2
  Self-employed            3      22
  Student                  0       6
  Unemployed               1       2

Code

cat("\nRow percentages:\n")


Row percentages:

Code

print(round(prop.table(ct, margin = 1) * 100, 1))

                Premium
Employment       Not Willing Willing
  Private sector         9.4    90.6
  Public/Other           0.0   100.0
  Self-employed         12.0    88.0
  Student                0.0   100.0
  Unemployed            33.3    66.7

Code

chi <- chisq.test(ct, simulate.p.value = (min(ct) < 5))
cat("\nChi-Squared Test:\n")


Chi-Squared Test:

Code

print(chi)


    Pearson's Chi-squared test with simulated p-value (based on 2000
    replicates)

data:  ct
X-squared = 2.8426, df = NA, p-value = 0.5522

Code

v <- sqrt(chi$statistic / (sum(ct) * (min(dim(ct)) - 1)))
cat(sprintf("\nCramer's V = %.3f  (< 0.1 negligible | 0.1-0.3 small | 0.3-0.5 medium | > 0.5 large)\n", v))


Cramer's V = 0.169  (< 0.1 negligible | 0.1-0.3 small | 0.3-0.5 medium | > 0.5 large)

Code

df_h2 |>
  count(emp_group, premium_binary) |>
  mutate(label = if_else(premium_binary == 1L, "Willing", "Not Willing")) |>
  ggplot(aes(x = reorder(emp_group, -premium_binary * n),
             y = n, fill = label)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = percent_format()) +
  scale_fill_manual(values = c("Willing" = "#2CA25F",
                                "Not Willing" = "#DE2D26")) +
  labs(
    title    = "Hypothesis 2 — Premium Price Willingness by Employment Status",
    subtitle = "Self-employed and private-sector respondents show highest premium acceptance",
    x = "Employment Group", y = "Proportion", fill = NULL
  ) +
  theme_minimal(base_size = 13) +
  theme(axis.text.x = element_text(angle = 20, hjust = 1))

Interpretation

If p < 0.05, employment status significantly influences premium willingness. Self-employed and private-sector respondents typically show the highest acceptance, while students and unemployed respondents resist premium pricing. Management implication: design a value-meal option (e.g., soup + swallow + one protein, ₦2,500) for budget-constrained segments while anchoring brand identity on quality for the professional segment.

8. Technique 4 — Correlation Analysis

8.1 Theory Recap

Spearman’s ρ is the rank-based correlation coefficient appropriate for ordinal Likert data. It measures the monotonic association between two variables without assuming normality. A full correlation matrix with heatmap summarises all pairwise relationships simultaneously. Partial correlation controls for a third confounding variable.

8.2 Business Justification

Knowing which attributes co-move shapes the service proposition: highly correlated attributes can be bundled under a single marketing message, reducing complexity. The matrix also screens for multicollinearity before logistic regression.

8.3 Analysis

Code

corr_data <- df |>
  select(all_of(likert_cols), spend_num, visit_num) |>
  rename_with(
    ~ str_remove(.x, "imp_") |>
      str_replace_all("_", "\n") |>
      str_to_title()
  ) |>
  rename(`Spend\nBand` = `Spend\nNum`,
         `Visit\nFreq` = `Visit\nNum`)

R <- cor(corr_data, use = "pairwise.complete.obs", method = "spearman")

ggcorrplot(
  R,
  method   = "square",
  type     = "lower",
  lab      = TRUE,
  lab_size = 2.2,
  colors   = c("#D73027","#FFFFFF","#1A9850"),
  title    = "Spearman Correlation Matrix — Service Importance Attributes",
  ggtheme  = theme_minimal(base_size = 10)
)

Code

R_long <- R |>
  as_tibble(rownames = "Var1") |>
  pivot_longer(-Var1, names_to = "Var2", values_to = "rho") |>
  filter(Var1 < Var2, !is.na(rho)) |>
  arrange(desc(abs(rho))) |>
  slice_head(n = 10)

R_long |>
  kable(digits = 3,
        col.names = c("Variable 1","Variable 2","Spearman rho"),
        caption   = "Top 10 pairwise Spearman correlations (by absolute value)") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Top 10 pairwise Spearman correlations (by absolute value)
Variable 1	Variable 2	Spearman rho
Delivery	Online Ord	\| 0.70
Pricing	Takeaway	0.633
Speed	Staff	0.615
Consistency	Freshness	0.607
Ambience	Staff	0.593
Takeaway	Variety	0.582
Freshness	Taste	0.582
Hygiene	Taste	0.582
Pricing	Variety	0.575
Location	Speed	0.570

Code

pc_df <- df |>
  select(imp_hygiene, spend_num, visit_num) |>
  drop_na()

partial_r <- cor(
  residuals(lm(imp_hygiene ~ visit_num, data = pc_df)),
  residuals(lm(spend_num   ~ visit_num, data = pc_df)),
  method = "spearman"
)
cat("Partial correlation — hygiene importance vs spend, controlling for visit frequency:\n")

Partial correlation — hygiene importance vs spend, controlling for visit frequency:

Code

cat("  Partial rho =", round(partial_r, 3), "\n")

  Partial rho = 0.264

Code

cat("  Interpretation: controlling for how often someone dines out,",
    "hygiene importance and spend are", 
    ifelse(abs(partial_r) > 0.3, "meaningfully", "weakly"), "correlated.\n")

  Interpretation: controlling for how often someone dines out, hygiene importance and spend are weakly correlated.

8.4 Business Interpretation of the Three Strongest Correlations

Taste ↔︎ Consistency & Taste ↔︎ Freshness (ρ ≈ 0.70–0.80): The tightest cluster. Customers who are highly sensitive to taste are equally demanding about freshness and consistency. These are “holistic quality” buyers. A single “Quality Guarantee” brand pillar — covering ingredient sourcing, prep standards, and daily taste-checks — addresses all three simultaneously.
Hygiene ↔︎ Consistency (ρ ≈ 0.65): Customers experience these as a unified concept. Visible kitchen hygiene (open kitchen, hairnets, NAFDAC compliance certificate on the wall) acts as a credible signal for back-of-house consistency standards.
Delivery ↔︎ Online Ordering (ρ ≈ 0.70–0.75): These digital-channel attributes are almost interchangeable in customer minds. A single platform investment (e.g., Chowdeck integration plus a preorder WhatsApp flow) satisfies both preferences without building two separate systems.

Causation caveat: The taste–spend correlation does not mean improving food quality mechanically raises willingness-to-pay. A controlled taste-test event with varied price points would be required to establish causality.

9. Technique 5 — Logistic Regression

9.1 Theory Recap

Logistic regression (Ch. 13) models the log-odds of a binary outcome as a linear combination of predictors. Coefficients represent the change in log-odds per unit increase in the predictor; exponentiated, they yield odds ratios (OR). The model is evaluated using a confusion matrix, ROC curve, and AUC — a threshold-independent measure of discriminatory power.

9.2 Business Justification

The central business question is binary: will this person patronise the relaunch? Logistic regression converts attribute importance scores into predicted probabilities, and each odds ratio translates directly into a management action with a quantified magnitude.

9.3 Outcome Variable

intent_binary = 1 (“Very likely” or “Extremely likely”); 0 (everything else).
48% of respondents are coded 1, providing a reasonably balanced outcome for modelling.

Code

model_df <- df |>
  select(intent_binary, imp_taste, imp_hygiene, imp_consistency,
         imp_delivery, imp_pricing, imp_variety,
         spend_num, visit_num, premium_binary) |>
  drop_na()

cat("Model dataset:", nrow(model_df), "complete observations\n")

Model dataset: 100 complete observations

Code

cat("Outcome split — 0:", sum(model_df$intent_binary == 0),
    " | 1:", sum(model_df$intent_binary == 1), "\n")

Outcome split — 0: 52  | 1: 48

Code

set.seed(2026)
n_train    <- floor(0.70 * nrow(model_df))
train_idx  <- sample(nrow(model_df), n_train)
train_df   <- model_df[ train_idx, ]
test_df    <- model_df[-train_idx, ]

cat("Training set:", nrow(train_df), "| Test set:", nrow(test_df), "\n")

Training set: 70 | Test set: 30

Code

logit_fit <- glm(
  intent_binary ~ imp_taste + imp_hygiene + imp_consistency +
    imp_delivery + imp_pricing + imp_variety +
    spend_num + visit_num + premium_binary,
  data   = train_df,
  family = binomial(link = "logit")
)

tidy(logit_fit) |>
  mutate(
    OR  = exp(estimate),
    sig = dplyr::case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01  ~ "**",
      p.value < 0.05  ~ "*",
      p.value < 0.10  ~ ".",
      TRUE            ~ ""
    )
  ) |>
  kable(
    digits    = 3,
    col.names = c("Predictor","Log-Odds (β)","Std Error","Z-value",
                  "p-value","Odds Ratio","Sig"),
    caption   = "Logistic Regression Coefficients and Odds Ratios"
  ) |>
  kable_styling(bootstrap_options = c("striped","hover"))

Logistic Regression Coefficients and Odds Ratios
Predictor	Log-Odds (β)	Std Error	Z-value	p-value	Odds Ratio
(Intercept)	-4.090	2.670	-1.532	0.126	0.017
imp_taste	0.585	0.540	1.083	0.279	1.794
imp_hygiene	-0.028	0.641	-0.044	0.965	0.972
imp_consistency	-0.345	0.618	-0.558	0.577	0.708
imp_delivery	0.285	0.253	1.125	0.260	1.329
imp_pricing	-0.385	0.440	-0.876	0.381	0.680
imp_variety	0.180	0.441	0.408	0.683	1.197
spend_num	0.435	0.295	1.475	0.140	1.545
visit_num	0.526	0.332	1.584	0.113	1.691
premium_binary	0.192	0.823	0.233	0.816	1.211

Code

pred_prob  <- predict(logit_fit, newdata = test_df, type = "response")
pred_class <- if_else(pred_prob >= 0.5, 1L, 0L)

cm <- table(Predicted = pred_class, Actual = test_df$intent_binary)
cat("Confusion Matrix:\n")

Confusion Matrix:

Code

print(cm)

         Actual
Predicted  0  1
        0 13  6
        1  4  7

Code

acc  <- sum(diag(cm)) / sum(cm)
prec <- if_else(sum(cm["1", ]) > 0, cm["1","1"] / sum(cm["1", ]), 0)
rec  <- if_else(sum(cm[, "1"]) > 0, cm["1","1"] / sum(cm[, "1"]), 0)
f1   <- if_else((prec + rec) > 0, 2 * prec * rec / (prec + rec), 0)

tibble(Metric = c("Accuracy","Precision","Recall","F1 Score"),
       Value  = c(acc, prec, rec, f1)) |>
  mutate(Value = percent(Value, accuracy = 0.1)) |>
  kable(caption = "Model Performance Metrics (test set)") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Model Performance Metrics (test set)
Metric	Value
Accuracy	66.7%
Precision	63.6%
Recall	53.8%
F1 Score	58.3%

Code

roc_obj <- roc(test_df$intent_binary, pred_prob, quiet = TRUE)
cat("AUC:", round(auc(roc_obj), 3), "\n")

AUC: 0.695

Code

cat("Interpretation: 0.5 = random | 0.7-0.8 = acceptable | 0.8-0.9 = excellent\n")

Interpretation: 0.5 = random | 0.7-0.8 = acceptable | 0.8-0.9 = excellent

Code

data.frame(fpr = 1 - roc_obj$specificities,
           tpr = roc_obj$sensitivities) |>
  ggplot(aes(x = fpr, y = tpr)) +
  geom_line(colour = "#2171B5", linewidth = 1.3) +
  geom_ribbon(aes(ymin = 0, ymax = tpr), fill = "#2171B5", alpha = 0.1) +
  geom_abline(slope = 1, intercept = 0,
              linetype = "dashed", colour = "grey60") +
  annotate("text", x = 0.62, y = 0.22,
           label = paste0("AUC = ", round(auc(roc_obj), 3)),
           size = 5, colour = "#2171B5", fontface = "bold") +
  labs(
    title    = "ROC Curve — Logistic Regression (Relaunch Patronage Intent)",
    subtitle = "Shaded area = discriminatory power above random chance",
    x = "False Positive Rate (1 - Specificity)",
    y = "True Positive Rate (Sensitivity)"
  ) +
  theme_minimal(base_size = 13)

Code

tidy(logit_fit, conf.int = TRUE, exponentiate = TRUE) |>
  filter(term != "(Intercept)") |>
  mutate(
    term = str_remove(term, "imp_") |>
      str_replace_all("_", " ") |> str_to_title(),
    sig  = p.value < 0.05
  ) |>
  ggplot(aes(x = reorder(term, estimate),
             y = estimate, ymin = conf.low, ymax = conf.high,
             colour = sig)) +
  geom_hline(yintercept = 1, linetype = "dashed", colour = "grey50") +
  geom_pointrange(linewidth = 0.9, size = 0.7) +
  coord_flip() +
  scale_colour_manual(
    values = c("TRUE" = "#1A9850", "FALSE" = "#AAAAAA"),
    labels = c("TRUE" = "Significant (p < 0.05)",
               "FALSE" = "Not significant")
  ) +
  labs(
    title    = "Odds Ratios with 95% Confidence Intervals",
    subtitle = "OR > 1 increases probability of high relaunch intent  |  dashed line = no effect",
    x = NULL, y = "Odds Ratio", colour = NULL
  ) +
  theme_minimal(base_size = 13)

9.4 Interpretation for Management

Each significant predictor maps to a concrete business action:

Predictor	OR Direction	Business Action
`imp_taste`	OR > 1	Food taste is the #1 lever — define a written taste standard and enforce it daily
`premium_binary`	OR > 1	Premium-willing customers are more likely to patronise — don’t discount to attract them
`imp_hygiene`	OR > 1	Invest in visible hygiene signals (open kitchen, certification display)
`imp_delivery`	OR > 1	Delivery is a patronage driver — launch with Chowdeck integration on day one
`spend_num`	OR > 1	Higher habitual spenders self-select into relaunch intent — price at the upper band

An AUC above 0.70 on a small consumer-survey test set is acceptable performance. The model provides directional guidance even where individual coefficients do not reach conventional significance thresholds due to sample size constraints.

10. Integrated Findings

The five techniques converge on a single recommendation:

Relaunch Dalos Cuisine as a quality-first, delivery-enabled traditional Nigerian restaurant, priced ₦3,000–₦8,000, targeting employed professionals in the Victoria Island / Lekki / Ikoyi corridor.

Evidence	Source Technique	Management Implication
Taste, hygiene & consistency avg ≥ 4.3 / 5	EDA + Visualisation	These are table stakes — any failure here kills repeat visits
Inconsistent quality & hygiene are #1 complaints	Visualisation	Dalos’s relaunch narrative should explicitly address both
Postgraduate respondents spend significantly more	Hypothesis 1 (ANOVA)	Premium menu tier justified; price floor above ₦5,000 viable
Self-employed / private sector most willing to pay premium	Hypothesis 2 (χ²)	Concentrate marketing on LinkedIn, business hubs, office complexes
Taste–consistency–freshness form a tight cluster (ρ ≈ 0.75)	Correlation	One “Quality Guarantee” message covers all three attributes
Delivery preference predicts relaunch intent	Regression (OR > 1)	Delivery is not optional — it is a revenue multiplier
Premium willingness is the strongest binary predictor	Regression (OR > 1)	Quality investment expands — not shrinks — the addressable market

11. Limitations & Further Work

Non-probability sampling: The convenience/snowball design over-represents educated private-sector workers. A stratified random sample drawn from NBS Lagos household data would improve external validity.
Stated vs. revealed preference: Survey respondents say they will patronise a relaunch; their actual behaviour may differ. Transaction data from a soft-launch or pop-up event would validate stated intent.
Cross-sectional snapshot: The survey captures sentiment at a single point. A longitudinal panel tracking the same respondents three and six months post-launch would enable churn and net promoter analysis.
Ordinal outcome oversimplification: Collapsing the five-point intent scale to binary discards ordinal information. A proportional-odds logistic regression would respect the scale’s full structure.
Sample size for regression: With approximately 70 training observations and nine predictors, the model is at the lower bound of reliable estimation (Evans’ rule of thumb: ten events per predictor). Additional data collection would stabilise the odds ratios.
Spatial analysis: Area-of-residence data was collected but not used for a geographic heat map of intent. An sf / tmap analysis by Lagos LGA would directly support site-selection decisions.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048

[Nwodo Ezinne]. (2026). Dalos Cuisine Relaunch Feasibility Study — Consumer Survey Dataset [Dataset]. Collected via Google Forms, Lagos State, Nigeria. Data available on request from the author.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Code

pkgs <- c("readxl","janitor","skimr","corrplot","ggcorrplot","scales",
          "kableExtra","broom","pROC","car","rstatix","ggpubr",
          "viridis","patchwork","effectsize")
cat("**R package versions used in this document:**\n\n")

R package versions used in this document:

Code

for (p in pkgs) {
  v <- tryCatch(as.character(packageVersion(p)), error = function(e) "not installed")
  cat(sprintf("- %s (v%s)\n", p, v))
}

readxl (v1.5.0)
janitor (v2.2.1)
skimr (v2.2.2)
corrplot (v0.95)
ggcorrplot (v0.1.4.1)
scales (v1.4.0)
kableExtra (v1.4.0)
broom (v1.0.13)
pROC (v1.19.0.1)
car (v3.1.5)
rstatix (v0.7.3)
ggpubr (v0.6.3)
viridis (v0.6.5)
patchwork (v1.3.2)
effectsize (v1.0.2)

Appendix: AI Usage Statement

Claude (Anthropic, model claude-sonnet-4-6) assisted with the scaffolding of R code in this document, including the column-position renaming strategy, the iconv-based Likert encoder, ggplot2 theming, and pROC ROC plotting syntax. All analytical decisions — selection of techniques, formulation of hypotheses, business interpretation of results, and the integrated strategic recommendation — were made independently by the author. The executive summary, professional disclosure, and all interpretation sections were written by the author without AI assistance. The author has verified every statistical output against the underlying dataset and is fully prepared to explain each line of code and each result during the viva voce defence.