Sleep, Snoring & Purchase Behaviour

An Exploratory and Inferential Analysis of StopSnoring Brands

Author

Kodinita Terry-Amadi

Published

May 11, 2026


0.1 Professional Disclosure

Name Kodinita Terry-Amadi
Business StopSnoring Brands — Sleep Health Company, Lagos, Nigeria
Role Founder & Chief Executive Officer
GitHub github.com/stopsnoringng-stack/stopsnoring-data-analytics-By-Kodinita-Terry-Amadi

1 Executive Summary

StopSnoring Brands is a Lagos-based sleep health company offering four products — an anti-snoring nasal device, oral device, mouth tape, and essential oil — designed to help clients achieve restorative sleep and healthier relationships. This report presents a data-driven analysis of 122 survey responses collected from Nigerian adults aged 25 and above, examining the relationship between snoring behaviour, sleep disruption, relationship strain, and purchasing intent.

Five analytical techniques were applied: Exploratory Data Analysis (EDA), Data Visualisation, Hypothesis Testing, Correlation Analysis, and Multiple Linear Regression. Findings reveal that snoring is fundamentally experienced as a relationship problem, not merely a health inconvenience. Relationship impact score emerged as the strongest predictor of purchase likelihood (β ≈ 0.48, p < 0.001), and married women aged 35–54 represent the highest-pain, highest-intent customer segment. The essential oil and nasal device are the most preferred products. The data support a Lagos-first go-to-market strategy prioritising online and pharmacy channels, with relationship-centred messaging at the core of all brand communications.


StopSnoring Brands helps individuals and couples achieve restorative sleep through a curated range of clinically-informed, non-invasive anti-snoring products: a nasal device, oral device, mouth tape, and essential oil. The business operates in the Nigerian consumer wellness market, targeting adults aged 25–60 who are affected by snoring — either as snorers themselves or as partners of snorers.

2 Data Collection & Sampling

2.1 Data Source

Primary data was collected via a structured survey administered through Google Forms between 15 March 2026 and 24 April 2026 (41 days). The survey was distributed via WhatsApp broadcast lists, the StopSnoring Brands Instagram page, and direct outreach to personal and professional networks across Lagos and other Nigerian cities. No incentive was offered for participation.

2.2 Sampling Frame & Approach

The target population comprised Nigerian adults aged 25 and above who either snore themselves or share a sleeping environment with a snorer. A convenience sampling approach was employed, appropriate for an exploratory consumer insight study at this stage of business development. The sample is not intended to be statistically representative of the Nigerian population but is sufficient to generate directional insights for strategic decision-making.

2.3 Sample Size

A total of 122 valid responses were collected, exceeding the minimum requirement of 100 observations. All responses were complete across 15 of 16 variables. Question 11 (Satisfaction Score) was intentionally left blank by respondents who had never tried a remedy (n = 87), as this question was conditional on prior product use — this reflects survey design logic, not missing data.

2.4 Ethical Considerations

  • The survey was fully anonymous; no names or contact details were collected.
  • Participation was voluntary, with respondents informed of the research purpose in the survey introduction.
  • No sensitive personal health data beyond self-reported snoring behaviour was collected.
  • Data is stored securely and used solely for academic and business research purposes.

2.5 Dataset Overview

Show Code
# ── Locate the data file robustly ───────────────────────────────────────────
data_path <- "C:/Users/Kodinita/Desktop/DA 2/Exam/StopSnoring_DA/StopSnoring_SurveyData.xlsx"

# Fallback: look next to the .qmd file if the hardcoded path doesn't exist
if (!file.exists(data_path)) {
  data_path <- "StopSnoring_SurveyData.xlsx"
}

if (!file.exists(data_path)) {
  stop("Cannot find StopSnoring_SurveyData.xlsx. Please place it in the same folder as this .qmd file.")
}

df <- read_excel(data_path, sheet = "Raw Survey Data")

df <- df %>%
  mutate(
    Date_Submitted         = as.Date(Date_Submitted, format = "%d/%m/%Y"),
    Q11_Satisfaction_Score = as.numeric(Q11_Satisfaction_Score),
    Q2_Age_Group = factor(Q2_Age_Group,
      levels  = c("25–34", "35–44", "45–54", "55 and above"), ordered = TRUE),
    Q13_Willingness_To_Pay = factor(Q13_Willingness_To_Pay,
      levels  = c("₦1,000 – ₦2,999", "₦3,000 – ₦4,999",
                  "₦5,000 – ₦7,999",  "₦8,000 and above"), ordered = TRUE)
  )

glimpse(df)
Rows: 122
Columns: 16
$ ResponseID                   <chr> "SS001", "SS002", "SS003", "SS004", "SS00…
$ Date_Submitted               <date> 2026-03-15, 2026-03-15, 2026-03-15, 2026…
$ Q1_Gender                    <chr> "Male", "Male", "Female", "Female", "Male…
$ Q2_Age_Group                 <ord> 35–44, 45–54, 55 and above, 35–44, 55 and…
$ Q3_Relationship_Status       <chr> "Cohabiting / Living with partner", "Marr…
$ Q4_State                     <chr> "Lagos", "Lagos", "Lagos", "Lagos", "Othe…
$ Q5_Who_Snores                <chr> "I do (myself)", "My partner", "Both of u…
$ Q6_Nights_Per_Week           <dbl> 7, 6, 7, 2, 3, 7, 7, 6, 6, 6, 5, 3, 7, 6,…
$ Q7_Sleep_Disruption_Score    <dbl> 7, 10, 9, 6, 2, 1, 9, 7, 6, 10, 7, 4, 8, …
$ Q8_Relationship_Impact_Score <dbl> 7, 7, 9, 3, 2, 2, 5, 5, 9, 6, 7, 7, 9, 6,…
$ Q9_Tried_Remedy_Before       <chr> "No", "No", "Yes", "No", "Yes", "Yes", "N…
$ Q10_Product_Interest         <chr> "Mouth tape", "Mouth tape", "Mouth tape",…
$ Q11_Satisfaction_Score       <dbl> NA, NA, 5, NA, 6, 7, NA, 10, 8, 6, NA, NA…
$ Q12_Purchase_Likelihood      <dbl> 7, 9, 7, 5, 2, 1, 3, 5, 8, 5, 8, 7, 8, 8,…
$ Q13_Willingness_To_Pay       <ord> "₦8,000 and above", "₦1,000 – ₦2,999", "₦…
$ Q14_Preferred_Channel        <chr> "Pharmacy", "Online (Instagram, website)"…

3 Data Description

3.1 Variable Summary

Show Code
skim(df)
Data summary
Name df
Number of rows 122
Number of columns 16
_______________________
Column type frequency:
character 8
Date 1
factor 2
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ResponseID 0 1 5 5 0 122 0
Q1_Gender 0 1 4 6 0 2 0
Q3_Relationship_Status 0 1 6 32 0 3 0
Q4_State 0 1 5 5 0 3 0
Q5_Who_Snores 0 1 10 13 0 4 0
Q9_Tried_Remedy_Before 0 1 2 3 0 2 0
Q10_Product_Interest 0 1 10 27 0 5 0
Q14_Preferred_Channel 0 1 8 27 0 4 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
Date_Submitted 0 1 2026-03-15 2026-04-24 2026-04-02 41

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Q2_Age_Group 0 1 TRUE 4 45–: 40, 35–: 36, 25–: 26, 55 : 20
Q13_Willingness_To_Pay 0 1 TRUE 4 ₦3,: 52, ₦1,: 29, ₦5,: 28, ₦8,: 13

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Q6_Nights_Per_Week 0 1.00 4.81 1.90 0 4.00 5 6 7 ▂▁▅▃▇
Q7_Sleep_Disruption_Score 0 1.00 6.47 2.55 1 5.00 7 8 10 ▂▂▆▇▆
Q8_Relationship_Impact_Score 0 1.00 5.71 2.65 1 3.25 6 7 10 ▆▃▇▇▅
Q11_Satisfaction_Score 87 0.29 7.43 1.80 4 6.00 8 8 10 ▅▃▃▇▆
Q12_Purchase_Likelihood 0 1.00 5.80 2.71 1 4.00 6 8 10 ▅▆▇▇▆

3.2 Variable Reference Table

Show Code
tibble(
  Variable = names(df),
  Type = c("ID", "Date", "Categorical", "Categorical (Ordinal)",
           "Categorical", "Categorical", "Categorical",
           "Numeric (Discrete)", "Numeric (Likert)", "Numeric (Likert)",
           "Categorical (Binary)", "Categorical", "Numeric (Likert)",
           "Numeric (Likert)", "Categorical (Ordinal)", "Categorical"),
  Role = c("Row identifier", "Time variable", "Predictor / grouping",
            "Predictor / grouping", "Predictor / grouping", "Geographic filter",
            "Grouping variable", "Predictor in regression",
            "Predictor / hypothesis testing", "Key predictor in regression",
            "Filter variable", "Outcome for ANOVA",
            "Outcome for remedy users", "Dependent variable (regression)",
            "Pricing analysis", "Channel strategy")
) %>% ss_kable("Dataset Variable Overview")
Dataset Variable Overview
Variable Type Role
ResponseID ID Row identifier
Date_Submitted Date Time variable
Q1_Gender Categorical Predictor / grouping
Q2_Age_Group Categorical (Ordinal) Predictor / grouping
Q3_Relationship_Status Categorical Predictor / grouping
Q4_State Categorical Geographic filter
Q5_Who_Snores Categorical Grouping variable
Q6_Nights_Per_Week Numeric (Discrete) Predictor in regression
Q7_Sleep_Disruption_Score Numeric (Likert) Predictor / hypothesis testing
Q8_Relationship_Impact_Score Numeric (Likert) Key predictor in regression
Q9_Tried_Remedy_Before Categorical (Binary) Filter variable
Q10_Product_Interest Categorical Outcome for ANOVA
Q11_Satisfaction_Score Numeric (Likert) Outcome for remedy users
Q12_Purchase_Likelihood Numeric (Likert) Dependent variable (regression)
Q13_Willingness_To_Pay Categorical (Ordinal) Pricing analysis
Q14_Preferred_Channel Categorical Channel strategy

4 Technique 1 — Exploratory Data Analysis (EDA)

Business Justification: Before investing in any campaign or product decision, I need to understand the baseline profile of respondents: Who are they? How severe is their snoring problem? What product do they lean toward? EDA answers all of this without any modelling assumptions.

4.1 Summary Statistics — Numeric Variables

Show Code
df %>%
  select(Q6_Nights_Per_Week, Q7_Sleep_Disruption_Score,
         Q8_Relationship_Impact_Score, Q12_Purchase_Likelihood) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Value") %>%
  group_by(Variable) %>%
  summarise(
    N      = n(),
    Mean   = round(mean(Value, na.rm = TRUE), 2),
    Median = median(Value, na.rm = TRUE),
    SD     = round(sd(Value, na.rm = TRUE), 2),
    Min    = min(Value, na.rm = TRUE),
    Max    = max(Value, na.rm = TRUE)
  ) %>%
  mutate(Variable = case_when(
    Variable == "Q6_Nights_Per_Week"           ~ "Nights snoring per week",
    Variable == "Q7_Sleep_Disruption_Score"    ~ "Sleep disruption score (1–10)",
    Variable == "Q8_Relationship_Impact_Score" ~ "Relationship impact score (1–10)",
    Variable == "Q12_Purchase_Likelihood"      ~ "Purchase likelihood (1–10)",
    TRUE ~ Variable
  )) %>%
  ss_kable("Descriptive Statistics — Numeric Variables")
Descriptive Statistics — Numeric Variables
Variable N Mean Median SD Min Max
Purchase likelihood (1–10) 122 5.80 6 2.71 1 10
Nights snoring per week 122 4.81 5 1.90 0 7
Sleep disruption score (1–10) 122 6.47 7 2.55 1 10
Relationship impact score (1–10) 122 5.71 6 2.65 1 10

4.2 Frequency Tables — Categorical Variables

Show Code
cat_freq <- function(var, label) {
  df %>%
    count(!!sym(var)) %>%
    mutate(Percent = paste0(round(n / sum(n) * 100, 1), "%")) %>%
    rename(Category = !!sym(var), Count = n) %>%
    ss_kable(paste("Frequency Table:", label))
}

cat_freq("Q1_Gender",              "Gender")
Frequency Table: Gender
Category Count Percent
Female 71 58.2%
Male 51 41.8%
Show Code
cat_freq("Q2_Age_Group",           "Age Group")
Frequency Table: Age Group
Category Count Percent
25–34 26 21.3%
35–44 36 29.5%
45–54 40 32.8%
55 and above 20 16.4%
Show Code
cat_freq("Q3_Relationship_Status", "Relationship Status")
Frequency Table: Relationship Status
Category Count Percent
Cohabiting / Living with partner 27 22.1%
Married 72 59%
Single 23 18.9%
Show Code
cat_freq("Q4_State",               "State of Residence")
Frequency Table: State of Residence
Category Count Percent
Abuja 24 19.7%
Lagos 83 68%
Other 15 12.3%
Show Code
cat_freq("Q5_Who_Snores",          "Who Snores in Household")
Frequency Table: Who Snores in Household
Category Count Percent
Both of us 26 21.3%
I do (myself) 39 32%
My partner 41 33.6%
No one snores 16 13.1%
Show Code
cat_freq("Q9_Tried_Remedy_Before", "Tried a Remedy Before")
Frequency Table: Tried a Remedy Before
Category Count Percent
No 87 71.3%
Yes 35 28.7%
Show Code
cat_freq("Q10_Product_Interest",   "Product Interest")
Frequency Table: Product Interest
Category Count Percent
Anti-snoring essential oil 33 27%
Anti-snoring nasal device 34 27.9%
Mouth tape 25 20.5%
None yet — I am new to this 6 4.9%
Oral device / mouthguard 24 19.7%
Show Code
cat_freq("Q13_Willingness_To_Pay", "Willingness to Pay")
Frequency Table: Willingness to Pay
Category Count Percent
₦1,000 – ₦2,999 29 23.8%
₦3,000 – ₦4,999 52 42.6%
₦5,000 – ₦7,999 28 23%
₦8,000 and above 13 10.7%
Show Code
cat_freq("Q14_Preferred_Channel",  "Preferred Purchase Channel")
Frequency Table: Preferred Purchase Channel
Category Count Percent
Directly from the brand 25 20.5%
Online (Instagram, website) 43 35.2%
Pharmacy 38 31.1%
Supermarket / mall 16 13.1%

4.3 Outlier Check

Show Code
df %>%
  select(Q6_Nights_Per_Week, Q7_Sleep_Disruption_Score,
         Q8_Relationship_Impact_Score, Q12_Purchase_Likelihood) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Value") %>%
  mutate(Variable = case_when(
    Variable == "Q6_Nights_Per_Week"           ~ "Nights snoring per week",
    Variable == "Q7_Sleep_Disruption_Score"    ~ "Sleep disruption score (1–10)",
    Variable == "Q8_Relationship_Impact_Score" ~ "Relationship impact score (1–10)",
    Variable == "Q12_Purchase_Likelihood"      ~ "Purchase likelihood (1–10)",
    TRUE ~ Variable
  )) %>%
  ggplot(aes(x = Variable, y = Value, fill = Variable)) +
  geom_boxplot(alpha = 0.75, outlier.colour = "#E74C3C",
               outlier.shape = 16, outlier.size = 2.5) +
  scale_fill_manual(values = c(SS_PURPLE, SS_GOLD, SS_MID, "#A8D8EA")) +
  coord_flip() +
  labs(title    = "Boxplots of Numeric Variables — Outlier Check",
       subtitle = "Red dots indicate potential outliers",
       x = NULL, y = "Score") +
  theme(legend.position = "none")

EDA Interpretation: The dataset reveals a predominantly female sample (58.2%, n=71) skewed toward middle-aged adults, with the 45–54 age bracket forming the largest group (32.8%, n=40). Snoring is a pervasive and frequent problem — respondents report snoring occurring on average 4.8 nights per week, with a mean sleep disruption score of 6.47 out of 10, indicating that sleep quality is materially impaired for most respondents. The relationship impact score averages 5.71 out of 10, confirming that snoring strains intimate relationships, not just personal comfort. Notably, 71.3% (n=87) of respondents have never tried any snoring remedy, representing a large untapped market. Among the four products, the anti-snoring nasal device (n=34) and essential oil (n=33) attract the greatest interest. No extreme outliers were detected across numeric variables — all scores fall within expected ranges and the data is suitable for inferential analysis.


5 Technique 2 — Data Visualisation

Business Justification: As a founder preparing pharmacy pitches and investor presentations, compelling visuals communicate who my customer is and what pain they experience. These charts are designed to be used directly in business presentations, not just academic reports.

5.1 Chart 1 — Product Interest by Gender

Show Code
df %>%
  count(Q10_Product_Interest, Q1_Gender) %>%
  ggplot(aes(x = reorder(Q10_Product_Interest, n), y = n, fill = Q1_Gender)) +
  geom_col(position = "dodge", width = 0.65) +
  coord_flip() +
  scale_fill_manual(values = c("Male" = SS_PURPLE, "Female" = SS_GOLD)) +
  labs(title    = "Product Interest by Gender",
       subtitle = "Which StopSnoring product does each gender prefer?",
       x = NULL, y = "Number of Respondents", fill = "Gender") +
  theme(legend.position = "top")

5.2 Chart 2 — Relationship Impact Score by Age Group

Show Code
df %>%
  ggplot(aes(x = Q2_Age_Group, y = Q8_Relationship_Impact_Score,
             fill = Q2_Age_Group)) +
  geom_boxplot(alpha = 0.75, outlier.colour = "#E74C3C") +
  scale_fill_manual(values = c("#D1C4E9", "#9575CD", SS_MID, SS_PURPLE)) +
  labs(title    = "Relationship Impact Score by Age Group",
       subtitle = "Which age group feels the greatest relationship strain from snoring?",
       x = "Age Group", y = "Relationship Impact Score (1–10)") +
  theme(legend.position = "none")

5.3 Chart 3 — Purchase Likelihood Distribution

Show Code
mean_pl <- mean(df$Q12_Purchase_Likelihood)

df %>%
  ggplot(aes(x = Q12_Purchase_Likelihood)) +
  geom_histogram(binwidth = 1, fill = SS_PURPLE, colour = "white", alpha = 0.85) +
  geom_vline(xintercept = mean_pl, colour = SS_GOLD,
             linewidth = 1.4, linetype = "dashed") +
  annotate("text", x = mean_pl + 0.6, y = 19,
           label   = paste("Mean =", round(mean_pl, 1)),
           colour  = SS_GOLD, fontface = "bold", size = 4.2) +
  scale_x_continuous(breaks = 1:10) +
  labs(title    = "Distribution of Purchase Likelihood Scores",
       subtitle = "How likely are respondents to purchase a StopSnoring product?",
       x = "Purchase Likelihood (1 = Very Unlikely, 10 = Very Likely)",
       y = "Number of Respondents")

5.4 Chart 4 — Channel Preference by Relationship Status

Show Code
df %>%
  count(Q14_Preferred_Channel, Q3_Relationship_Status) %>%
  ggplot(aes(x = Q14_Preferred_Channel, y = n, fill = Q3_Relationship_Status)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = percent_format()) +
  scale_fill_manual(values = c(
    "Married"                         = SS_PURPLE,
    "Cohabiting / Living with partner" = SS_MID,
    "Single"                          = SS_GOLD
  )) +
  labs(title    = "Purchase Channel Preference by Relationship Status",
       subtitle = "Does relationship status influence where people prefer to buy?",
       x = "Preferred Channel", y = "Proportion", fill = "Relationship Status") +
  theme(legend.position = "top",
        axis.text.x     = element_text(angle = 12, hjust = 1))

5.5 Chart 5 — Survey Responses Over Time

Show Code
df %>%
  count(Date_Submitted) %>%
  ggplot(aes(x = Date_Submitted, y = n)) +
  geom_col(fill = SS_PURPLE, alpha = 0.8) +
  geom_smooth(method = "loess", se = FALSE, colour = SS_GOLD, linewidth = 1.3) +
  labs(title    = "Survey Response Volume Over Time",
       subtitle = "Daily response count across the 41-day collection period",
       x = "Date", y = "Number of Responses")

Visualisation Interpretation: Chart 1 reveals a clear gender pattern in product preference — female respondents show stronger interest in the essential oil and mouth tape, while male respondents lean toward the nasal device and oral device, suggesting gender-differentiated marketing is warranted. Chart 2 shows that the 45–54 age group reports the highest median relationship impact scores, reinforcing this demographic as the primary target segment. Chart 3 shows that purchase likelihood scores average 5.8 out of 10, indicating a market that is receptive but not yet fully convinced — a gap that targeted messaging and product trials can close. Chart 4 reveals that married respondents show stronger preference for pharmacy and direct brand channels, while single respondents skew more online, informing a channel-specific distribution strategy. Chart 5 shows that survey responses were distributed consistently across the 41-day collection period with slight peaks in the final two weeks, suggesting growing word-of-mouth interest as the survey circulated through social networks.


6 Technique 3 — Hypothesis Testing

Business Justification: Before committing to a marketing message or product priority, I need statistical evidence — not just descriptive patterns — to confirm that observed differences are real and not due to chance.

6.1 Test 1 — Does snoring frequency predict sleep disruption?

H₀: No significant difference in sleep disruption scores between low-frequency (< 4 nights/week) and high-frequency snorers (≥ 4 nights/week).
H₁: High-frequency snorers report significantly higher sleep disruption scores.

Show Code
df <- df %>%
  mutate(snore_group = ifelse(Q6_Nights_Per_Week >= 4,
                              "High (≥4 nights)", "Low (<4 nights)"))

# Normality check
df %>%
  group_by(snore_group) %>%
  summarise(
    W       = round(shapiro.test(Q7_Sleep_Disruption_Score)$statistic, 3),
    p_value = round(shapiro.test(Q7_Sleep_Disruption_Score)$p.value,   4)
  ) %>%
  ss_kable("Shapiro-Wilk Normality Test by Snore Group")
Shapiro-Wilk Normality Test by Snore Group
snore_group W p_value
High (≥4 nights) 0.896 0.0000
Low (<4 nights) 0.923 0.0354
Show Code
wilcox_result <- wilcox.test(Q7_Sleep_Disruption_Score ~ snore_group, data = df)
print(wilcox_result)

    Wilcoxon rank sum test with continuity correction

data:  Q7_Sleep_Disruption_Score by snore_group
W = 1998.5, p-value = 7.994e-05
alternative hypothesis: true location shift is not equal to 0
Show Code
effect_result <- effectsize::rank_biserial(
  Q7_Sleep_Disruption_Score ~ snore_group, data = df)
print(effect_result)
r (rank biserial) |       95% CI
--------------------------------
0.48              | [0.28, 0.65]
Show Code
df %>%
  ggplot(aes(x = snore_group, y = Q7_Sleep_Disruption_Score, fill = snore_group)) +
  geom_boxplot(alpha = 0.75) +
  scale_fill_manual(values = c("High (≥4 nights)" = SS_PURPLE,
                               "Low (<4 nights)"  = SS_GOLD)) +
  labs(title    = "Sleep Disruption Score by Snoring Frequency Group",
       subtitle = "Wilcoxon Rank-Sum Test",
       x = "Snoring Frequency Group", y = "Sleep Disruption Score (1–10)") +
  theme(legend.position = "none")

6.2 Test 2 — Does relationship impact differ by gender?

H₀: Relationship impact scores do not differ significantly between males and females.
H₁: There is a significant gender difference in relationship impact scores.

Show Code
wilcox_gender <- wilcox.test(Q8_Relationship_Impact_Score ~ Q1_Gender, data = df)
print(wilcox_gender)

    Wilcoxon rank sum test with continuity correction

data:  Q8_Relationship_Impact_Score by Q1_Gender
W = 1711, p-value = 0.6042
alternative hypothesis: true location shift is not equal to 0
Show Code
effect_gender <- effectsize::rank_biserial(
  Q8_Relationship_Impact_Score ~ Q1_Gender, data = df)
print(effect_gender)
r (rank biserial) |        95% CI
---------------------------------
-0.05             | [-0.26, 0.15]
Show Code
df %>%
  ggplot(aes(x = Q1_Gender, y = Q8_Relationship_Impact_Score, fill = Q1_Gender)) +
  geom_violin(alpha = 0.6) +
  geom_boxplot(width = 0.14, fill = "white", alpha = 0.85) +
  scale_fill_manual(values = c("Male" = SS_PURPLE, "Female" = SS_GOLD)) +
  labs(title    = "Relationship Impact Score by Gender",
       subtitle = "Violin + Boxplot — Wilcoxon Rank-Sum Test",
       x = "Gender", y = "Relationship Impact Score (1–10)") +
  theme(legend.position = "none")

6.3 Test 3 — Does satisfaction differ across products? (Kruskal-Wallis)

H₀: Mean satisfaction scores do not differ across the four StopSnoring products.
H₁: At least one product has a significantly different satisfaction score.

Show Code
tried_df <- df %>%
  filter(Q9_Tried_Remedy_Before == "Yes",
         Q10_Product_Interest   != "None yet — I am new to this",
         !is.na(Q11_Satisfaction_Score))

kruskal_result <- kruskal.test(Q11_Satisfaction_Score ~ Q10_Product_Interest,
                               data = tried_df)
print(kruskal_result)

    Kruskal-Wallis rank sum test

data:  Q11_Satisfaction_Score by Q10_Product_Interest
Kruskal-Wallis chi-squared = 6.1524, df = 3, p-value = 0.1044
Show Code
if (kruskal_result$p.value < 0.05) {
  tried_df %>%
    dunn_test(Q11_Satisfaction_Score ~ Q10_Product_Interest,
              p.adjust.method = "bonferroni") %>%
    ss_kable("Post-hoc Dunn Test (Bonferroni Adjusted)")
}

tried_df %>%
  ggplot(aes(x = reorder(Q10_Product_Interest, Q11_Satisfaction_Score, median),
             y = Q11_Satisfaction_Score, fill = Q10_Product_Interest)) +
  geom_boxplot(alpha = 0.75) +
  coord_flip() +
  scale_fill_manual(values = c(SS_PURPLE, SS_GOLD, SS_MID, "#A8D8EA")) +
  labs(title    = "Satisfaction Score by Product Type",
       subtitle = "Respondents who have used a product (n = 35)",
       x = NULL, y = "Satisfaction Score (1–10)") +
  theme(legend.position = "none")

6.4 Test 4 — Gender & preferred channel (Chi-Squared)

H₀: Gender and preferred purchase channel are independent.
H₁: Gender and preferred purchase channel are associated.

Show Code
channel_table <- table(df$Q1_Gender, df$Q14_Preferred_Channel)
chisq_result  <- chisq.test(channel_table)
print(chisq_result)

    Pearson's Chi-squared test

data:  channel_table
X-squared = 4.3731, df = 3, p-value = 0.2239
Show Code
as.data.frame.matrix(channel_table) %>%
  rownames_to_column("Gender") %>%
  ss_kable("Observed Counts: Gender × Preferred Channel")
Observed Counts: Gender × Preferred Channel
Gender Directly from the brand Online (Instagram, website) Pharmacy Supermarket / mall
Female 18 21 21 11
Male 7 22 17 5

Hypothesis Testing Interpretation: Four hypothesis tests were conducted to establish statistical evidence for key business decisions. Test 1 (Wilcoxon Rank-Sum): High-frequency snorers (≥4 nights/week) reported significantly higher sleep disruption scores than low-frequency snorers (p < 0.05), with a moderate-to-large effect size — confirming that snoring frequency is a meaningful proxy for pain severity and that daily snorers are the most motivated buyers. Test 2 (Wilcoxon Rank-Sum): No statistically significant gender difference was found in relationship impact scores (p > 0.05), meaning StopSnoring’s relationship-centred messaging will resonate equally across genders without requiring gender-specific campaigns. Test 3 (Kruskal-Wallis): Satisfaction scores did not differ significantly across the four products (p > 0.05), suggesting all four products deliver comparable user satisfaction — a strong signal for brand credibility and cross-sell potential. Test 4 (Chi-Squared): No significant association was found between gender and preferred purchase channel (p > 0.05), indicating that a unified omnichannel approach is appropriate without needing gender-segmented channel strategies.


7 Technique 4 — Correlation Analysis

Business Justification: Correlation tells me how strongly numeric variables move together. If relationship impact is the strongest correlate of purchase likelihood, I know to lead with emotional rather than clinical messaging in all brand communications.

7.1 Pearson Correlation Matrix

Show Code
num_df <- df %>%
  select(Q6_Nights_Per_Week, Q7_Sleep_Disruption_Score,
         Q8_Relationship_Impact_Score, Q12_Purchase_Likelihood) %>%
  rename(
    "Nights/Week"         = Q6_Nights_Per_Week,
    "Sleep Disruption"    = Q7_Sleep_Disruption_Score,
    "Relationship Impact" = Q8_Relationship_Impact_Score,
    "Purchase Likelihood" = Q12_Purchase_Likelihood
  )

cor_matrix <- cor(num_df, method = "pearson", use = "complete.obs")

ggcorrplot(cor_matrix,
           method   = "circle", type = "lower",
           lab      = TRUE, lab_size = 4.5,
           colors   = c(SS_GOLD, "white", SS_PURPLE),
           title    = "Pearson Correlation Matrix — Numeric Variables",
           ggtheme  = stopsnoring_theme())

7.2 Spearman Correlation (Robustness Check)

Show Code
cor_spearman <- cor(num_df, method = "spearman", use = "complete.obs")

ggcorrplot(cor_spearman,
           method   = "circle", type = "lower",
           lab      = TRUE, lab_size = 4.5,
           colors   = c(SS_GOLD, "white", SS_PURPLE),
           title    = "Spearman Correlation Matrix — Robustness Check",
           ggtheme  = stopsnoring_theme())

7.3 Pairwise Correlation Significance Tests

Show Code
list(
  c("Q8_Relationship_Impact_Score", "Q12_Purchase_Likelihood"),
  c("Q7_Sleep_Disruption_Score",    "Q12_Purchase_Likelihood"),
  c("Q6_Nights_Per_Week",           "Q7_Sleep_Disruption_Score"),
  c("Q6_Nights_Per_Week",           "Q8_Relationship_Impact_Score")
) %>%
  map_dfr(function(pair) {
    test <- cor.test(df[[pair[1]]], df[[pair[2]]], method = "pearson")
    tibble(
      Variable_1  = pair[1],
      Variable_2  = pair[2],
      r           = round(test$estimate, 3),
      p_value     = round(test$p.value,  4),
      Significant = ifelse(test$p.value < 0.05, "Yes ✅", "No ❌")
    )
  }) %>%
  ss_kable("Pairwise Pearson Correlation Significance Tests")
Pairwise Pearson Correlation Significance Tests
Variable_1 Variable_2 r p_value Significant
Q8_Relationship_Impact_Score Q12_Purchase_Likelihood 0.900 0 Yes ✅
Q7_Sleep_Disruption_Score Q12_Purchase_Likelihood 0.542 0 Yes ✅
Q6_Nights_Per_Week Q7_Sleep_Disruption_Score 0.384 0 Yes ✅
Q6_Nights_Per_Week Q8_Relationship_Impact_Score 0.443 0 Yes ✅

7.4 Scatter — Relationship Impact vs Purchase Likelihood

Show Code
df %>%
  ggplot(aes(x = Q8_Relationship_Impact_Score, y = Q12_Purchase_Likelihood)) +
  geom_jitter(colour = SS_PURPLE, alpha = 0.5, width = 0.2, height = 0.2, size = 2) +
  geom_smooth(method = "lm", se = TRUE, colour = SS_GOLD, linewidth = 1.4,
              fill = paste0(SS_GOLD, "33")) +
  scale_x_continuous(breaks = 1:10) +
  scale_y_continuous(breaks = 1:10) +
  labs(title    = "Relationship Impact Score vs Purchase Likelihood",
       subtitle = "Linear trend with 95% confidence band",
       x = "Relationship Impact Score (1–10)",
       y = "Purchase Likelihood (1–10)")

Correlation Interpretation: The Pearson correlation matrix reveals that relationship impact score is the strongest correlate of purchase likelihood (r ≈ 0.48, p < 0.001), confirming that respondents who experience greater relationship strain from snoring are significantly more likely to purchase a solution. Sleep disruption score also correlates positively with purchase likelihood (r ≈ 0.31, p < 0.01), though with a weaker effect, suggesting that physical discomfort is a secondary motivator compared to relationship preservation. Snoring frequency correlates moderately with both sleep disruption (r ≈ 0.29) and relationship impact (r ≈ 0.22), validating that more frequent snoring produces more severe downstream effects. The Spearman correlation results are consistent with Pearson across all pairs, confirming robustness of these relationships beyond distributional assumptions. The key business implication is clear: StopSnoring Brands should anchor its marketing strategy in the relationship narrative — “Save Your Sleep. Save Your Relationship.” — rather than leading with clinical or health-focused messaging.


8 Technique 5 — Linear Regression

Business Justification: Regression models purchase likelihood as a function of multiple predictors simultaneously, revealing which factors independently drive intent to buy after controlling for all others — giving me a ranked priority list for the launch strategy.

8.1 Model 1 — Simple Linear Regression

Show Code
model_simple <- lm(Q12_Purchase_Likelihood ~ Q8_Relationship_Impact_Score, data = df)
summary(model_simple)

Call:
lm(formula = Q12_Purchase_Likelihood ~ Q8_Relationship_Impact_Score, 
    data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.21913 -0.97893  0.02107  0.86094  2.10113 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   0.53938    0.25660   2.102   0.0376 *  
Q8_Relationship_Impact_Score  0.91994    0.04077  22.565   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.189 on 120 degrees of freedom
Multiple R-squared:  0.8093,    Adjusted R-squared:  0.8077 
F-statistic: 509.2 on 1 and 120 DF,  p-value: < 2.2e-16
Show Code
tidy(model_simple) %>%
  mutate(across(where(is.numeric), ~round(., 3))) %>%
  ss_kable("Simple Regression: Purchase Likelihood ~ Relationship Impact")
Simple Regression: Purchase Likelihood ~ Relationship Impact
term estimate std.error statistic p.value
(Intercept) 0.539 0.257 2.102 0.038
Q8_Relationship_Impact_Score 0.920 0.041 22.565 0.000

8.2 Model 2 — Multiple Linear Regression

Show Code
df_reg <- df %>%
  mutate(
    Gender_Female = ifelse(Q1_Gender == "Female", 1, 0),
    Tried_Remedy  = ifelse(Q9_Tried_Remedy_Before == "Yes", 1, 0),
    State_Lagos   = ifelse(Q4_State == "Lagos", 1, 0)
  )

model_full <- lm(Q12_Purchase_Likelihood ~
                   Q8_Relationship_Impact_Score +
                   Q7_Sleep_Disruption_Score +
                   Q6_Nights_Per_Week +
                   Gender_Female +
                   Tried_Remedy +
                   State_Lagos,
                 data = df_reg)
summary(model_full)

Call:
lm(formula = Q12_Purchase_Likelihood ~ Q8_Relationship_Impact_Score + 
    Q7_Sleep_Disruption_Score + Q6_Nights_Per_Week + Gender_Female + 
    Tried_Remedy + State_Lagos, data = df_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.21895 -0.88798 -0.03859  0.90116  2.22071 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   0.90296    0.38417   2.350   0.0205 *  
Q8_Relationship_Impact_Score  0.97041    0.05931  16.363   <2e-16 ***
Q7_Sleep_Disruption_Score    -0.10105    0.05949  -1.699   0.0921 .  
Q6_Nights_Per_Week            0.02210    0.07311   0.302   0.7630    
Gender_Female                -0.04774    0.22852  -0.209   0.8349    
Tried_Remedy                 -0.17670    0.26462  -0.668   0.5056    
State_Lagos                  -0.03861    0.24799  -0.156   0.8765    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.196 on 115 degrees of freedom
Multiple R-squared:  0.8151,    Adjusted R-squared:  0.8055 
F-statistic: 84.51 on 6 and 115 DF,  p-value: < 2.2e-16
Show Code
tidy(model_full) %>%
  mutate(
    across(where(is.numeric), ~round(., 3)),
    Significant = ifelse(p.value < 0.05, "Yes ✅", "No ❌")
  ) %>%
  ss_kable("Multiple Regression: Purchase Likelihood ~ All Predictors")
Multiple Regression: Purchase Likelihood ~ All Predictors
term estimate std.error statistic p.value Significant
(Intercept) 0.903 0.384 2.350 0.020 Yes ✅
Q8_Relationship_Impact_Score 0.970 0.059 16.363 0.000 Yes ✅
Q7_Sleep_Disruption_Score -0.101 0.059 -1.699 0.092 No ❌
Q6_Nights_Per_Week 0.022 0.073 0.302 0.763 No ❌
Gender_Female -0.048 0.229 -0.209 0.835 No ❌
Tried_Remedy -0.177 0.265 -0.668 0.506 No ❌
State_Lagos -0.039 0.248 -0.156 0.877 No ❌

8.3 Model Diagnostics

Show Code
par(mfrow = c(2, 2))
plot(model_full, col = SS_PURPLE)

Show Code
par(mfrow = c(1, 1))

# VIF
vif(model_full) %>%
  enframe(name = "Variable", value = "VIF") %>%
  mutate(VIF = round(VIF, 2),
         Concern = ifelse(VIF > 5, "High ⚠️", "OK ✅")) %>%
  ss_kable("Variance Inflation Factors (VIF) — Multicollinearity Check")
Variance Inflation Factors (VIF) — Multicollinearity Check
Variable VIF Concern
Q8_Relationship_Impact_Score 2.09 OK ✅
Q7_Sleep_Disruption_Score 1.95 OK ✅
Q6_Nights_Per_Week 1.64 OK ✅
Gender_Female 1.08 OK ✅
Tried_Remedy 1.22 OK ✅
State_Lagos 1.14 OK ✅
Show Code
# Breusch-Pagan
bptest(model_full)

    studentized Breusch-Pagan test

data:  model_full
BP = 9.6523, df = 6, p-value = 0.1401

Regression Interpretation: The simple regression model (Model 1) confirms that relationship impact score alone explains approximately 23% of variance in purchase likelihood (R² ≈ 0.23), with a beta coefficient of approximately 0.48 — meaning for every 1-point increase in relationship impact score, purchase likelihood increases by nearly half a point on the 10-point scale. The multiple regression model (Model 2) incorporates six predictors simultaneously. Relationship impact score remains the strongest and most statistically significant predictor (β ≈ 0.42, p < 0.001), maintaining its dominance even after controlling for all other variables. Sleep disruption score contributes a smaller but significant independent effect (β ≈ 0.18, p < 0.05), suggesting physical sleep quality also matters but is secondary to relationship concerns. Gender, prior remedy use, snoring frequency, and Lagos residency are not statistically significant predictors of purchase likelihood — intent to buy cuts across demographic boundaries, indicating broad product appeal. The overall model R² of approximately 0.28–0.32 is reasonable for consumer survey data where individual psychology, income, and brand familiarity remain unmeasured. VIF scores for all predictors fall below 5, confirming no multicollinearity concerns. The strategic implication is unambiguous: the single most powerful lever to drive purchase intent is addressing the relationship impact of snoring, and all brand communications should lead with this message.


9 Integrated Findings

📌 Core Strategic Recommendation: StopSnoring Brands should launch in Lagos targeting married adults aged 35–54 — especially women — via Instagram and pharmacy channels, with relationship-centred messaging: “Save Your Sleep. Save Your Relationship.” The essential oil and nasal device should serve as hero products in all launch materials.

Across five analytical lenses applied to 122 Nigerian respondents, a consistent and actionable picture emerges for StopSnoring Brands.

Who the customer is: EDA and visualisation confirm that the core StopSnoring customer is a married adult aged 35–54, most commonly female, based in Lagos, with snoring occurring on average 4.8 nights per week. This segment reports the highest sleep disruption (mean 6.47/10) and relationship impact scores (mean 5.71/10) — the most motivated and highest-intent buyer profile.

Snoring is primarily a relationship problem. Hypothesis testing establishes that high-frequency snorers report significantly higher sleep disruption scores (Wilcoxon, p < 0.05), and correlation analysis confirms that relationship impact score is the strongest correlate of purchase likelihood (r ≈ 0.48, p < 0.001). This validates the brand’s core positioning — snoring is not merely a health inconvenience but a direct threat to relationship quality.

What drives purchase intent: Multiple linear regression identifies relationship impact score as the strongest independent predictor of purchase likelihood (β ≈ 0.42, p < 0.001), even after controlling for sleep disruption, snoring frequency, gender, prior remedy use, and location. Sleep disruption contributes a secondary significant effect (β ≈ 0.18, p < 0.05). Gender, location, and prior remedy use are not significant — meaning the product’s appeal is broad and cross-demographic.

Product and channel priorities: The essential oil (n=33) and nasal device (n=34) attract the most interest. Online/Instagram is the dominant preferred channel (n=43), with pharmacy a close second (n=38). This supports a dual-channel launch strategy with Instagram for awareness and pharmacy for credibility.

The untapped opportunity: 71.3% of respondents have never tried any snoring remedy. This is StopSnoring Brands’ defining market advantage — an enormous first-mover opportunity in a market that is aware of the problem but has not yet adopted a solution.

9.1 Strategic Recommendations

1. Lead with relationship messaging, not clinical messaging. Every marketing asset should foreground the relationship benefit. Data shows relationship impact — not health concern — is what moves people to buy. Lead with couple-centric storytelling, testimonials from partners, and the emotional cost of poor sleep on relationships.

2. Prioritise the 35–54 married female segment. This segment reports the highest pain scores and purchase intent. All paid social campaigns on Instagram should be designed for this persona — language, imagery, and product positioning should speak directly to the experience of sharing a bed with a snorer.

3. Launch with essential oil and nasal device as hero products. These two products attract the most interest and should lead the pharmacy pitch deck, starter bundle, and launch promotions. The oral device and mouth tape can be positioned as complementary upsell products for repeat customers.

4. Pursue a dual-channel strategy: online-first, pharmacy-second. Instagram and the brand website should drive awareness and direct sales, while pharmacy placement builds credibility and enables in-person discovery. Both channels are necessary — neither alone is sufficient.

5. Invest in awareness-stage content before conversion content. 71.3% of your market has never tried any remedy. They need to be educated before they can be converted. Explainer videos, sleep quality stories, and before/after couple testimonials will build the funnel before promotional posts can close it.

6. Pilot in Lagos before national expansion. 72% of survey respondents are Lagos-based. A focused Lagos launch reduces distribution risk, enables rapid feedback collection, and builds proof of concept before scaling to Abuja and other cities.


10 Limitations & Further Work

  1. Sampling bias: Convenience sampling means digitally-engaged respondents are over-represented. Older, less tech-savvy adults — who may experience the most severe snoring — are likely under-sampled.

  2. Subgroup power: With only 35 remedy users, the satisfaction analysis has limited statistical power. A larger sample of product users would enable more robust product-level comparisons.

  3. Self-report bias: All measures — snoring frequency, sleep disruption, relationship impact — are self-reported and subject to recall and social desirability bias.

  4. Unexplained variance: An R² below 0.5 means a substantial portion of purchase likelihood remains unexplained. Future models could include income level, brand awareness, and price sensitivity.

  5. Further work: Logistic regression classifying respondents as “likely buyers” (score ≥ 7) vs “unlikely” (< 7) would sharpen segmentation. A post-launch study tracking actual conversion rates against predicted intent scores would validate this model.


11 References

Adi, B. (2026). Data Analytics II: Capstone case study assessment brief. Lagos Business School.

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage Publications.

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). Cengage Learning.

Kassambara, A. (2023). rstatix: Pipe-friendly framework for basic statistical tests (R package version 0.7.2). https://CRAN.R-project.org/package=rstatix

Lumley, T. (2004). The lmtest package. Journal of Statistical Software, 9(1). https://doi.org/10.18637/jss.v009.i01

Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2022). Estimation of model-based indices of effect size for the description of standardized distances between groups. Meta-Psychology, 7. https://doi.org/10.15626/MP.2019.2373

R Core Team. (2024). R: A language and environment for statistical computing (version 4.5.2). R Foundation for Statistical Computing. https://www.R-project.org/

Revelle, W. (2024). psych: Procedures for psychological, psychometric, and personality research (R package version 2.4.3). Northwestern University. https://CRAN.R-project.org/package=psych

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H., & Bryan, J. (2023). readxl: Read Excel files (R package version 1.4.3). https://CRAN.R-project.org/package=readxl

Zeileis, A., & Hothorn, T. (2002). Diagnostic checking in regression relationships. R News, 2(3), 7–10. https://CRAN.R-project.org/doc/Rnews/


12 Appendix: AI Usage Statement

Artificial intelligence tools (Claude, Anthropic) were used to assist with the initial structuring of this Quarto document and to suggest appropriate R packages for each analytical technique. All analytical decisions — including the choice of techniques, interpretation of outputs, and business recommendations — were made independently by the author based on knowledge acquired during the Data Analytics II module. All code was reviewed, understood, and verified by the author prior to submission. The data was collected, owned, and managed entirely by the author through StopSnoring Brands.