Sleep, Snoring & Purchase Behaviour

An Exploratory and Inferential Analysis of StopSnoring Brands

Author

Kodinita Terry-Amadi

Published

May 11, 2026

0.1 Professional Disclosure

Name	Kodinita Terry-Amadi
Business	StopSnoring Brands — Sleep Health Company, Lagos, Nigeria
Role	Founder & Chief Executive Officer
GitHub	github.com/stopsnoringng-stack/stopsnoring-data-analytics-By-Kodinita-Terry-Amadi

1 Executive Summary

StopSnoring Brands is a Lagos-based sleep health company offering four products — an anti-snoring nasal device, oral device, mouth tape, and essential oil — designed to help clients achieve restorative sleep and healthier relationships. This report presents a data-driven analysis of 122 survey responses collected from Nigerian adults aged 25 and above, examining the relationship between snoring behaviour, sleep disruption, relationship strain, and purchasing intent.

Five analytical techniques were applied: Exploratory Data Analysis (EDA), Data Visualisation, Hypothesis Testing, Correlation Analysis, and Multiple Linear Regression. Findings reveal that snoring is fundamentally experienced as a relationship problem, not merely a health inconvenience. Relationship impact score emerged as the strongest predictor of purchase likelihood (β ≈ 0.48, p < 0.001), and married women aged 35–54 represent the highest-pain, highest-intent customer segment. The essential oil and nasal device are the most preferred products. The data support a Lagos-first go-to-market strategy prioritising online and pharmacy channels, with relationship-centred messaging at the core of all brand communications.

StopSnoring Brands helps individuals and couples achieve restorative sleep through a curated range of clinically-informed, non-invasive anti-snoring products: a nasal device, oral device, mouth tape, and essential oil. The business operates in the Nigerian consumer wellness market, targeting adults aged 25–60 who are affected by snoring — either as snorers themselves or as partners of snorers.

2 Data Collection & Sampling

2.1 Data Source

Primary data was collected via a structured survey administered through Google Forms between 15 March 2026 and 24 April 2026 (41 days). The survey was distributed via WhatsApp broadcast lists, the StopSnoring Brands Instagram page, and direct outreach to personal and professional networks across Lagos and other Nigerian cities. No incentive was offered for participation.

2.2 Sampling Frame & Approach

The target population comprised Nigerian adults aged 25 and above who either snore themselves or share a sleeping environment with a snorer. A convenience sampling approach was employed, appropriate for an exploratory consumer insight study at this stage of business development. The sample is not intended to be statistically representative of the Nigerian population but is sufficient to generate directional insights for strategic decision-making.

2.3 Sample Size

A total of 122 valid responses were collected, exceeding the minimum requirement of 100 observations. All responses were complete across 15 of 16 variables. Question 11 (Satisfaction Score) was intentionally left blank by respondents who had never tried a remedy (n = 87), as this question was conditional on prior product use — this reflects survey design logic, not missing data.

2.4 Ethical Considerations

The survey was fully anonymous; no names or contact details were collected.
Participation was voluntary, with respondents informed of the research purpose in the survey introduction.
No sensitive personal health data beyond self-reported snoring behaviour was collected.
Data is stored securely and used solely for academic and business research purposes.

2.5 Dataset Overview

Show Code

# ── Locate the data file robustly ───────────────────────────────────────────
data_path <- "C:/Users/Kodinita/Desktop/DA 2/Exam/StopSnoring_DA/StopSnoring_SurveyData.xlsx"

# Fallback: look next to the .qmd file if the hardcoded path doesn't exist
if (!file.exists(data_path)) {
  data_path <- "StopSnoring_SurveyData.xlsx"
}

if (!file.exists(data_path)) {
  stop("Cannot find StopSnoring_SurveyData.xlsx. Please place it in the same folder as this .qmd file.")
}

df <- read_excel(data_path, sheet = "Raw Survey Data")

df <- df %>%
  mutate(
    Date_Submitted         = as.Date(Date_Submitted, format = "%d/%m/%Y"),
    Q11_Satisfaction_Score = as.numeric(Q11_Satisfaction_Score),
    Q2_Age_Group = factor(Q2_Age_Group,
      levels  = c("25–34", "35–44", "45–54", "55 and above"), ordered = TRUE),
    Q13_Willingness_To_Pay = factor(Q13_Willingness_To_Pay,
      levels  = c("₦1,000 – ₦2,999", "₦3,000 – ₦4,999",
                  "₦5,000 – ₦7,999",  "₦8,000 and above"), ordered = TRUE)
  )

glimpse(df)

Rows: 122
Columns: 16
$ ResponseID                   <chr> "SS001", "SS002", "SS003", "SS004", "SS00…
$ Date_Submitted               <date> 2026-03-15, 2026-03-15, 2026-03-15, 2026…
$ Q1_Gender                    <chr> "Male", "Male", "Female", "Female", "Male…
$ Q2_Age_Group                 <ord> 35–44, 45–54, 55 and above, 35–44, 55 and…
$ Q3_Relationship_Status       <chr> "Cohabiting / Living with partner", "Marr…
$ Q4_State                     <chr> "Lagos", "Lagos", "Lagos", "Lagos", "Othe…
$ Q5_Who_Snores                <chr> "I do (myself)", "My partner", "Both of u…
$ Q6_Nights_Per_Week           <dbl> 7, 6, 7, 2, 3, 7, 7, 6, 6, 6, 5, 3, 7, 6,…
$ Q7_Sleep_Disruption_Score    <dbl> 7, 10, 9, 6, 2, 1, 9, 7, 6, 10, 7, 4, 8, …
$ Q8_Relationship_Impact_Score <dbl> 7, 7, 9, 3, 2, 2, 5, 5, 9, 6, 7, 7, 9, 6,…
$ Q9_Tried_Remedy_Before       <chr> "No", "No", "Yes", "No", "Yes", "Yes", "N…
$ Q10_Product_Interest         <chr> "Mouth tape", "Mouth tape", "Mouth tape",…
$ Q11_Satisfaction_Score       <dbl> NA, NA, 5, NA, 6, 7, NA, 10, 8, 6, NA, NA…
$ Q12_Purchase_Likelihood      <dbl> 7, 9, 7, 5, 2, 1, 3, 5, 8, 5, 8, 7, 8, 8,…
$ Q13_Willingness_To_Pay       <ord> "₦8,000 and above", "₦1,000 – ₦2,999", "₦…
$ Q14_Preferred_Channel        <chr> "Pharmacy", "Online (Instagram, website)"…

3 Data Description

3.1 Variable Summary

Show Code

skim(df)

Data summary
Name	df
Number of rows	122
Number of columns	16
_______________________
Column type frequency:
character	8
Date	1
factor	2
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
ResponseID	1	5	5	122
Q1_Gender	1	4	6	2
Q3_Relationship_Status	1	6	32	3
Q4_State	1	5	5	3
Q5_Who_Snores	1	10	13	4
Q9_Tried_Remedy_Before	1	2	3	2
Q10_Product_Interest	1	10	27	5
Q14_Preferred_Channel	1	8	27	4

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
Date_Submitted	0	1	2026-03-15	2026-04-24	2026-04-02	41

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
Q2_Age_Group	0	1	TRUE	4	45–: 40, 35–: 36, 25–: 26, 55 : 20
Q13_Willingness_To_Pay	0	1	TRUE	4	₦3,: 52, ₦1,: 29, ₦5,: 28, ₦8,: 13

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Q6_Nights_Per_Week	0	1.00	4.81	1.90	0	4.00	5	6	7	▂▁▅▃▇
Q7_Sleep_Disruption_Score	0	1.00	6.47	2.55	1	5.00	7	8	10	▂▂▆▇▆
Q8_Relationship_Impact_Score	0	1.00	5.71	2.65	1	3.25	6	7	10	▆▃▇▇▅
Q11_Satisfaction_Score	87	0.29	7.43	1.80	4	6.00	8	8	10	▅▃▃▇▆
Q12_Purchase_Likelihood	0	1.00	5.80	2.71	1	4.00	6	8	10	▅▆▇▇▆

3.2 Variable Reference Table

Show Code

tibble(
  Variable = names(df),
  Type = c("ID", "Date", "Categorical", "Categorical (Ordinal)",
           "Categorical", "Categorical", "Categorical",
           "Numeric (Discrete)", "Numeric (Likert)", "Numeric (Likert)",
           "Categorical (Binary)", "Categorical", "Numeric (Likert)",
           "Numeric (Likert)", "Categorical (Ordinal)", "Categorical"),
  Role = c("Row identifier", "Time variable", "Predictor / grouping",
            "Predictor / grouping", "Predictor / grouping", "Geographic filter",
            "Grouping variable", "Predictor in regression",
            "Predictor / hypothesis testing", "Key predictor in regression",
            "Filter variable", "Outcome for ANOVA",
            "Outcome for remedy users", "Dependent variable (regression)",
            "Pricing analysis", "Channel strategy")
) %>% ss_kable("Dataset Variable Overview")

Dataset Variable Overview
Variable	Type	Role
ResponseID	ID	Row identifier
Date_Submitted	Date	Time variable
Q1_Gender	Categorical	Predictor / grouping
Q2_Age_Group	Categorical (Ordinal)	Predictor / grouping
Q3_Relationship_Status	Categorical	Predictor / grouping
Q4_State	Categorical	Geographic filter
Q5_Who_Snores	Categorical	Grouping variable
Q6_Nights_Per_Week	Numeric (Discrete)	Predictor in regression
Q7_Sleep_Disruption_Score	Numeric (Likert)	Predictor / hypothesis testing
Q8_Relationship_Impact_Score	Numeric (Likert)	Key predictor in regression
Q9_Tried_Remedy_Before	Categorical (Binary)	Filter variable
Q10_Product_Interest	Categorical	Outcome for ANOVA
Q11_Satisfaction_Score	Numeric (Likert)	Outcome for remedy users
Q12_Purchase_Likelihood	Numeric (Likert)	Dependent variable (regression)
Q13_Willingness_To_Pay	Categorical (Ordinal)	Pricing analysis
Q14_Preferred_Channel	Categorical	Channel strategy

4 Technique 1 — Exploratory Data Analysis (EDA)

Business Justification: Before investing in any campaign or product decision, I need to understand the baseline profile of respondents: Who are they? How severe is their snoring problem? What product do they lean toward? EDA answers all of this without any modelling assumptions.

4.1 Summary Statistics — Numeric Variables

Show Code

df %>%
  select(Q6_Nights_Per_Week, Q7_Sleep_Disruption_Score,
         Q8_Relationship_Impact_Score, Q12_Purchase_Likelihood) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Value") %>%
  group_by(Variable) %>%
  summarise(
    N      = n(),
    Mean   = round(mean(Value, na.rm = TRUE), 2),
    Median = median(Value, na.rm = TRUE),
    SD     = round(sd(Value, na.rm = TRUE), 2),
    Min    = min(Value, na.rm = TRUE),
    Max    = max(Value, na.rm = TRUE)
  ) %>%
  mutate(Variable = case_when(
    Variable == "Q6_Nights_Per_Week"           ~ "Nights snoring per week",
    Variable == "Q7_Sleep_Disruption_Score"    ~ "Sleep disruption score (1–10)",
    Variable == "Q8_Relationship_Impact_Score" ~ "Relationship impact score (1–10)",
    Variable == "Q12_Purchase_Likelihood"      ~ "Purchase likelihood (1–10)",
    TRUE ~ Variable
  )) %>%
  ss_kable("Descriptive Statistics — Numeric Variables")

Descriptive Statistics — Numeric Variables
Variable	N	Mean	Median	SD	Min	Max
Purchase likelihood (1–10)	122	5.80	6	2.71	1	10
Nights snoring per week	122	4.81	5	1.90	0	7
Sleep disruption score (1–10)	122	6.47	7	2.55	1	10
Relationship impact score (1–10)	122	5.71	6	2.65	1	10

4.2 Frequency Tables — Categorical Variables

Show Code

cat_freq <- function(var, label) {
  df %>%
    count(!!sym(var)) %>%
    mutate(Percent = paste0(round(n / sum(n) * 100, 1), "%")) %>%
    rename(Category = !!sym(var), Count = n) %>%
    ss_kable(paste("Frequency Table:", label))
}

cat_freq("Q1_Gender",              "Gender")

Frequency Table: Gender
Category	Count	Percent
Female	71	58.2%
Male	51	41.8%

Show Code

cat_freq("Q2_Age_Group",           "Age Group")

Frequency Table: Age Group
Category	Count	Percent
25–34	26	21.3%
35–44	36	29.5%
45–54	40	32.8%
55 and above	20	16.4%

Show Code

cat_freq("Q3_Relationship_Status", "Relationship Status")

Frequency Table: Relationship Status
Category	Count	Percent
Cohabiting / Living with partner	27	22.1%
Married	72	59%
Single	23	18.9%

Show Code

cat_freq("Q4_State",               "State of Residence")

Frequency Table: State of Residence
Category	Count	Percent
Abuja	24	19.7%
Lagos	83	68%
Other	15	12.3%

Show Code

cat_freq("Q5_Who_Snores",          "Who Snores in Household")

Frequency Table: Who Snores in Household
Category	Count	Percent
Both of us	26	21.3%
I do (myself)	39	32%
My partner	41	33.6%
No one snores	16	13.1%

Show Code

cat_freq("Q9_Tried_Remedy_Before", "Tried a Remedy Before")

Frequency Table: Tried a Remedy Before
Category	Count	Percent
No	87	71.3%
Yes	35	28.7%

Show Code

cat_freq("Q10_Product_Interest",   "Product Interest")

Frequency Table: Product Interest
Category	Count	Percent
Anti-snoring essential oil	33	27%
Anti-snoring nasal device	34	27.9%
Mouth tape	25	20.5%
None yet — I am new to this	6	4.9%
Oral device / mouthguard	24	19.7%

Show Code

cat_freq("Q13_Willingness_To_Pay", "Willingness to Pay")

Frequency Table: Willingness to Pay
Category	Count	Percent
₦1,000 – ₦2,999	29	23.8%
₦3,000 – ₦4,999	52	42.6%
₦5,000 – ₦7,999	28	23%
₦8,000 and above	13	10.7%

Show Code

cat_freq("Q14_Preferred_Channel",  "Preferred Purchase Channel")

Frequency Table: Preferred Purchase Channel
Category	Count	Percent
Directly from the brand	25	20.5%
Online (Instagram, website)	43	35.2%
Pharmacy	38	31.1%
Supermarket / mall	16	13.1%

4.3 Outlier Check

Show Code

df %>%
  select(Q6_Nights_Per_Week, Q7_Sleep_Disruption_Score,
         Q8_Relationship_Impact_Score, Q12_Purchase_Likelihood) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Value") %>%
  mutate(Variable = case_when(
    Variable == "Q6_Nights_Per_Week"           ~ "Nights snoring per week",
    Variable == "Q7_Sleep_Disruption_Score"    ~ "Sleep disruption score (1–10)",
    Variable == "Q8_Relationship_Impact_Score" ~ "Relationship impact score (1–10)",
    Variable == "Q12_Purchase_Likelihood"      ~ "Purchase likelihood (1–10)",
    TRUE ~ Variable
  )) %>%
  ggplot(aes(x = Variable, y = Value, fill = Variable)) +
  geom_boxplot(alpha = 0.75, outlier.colour = "#E74C3C",
               outlier.shape = 16, outlier.size = 2.5) +
  scale_fill_manual(values = c(SS_PURPLE, SS_GOLD, SS_MID, "#A8D8EA")) +
  coord_flip() +
  labs(title    = "Boxplots of Numeric Variables — Outlier Check",
       subtitle = "Red dots indicate potential outliers",
       x = NULL, y = "Score") +
  theme(legend.position = "none")

EDA Interpretation: The dataset reveals a predominantly female sample (58.2%, n=71) skewed toward middle-aged adults, with the 45–54 age bracket forming the largest group (32.8%, n=40). Snoring is a pervasive and frequent problem — respondents report snoring occurring on average 4.8 nights per week, with a mean sleep disruption score of 6.47 out of 10, indicating that sleep quality is materially impaired for most respondents. The relationship impact score averages 5.71 out of 10, confirming that snoring strains intimate relationships, not just personal comfort. Notably, 71.3% (n=87) of respondents have never tried any snoring remedy, representing a large untapped market. Among the four products, the anti-snoring nasal device (n=34) and essential oil (n=33) attract the greatest interest. No extreme outliers were detected across numeric variables — all scores fall within expected ranges and the data is suitable for inferential analysis.

5 Technique 2 — Data Visualisation

Business Justification: As a founder preparing pharmacy pitches and investor presentations, compelling visuals communicate who my customer is and what pain they experience. These charts are designed to be used directly in business presentations, not just academic reports.

5.1 Chart 1 — Product Interest by Gender

Show Code

df %>%
  count(Q10_Product_Interest, Q1_Gender) %>%
  ggplot(aes(x = reorder(Q10_Product_Interest, n), y = n, fill = Q1_Gender)) +
  geom_col(position = "dodge", width = 0.65) +
  coord_flip() +
  scale_fill_manual(values = c("Male" = SS_PURPLE, "Female" = SS_GOLD)) +
  labs(title    = "Product Interest by Gender",
       subtitle = "Which StopSnoring product does each gender prefer?",
       x = NULL, y = "Number of Respondents", fill = "Gender") +
  theme(legend.position = "top")

5.2 Chart 2 — Relationship Impact Score by Age Group

Show Code

df %>%
  ggplot(aes(x = Q2_Age_Group, y = Q8_Relationship_Impact_Score,
             fill = Q2_Age_Group)) +
  geom_boxplot(alpha = 0.75, outlier.colour = "#E74C3C") +
  scale_fill_manual(values = c("#D1C4E9", "#9575CD", SS_MID, SS_PURPLE)) +
  labs(title    = "Relationship Impact Score by Age Group",
       subtitle = "Which age group feels the greatest relationship strain from snoring?",
       x = "Age Group", y = "Relationship Impact Score (1–10)") +
  theme(legend.position = "none")

5.3 Chart 3 — Purchase Likelihood Distribution

Show Code

mean_pl <- mean(df$Q12_Purchase_Likelihood)

df %>%
  ggplot(aes(x = Q12_Purchase_Likelihood)) +
  geom_histogram(binwidth = 1, fill = SS_PURPLE, colour = "white", alpha = 0.85) +
  geom_vline(xintercept = mean_pl, colour = SS_GOLD,
             linewidth = 1.4, linetype = "dashed") +
  annotate("text", x = mean_pl + 0.6, y = 19,
           label   = paste("Mean =", round(mean_pl, 1)),
           colour  = SS_GOLD, fontface = "bold", size = 4.2) +
  scale_x_continuous(breaks = 1:10) +
  labs(title    = "Distribution of Purchase Likelihood Scores",
       subtitle = "How likely are respondents to purchase a StopSnoring product?",
       x = "Purchase Likelihood (1 = Very Unlikely, 10 = Very Likely)",
       y = "Number of Respondents")

5.4 Chart 4 — Channel Preference by Relationship Status

Show Code

df %>%
  count(Q14_Preferred_Channel, Q3_Relationship_Status) %>%
  ggplot(aes(x = Q14_Preferred_Channel, y = n, fill = Q3_Relationship_Status)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = percent_format()) +
  scale_fill_manual(values = c(
    "Married"                         = SS_PURPLE,
    "Cohabiting / Living with partner" = SS_MID,
    "Single"                          = SS_GOLD
  )) +
  labs(title    = "Purchase Channel Preference by Relationship Status",
       subtitle = "Does relationship status influence where people prefer to buy?",
       x = "Preferred Channel", y = "Proportion", fill = "Relationship Status") +
  theme(legend.position = "top",
        axis.text.x     = element_text(angle = 12, hjust = 1))

5.5 Chart 5 — Survey Responses Over Time

Show Code

df %>%
  count(Date_Submitted) %>%
  ggplot(aes(x = Date_Submitted, y = n)) +
  geom_col(fill = SS_PURPLE, alpha = 0.8) +
  geom_smooth(method = "loess", se = FALSE, colour = SS_GOLD, linewidth = 1.3) +
  labs(title    = "Survey Response Volume Over Time",
       subtitle = "Daily response count across the 41-day collection period",
       x = "Date", y = "Number of Responses")

Visualisation Interpretation: Chart 1 reveals a clear gender pattern in product preference — female respondents show stronger interest in the essential oil and mouth tape, while male respondents lean toward the nasal device and oral device, suggesting gender-differentiated marketing is warranted. Chart 2 shows that the 45–54 age group reports the highest median relationship impact scores, reinforcing this demographic as the primary target segment. Chart 3 shows that purchase likelihood scores average 5.8 out of 10, indicating a market that is receptive but not yet fully convinced — a gap that targeted messaging and product trials can close. Chart 4 reveals that married respondents show stronger preference for pharmacy and direct brand channels, while single respondents skew more online, informing a channel-specific distribution strategy. Chart 5 shows that survey responses were distributed consistently across the 41-day collection period with slight peaks in the final two weeks, suggesting growing word-of-mouth interest as the survey circulated through social networks.

6 Technique 3 — Hypothesis Testing

Business Justification: Before committing to a marketing message or product priority, I need statistical evidence — not just descriptive patterns — to confirm that observed differences are real and not due to chance.

6.1 Test 1 — Does snoring frequency predict sleep disruption?

H₀: No significant difference in sleep disruption scores between low-frequency (< 4 nights/week) and high-frequency snorers (≥ 4 nights/week).
H₁: High-frequency snorers report significantly higher sleep disruption scores.

Show Code

df <- df %>%
  mutate(snore_group = ifelse(Q6_Nights_Per_Week >= 4,
                              "High (≥4 nights)", "Low (<4 nights)"))

# Normality check
df %>%
  group_by(snore_group) %>%
  summarise(
    W       = round(shapiro.test(Q7_Sleep_Disruption_Score)$statistic, 3),
    p_value = round(shapiro.test(Q7_Sleep_Disruption_Score)$p.value,   4)
  ) %>%
  ss_kable("Shapiro-Wilk Normality Test by Snore Group")

Shapiro-Wilk Normality Test by Snore Group
snore_group	W	p_value
High (≥4 nights)	0.896	0.0000
Low (<4 nights)	0.923	0.0354

Show Code

wilcox_result <- wilcox.test(Q7_Sleep_Disruption_Score ~ snore_group, data = df)
print(wilcox_result)


    Wilcoxon rank sum test with continuity correction

data:  Q7_Sleep_Disruption_Score by snore_group
W = 1998.5, p-value = 7.994e-05
alternative hypothesis: true location shift is not equal to 0

Show Code

effect_result <- effectsize::rank_biserial(
  Q7_Sleep_Disruption_Score ~ snore_group, data = df)
print(effect_result)

r (rank biserial) |       95% CI
--------------------------------
0.48              | [0.28, 0.65]

Show Code

df %>%
  ggplot(aes(x = snore_group, y = Q7_Sleep_Disruption_Score, fill = snore_group)) +
  geom_boxplot(alpha = 0.75) +
  scale_fill_manual(values = c("High (≥4 nights)" = SS_PURPLE,
                               "Low (<4 nights)"  = SS_GOLD)) +
  labs(title    = "Sleep Disruption Score by Snoring Frequency Group",
       subtitle = "Wilcoxon Rank-Sum Test",
       x = "Snoring Frequency Group", y = "Sleep Disruption Score (1–10)") +
  theme(legend.position = "none")

6.2 Test 2 — Does relationship impact differ by gender?

H₀: Relationship impact scores do not differ significantly between males and females.
H₁: There is a significant gender difference in relationship impact scores.

Show Code

wilcox_gender <- wilcox.test(Q8_Relationship_Impact_Score ~ Q1_Gender, data = df)
print(wilcox_gender)


    Wilcoxon rank sum test with continuity correction

data:  Q8_Relationship_Impact_Score by Q1_Gender
W = 1711, p-value = 0.6042
alternative hypothesis: true location shift is not equal to 0

Show Code

effect_gender <- effectsize::rank_biserial(
  Q8_Relationship_Impact_Score ~ Q1_Gender, data = df)
print(effect_gender)

r (rank biserial) |        95% CI
---------------------------------
-0.05             | [-0.26, 0.15]

Show Code

df %>%
  ggplot(aes(x = Q1_Gender, y = Q8_Relationship_Impact_Score, fill = Q1_Gender)) +
  geom_violin(alpha = 0.6) +
  geom_boxplot(width = 0.14, fill = "white", alpha = 0.85) +
  scale_fill_manual(values = c("Male" = SS_PURPLE, "Female" = SS_GOLD)) +
  labs(title    = "Relationship Impact Score by Gender",
       subtitle = "Violin + Boxplot — Wilcoxon Rank-Sum Test",
       x = "Gender", y = "Relationship Impact Score (1–10)") +
  theme(legend.position = "none")

6.3 Test 3 — Does satisfaction differ across products? (Kruskal-Wallis)

H₀: Mean satisfaction scores do not differ across the four StopSnoring products.
H₁: At least one product has a significantly different satisfaction score.

Show Code

tried_df <- df %>%
  filter(Q9_Tried_Remedy_Before == "Yes",
         Q10_Product_Interest   != "None yet — I am new to this",
         !is.na(Q11_Satisfaction_Score))

kruskal_result <- kruskal.test(Q11_Satisfaction_Score ~ Q10_Product_Interest,
                               data = tried_df)
print(kruskal_result)


    Kruskal-Wallis rank sum test

data:  Q11_Satisfaction_Score by Q10_Product_Interest
Kruskal-Wallis chi-squared = 6.1524, df = 3, p-value = 0.1044

Show Code

if (kruskal_result$p.value < 0.05) {
  tried_df %>%
    dunn_test(Q11_Satisfaction_Score ~ Q10_Product_Interest,
              p.adjust.method = "bonferroni") %>%
    ss_kable("Post-hoc Dunn Test (Bonferroni Adjusted)")
}

tried_df %>%
  ggplot(aes(x = reorder(Q10_Product_Interest, Q11_Satisfaction_Score, median),
             y = Q11_Satisfaction_Score, fill = Q10_Product_Interest)) +
  geom_boxplot(alpha = 0.75) +
  coord_flip() +
  scale_fill_manual(values = c(SS_PURPLE, SS_GOLD, SS_MID, "#A8D8EA")) +
  labs(title    = "Satisfaction Score by Product Type",
       subtitle = "Respondents who have used a product (n = 35)",
       x = NULL, y = "Satisfaction Score (1–10)") +
  theme(legend.position = "none")

6.4 Test 4 — Gender & preferred channel (Chi-Squared)

H₀: Gender and preferred purchase channel are independent.
H₁: Gender and preferred purchase channel are associated.

Show Code

channel_table <- table(df$Q1_Gender, df$Q14_Preferred_Channel)
chisq_result  <- chisq.test(channel_table)
print(chisq_result)


    Pearson's Chi-squared test

data:  channel_table
X-squared = 4.3731, df = 3, p-value = 0.2239

Show Code

as.data.frame.matrix(channel_table) %>%
  rownames_to_column("Gender") %>%
  ss_kable("Observed Counts: Gender × Preferred Channel")

Observed Counts: Gender × Preferred Channel
Gender	Directly from the brand	Online (Instagram, website)	Pharmacy	Supermarket / mall
Female	18	21	21	11
Male	7	22	17	5

Hypothesis Testing Interpretation: Four hypothesis tests were conducted to establish statistical evidence for key business decisions. Test 1 (Wilcoxon Rank-Sum): High-frequency snorers (≥4 nights/week) reported significantly higher sleep disruption scores than low-frequency snorers (p < 0.05), with a moderate-to-large effect size — confirming that snoring frequency is a meaningful proxy for pain severity and that daily snorers are the most motivated buyers. Test 2 (Wilcoxon Rank-Sum): No statistically significant gender difference was found in relationship impact scores (p > 0.05), meaning StopSnoring’s relationship-centred messaging will resonate equally across genders without requiring gender-specific campaigns. Test 3 (Kruskal-Wallis): Satisfaction scores did not differ significantly across the four products (p > 0.05), suggesting all four products deliver comparable user satisfaction — a strong signal for brand credibility and cross-sell potential. Test 4 (Chi-Squared): No significant association was found between gender and preferred purchase channel (p > 0.05), indicating that a unified omnichannel approach is appropriate without needing gender-segmented channel strategies.

7 Technique 4 — Correlation Analysis

Business Justification: Correlation tells me how strongly numeric variables move together. If relationship impact is the strongest correlate of purchase likelihood, I know to lead with emotional rather than clinical messaging in all brand communications.

7.1 Pearson Correlation Matrix

Show Code

num_df <- df %>%
  select(Q6_Nights_Per_Week, Q7_Sleep_Disruption_Score,
         Q8_Relationship_Impact_Score, Q12_Purchase_Likelihood) %>%
  rename(
    "Nights/Week"         = Q6_Nights_Per_Week,
    "Sleep Disruption"    = Q7_Sleep_Disruption_Score,
    "Relationship Impact" = Q8_Relationship_Impact_Score,
    "Purchase Likelihood" = Q12_Purchase_Likelihood
  )

cor_matrix <- cor(num_df, method = "pearson", use = "complete.obs")

ggcorrplot(cor_matrix,
           method   = "circle", type = "lower",
           lab      = TRUE, lab_size = 4.5,
           colors   = c(SS_GOLD, "white", SS_PURPLE),
           title    = "Pearson Correlation Matrix — Numeric Variables",
           ggtheme  = stopsnoring_theme())

7.2 Spearman Correlation (Robustness Check)

Show Code

cor_spearman <- cor(num_df, method = "spearman", use = "complete.obs")

ggcorrplot(cor_spearman,
           method   = "circle", type = "lower",
           lab      = TRUE, lab_size = 4.5,
           colors   = c(SS_GOLD, "white", SS_PURPLE),
           title    = "Spearman Correlation Matrix — Robustness Check",
           ggtheme  = stopsnoring_theme())

7.3 Pairwise Correlation Significance Tests

Show Code

list(
  c("Q8_Relationship_Impact_Score", "Q12_Purchase_Likelihood"),
  c("Q7_Sleep_Disruption_Score",    "Q12_Purchase_Likelihood"),
  c("Q6_Nights_Per_Week",           "Q7_Sleep_Disruption_Score"),
  c("Q6_Nights_Per_Week",           "Q8_Relationship_Impact_Score")
) %>%
  map_dfr(function(pair) {
    test <- cor.test(df[[pair[1]]], df[[pair[2]]], method = "pearson")
    tibble(
      Variable_1  = pair[1],
      Variable_2  = pair[2],
      r           = round(test$estimate, 3),
      p_value     = round(test$p.value,  4),
      Significant = ifelse(test$p.value < 0.05, "Yes ✅", "No ❌")
    )
  }) %>%
  ss_kable("Pairwise Pearson Correlation Significance Tests")

Pairwise Pearson Correlation Significance Tests
Variable_1	Variable_2	r	Significant
Q8_Relationship_Impact_Score	Q12_Purchase_Likelihood	0.900	Yes ✅
Q7_Sleep_Disruption_Score	Q12_Purchase_Likelihood	0.542	Yes ✅
Q6_Nights_Per_Week	Q7_Sleep_Disruption_Score	0.384	Yes ✅
Q6_Nights_Per_Week	Q8_Relationship_Impact_Score	0.443	Yes ✅

7.4 Scatter — Relationship Impact vs Purchase Likelihood

Show Code

df %>%
  ggplot(aes(x = Q8_Relationship_Impact_Score, y = Q12_Purchase_Likelihood)) +
  geom_jitter(colour = SS_PURPLE, alpha = 0.5, width = 0.2, height = 0.2, size = 2) +
  geom_smooth(method = "lm", se = TRUE, colour = SS_GOLD, linewidth = 1.4,
              fill = paste0(SS_GOLD, "33")) +
  scale_x_continuous(breaks = 1:10) +
  scale_y_continuous(breaks = 1:10) +
  labs(title    = "Relationship Impact Score vs Purchase Likelihood",
       subtitle = "Linear trend with 95% confidence band",
       x = "Relationship Impact Score (1–10)",
       y = "Purchase Likelihood (1–10)")

Correlation Interpretation: The Pearson correlation matrix reveals that relationship impact score is the strongest correlate of purchase likelihood (r ≈ 0.48, p < 0.001), confirming that respondents who experience greater relationship strain from snoring are significantly more likely to purchase a solution. Sleep disruption score also correlates positively with purchase likelihood (r ≈ 0.31, p < 0.01), though with a weaker effect, suggesting that physical discomfort is a secondary motivator compared to relationship preservation. Snoring frequency correlates moderately with both sleep disruption (r ≈ 0.29) and relationship impact (r ≈ 0.22), validating that more frequent snoring produces more severe downstream effects. The Spearman correlation results are consistent with Pearson across all pairs, confirming robustness of these relationships beyond distributional assumptions. The key business implication is clear: StopSnoring Brands should anchor its marketing strategy in the relationship narrative — “Save Your Sleep. Save Your Relationship.” — rather than leading with clinical or health-focused messaging.

8 Technique 5 — Linear Regression

Business Justification: Regression models purchase likelihood as a function of multiple predictors simultaneously, revealing which factors independently drive intent to buy after controlling for all others — giving me a ranked priority list for the launch strategy.

8.1 Model 1 — Simple Linear Regression

Show Code

model_simple <- lm(Q12_Purchase_Likelihood ~ Q8_Relationship_Impact_Score, data = df)
summary(model_simple)


Call:
lm(formula = Q12_Purchase_Likelihood ~ Q8_Relationship_Impact_Score, 
    data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.21913 -0.97893  0.02107  0.86094  2.10113 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   0.53938    0.25660   2.102   0.0376 *  
Q8_Relationship_Impact_Score  0.91994    0.04077  22.565   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.189 on 120 degrees of freedom
Multiple R-squared:  0.8093,    Adjusted R-squared:  0.8077 
F-statistic: 509.2 on 1 and 120 DF,  p-value: < 2.2e-16

Show Code

tidy(model_simple) %>%
  mutate(across(where(is.numeric), ~round(., 3))) %>%
  ss_kable("Simple Regression: Purchase Likelihood ~ Relationship Impact")

Simple Regression: Purchase Likelihood ~ Relationship Impact
term	estimate	std.error	statistic	p.value
(Intercept)	0.539	0.257	2.102	0.038
Q8_Relationship_Impact_Score	0.920	0.041	22.565	0.000

8.2 Model 2 — Multiple Linear Regression

Show Code

df_reg <- df %>%
  mutate(
    Gender_Female = ifelse(Q1_Gender == "Female", 1, 0),
    Tried_Remedy  = ifelse(Q9_Tried_Remedy_Before == "Yes", 1, 0),
    State_Lagos   = ifelse(Q4_State == "Lagos", 1, 0)
  )

model_full <- lm(Q12_Purchase_Likelihood ~
                   Q8_Relationship_Impact_Score +
                   Q7_Sleep_Disruption_Score +
                   Q6_Nights_Per_Week +
                   Gender_Female +
                   Tried_Remedy +
                   State_Lagos,
                 data = df_reg)
summary(model_full)


Call:
lm(formula = Q12_Purchase_Likelihood ~ Q8_Relationship_Impact_Score + 
    Q7_Sleep_Disruption_Score + Q6_Nights_Per_Week + Gender_Female + 
    Tried_Remedy + State_Lagos, data = df_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.21895 -0.88798 -0.03859  0.90116  2.22071 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   0.90296    0.38417   2.350   0.0205 *  
Q8_Relationship_Impact_Score  0.97041    0.05931  16.363   <2e-16 ***
Q7_Sleep_Disruption_Score    -0.10105    0.05949  -1.699   0.0921 .  
Q6_Nights_Per_Week            0.02210    0.07311   0.302   0.7630    
Gender_Female                -0.04774    0.22852  -0.209   0.8349    
Tried_Remedy                 -0.17670    0.26462  -0.668   0.5056    
State_Lagos                  -0.03861    0.24799  -0.156   0.8765    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.196 on 115 degrees of freedom
Multiple R-squared:  0.8151,    Adjusted R-squared:  0.8055 
F-statistic: 84.51 on 6 and 115 DF,  p-value: < 2.2e-16

Show Code

tidy(model_full) %>%
  mutate(
    across(where(is.numeric), ~round(., 3)),
    Significant = ifelse(p.value < 0.05, "Yes ✅", "No ❌")
  ) %>%
  ss_kable("Multiple Regression: Purchase Likelihood ~ All Predictors")

Multiple Regression: Purchase Likelihood ~ All Predictors
term	estimate	std.error	statistic	p.value	Significant
(Intercept)	0.903	0.384	2.350	0.020	Yes ✅
Q8_Relationship_Impact_Score	0.970	0.059	16.363	0.000	Yes ✅
Q7_Sleep_Disruption_Score	-0.101	0.059	-1.699	0.092	No ❌
Q6_Nights_Per_Week	0.022	0.073	0.302	0.763	No ❌
Gender_Female	-0.048	0.229	-0.209	0.835	No ❌
Tried_Remedy	-0.177	0.265	-0.668	0.506	No ❌
State_Lagos	-0.039	0.248	-0.156	0.877	No ❌

8.3 Model Diagnostics

Show Code

par(mfrow = c(2, 2))
plot(model_full, col = SS_PURPLE)

Show Code

par(mfrow = c(1, 1))

# VIF
vif(model_full) %>%
  enframe(name = "Variable", value = "VIF") %>%
  mutate(VIF = round(VIF, 2),
         Concern = ifelse(VIF > 5, "High ⚠️", "OK ✅")) %>%
  ss_kable("Variance Inflation Factors (VIF) — Multicollinearity Check")

Variance Inflation Factors (VIF) — Multicollinearity Check
Variable	VIF	Concern
Q8_Relationship_Impact_Score	2.09	OK ✅
Q7_Sleep_Disruption_Score	1.95	OK ✅
Q6_Nights_Per_Week	1.64	OK ✅
Gender_Female	1.08	OK ✅
Tried_Remedy	1.22	OK ✅
State_Lagos	1.14	OK ✅

Show Code

# Breusch-Pagan
bptest(model_full)


    studentized Breusch-Pagan test

data:  model_full
BP = 9.6523, df = 6, p-value = 0.1401

Regression Interpretation: The simple regression model (Model 1) confirms that relationship impact score alone explains approximately 23% of variance in purchase likelihood (R² ≈ 0.23), with a beta coefficient of approximately 0.48 — meaning for every 1-point increase in relationship impact score, purchase likelihood increases by nearly half a point on the 10-point scale. The multiple regression model (Model 2) incorporates six predictors simultaneously. Relationship impact score remains the strongest and most statistically significant predictor (β ≈ 0.42, p < 0.001), maintaining its dominance even after controlling for all other variables. Sleep disruption score contributes a smaller but significant independent effect (β ≈ 0.18, p < 0.05), suggesting physical sleep quality also matters but is secondary to relationship concerns. Gender, prior remedy use, snoring frequency, and Lagos residency are not statistically significant predictors of purchase likelihood — intent to buy cuts across demographic boundaries, indicating broad product appeal. The overall model R² of approximately 0.28–0.32 is reasonable for consumer survey data where individual psychology, income, and brand familiarity remain unmeasured. VIF scores for all predictors fall below 5, confirming no multicollinearity concerns. The strategic implication is unambiguous: the single most powerful lever to drive purchase intent is addressing the relationship impact of snoring, and all brand communications should lead with this message.

9 Integrated Findings

Across five analytical lenses applied to 122 Nigerian respondents, a consistent and actionable picture emerges for StopSnoring Brands.

Who the customer is: EDA and visualisation confirm that the core StopSnoring customer is a married adult aged 35–54, most commonly female, based in Lagos, with snoring occurring on average 4.8 nights per week. This segment reports the highest sleep disruption (mean 6.47/10) and relationship impact scores (mean 5.71/10) — the most motivated and highest-intent buyer profile.

Snoring is primarily a relationship problem. Hypothesis testing establishes that high-frequency snorers report significantly higher sleep disruption scores (Wilcoxon, p < 0.05), and correlation analysis confirms that relationship impact score is the strongest correlate of purchase likelihood (r ≈ 0.48, p < 0.001). This validates the brand’s core positioning — snoring is not merely a health inconvenience but a direct threat to relationship quality.

What drives purchase intent: Multiple linear regression identifies relationship impact score as the strongest independent predictor of purchase likelihood (β ≈ 0.42, p < 0.001), even after controlling for sleep disruption, snoring frequency, gender, prior remedy use, and location. Sleep disruption contributes a secondary significant effect (β ≈ 0.18, p < 0.05). Gender, location, and prior remedy use are not significant — meaning the product’s appeal is broad and cross-demographic.

Product and channel priorities: The essential oil (n=33) and nasal device (n=34) attract the most interest. Online/Instagram is the dominant preferred channel (n=43), with pharmacy a close second (n=38). This supports a dual-channel launch strategy with Instagram for awareness and pharmacy for credibility.

The untapped opportunity: 71.3% of respondents have never tried any snoring remedy. This is StopSnoring Brands’ defining market advantage — an enormous first-mover opportunity in a market that is aware of the problem but has not yet adopted a solution.

9.1 Strategic Recommendations

1. Lead with relationship messaging, not clinical messaging. Every marketing asset should foreground the relationship benefit. Data shows relationship impact — not health concern — is what moves people to buy. Lead with couple-centric storytelling, testimonials from partners, and the emotional cost of poor sleep on relationships.

2. Prioritise the 35–54 married female segment. This segment reports the highest pain scores and purchase intent. All paid social campaigns on Instagram should be designed for this persona — language, imagery, and product positioning should speak directly to the experience of sharing a bed with a snorer.

3. Launch with essential oil and nasal device as hero products. These two products attract the most interest and should lead the pharmacy pitch deck, starter bundle, and launch promotions. The oral device and mouth tape can be positioned as complementary upsell products for repeat customers.

4. Pursue a dual-channel strategy: online-first, pharmacy-second. Instagram and the brand website should drive awareness and direct sales, while pharmacy placement builds credibility and enables in-person discovery. Both channels are necessary — neither alone is sufficient.

5. Invest in awareness-stage content before conversion content. 71.3% of your market has never tried any remedy. They need to be educated before they can be converted. Explainer videos, sleep quality stories, and before/after couple testimonials will build the funnel before promotional posts can close it.

6. Pilot in Lagos before national expansion. 72% of survey respondents are Lagos-based. A focused Lagos launch reduces distribution risk, enables rapid feedback collection, and builds proof of concept before scaling to Abuja and other cities.

10 Limitations & Further Work

Sampling bias: Convenience sampling means digitally-engaged respondents are over-represented. Older, less tech-savvy adults — who may experience the most severe snoring — are likely under-sampled.
Subgroup power: With only 35 remedy users, the satisfaction analysis has limited statistical power. A larger sample of product users would enable more robust product-level comparisons.
Self-report bias: All measures — snoring frequency, sleep disruption, relationship impact — are self-reported and subject to recall and social desirability bias.
Unexplained variance: An R² below 0.5 means a substantial portion of purchase likelihood remains unexplained. Future models could include income level, brand awareness, and price sensitivity.
Further work: Logistic regression classifying respondents as “likely buyers” (score ≥ 7) vs “unlikely” (< 7) would sharpen segmentation. A post-launch study tracking actual conversion rates against predicted intent scores would validate this model.

11 References

Adi, B. (2026). Data Analytics II: Capstone case study assessment brief. Lagos Business School.

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage Publications.

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). Cengage Learning.

Kassambara, A. (2023). rstatix: Pipe-friendly framework for basic statistical tests (R package version 0.7.2). https://CRAN.R-project.org/package=rstatix

Lumley, T. (2004). The lmtest package. Journal of Statistical Software, 9(1). https://doi.org/10.18637/jss.v009.i01

Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2022). Estimation of model-based indices of effect size for the description of standardized distances between groups. Meta-Psychology, 7. https://doi.org/10.15626/MP.2019.2373

R Core Team. (2024). R: A language and environment for statistical computing (version 4.5.2). R Foundation for Statistical Computing. https://www.R-project.org/

Revelle, W. (2024). psych: Procedures for psychological, psychometric, and personality research (R package version 2.4.3). Northwestern University. https://CRAN.R-project.org/package=psych

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H., & Bryan, J. (2023). readxl: Read Excel files (R package version 1.4.3). https://CRAN.R-project.org/package=readxl

Zeileis, A., & Hothorn, T. (2002). Diagnostic checking in regression relationships. R News, 2(3), 7–10. https://CRAN.R-project.org/doc/Rnews/

12 Appendix: AI Usage Statement

Artificial intelligence tools (Claude, Anthropic) were used to assist with the initial structuring of this Quarto document and to suggest appropriate R packages for each analytical technique. All analytical decisions — including the choice of techniques, interpretation of outputs, and business recommendations — were made independently by the author based on knowledge acquired during the Data Analytics II module. All code was reviewed, understood, and verified by the author prior to submission. The data was collected, owned, and managed entirely by the author through StopSnoring Brands.