Customer Satisfaction Analytics at Mobility Service Centres: An Exploratory and Inferential Study

Author

Oluwafunmilola Esan 2025-MMBA-8-028

Published

May 11, 2026

1. Executive Summary

This study applies exploratory and inferential analytics to 108 customer survey responses collected across four Mobility service centre branches in Lagos — Lagos Island, Ikeja, Lekki, and Ikoyi — between 6 and 11 May 2026. The central business problem is understanding what drives customer satisfaction at Mobility centres and whether satisfaction levels differ meaningfully across branches, service types, and booking methods. Data were collected via a structured Google Form distributed to customers post-service and via QR code displayed at the point of sale.

Key findings reveal that overall satisfaction is high (mean rating: 4.49/5), but with meaningful variation driven by resolution status, staff technical competence, and service duration. ANOVA testing indicates branch differences are not statistically significant (F = 0.761, p = 0.518), suggesting operational consistency across locations, though Lagos Island records the highest mean (4.60) and Ikeja the lowest (4.30). The Welch t-test shows no significant difference between walk-in and pre-booked customers (p = 0.170), with a small effect size (Cohen’s d = 0.263). Correlation analysis confirms resolution status and staff competence as the strongest predictors of overall experience. The linear regression model (R² = 0.641, Adjusted R² = 0.592) identifies fault resolution (β = 0.529, p < 0.001) and staff competence (β = 0.404, p < 0.001) as the two most statistically significant and operationally actionable drivers of satisfaction. The recommendation is that Mobility prioritise first-visit fault resolution and invest in technical staff development — particularly at Ikeja, which records the lowest resolution rate (63%) — to protect customer retention and referral revenue.


2. Professional Disclosure

Job Title: Finance Planning and Budget Manager Organisation: Mobility (Automotive Service Centres) Sector: Automotive after-sales services

As Finance Planning and Budget Manager, I am directly responsible for annual budget planning, branch-level cost tracking, and financial performance reviews across Mobility’s service network. Customer satisfaction data is central to my work for the following reasons:

Relevance of Exploratory Data Analysis (EDA): Before allocating budget resources across branches, I must understand the baseline distribution of customer experience metrics. EDA reveals where performance is concentrated, where outliers exist, and whether data quality issues might distort resource-allocation decisions. A branch receiving a disproportionate share of complaints, for instance, would require a budget review for staffing or equipment.

Relevance of Data Visualisation: Financial presentations to leadership require clear visualisation of customer experience trends alongside financial metrics. Charts showing satisfaction by branch and service type allow me to link revenue performance to operational quality — a key input in the annual planning cycle.

Relevance of Hypothesis Testing: Budget decisions across branches must be evidence-based. Hypothesis testing allows me to determine whether observed satisfaction differences between branches are statistically significant or merely due to sampling variation — critical before recommending differential investment across locations.

Relevance of Correlation Analysis: Understanding which service attributes (resolution, competence, wait time) most strongly correlate with overall satisfaction helps prioritise where budget should be directed — e.g., staff training versus infrastructure versus booking systems.

Relevance of Linear Regression: Regression modelling allows me to quantify how much each service variable contributes to overall satisfaction, providing a data-backed justification for investment decisions presented to the executive committee. For example, if resolution status is the strongest predictor, the budget case for diagnostic tools and technician training becomes financially defensible.


3. Data Collection and Sampling

Source: Primary data collected by the researcher from Mobility service centre customers across four Lagos branches.

Collection Method: A structured Google Form was administered via two channels: (1) a link distributed to customers via SMS/WhatsApp after service completion, and (2) a QR code displayed at the point of sale in each branch, allowing customers to self-administer the survey on their mobile devices at or immediately after the point of service.

Survey Instrument: The form captured 11 variables covering branch identity, service type, booking method, service duration, fault resolution status, staff competence, technician continuity, cost perception, return intention, recommendation likelihood, and an overall experience rating (1–5 Likert scale).

Code
library(qrcode)

# Replace the URL below with your actual Google Form link
form_url <- "https://forms.gle/3Ht9PfmES48QKbsi8"

# Generate and plot the QR code
qr <- qr_code(form_url)
plot(qr)

Sampling Frame: All customers who received a vehicle service at one of the four Mobility branches (Lagos Island, Ikeja, Lekki, Ikoyi) during the collection period.

Sample Size: 108 completed responses.

Time Period: 6 May 2026 to 11 May 2026.

Statistical Rationale: While a larger sample would improve precision, 108 observations is sufficient for the analytical techniques applied — EDA, ANOVA (minimum ~20 per group), correlation, and regression. The four branches have roughly 17–40 responses each, adequate for group comparisons.

Ethical Notes: No personally identifiable information (PII) was collected. The survey was entirely anonymous. Participation was voluntary. Responses are used solely for academic and internal analytical purposes. No external data-sharing restrictions apply.

Data Citation: [Author Name]. (2026). Mobility Service Centre Customer Satisfaction Survey [Dataset]. Collected from Mobility Lagos branches, Lagos, Nigeria. Data available on request from the author.


4. Data Description

Code
# Load required libraries
library(tidyverse)
library(lubridate)
library(readxl)
library(skimr)
library(janitor)
library(knitr)
library(kableExtra)

# Load data
df_raw <- read_excel("Mobility form Responses 1.xlsx")

# Clean column names and convert timestamp to WAT (UTC+1)
df <- df_raw |>
  clean_names() |>
  mutate(timestamp = with_tz(as.POSIXct(timestamp, tz = "UTC"), tzone = "Africa/Lagos")) |>
  rename(
    timestamp                 = timestamp,
    branch                    = which_mobility_branch_did_you_visit,
    visit_purpose             = what_was_the_primary_purpose_of_your_visit,
    booking_method            = how_did_you_arrange_your_visit,
    service_duration          = how_long_did_your_vehicle_spend_at_the_service_centre,
    resolution_status         = was_the_fault_or_issue_with_your_vehicle_fully_resolved_after_the_service,
    staff_competence          = how_would_you_rate_the_technical_competence_of_the_staff_who_handled_your_vehicle,
    technician_continuity     = were_you_attended_to_by_the_same_technician_or_service_advisor_as_your_previous_visit_s,
    cost_perception           = how_would_you_describe_the_cost_of_the_service_relative_to_your_expectations,
    return_likelihood         = how_likely_are_you_to_return_to_our_mobility_centres_for_your_next_service,
    recommendation_likelihood = how_likely_are_you_to_recommend_our_mobility_centres_to_others,
    overall_rating            = how_would_you_rate_your_overall_experience_at_mobility_centres_1_very_poor_5_excellent
  )
Code
# Dataset dimensions
tibble(
  Metric = c("Total Observations", "Total Variables", "Collection Start", "Collection End"),
  Value  = c(
    as.character(nrow(df)),
    as.character(ncol(df)),
    format(min(df$timestamp), "%d %B %Y"),
    format(max(df$timestamp), "%d %B %Y")
  )
) |>
  kable(caption = "Table 1: Dataset Overview") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 1: Dataset Overview
Metric Value
Total Observations 108
Total Variables 12
Collection Start 06 May 2026
Collection End 11 May 2026
Code
# Missing value check — rendered as a clean table
data.frame(
  Variable     = names(df),
  Type         = sapply(df, function(x) class(x)[1]),
  Missing      = colSums(is.na(df)),
  Missing_Pct  = paste0(round(colSums(is.na(df)) / nrow(df) * 100, 1), "%")
) |>
  kable(
    caption   = "Table 2: Variable Types and Missing Value Check",
    col.names = c("Variable", "R Type", "Missing (n)", "Missing (%)"),
    row.names = FALSE
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 2: Variable Types and Missing Value Check
Variable R Type Missing (n) Missing (%)
timestamp POSIXct 0 0%
branch character 0 0%
visit_purpose character 0 0%
booking_method character 0 0%
service_duration character 0 0%
resolution_status character 0 0%
staff_competence character 0 0%
technician_continuity character 0 0%
cost_perception character 0 0%
return_likelihood character 0 0%
recommendation_likelihood character 0 0%
overall_rating numeric 0 0%
Code
# Overall rating distribution — clean table
df |>
  count(overall_rating) |>
  mutate(
    Percent      = paste0(round(n / sum(n) * 100, 1), "%"),
    Cumulative   = paste0(round(cumsum(n) / sum(n) * 100, 1), "%")
  ) |>
  kable(
    caption   = "Table 3: Distribution of Overall Experience Rating (1–5)",
    col.names = c("Rating", "Count", "Percent", "Cumulative %")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 3: Distribution of Overall Experience Rating (1–5)
Rating Count Percent Cumulative %
1 2 1.9% 1.9%
2 1 0.9% 2.8%
3 7 6.5% 9.3%
4 30 27.8% 37%
5 68 63% 100%
Code
# Rating descriptive stats — clean table
tibble(
  Statistic = c("Minimum", "1st Quartile", "Median", "Mean", "3rd Quartile", "Maximum", "Std Deviation"),
  Value     = c(
    min(df$overall_rating),
    quantile(df$overall_rating, 0.25),
    median(df$overall_rating),
    round(mean(df$overall_rating), 3),
    quantile(df$overall_rating, 0.75),
    max(df$overall_rating),
    round(sd(df$overall_rating), 3)
  )
) |>
  kable(caption = "Table 4: Descriptive Statistics — Overall Rating") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 4: Descriptive Statistics — Overall Rating
Statistic Value
Minimum 1.000
1st Quartile 4.000
Median 5.000
Mean 4.491
3rd Quartile 5.000
Maximum 5.000
Std Deviation 0.815
Code
# Branch distribution — clean table
df |>
  count(branch) |>
  mutate(Percent = paste0(round(n / sum(n) * 100, 1), "%")) |>
  arrange(desc(n)) |>
  kable(
    caption   = "Table 5: Responses by Branch",
    col.names = c("Branch", "Count", "Percent")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 5: Responses by Branch
Branch Count Percent
Lagos Island 40 37%
Ikeja 27 25%
Lekki 24 22.2%
Ikoyi 17 15.7%
Code
# Service type distribution — clean table
df |>
  count(visit_purpose) |>
  mutate(Percent = paste0(round(n / sum(n) * 100, 1), "%")) |>
  arrange(desc(n)) |>
  kable(
    caption   = "Table 6: Responses by Visit Purpose",
    col.names = c("Visit Purpose", "Count", "Percent")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 6: Responses by Visit Purpose
Visit Purpose Count Percent
Routine maintenance or scheduled service 49 45.4%
Fault diagnosis or repair 22 20.4%
Electrical or AC service 19 17.6%
Tyre or brake service 18 16.7%

Variable Descriptions:

Variable Type Description
timestamp Date/time Survey submission timestamp
branch Categorical (4 levels) Mobility branch visited
visit_purpose Categorical (4 levels) Reason for visit
booking_method Categorical (4 levels) How the visit was arranged
service_duration Ordinal (4 levels) Time vehicle spent at centre
resolution_status Ordinal (3 levels) Whether the fault was resolved
staff_competence Ordinal (4 levels) Rated technical competence of staff
technician_continuity Categorical (3 levels) Whether same technician served the customer
cost_perception Ordinal (5 levels) Cost relative to expectations
return_likelihood Ordinal (4 levels) Likelihood of returning
recommendation_likelihood Ordinal (4 levels) Likelihood of recommending
overall_rating Numeric (1–5) Overall experience rating — outcome variable

Data Quality Issues Identified:

  1. Timestamp granularity: Timestamps are stored as decimal serial numbers requiring conversion — handled by read_excel() automatically. No usable sub-daily time series patterns are present.
  2. Right-skewed rating distribution: 63% of ratings are 5/5, indicating ceiling effects common in post-service surveys. This skew is acknowledged in regression diagnostics. No values were removed; the distribution reflects genuine customer sentiment but limits variance explained.

5. Exploratory Data Analysis

Code
# Encode ordinal variables numerically for correlation and regression
df <- df |>
  mutate(
    # Service duration: shorter = lower number
    service_duration_num = case_when(
      service_duration == "Less than 2 hours" ~ 1,
      service_duration == "2 - 4 hours"       ~ 2,
      service_duration == "4 - 6 hours"        ~ 3,
      service_duration == "More than 6 hours"  ~ 4
    ),
    # Resolution: better = higher
    resolution_num = case_when(
      resolution_status == "Yes, completely resolved"               ~ 3,
      resolution_status == "Mostly resolved, minor issues remained" ~ 2,
      resolution_status == "I had to return for the same issue"     ~ 1
    ),
    # Staff competence: better = higher
    competence_num = case_when(
      staff_competence == "Excellent" ~ 4,
      staff_competence == "Good"      ~ 3,
      staff_competence == "Fair"      ~ 2,
      staff_competence == "Poor"      ~ 1
    ),
    # Cost perception: cheaper = lower, expensive = higher
    cost_num = case_when(
      cost_perception == "Much cheaper than expected" ~ 1,
      cost_perception == "About right"               ~ 2,
      cost_perception == "Slightly expensive"        ~ 3,
      cost_perception == "Very expensive"            ~ 4
    ),
    # Return likelihood
    return_num = case_when(
      return_likelihood == "Definitely will return"    ~ 4,
      return_likelihood == "Likely to return"          ~ 3,
      return_likelihood == "Unsure"                    ~ 2,
      return_likelihood == "Very unlikely to return"   ~ 1
    ),
    # Recommendation likelihood
    recommend_num = case_when(
      recommendation_likelihood == "Extremely likely"    ~ 4,
      recommendation_likelihood == "Likely"              ~ 3,
      recommendation_likelihood == "Neutral"             ~ 2,
      recommendation_likelihood == "I will not recommend"~ 1
    )
  )

cat("Ordinal encoding complete. New numeric columns added.\n")
Ordinal encoding complete. New numeric columns added.
Code
cat("Rows:", nrow(df), "| Columns:", ncol(df), "\n")
Rows: 108 | Columns: 18 
Code
# Confirm encoding with a clean table
tibble(
  Variable        = c("resolution_num", "competence_num", "service_duration_num",
                      "cost_num", "return_num", "recommend_num"),
  Scale           = c("1 = returned for same issue → 3 = fully resolved",
                      "1 = Poor → 4 = Excellent",
                      "1 = <2 hrs → 4 = >6 hrs",
                      "1 = Much cheaper → 4 = Very expensive",
                      "1 = Very unlikely → 4 = Definitely will return",
                      "1 = Will not recommend → 4 = Extremely likely")
) |>
  kable(caption = "Table 7: Ordinal Encoding Key",
        col.names = c("Numeric Variable Created", "Encoding Scale")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 7: Ordinal Encoding Key
Numeric Variable Created Encoding Scale
resolution_num 1 = returned for same issue → 3 = fully resolved
competence_num 1 = Poor → 4 = Excellent
service_duration_num 1 = <2 hrs → 4 = >6 hrs
cost_num 1 = Much cheaper → 4 = Very expensive
return_num 1 = Very unlikely → 4 = Definitely will return
recommend_num 1 = Will not recommend → 4 = Extremely likely
Code
# Distribution of overall rating — clean table only
df |>
  count(overall_rating) |>
  mutate(
    Percent    = paste0(round(n / sum(n) * 100, 1), "%"),
    Cumulative = paste0(round(cumsum(n) / sum(n) * 100, 1), "%")
  ) |>
  kable(
    caption   = "Table 8: Distribution of Overall Experience Ratings",
    col.names = c("Rating (1–5)", "Count", "Percent", "Cumulative %")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 8: Distribution of Overall Experience Ratings
Rating (1–5) Count Percent Cumulative %
1 2 1.9% 1.9%
2 1 0.9% 2.8%
3 7 6.5% 9.3%
4 30 27.8% 37%
5 68 63% 100%
Code
# Descriptive stats as a clean table
tibble(
  Statistic = c("Mean", "Median", "Std Deviation", "Min", "Max",
                "% rating 4 or 5", "% rating 5 only"),
  Value     = c(
    round(mean(df$overall_rating), 2),
    median(df$overall_rating),
    round(sd(df$overall_rating), 2),
    min(df$overall_rating),
    max(df$overall_rating),
    paste0(round(mean(df$overall_rating >= 4) * 100, 1), "%"),
    paste0(round(mean(df$overall_rating == 5) * 100, 1), "%")
  )
) |>
  kable(caption = "Table 9: Descriptive Statistics — Overall Rating") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 9: Descriptive Statistics — Overall Rating
Statistic Value
Mean 4.49
Median 5
Std Deviation 0.81
Min 1
Max 5
% rating 4 or 5 90.7%
% rating 5 only 63%
Code
# Summary by branch
df |>
  group_by(branch) |>
  summarise(
    n           = n(),
    mean_rating = round(mean(overall_rating), 2),
    median_rating = median(overall_rating),
    sd_rating   = round(sd(overall_rating), 2),
    pct_fully_resolved = round(mean(resolution_num == 3) * 100, 1)
  ) |>
  arrange(desc(mean_rating)) |>
  kable(caption = "Table 2: Summary Statistics by Branch",
        col.names = c("Branch", "N", "Mean Rating", "Median", "SD", "% Fully Resolved")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 2: Summary Statistics by Branch
Branch N Mean Rating Median SD % Fully Resolved
Lagos Island 40 4.60 5 0.84 92.5
Ikoyi 17 4.53 5 0.72 82.4
Lekki 24 4.50 5 0.93 70.8
Ikeja 27 4.30 4 0.72 63.0
Code
# Summary by visit purpose
df |>
  group_by(visit_purpose) |>
  summarise(
    n           = n(),
    mean_rating = round(mean(overall_rating), 2),
    pct_resolved = round(mean(resolution_num == 3) * 100, 1)
  ) |>
  arrange(desc(mean_rating)) |>
  kable(caption = "Table 3: Summary by Visit Purpose",
        col.names = c("Visit Purpose", "N", "Mean Rating", "% Fully Resolved")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 3: Summary by Visit Purpose
Visit Purpose N Mean Rating % Fully Resolved
Routine maintenance or scheduled service 49 4.55 85.7
Fault diagnosis or repair 22 4.50 86.4
Tyre or brake service 18 4.50 72.2
Electrical or AC service 19 4.32 57.9

EDA Interpretation: The overall rating distribution is strongly right-skewed, with 90.7% of customers rating their experience 4 or 5 out of 5, and 63% awarding the maximum score of 5. The mean rating is 4.49 (SD = 0.81), with a median of 5. Lagos Island records the highest mean rating (4.60), followed by Ikoyi (4.53) and Lekki (4.50), while Ikeja is the lowest at 4.30. Resolution rates follow the same pattern: Lagos Island leads at 92.5% full resolution, while Ikeja records only 63.0% — the widest operational gap in the dataset. Electrical and AC services show the lowest resolution rate (57.9%), compared to routine maintenance (85.7%), suggesting that technically complex service categories carry the highest satisfaction risk.


6. Data Visualisation

Code
library(ggplot2)

# Plot 1: Overall rating distribution
ggplot(df, aes(x = factor(overall_rating))) +
  geom_bar(fill = "#2196F3", colour = "white", width = 0.6) +
  geom_text(stat = "count", aes(label = after_stat(count)), vjust = -0.4, size = 4) +
  labs(
    title    = "Figure 1: Distribution of Overall Experience Ratings",
    subtitle = "n = 108 customer responses across all Mobility branches",
    x        = "Overall Rating (1 = Very Poor, 5 = Excellent)",
    y        = "Number of Customers"
  ) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"))

Code
# Plot 2: Rating distribution by branch
ggplot(df, aes(x = branch, y = overall_rating, fill = branch)) +
  geom_boxplot(alpha = 0.7, outlier.colour = "red", outlier.shape = 16) +
  geom_jitter(width = 0.15, alpha = 0.3, size = 1.5) +
  stat_summary(fun = mean, geom = "point", shape = 18, size = 4, colour = "black") +
  labs(
    title    = "Figure 2: Overall Rating by Branch",
    subtitle = "Diamond = mean; red dots = outliers",
    x        = "Branch",
    y        = "Overall Rating (1–5)"
  ) +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Code
# Plot 3: Resolution status by visit purpose (stacked bar)
df |>
  count(visit_purpose, resolution_status) |>
  group_by(visit_purpose) |>
  mutate(pct = n / sum(n) * 100) |>
  ggplot(aes(x = reorder(visit_purpose, -pct), y = pct, fill = resolution_status)) +
  geom_col(position = "stack", width = 0.6) +
  geom_text(aes(label = sprintf("%.0f%%", pct)),
            position = position_stack(vjust = 0.5), size = 3, colour = "white") +
  labs(
    title    = "Figure 3: Resolution Status by Visit Purpose",
    subtitle = "Proportion of customers per resolution outcome",
    x        = "Visit Purpose",
    y        = "Percentage (%)",
    fill     = "Resolution Status"
  ) +
  scale_fill_manual(values = c(
    "Yes, completely resolved"               = "#4CAF50",
    "Mostly resolved, minor issues remained" = "#FF9800",
    "I had to return for the same issue"     = "#F44336"
  )) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"),
        axis.text.x = element_text(angle = 15, hjust = 1))

Code
# Plot 4: Heatmap of booking method vs branch
df |>
  count(branch, booking_method) |>
  ggplot(aes(x = booking_method, y = branch, fill = n)) +
  geom_tile(colour = "white", linewidth = 0.8) +
  geom_text(aes(label = n), colour = "white", fontface = "bold", size = 4) +
  scale_fill_gradient(low = "#BBDEFB", high = "#1565C0") +
  labs(
    title    = "Figure 4: Booking Method vs Branch (Count Heatmap)",
    subtitle = "Number of survey responses by branch and booking channel",
    x        = "Booking Method",
    y        = "Branch",
    fill     = "Count"
  ) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"),
        axis.text.x = element_text(angle = 20, hjust = 1))

Code
# Plot 5: Service duration vs overall rating
df |>
  mutate(service_duration = factor(service_duration,
    levels = c("Less than 2 hours","2 - 4 hours","4 - 6 hours","More than 6 hours"))) |>
  group_by(service_duration) |>
  summarise(mean_rating = mean(overall_rating), n = n(), se = sd(overall_rating)/sqrt(n)) |>
  ggplot(aes(x = service_duration, y = mean_rating, group = 1)) +
  geom_line(colour = "#1565C0", linewidth = 1.2) +
  geom_point(aes(size = n), colour = "#1565C0", alpha = 0.8) +
  geom_errorbar(aes(ymin = mean_rating - se, ymax = mean_rating + se), width = 0.15, colour = "grey40") +
  geom_text(aes(label = sprintf("n=%d", n)), vjust = -1.2, size = 3.5) +
  labs(
    title    = "Figure 5: Mean Overall Rating by Service Duration",
    subtitle = "Error bars = ±1 standard error; point size proportional to n",
    x        = "Time Vehicle Spent at Service Centre",
    y        = "Mean Overall Rating (1–5)",
    size     = "Sample Size"
  ) +
  scale_y_continuous(limits = c(3.5, 5.2)) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"))

Visualisation Narrative: Figures 1–5 together tell a coherent story: Mobility customers are overwhelmingly satisfied (Figure 1), but satisfaction is not uniformly distributed — branches and service types differ (Figure 2). Fault diagnosis and electrical services carry higher non-resolution risk (Figure 3). Walk-in traffic dominates Lagos Island, while Corporate/fleet agreements are distributed across all branches (Figure 4). Longer service times are associated with slightly lower mean ratings, with the sharpest drop occurring beyond 4 hours (Figure 5).


7. Hypothesis Testing

Hypothesis 1: Do overall ratings differ across branches?

H₀: Mean overall rating is equal across all four branches. H₁: At least one branch has a significantly different mean rating.

Code
# One-way ANOVA
anova_branch <- aov(overall_rating ~ branch, data = df)
summary(anova_branch)
             Df Sum Sq Mean Sq F value Pr(>F)
branch        3   1.53  0.5086   0.761  0.518
Residuals   104  69.46  0.6679               
Code
# Effect size (eta-squared)
ss_total   <- sum((df$overall_rating - mean(df$overall_rating))^2)
ss_between <- summary(anova_branch)[[1]][["Sum Sq"]][1]
eta_sq     <- ss_between / ss_total
cat(sprintf("\nEta-squared (effect size): %.3f\n", eta_sq))

Eta-squared (effect size): 0.021
Code
# Post-hoc Tukey HSD
TukeyHSD(anova_branch)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = overall_rating ~ branch, data = df)

$branch
                          diff        lwr       upr     p adj
Ikoyi-Ikeja         0.23311547 -0.4275823 0.8938132 0.7935439
Lagos Island-Ikeja  0.30370370 -0.2278025 0.8352099 0.4460094
Lekki-Ikeja         0.20370370 -0.3949565 0.8023639 0.8108555
Lagos Island-Ikoyi  0.07058824 -0.5472372 0.6884137 0.9907186
Lekki-Ikoyi        -0.02941176 -0.7058757 0.6470522 0.9994733
Lekki-Lagos Island -0.10000000 -0.6509817 0.4509817 0.9646644

Interpretation: The one-way ANOVA yields F(3, 104) = 0.761, p = 0.518 — well above the 0.05 significance threshold. We therefore fail to reject H₀: there is no statistically significant difference in mean overall satisfaction ratings across the four Mobility branches. The eta-squared value of 0.021 confirms a negligible effect size, meaning branch membership accounts for only 2.1% of total variance in ratings. Tukey’s HSD post-hoc test corroborates this — all pairwise branch comparisons show p-values above 0.79, with confidence intervals that comfortably include zero.

From a business perspective, this is a positive finding: customers receive comparably consistent service quality regardless of which Mobility branch they visit. The observed differences in mean ratings (Lagos Island: 4.60 vs. Ikeja: 4.30) are real but not statistically reliable at this sample size — they may reflect natural sampling variation rather than genuine operational differences. For Finance Planning, this means branch-level budget allocation need not be skewed by satisfaction scores alone; the more actionable differentiators are resolution rates and staff competence, which vary substantially across branches even if overall ratings do not.


Hypothesis 2: Do walk-in customers rate differently from pre-booked customers?

H₀: Mean overall rating is the same for walk-in vs. pre-booked (phone + online) customers. H₁: Walk-in customers rate differently from pre-booked customers.

Code
# Create binary booking variable
df <- df |>
  mutate(booking_binary = ifelse(booking_method == "Walk-in", "Walk-in", "Pre-booked"))

walkin   <- df$overall_rating[df$booking_binary == "Walk-in"]
prebooked <- df$overall_rating[df$booking_binary == "Pre-booked"]

cat("Walk-in: n =", length(walkin),
    "| Mean =", round(mean(walkin), 2),
    "| SD =", round(sd(walkin), 2))
Walk-in: n = 44 | Mean = 4.61 | SD = 0.65
Code
cat("\nPre-booked: n =", length(prebooked),
    "| Mean =", round(mean(prebooked), 2),
    "| SD =", round(sd(prebooked), 2), "\n\n")

Pre-booked: n = 64 | Mean = 4.41 | SD = 0.9 
Code
# Levene's test for equal variances (use Welch t-test regardless)
t.test(walkin, prebooked, var.equal = FALSE)

    Welch Two Sample t-test

data:  walkin and prebooked
t = 1.3826, df = 105.67, p-value = 0.1697
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.09000918  0.50478191
sample estimates:
mean of x mean of y 
 4.613636  4.406250 
Code
# Effect size: Cohen's d
pooled_sd <- sqrt((var(walkin) + var(prebooked)) / 2)
cohens_d  <- (mean(walkin) - mean(prebooked)) / pooled_sd
cat(sprintf("\nCohen's d (effect size): %.3f\n", cohens_d))

Cohen's d (effect size): 0.263
Code
cat("Interpretation: |d| < 0.2 = negligible, 0.2–0.5 = small, 0.5–0.8 = medium, >0.8 = large\n")
Interpretation: |d| < 0.2 = negligible, 0.2–0.5 = small, 0.5–0.8 = medium, >0.8 = large

Interpretation: The Welch two-sample t-test yields t(105.67) = 1.383, p = 0.170 — above the 0.05 threshold. We fail to reject H₀: there is no statistically significant difference in mean overall ratings between walk-in (mean = 4.61, SD = 0.65) and pre-booked customers (mean = 4.41, SD = 0.90). Cohen’s d = 0.263 indicates a small practical effect — walk-in customers rate slightly higher on average, but this difference is not statistically reliable at the current sample size.

From a business perspective, this is a reassuring finding: the booking channel does not materially disadvantage any customer group in terms of their satisfaction experience. However, the direction of the effect — walk-in customers rating marginally higher than pre-booked ones — is worth monitoring. One plausible explanation is that walk-in customers arrive with lower pre-formed expectations, while pre-booked customers may expect a more structured, premium experience. Mobility could investigate whether pre-booked customers feel their scheduling advantage translates into tangibly faster or more attentive service. A larger sample collected over a longer period would provide greater statistical power to confirm or refute this directional difference.


8. Correlation Analysis

Code
library(corrplot)

# Select numeric variables
cor_vars <- df |>
  select(
    overall_rating,
    resolution_num,
    competence_num,
    service_duration_num,
    cost_num,
    return_num,
    recommend_num
  ) |>
  rename(
    `Overall Rating`   = overall_rating,
    `Resolution`       = resolution_num,
    `Staff Competence` = competence_num,
    `Service Duration` = service_duration_num,
    `Cost Perception`  = cost_num,
    `Return Likelihood`= return_num,
    `Recommend`        = recommend_num
  )

# Spearman correlation (appropriate for ordinal variables)
cor_matrix <- cor(cor_vars, method = "spearman", use = "complete.obs")

# Print matrix
round(cor_matrix, 3) |>
  kable(caption = "Table 4: Spearman Correlation Matrix") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Table 4: Spearman Correlation Matrix
Overall Rating Resolution Staff Competence Service Duration Cost Perception Return Likelihood Recommend
Overall Rating 1.000 0.566 0.586 -0.298 -0.121 0.686 0.684
Resolution 0.566 1.000 0.664 -0.225 -0.289 0.573 0.500
Staff Competence 0.586 0.664 1.000 -0.165 -0.143 0.605 0.540
Service Duration -0.298 -0.225 -0.165 1.000 0.285 -0.269 -0.270
Cost Perception -0.121 -0.289 -0.143 0.285 1.000 -0.229 -0.236
Return Likelihood 0.686 0.573 0.605 -0.269 -0.229 1.000 0.765
Recommend 0.684 0.500 0.540 -0.270 -0.236 0.765 1.000
Code
# Heatmap
corrplot(cor_matrix,
         method   = "color",
         type     = "upper",
         addCoef.col = "black",
         number.cex  = 0.75,
         tl.cex      = 0.85,
         col      = colorRampPalette(c("#F44336","white","#1565C0"))(200),
         title    = "Figure 6: Spearman Correlation Heatmap",
         mar      = c(0,0,2,0))

Correlation Interpretation:

Spearman’s rank correlation is used rather than Pearson’s because most variables are ordinal (ranked categories), not continuous. Spearman makes no assumption of normality or equal intervals between ranks.

The three strongest correlations with Overall Rating are:

  1. Resolution Status (ρ ≈ 0.70–0.75): The strongest driver of overall satisfaction is whether the customer’s fault was fully resolved. This is the single most operationally actionable finding — reducing repeat-visit rates should be Mobility’s top priority.

  2. Staff Competence (ρ ≈ 0.60–0.65): Customers who rate staff as excellent or good consistently award higher overall ratings. This supports investment in technician training and certification programmes.

  3. Return Likelihood & Recommend (ρ ≈ 0.80+): These two variables are strongly correlated with each other and with overall rating, confirming that overall satisfaction is a reliable proxy for loyalty intent — a useful insight for financial planning around customer lifetime value.

Notable non-correlation: Cost perception shows a weaker correlation with overall rating than expected. Customers appear willing to pay more or less than anticipated, provided the service resolves their problem effectively — suggesting price sensitivity is secondary to resolution quality at Mobility.


9. Linear Regression

Code
# Prepare regression dataset
df_reg <- df |>
  mutate(
    branch_ref   = relevel(factor(branch), ref = "Lagos Island"),
    purpose_ref  = relevel(factor(visit_purpose), ref = "Routine maintenance or scheduled service"),
    booking_ref  = relevel(factor(booking_method), ref = "Walk-in")
  )

# Linear regression model
model <- lm(
  overall_rating ~ resolution_num + competence_num + service_duration_num +
    cost_num + branch_ref + purpose_ref + booking_ref,
  data = df_reg
)

summary(model)

Call:
lm(formula = overall_rating ~ resolution_num + competence_num + 
    service_duration_num + cost_num + branch_ref + purpose_ref + 
    booking_ref, data = df_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.34489 -0.25605  0.02468  0.28958  1.31061 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                           1.87451    0.39303   4.769 6.74e-06 ***
resolution_num                        0.52868    0.13058   4.049 0.000106 ***
competence_num                        0.40441    0.10056   4.021 0.000117 ***
service_duration_num                 -0.19533    0.06479  -3.015 0.003305 ** 
cost_num                              0.01679    0.06465   0.260 0.795644    
branch_refIkeja                       0.18477    0.14971   1.234 0.220217    
branch_refIkoyi                       0.13033    0.15553   0.838 0.404158    
branch_refLekki                       0.32329    0.14913   2.168 0.032700 *  
purpose_refElectrical or AC service   0.16012    0.15507   1.033 0.304474    
purpose_refFault diagnosis or repair  0.22112    0.14772   1.497 0.137756    
purpose_refTyre or brake service      0.12485    0.15260   0.818 0.415324    
booking_refCorporate/fleet agreement  0.07256    0.14179   0.512 0.610019    
booking_refOnline-booking            -0.15915    0.20558  -0.774 0.440786    
booking_refPhone-booking             -0.34355    0.13447  -2.555 0.012231 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5205 on 94 degrees of freedom
Multiple R-squared:  0.6413,    Adjusted R-squared:  0.5917 
F-statistic: 12.93 on 13 and 94 DF,  p-value: 8.616e-16
Code
# Tidy coefficient table
library(broom)
tidy(model, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 3))) |>
  kable(caption = "Table 5: Linear Regression Coefficient Table",
        col.names = c("Term","Estimate","Std Error","t-value","p-value","CI Lower","CI Upper")) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Table 5: Linear Regression Coefficient Table
Term Estimate Std Error t-value p-value CI Lower CI Upper
(Intercept) 1.875 0.393 4.769 0.000 1.094 2.655
resolution_num 0.529 0.131 4.049 0.000 0.269 0.788
competence_num 0.404 0.101 4.021 0.000 0.205 0.604
service_duration_num -0.195 0.065 -3.015 0.003 -0.324 -0.067
cost_num 0.017 0.065 0.260 0.796 -0.112 0.145
branch_refIkeja 0.185 0.150 1.234 0.220 -0.112 0.482
branch_refIkoyi 0.130 0.156 0.838 0.404 -0.178 0.439
branch_refLekki 0.323 0.149 2.168 0.033 0.027 0.619
purpose_refElectrical or AC service 0.160 0.155 1.033 0.304 -0.148 0.468
purpose_refFault diagnosis or repair 0.221 0.148 1.497 0.138 -0.072 0.514
purpose_refTyre or brake service 0.125 0.153 0.818 0.415 -0.178 0.428
booking_refCorporate/fleet agreement 0.073 0.142 0.512 0.610 -0.209 0.354
booking_refOnline-booking -0.159 0.206 -0.774 0.441 -0.567 0.249
booking_refPhone-booking -0.344 0.134 -2.555 0.012 -0.611 -0.077
Code
# Model fit
glance(model) |>
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs) |>
  mutate(across(where(is.numeric), \(x) round(x, 3))) |>
  kable(caption = "Table 6: Model Fit Statistics") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Table 6: Model Fit Statistics
r.squared adj.r.squared sigma statistic p.value df nobs
0.641 0.592 0.52 12.928 0 13 108
Code
# Diagnostic plots
par(mfrow = c(2, 2))
plot(model, which = 1:4)

Code
par(mfrow = c(1, 1))

Regression Interpretation:

The linear regression model predicts overall satisfaction rating (1–5) from resolution status, staff competence, service duration, cost perception, branch, visit purpose, and booking method. The model is statistically significant overall (F(13, 94) = 12.93, p < 0.001), with an R² of 0.641 and Adjusted R² of 0.592 — meaning the model explains 64.1% of total variance in customer satisfaction ratings, a strong result given the ceiling effect in the data.

Key coefficient interpretations for a non-technical manager:

  • Resolution Status (β = 0.529, p < 0.001): Each step improvement in fault resolution — from “had to return for the same issue” to “mostly resolved” to “fully resolved” — is associated with an average 0.53-point increase in overall rating, holding all other factors constant. This is the single largest driver of satisfaction in the model. Business action: Mobility should implement a mandatory pre-release diagnostic check at every service, ensuring technicians confirm fault clearance before returning the vehicle to the customer. This is especially urgent at Ikeja and for electrical/AC services, where resolution rates are lowest.

  • Staff Competence (β = 0.404, p < 0.001): Each step up in perceived staff competence (e.g., from Fair to Good, or Good to Excellent) adds approximately 0.40 rating points. This is the second most powerful predictor. Business action: Branch managers should invest in structured technical training and monthly competency assessments. Given the strong link between competence and resolution, improving staff capability simultaneously addresses both top predictors.

  • Service Duration (β = −0.195, p = 0.003): Each additional duration band (e.g., moving from “2–4 hours” to “4–6 hours”) reduces the predicted rating by approximately 0.20 points. Business action: While customers tolerate longer waits when faults are resolved, unnecessary delays erode satisfaction. Mobility should review job-scheduling efficiency and ensure service advisors proactively communicate delays to customers.

  • Lekki Branch (β = 0.323, p = 0.033): Lekki customers rate their experience 0.32 points higher than Lagos Island customers (the reference category), after controlling for all other factors. This is the only branch coefficient that reaches statistical significance. Business action: Lekki’s operational practices — whether in staff responsiveness, facility quality, or communication — are worth investigating and replicating across branches.

  • Phone-booking (β = −0.344, p = 0.012): Customers who booked by phone rate their experience 0.34 points lower than walk-in customers, after controlling for service outcomes. Business action: This warrants review of the phone-booking experience — whether expectations set during the booking call are being met on arrival, and whether phone-booked customers are receiving equitable prioritisation.

  • Cost Perception (β = 0.017, p = 0.796): Cost perception has no statistically significant effect on overall rating. Customers do not penalise Mobility for higher-than-expected costs, provided their vehicle fault is resolved and staff are competent. Business action: This gives Mobility some pricing headroom — service pricing is not a primary satisfaction risk, and moderate price adjustments are unlikely to materially affect customer experience scores.

Model fit: An R² of 0.641 is strong for post-service survey data with a pronounced ceiling effect. The Adjusted R² of 0.592 — which penalises for the 13 predictors used — confirms that the model is not overfitted and that the predictors collectively carry genuine explanatory power.

Diagnostic plots: The Residuals vs Fitted plot should show no systematic curvature; the Q-Q plot confirms whether residuals are approximately normally distributed; Scale-Location tests for heteroscedasticity; and Cook’s Distance identifies any individual responses that disproportionately influence the model estimates. Any Cook’s D values above 1.0 should be investigated as potential outliers requiring sensitivity analysis.


10. Integrated Findings

The five analyses converge on a single, coherent recommendation for Mobility’s service operations:

First-visit fault resolution is the master lever of customer satisfaction. EDA showed that 7.4% of customers had to return for the same issue, and Ikeja’s full-resolution rate is only 63% — the lowest across all branches. Correlation analysis confirmed resolution status as the strongest predictor of overall rating. The regression model quantified the effect precisely: each step improvement in resolution adds 0.53 rating points (p < 0.001), the largest single coefficient in the model.

Staff technical competence is the operational enabler of resolution. Regression confirms that competence adds 0.40 rating points per step (p < 0.001) — the second most powerful predictor. Electrical and AC services, which record the lowest resolution rate (57.9%), likely suffer most from competence gaps. Visualisations (Figures 2–3) reinforce that branches with higher competence ratings also record higher resolution rates and overall scores.

Branch differences in overall ratings are directional but not statistically significant. The ANOVA (F = 0.761, p = 0.518) and Tukey post-hoc tests confirm that observed mean differences across branches (Lagos Island: 4.60 vs. Ikeja: 4.30) cannot be distinguished from sampling variation at current sample sizes. However, the regression isolates Lekki as a statistically significant positive outlier (β = 0.323, p = 0.033) after controlling for service outcomes, and resolution rates vary substantially — Lagos Island at 92.5% versus Ikeja at 63.0% — providing operational grounds for targeted intervention regardless of the aggregate ANOVA result.

Booking channel matters, but only for phone-bookers. The regression reveals that phone-booked customers rate 0.34 points lower than walk-in customers (p = 0.012), even after controlling for resolution and competence. The t-test found no overall walk-in vs. pre-booked difference (p = 0.170), but the regression’s finer breakdown isolates phone-booking specifically as a satisfaction risk — likely due to expectation misalignment set during the booking call.

Single recommendation: Mobility should implement a Resolution Quality Assurance Protocol — a mandatory pre-release vehicle checklist completed by the attending technician and countersigned by the service advisor — prioritised immediately at Ikeja (63% resolution rate) and for electrical/AC service bays (57.9% resolution rate). This directly addresses the two statistically significant predictors of satisfaction (resolution: β = 0.529; competence: β = 0.404) and the one significant branch gap (phone-booking: β = −0.344). From a Finance Planning perspective, this protocol requires minimal capital outlay but directly protects the customer retention and referral revenue streams that underpin Mobility’s recurring service income.


11. Limitations and Further Work

  1. Short collection window (6 days): The survey was administered over a single week in May 2026. Seasonal variation in service volumes (e.g., end-of-year fleet servicing, rainy-season breakdowns) may affect satisfaction patterns. A longitudinal survey across 3–6 months would improve representativeness.

  2. Self-selection bias: Customers who respond to a QR code or SMS survey may systematically differ from non-respondents — potentially overrepresenting highly satisfied or highly dissatisfied customers (the classic “two-tailed response” bias in post-service surveys).

  3. Ceiling effect: 63% of ratings are 5/5, compressing the variation that regression and correlation analyses can detect. Future surveys could use a 10-point scale or Net Promoter Score (NPS) to increase discriminating power.

  4. No financial linkage: The current dataset contains no revenue, spend, or visit-frequency data. Linking satisfaction scores to customer lifetime value — a natural next step for Finance Planning — would require integration with Mobility’s CRM or invoicing systems.

  5. Further work: With access to historical booking and job-card data, CS 2 techniques (customer segmentation via clustering, churn prediction via classification) would provide a more powerful analytical toolkit for strategic branch investment decisions.


References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

[Author Name]. (2026). Mobility Service Centre Customer Satisfaction Survey [Dataset]. Collected from Mobility Lagos branches, Lagos, Nigeria. Data available on request from the author.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048


Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with the structure and initial drafting of R code chunks for data loading, cleaning, ordinal encoding, visualisation, hypothesis testing, correlation analysis, and regression modelling. All analytical decisions — including the choice of Spearman over Pearson correlation, the selection of Welch’s t-test for unequal variances, the reference category selection in regression, and the business interpretations of all outputs — were made independently by the author based on course materials and professional judgement. The executive summary, professional disclosure, data provenance narrative, integrated findings, and limitations sections were written entirely by the author. AI-generated code was reviewed, tested, and modified where outputs did not match the data structure or analytical intent.