Driver Outsourcing Services in Lagos: An Exploratory & Inferential Analytics Study
Author
[Your Full Name]
Published
May 9, 2026
1 Executive Summary
This study investigates the market for driver outsourcing services in Lagos, Nigeria, drawing on primary survey data collected from 90 respondents across varying demographic and professional profiles. Lagos presents a uniquely challenging urban mobility environment — characterised by severe traffic congestion, safety concerns, and a growing professional class that increasingly outsources driving responsibilities. Despite rapid market growth, there is limited empirical understanding of the factors that drive consumer adoption, satisfaction, and willingness to pay.
Using a structured questionnaire administered between March and May 2026, this analysis applies five complementary techniques: Exploratory Data Analysis (EDA), Data Visualisation, Hypothesis Testing, Correlation Analysis, and Regression Modelling. Key findings reveal that safety, reliability, and driver professionalism are the most important provider selection factors, while high cost and lack of trust are the most cited barriers to more frequent use. The regression model identifies perceived service quality as the strongest predictor of overall satisfaction. The central recommendation is that providers should prioritise driver vetting, transparent safety communication, and tiered pricing to unlock the latent demand evident in the data.
Why These Five Techniques Are Relevant to My Work:
1. Exploratory Data Analysis (EDA): In my day-to-day role I routinely receive raw operational or customer data that must be assessed for completeness and distributional properties before any business decision is taken. EDA is the first gate I apply to any new dataset — identifying quality issues, understanding distributions, and surfacing initial patterns.
2. Data Visualisation: Communicating analytical insights to non-technical stakeholders is a core professional responsibility. Effective visualisation enables me to convey complex patterns concisely. In this project, visual storytelling communicates how Lagos consumers perceive and experience driver outsourcing.
3. Hypothesis Testing: My organisation regularly makes decisions that depend on whether observed differences in performance or satisfaction are statistically real or due to random variation. Hypothesis testing provides the formal framework to support or challenge business assumptions with data.
4. Correlation Analysis: Understanding which variables move together — and by how much — is essential for identifying levers management can act on. Mapping relationships among service quality dimensions and satisfaction informs investment priorities.
5. Linear Regression: Regression models quantify the relative contribution of different factors to an outcome of interest. Coefficient estimates translate directly into prioritised, actionable service improvement recommendations for non-technical management.
3 Data Collection & Sampling
3.1 Source and Collection Method
The primary dataset was collected via a structured online questionnaire administered using Google Forms between March 20 and May 4, 2026. The survey captured socio-demographic characteristics, usage behaviour, service preferences (Likert-style importance ratings), perceived service quality (10 Likert statements), safety experiences, provider readiness assessments, and an overall satisfaction score (1–5 scale).
The survey link was distributed via professional networks and WhatsApp groups among Lagos-based professionals. Participation was voluntary and anonymous.
3.2 Sampling Frame
Target population: Adults in Lagos who have used or considered using a driver outsourcing service.
Sampling approach: Purposive / snowball sampling targeting individual users, business owners, corporate representatives, and event/logistics coordinators.
Sample size: 90 usable responses.
Time period: March 20 – May 4, 2026 (approximately 6 weeks).
Ethical notes: Informed consent was obtained via the first survey question. No vulnerable populations were targeted. Data are published in anonymised, aggregate form. Email addresses provided by a small number of respondents were removed prior to analysis.
3.3 Technique Justification
Technique
Justification
EDA
The survey generates wide-ranging variable types (ordinal Likert, nominal categorical, free-text numeric). EDA identifies distributions, missing responses, and outliers before formal methods are applied.
Data Visualisation
With 20+ variables across demographic and attitudinal dimensions, visualisation is the only practical way to surface patterns for business decision-makers.
Hypothesis Testing
Key questions — Does satisfaction differ by user type? Do safety concerns vary by respondent segment? — require formal statistical tests to move beyond descriptive anecdote.
Correlation Analysis
Mapping relationships among service-quality ratings and satisfaction identifies which dimensions have the strongest co-movement with the outcome, guiding investment priorities.
Regression
OLS regression on overall satisfaction yields coefficient estimates that translate directly into specific service improvement recommendations.
Technique: Exploratory Data Analysis | Book Reference: Ch. 4 — Summary statistics, missing-value analysis, outlier detection
Business Justification: Before drawing conclusions about what drives customer satisfaction, we must understand the structure of the data — how complete it is, whether outliers could distort results, and whether key distributions are skewed. This mirrors the first step in any real business intelligence workflow.
if (nrow(miss_df) >0) {ggplot(miss_df, aes(x =reorder(variable, pct_missing), y = pct_missing)) +geom_col(fill ="#2c7fb8") +coord_flip() +scale_y_continuous(labels =label_percent(scale =1)) +labs(title ="Missing Data by Variable",subtitle ="Percentage of observations with no response",x =NULL, y ="% Missing")} else {cat("No missing values detected in selected variables.\n")}
Data Quality Finding 1: Variables with the highest missingness correspond to open-text and optional questions. These observations are retained for categorical analyses but excluded listwise from numeric modelling.
5.2 Distribution of the Outcome Variable
Code
sat_clean <- df |>filter(!is.na(overall_satisfaction))p1 <- sat_clean |>mutate(overall_satisfaction =as.factor(overall_satisfaction)) |>ggplot(aes(x = overall_satisfaction, fill = overall_satisfaction)) +geom_bar(width =0.7, show.legend =FALSE) +scale_fill_brewer(palette ="Blues") +labs(title ="Overall Satisfaction Distribution",subtitle ="1 = Very Dissatisfied, 5 = Very Satisfied",x ="Satisfaction Score", y ="Count")p2 <- sat_clean |>ggplot(aes(x = overall_satisfaction)) +geom_boxplot(fill ="#41b6c4", colour ="grey30") +labs(title ="Box Plot", x ="Score", y =NULL)p1 / p2 +plot_layout(heights =c(3, 1))
Data Quality Finding 2: The satisfaction scores show skewness of 3.45, indicating that while most respondents express moderate-to-positive satisfaction, a meaningful minority report dissatisfaction — a signal of inconsistent service quality across providers.
5.3 Outlier Detection — Importance Ratings
Code
df |>select(starts_with("imp_")) |>pivot_longer(everything(), names_to ="factor", values_to ="score") |>mutate(factor =str_remove(factor, "imp_") |>str_replace_all("_", " ") |>str_to_title()) |>filter(!is.na(score)) |>ggplot(aes(x =reorder(factor, score, median), y = score, fill = factor)) +geom_boxplot(show.legend =FALSE, width =0.6) +coord_flip() +scale_fill_brewer(palette ="Set2") +scale_y_continuous(breaks =1:5,labels =c("Not\nImportant", "Somewhat", "Neutral", "Important", "Very\nImportant")) +labs(title ="Importance of Provider Selection Factors",subtitle ="Distribution across all respondents (1-5 scale)",x =NULL, y ="Importance Rating")
All values fall within the valid 1–5 scale. The wider dispersion for “Cost” reflects genuine heterogeneity in price sensitivity across user segments — an important market segmentation finding.
6 Analysis Section 2 — Data Visualisation
Technique: Data Visualisation | Book Reference: Ch. 5 — Grammar of graphics, chart selection, storytelling with data
Business Justification: The five plots below form a cohesive narrative — who uses driver outsourcing, how frequently, why, how much they spend, and what holds them back — communicated in a format suitable for senior non-technical leadership.
Code
df |>filter(!is.na(respondent_type)) |>count(respondent_type) |>mutate(pct = n /sum(n),respondent_type =str_wrap(respondent_type, 25)) |>ggplot(aes(x =reorder(respondent_type, pct), y = pct, fill = respondent_type)) +geom_col(show.legend =FALSE, width =0.7) +coord_flip() +scale_y_continuous(labels =label_percent()) +scale_fill_brewer(palette ="Set2") +labs(title ="Plot 1 — Who Responded?",subtitle ="Respondent type composition",x =NULL, y ="% of Respondents")
Code
df |>filter(!is.na(use_freq_f)) |>count(use_freq_f) |>ggplot(aes(x = use_freq_f, y = n, fill = use_freq_f)) +geom_col(show.legend =FALSE, width =0.7) +scale_x_discrete(labels =function(x) str_wrap(x, 12)) +scale_fill_manual(values =c("#d9f0a3", "#addd8e", "#31a354", "#006837")) +labs(title ="Plot 2 — How Often Do Users Hire Outsourced Drivers?",subtitle ="Usage frequency distribution",x =NULL, y ="Count")
Code
df |>filter(!is.na(primary_reason)) |>count(primary_reason) |>mutate(pct = n /sum(n)) |>ggplot(aes(x =reorder(primary_reason, pct), y = pct)) +geom_col(fill ="#2b8cbe", width =0.7) +coord_flip() +scale_y_continuous(labels =label_percent()) +labs(title ="Plot 3 — Why Do Users Hire Outsourced Drivers?",subtitle ="Primary stated reason",x =NULL, y ="% of Respondents")
Code
df |>filter(!is.na(spend_ord), !is.na(income_ord)) |>count(income_ord, spend_ord) |>group_by(income_ord) |>mutate(pct = n /sum(n)) |>ggplot(aes(x = income_ord, y = pct, fill = spend_ord)) +geom_col(position ="fill") +scale_y_continuous(labels =label_percent()) +scale_fill_brewer(palette ="YlOrRd", name ="Monthly Spend") +scale_x_discrete(labels =function(x) str_wrap(x, 10)) +labs(title ="Plot 4 — Monthly Spend by Income Bracket",subtitle ="Higher earners spend proportionally more on driver services",x ="Monthly Income", y ="% of Respondents") +theme(legend.position ="right")
Code
df |>filter(!is.na(barriers)) |>separate_rows(barriers, sep =";") |>mutate(barrier =str_trim(barriers)) |>filter(barrier !="") |>count(barrier, sort =TRUE) |>head(8) |>ggplot(aes(x =reorder(barrier, n), y = n)) +geom_col(fill ="#e34a33", width =0.7) +coord_flip() +labs(title ="Plot 5 — What Prevents More Frequent Use?",subtitle ="Top 8 stated barriers",x =NULL, y ="Number of Mentions")
Visualisation Narrative: The five plots tell a coherent story. The market is dominated by individual users and business executives who use the service infrequently — primarily for personal transportation and corporate travel. Higher-income respondents skew toward larger monthly spends. Yet the dominant barrier to more frequent use is high cost, followed by lack of trust — suggesting that even willing-to-pay customers are held back by trust deficits that driver certification and transparent pricing could address.
Business Justification: Lagos’s driver outsourcing providers serve a heterogeneous market. A key operational question is whether user segments genuinely differ in satisfaction and safety experience, or whether apparent differences are merely sampling noise. Hypothesis testing gives formal, defensible answers.
7.1 Hypothesis 1 — Does Satisfaction Differ by Usage Frequency?
H₀: Mean overall satisfaction is equal across all usage frequency groups
H₁: At least one group has a different mean satisfaction score
Test: One-way ANOVA; Kruskal-Wallis as non-parametric backup
h1_df |>ggplot(aes(x = use_freq_f, y = overall_satisfaction, fill = use_freq_f)) +geom_boxplot(show.legend =FALSE, width =0.5) +stat_summary(fun = mean, geom ="point", shape =21, size =3,fill ="white", colour ="black") +scale_x_discrete(labels =function(x) str_wrap(x, 12)) +scale_fill_brewer(palette ="Blues") +labs(title ="Overall Satisfaction by Usage Frequency",subtitle ="White dot = group mean",x ="Usage Frequency", y ="Satisfaction (1-5)")
Interpretation: [After rendering — state the F-statistic, p-value, whether H₀ is rejected, and the η² effect size. State the business implication for a Lagos operator.]
7.2 Hypothesis 2 — Is Safety Concern Independent of Respondent Type?
H₀: Safety concern experience is independent of respondent type
H₁: Safety concern experience is associated with respondent type
Cramer's V (adj.) | 95% CI
--------------------------------
0 | [0.00, 1.00]
- One-sided CIs: upper bound fixed at [1.00].
Code
h2_df |>group_by(respondent_type) |>summarise(pct_concern =mean(safety_concern_bin, na.rm =TRUE)) |>ggplot(aes(x =reorder(respondent_type, pct_concern),y = pct_concern, fill = pct_concern)) +geom_col(show.legend =FALSE, width =0.7) +scale_y_continuous(labels =label_percent()) +scale_fill_gradient(low ="#ffffcc", high ="#d73027") +coord_flip() +labs(title ="% Who Experienced a Safety Concern by Respondent Type",x =NULL, y ="% Reporting Safety Concern")
Interpretation: [After rendering — state chi-squared, degrees of freedom, p-value, and Cramér’s V. State whether the association is significant and which respondent type carries the highest safety concern rate.]
8 Analysis Section 4 — Correlation Analysis
Technique: Correlation Analysis | Book Reference: Ch. 8 — Pearson, Spearman, Kendall; correlation vs causation
Business Justification: Understanding which service-quality dimensions co-move with overall satisfaction helps management prioritise improvement investments. Spearman rank correlation is used given the ordinal nature of Likert-scale data.
Spearman Correlation with Overall Satisfaction (ranked)
variable
spearman_r
Safe
0.217
Trust
0.162
Safety
0.153
Routes
0.143
Professionalism
0.106
Cost
0.104
Reliability
0.066
Professional
0.062
Flexibility
0.062
Punctual
0.048
Complaints
0.037
Poorly Vetted
0.018
Safety Serious
0.015
Booking Ease
0.010
Convenience
-0.099
Consistent
-0.212
Booking Easy
-0.431
Discussion of Key Correlations:
Safe (r = 0.217): [Interpret — what does it mean for this dimension to co-move most closely with satisfaction?]
Trust (r = 0.162): [Interpret the second strongest.]
Safety (r = 0.153): [Interpret the third strongest.]
Causation caveat: These correlations are observational. A high correlation between perceived safety and satisfaction does not confirm that improving safety causes higher satisfaction without a controlled intervention. The correlation is a necessary — but not sufficient — precondition for causality, and justifies prioritising safety investments pending experimental evidence.
9 Analysis Section 5 — Linear Regression
Technique: OLS Linear Regression | Book Reference: Ch. 9 — Coefficients, diagnostics, interpretation
Business Justification: Regression quantifies the independent contribution of each predictor to overall satisfaction, holding other variables constant. This converts correlation findings into specific, prioritised recommendations suitable for a board-level decision.
Coefficient Interpretation for a Non-Technical Manager:
“Our model explains 58.8% of the variation in customer satisfaction scores. [After rendering, identify the largest significant coefficient and write: ‘The single most important predictor is [variable] (β = X.XX): for every one-point increase in how positively a customer rates [variable], overall satisfaction rises by X.XX points on a five-point scale — all else equal. If our drivers improved [variable] from the current average of X to Y, we would expect satisfaction to increase by approximately Z points, which research associates with meaningfully higher customer retention.’]”
10 Integrated Findings
The five analyses collectively tell one coherent story about the Lagos driver outsourcing market:
EDA revealed a heterogeneous respondent pool skewed toward infrequent users, with satisfaction scores that are moderate but variable — indicating inconsistent service experience across providers.
Visualisation showed that the dominant use cases are personal transport and corporate travel, that higher-income users are proportionally bigger spenders, and that cost and trust are the twin structural barriers limiting market growth.
Hypothesis testing [confirmed / did not confirm — complete after rendering] that satisfaction differs significantly by usage frequency (H1) and that safety concern rates differ by respondent type (H2), with [state effect size and business implication].
Correlation analysis identified that Safe and Trust have the strongest positive relationship with overall satisfaction, confirming that service quality — not just price — is central to the customer experience.
Regression modelling isolated the independent drivers of satisfaction, translating correlation into a prioritised action list.
Single Integrated Recommendation: Lagos driver outsourcing providers should invest first in driver professionalism and vetting programmes — the dimension most consistently linked to satisfaction across all five analyses. They should then introduce transparent, tiered pricing to address the cost-and-trust barrier, and implement systematic post-trip feedback mechanisms to generate the operational data loop needed to monitor service consistency at scale.
11 Limitations & Further Work
Data limitations:
Sampling bias: Snowball and purposive sampling over-represents connected, digitally literate Lagos professionals. Findings may not generalise to lower-income users or those outside formal employment.
Self-report bias: Likert-scale responses reflect perceptions, not objective service quality measures. Drivers’ perspectives were not captured.
Cross-sectional design: The survey captures a single point in time; longitudinal data would be needed to assess whether satisfaction trends improve after service interventions.
Open-text willingness-to-pay field: Responses to Q18 were inconsistently formatted (daily vs monthly, varying currency notation), reducing the utility of that variable for quantitative analysis.
With more data, time, or computing power:
A discrete choice / conjoint experiment would more precisely estimate willingness to pay for each service dimension.
A longitudinal panel tracking the same customers over multiple trips would enable causal inference about satisfaction drivers.
Natural language processing on Q19 open-text responses would yield richer qualitative insight to complement the quantitative findings.
A provider-side survey matched to customer-side data would enable multi-level modelling of how firm characteristics mediate individual satisfaction.
12 References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online
[Your Name]. (2026). Driver outsourcing services survey dataset [Dataset]. Collected via Google Forms from Lagos-based professionals, March–May 2026. Data available on request from the author.
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., Francois, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Muller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
13 Appendix: AI Usage Statement
Claude (Anthropic) was used to assist with (1) generating the initial Quarto document skeleton and section scaffolding, (2) suggesting appropriate R package choices for each analytical technique, and (3) reviewing code syntax for tidyverse and related functions. All analytical decisions — the choice of Case Study 1, the selection of overall satisfaction as the dependent variable, the decision to use Spearman rather than Pearson correlation given the ordinal nature of Likert data, the interpretation of all statistical outputs, and the business recommendations — were made independently by the author. No AI tool generated the professional disclosure, the data collection narrative, or the substantive interpretation of results. The author takes full responsibility for all analytical judgements in this document.