Lagos Business School  •  April 2026
Capstone Case Study
Data Analytics 1

Assessment Brief
Prof Bongo Adi  •  Lagos Business School
📄 Download Word (.docx)
⚠️ SELECT ONE CASE STUDY ONLY

Three case studies are provided below. You are required to complete exactly one (1) of them. Read all three before choosing — select the case that best fits the data you can realistically collect from your own professional context.

Submitting more than one case study will result in only the first submitted being marked.

Assessment weight: This case study carries a total of 130 marks, broken into three components: 30 marks for real-data generation and business-operations mapping, 80 marks for the submitted analytical work, and 20 marks for a defence held approximately one week after submission. The 30-mark practical component compensates for the practical assignment portion of the course.  This is an individual assignment — each student must submit and defend their own independent work.  Submission format: Live HTML published on RPubs or Posit Connect Cloud.   Tool: Quarto in RStudio or Positron.
👥 INDIVIDUAL ASSIGNMENT

Each student must independently collect their own data, conduct their own analysis, and submit their own Quarto document. Identical or near-identical submissions, shared datasets presented as independent, or any form of collusion will be treated as academic misconduct. You will be required to defend your work in person — you must be able to explain every line of code and every result.

#ThemeFive TechniquesMarks (A + B + Defence)
CS 1 Exploratory & Inferential Analytics EDA · Visualisation · Hypothesis Testing · Correlation · Regression 130 (30 + 80 + 20)
CS 2 Predictive Modelling & Segmentation Classification · Explainability · Clustering · Dimensionality Reduction · Time Series 130 (30 + 80 + 20)
CS 3 Advanced & Operational Analytics Text Analytics · Monte Carlo · Advanced Forecasting · Customer/People Analytics · Optimisation or Association Rules 130 (30 + 80 + 20)

Each submission must use real data collected from your own workplace, professional practice, or a Nigerian/African organisation you have direct access to. Simulated or publicly downloaded datasets may only supplement primary data — they cannot replace it.


General Instructions & Assessment Philosophy

This assessment asks you to act as a practising data scientist tackling a real problem in your own organisation or sector. You will collect or extract genuine data, frame an analytical question, apply five techniques from the textbook, and communicate your findings through a reproducible Quarto document.

1.1   The Real-Data Requirement 30 marks

The defining feature of this assessment — and the source of the 30-mark practical component — is the use of real data that clearly maps to your business operations. These 30 marks are awarded for the quality, transparency, and operational relevance of your data, not for the analyses themselves. Each submission must include:

Academic integrity: Submitting simulated data presented as real data is academic misconduct and will result in a grade of zero for the entire submission. Prof Bongo Adi may ask you to present your data collection process during a viva voce.

30-Mark Practical Component — Breakdown

Real-Data & Business-Operations ComponentMarks
Primary data collection: documented methodology, source, tools used10
Sampling justification: sample frame, sample size, period, and statistical rationale10
Clear mapping to your business operations: each technique linked to a real decision or process in your organisation10
SECTION A TOTAL30
1.2   Quarto Document Requirements

Your submission is a single Quarto (.qmd) document rendered to HTML. It must:

A minimal YAML header:

---
title: "[Your Case Study Title]"
author: "[Your Full Name]"
date: today
format:
  html:
    theme: flatly
    toc: true
    code-fold: true
    self-contained: true
---
1.3   Standard Document Structure
SectionContent Required
1. Executive Summary 150–200 words: business problem, data collected, key findings, and recommendation.
2. Professional Disclosure Your job title, organisation type/sector, and a paragraph for each technique explaining its operational relevance to your work. (Assessed in the 30-mark component.)
3. Data Collection & Sampling Source, collection method, sampling frame, sample size, time period covered, and ethical notes or consent statement. (Assessed in the 30-mark component.)
4. Data Description Variable names, types, and distributions produced with EDA code.
5–9. Analysis (one section per technique) For each technique: brief theory recap, business justification, code, output, and plain-language interpretation for a non-technical manager.
10. Integrated Findings How do the five analyses fit together? What single recommendation do they collectively support?
11. Limitations & Further Work What would you do differently with more data, time, or computing power?
References APA format. Cite the textbook, R/Python packages (use citation("pkgname")), and data sources.
Appendix: AI Usage Statement One paragraph describing which AI tools (if any) assisted with coding, and where you exercised independent analytical judgement.
1.4   Full Marking Scheme (130 marks total)

Section A — Real-Data & Business-Operations Component (30 marks)

See Section 1.1 breakdown above.

Section B — Submitted Analytical Work (80 marks)

Analytical ComponentMarks
Professional disclosure quality and depth of context linkage8
Correct and appropriate application of each of the five techniques (5 × 10 marks)50
Depth of business interpretation per technique12
Code quality, reproducibility, and document structure6
Integrated conclusion and actionable recommendation4
SECTION B TOTAL80

Section C — Defence / Viva Voce (20 marks) — held approximately one week after submission

The defence is a short individual oral examination (approx. 10–15 minutes) conducted by Prof Bongo Adi or a designated examiner. You will be asked to explain your data collection process, justify your analytical choices, and interpret selected outputs on the spot. No slides are required — bring your live HTML document. The defence confirms that the submitted work is genuinely your own.
Defence ComponentMarks
Ability to explain analytical decisions and justify technique selection8
Correct interpretation of model outputs and statistical results under questioning8
Demonstrated ownership: evidence that the data, code, and conclusions are genuinely yours4
SECTION C TOTAL20
Grand SummaryMarks
Section A — Real-data generation & business-operations mapping30
Section B — Submitted analytical work80
Section C — Defence / viva voce (~1 week after submission)20
GitHub repository (bonus)+5
GRAND TOTAL (excl. bonus)130

Case Study 1 — Exploratory & Inferential Analytics
Theme: Understanding the story in your data before building models  •  Total marks: 130

1.1   Overview

This case study focuses on the first and most important phase of any analysis: understanding what you have. Before fitting models, a rigorous analyst spends significant time on exploratory data analysis, visualisation, and formal statistical testing. You will apply these foundational techniques to data from your own professional context and demonstrate that you can move fluently between exploratory insight and inferential conclusion.

1.2   Required Techniques

#TechniqueBook Reference
1Exploratory Data Analysis (EDA) Ch. 4 — Summary stats, missing-value analysis, outlier detection, Anscombe's Quartet
2Data Visualisation Ch. 5 — Grammar of graphics, chart selection, storytelling with data
3Hypothesis Testing Ch. 6 — t-test, chi-squared, ANOVA, non-parametric alternatives, effect sizes
4Correlation Analysis Ch. 8 — Pearson, Spearman, Kendall; partial correlation; correlation vs causation
5Linear or Logistic Regression Ch. 9 (OLS) or Ch. 13 (logistic) — coefficients, diagnostics, interpretation

1.3   Business Context Examples (illustrative only — use your own)

1.4   Data Requirements

1.5   Specific Deliverables

1.6   Guiding Questions


Case Study 2 — Predictive Modelling & Segmentation
Theme: Building models that predict outcomes and discover hidden groups  •  Total marks: 130

2.1   Overview

Machine learning transforms data into decisions. In this case study you will move beyond description and inference to build predictive and segmentation models on your own data. You will demonstrate that you understand not just how to run a model, but how to evaluate it, explain it, and connect its output to a concrete business action. You will also show that unsupervised techniques can reveal structure that supervised methods assume away.

2.2   Required Techniques

#TechniqueBook Reference
1Classification Model Ch. 12–15 — Logistic regression, decision tree, random forest, or XGBoost
2Model Evaluation & Explainability Ch. 12, 16 — Confusion matrix, ROC/AUC, SHAP values, LIME, feature importance
3Customer/Entity Segmentation (Clustering) Ch. 19–21 — K-Means, hierarchical, DBSCAN; silhouette score; cluster profiling
4Dimensionality Reduction Ch. 22 — PCA, t-SNE, or UMAP; biplot; variance explained
5Time Series Analysis Ch. 23–24 — Decomposition, stationarity test, ARIMA or ETS forecast

2.3   Business Context Examples (illustrative only)

2.4   Data Requirements

2.5   Specific Deliverables

2.6   Guiding Questions


Case Study 3 — Advanced & Operational Analytics
Theme: Specialised methods for text, risk, forecasting, and optimisation  •  Total marks: 130

3.1   Overview

The frontier of business analytics extends well beyond structured tables of numbers. Organisations generate text — customer complaints, employee surveys, board minutes, social media mentions. Decisions involve risk and uncertainty that can be quantified through simulation. Operational systems need demand forecasts and optimal allocation of constrained resources. In this case study you apply five such advanced methods to your own organisational data.

3.2   Required Techniques

#TechniqueBook Reference
1Text Analytics & Sentiment Analysis Ch. 27–28 — TF-IDF, bag-of-words, VADER/AFINN sentiment, topic modelling (LDA)
2Monte Carlo Simulation Ch. 55 — Distribution fitting, simulation workflow, P10/P50/P90, VaR, tornado chart
3Advanced Forecasting Ch. 25–26 — Prophet, LightGBM features, walk-forward CV, or hierarchical forecasting
4Customer / People Analytics Ch. 40–44 or Ch. 53–54 — RFM, CLV, churn, survival analysis, attrition drivers
5Optimisation or Association Rules Ch. 18 (Apriori / FP-Growth / market basket) or Ch. 49 (LP / EOQ / transportation)

3.3   Business Context Examples (illustrative only)

3.4   Data Requirements

3.5   Specific Deliverables

3.6   Guiding Questions


Submission, Data Privacy & Honour Code

4.1   Submission Instructions

Deadline: As announced on the LMS. Late submissions lose 5 marks per day. Extensions require documentation submitted at least 48 hours before the deadline. Contact Prof Bongo Adi at badi@lbs.edu.ng.

4.2   Defence / Viva Voce (Section C — 20 marks)

Approximately one week after the submission deadline, each student will attend a short individual defence session with Prof Bongo Adi or a designated examiner. The session lasts approximately 10–15 minutes.

Tip for defence preparation: For each of your five techniques, be ready to answer: (1) Why this technique for this data? (2) What do the key numbers mean? (3) What is the single most important business implication? (4) What assumption might be violated and how would that affect your conclusion?

4.3   Data Privacy & Ethics

If your data contains personally identifiable information (PII) — names, employee IDs, customer account numbers — you must anonymise before submission. Replace identifiers with codes (Customer_001, Employee_A, etc.). Do not publish raw financial data that your organisation treats as confidential. If in doubt, obtain written permission from your organisation before submitting and include a copy in an appendix.

4.4   Academic Integrity & AI Usage

You may use AI coding assistants (GitHub Copilot, Claude, ChatGPT) to help write code, but the analytical decisions — which technique, which model, how to interpret the output, what to recommend — must be yours. Include a brief AI usage statement at the end of your document (one paragraph) describing what you used AI for and where you made independent judgements. Presenting AI-generated interpretation as your own without disclosure constitutes academic misconduct.

4.5   Useful Resources

ResourceURL
Course textbook (see bibliography) markanalytics.online
Quarto documentation quarto.org/docs
RPubs publishing rpubs.com — sign up free, publish from RStudio in one click
Posit Connect Cloud connect.posit.cloud — free tier for students
GitHub Student Pack education.github.com/pack — free Pro account
NBS Nigeria nigerianstat.gov.ng — household survey & labour force data
CBN Statistics cbn.gov.ng/documents/statbulletin.asp
Paystack Developer API developers.paystack.com — fintech transaction data

Bibliography & Citation Guide

All submitted work must cite sources in APA 7th edition format. The minimum required citations are listed below. Add further references as your analysis demands.

Course Textbook (required citation in every submission)

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Software & Package Citations

R and Python packages must be cited. Use the commands below to retrieve the correct citation for each package you use, then format in APA style.

Software / PackageAPA 7th Citation (or how to retrieve it)
R language R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Python Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. CreateSpace. (For specific version, cite platform.python_version() output.)
Any R package Run citation("packagename") in R and copy the BibTeX or text output. Convert to APA 7 format.
Any Python package Cite via the package's official documentation or JOSS/PyPI entry. Include version: import pkg; pkg.__version__.
tidyverse Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
scikit-learn Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
ggplot2 Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
pandas McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a
Prophet Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45. https://doi.org/10.1080/00031305.2017.1380080
SHAP Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (pp. 4765–4774). Curran Associates.
Quarto Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048

How to Cite Your Data Source

Every dataset you use must be cited. Use the appropriate template below:

Primary data you collected:
[Your Name]. (2026). [Descriptive title of dataset] [Dataset]. Collected from [Organisation/Department], [City, Nigeria]. Data available on request from the author.
Organisational records / internal report:
[Organisation Name]. (Year). [Title of report or data extract] [Internal data]. [Department], [Organisation].
Survey data:
[Your Name]. (2026). [Survey title] [Survey instrument and dataset]. Administered to [population description], [Month Year]. Ethical clearance: [details or N/A].