Attendance Analytics: Lagos Climate Summit 2024
1. Executive Summary
Attendance Analytics: Lagos Climate Summit 2024
What factors predict whether a pre-registered visitor will attend — and how can this drive smarter event management?
1,765 Total Registrations
55.8% No-Show Rate
44.2% Attendance Rate
0.779 Model AUC
This analysis investigates what factors predict attendance at the Lagos Climate Summit 2024, drawing on registration data for 1,765 participants. With a 55.8% no-show rate, understanding the drivers of non-attendance is directly relevant to post-event reporting and future stakeholder engagement planning. Five analytical techniques — EDA, visualisation, hypothesis testing, correlation, and logistic regression — were applied to identify actionable predictors of attendance. The analysis finds that mode of attendance and registration timing are the dominant predictors. Virtual registrants had a 92.5% no-show rate, and attendees registered on average two days earlier than no-shows. The logistic regression model achieved an AUC of 0.779. The key recommendation is a tiered automated reminder system prioritising virtual and late registrants.
2. Professional Disclosure
(Your job title, organisation, and one paragraph per technique explaining its operational relevance to your role)
3. Data Collection & Sampling
(Source, collection method, sampling frame, sample size, time period covered, and ethical notes)
4. Data Description
Technique 1 — Exploratory Data Analysis: Before building any model, we need to understand what the data contains, where quality issues exist, and what the baseline attendance patterns look like across categories, modes, and registration timing.
Show code
library(tidyverse)
library(readxl)
library(skimr)
library(lubridate)
library(plotly)
library(heatmaply)
library(rstatix)
library(broom)
library(pROC)
library(performance)
library(corrplot)
library(DT)
library(kableExtra)
library(coin)
attended <- read_excel("data/Climate_Summit.xlsx", sheet = "Attended")
noshow <- read_excel("data/Climate_Summit.xlsx", sheet = "No show")
df <- bind_rows(attended, noshow) |>
select(-Surname, -Firstname, -Email, -Phone, -Description) |>
mutate(
admitted_bin = if_else(Admitted == "Yes", 1L, 0L),
reg_lead_days = as.numeric(as.Date("2024-06-13") - as.Date(Date_Reg)),
reg_lead_days = if_else(reg_lead_days < 0, NA_real_, reg_lead_days),
mode_clean = if_else(`Mode of Attendance` %in% c("Physical","Virtual"),
`Mode of Attendance`, NA_character_),
is_nigeria = if_else(Country == "Nigeria" | is.na(Country),
"Nigeria", "International"),
reg_week = floor_date(as.Date(Date_Reg), "week"),
category_analysis = if_else(Category %in% c("Official","VIP"),
"Other", Category)
)
cat("Total registrations:", nrow(df), "\n")Total registrations: 1765
Show code
cat("Attended:", sum(df$admitted_bin), "\n")Attended: 780
Show code
cat("No-show:", sum(df$admitted_bin == 0), "\n")No-show: 985
Show code
cat("Variables:", ncol(df), "\n")Variables: 17
Show code
skim(df |> select(admitted_bin, reg_lead_days, Category,
Pre_Reg, mode_clean, is_nigeria))| Name | select(…) |
| Number of rows | 1765 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Category | 0 | 1.00 | 3 | 8 | 0 | 5 | 0 |
| Pre_Reg | 0 | 1.00 | 2 | 3 | 0 | 2 | 0 |
| mode_clean | 274 | 0.84 | 7 | 8 | 0 | 2 | 0 |
| is_nigeria | 0 | 1.00 | 7 | 13 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| admitted_bin | 0 | 1 | 0.44 | 0.50 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▆ |
| reg_lead_days | 0 | 1 | 7.15 | 6.64 | 0 | 2 | 6 | 8 | 36 | ▇▂▁▁▁ |
Show code
df |>
count(`Mode of Attendance`) |>
kbl(caption = "Mode of Attendance — Raw Values") |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE)| Mode of Attendance | n |
|---|---|
| 5 | 1 |
| Physical | 1197 |
| Virtual | 294 |
| NA | 273 |
Show code
df |>
summarise(across(everything(), ~sum(is.na(.)))) |>
pivot_longer(everything(), names_to = "Variable", values_to = "Missing") |>
filter(Missing > 0) |>
arrange(desc(Missing)) |>
kbl(caption = "Missing Values by Variable") |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE) |>
row_spec(1, bold = TRUE, color = "white", background = "#C0392B")| Variable | Missing |
|---|---|
| mode_clean | 274 |
| Mode of Attendance | 273 |
| City | 122 |
| Country | 122 |
| Designation | 51 |
| Organization | 25 |
Show code
df |>
count(Admitted) |>
mutate(pct = round(n / sum(n) * 100, 1)) |>
kbl(caption = "Overall Attendance vs No-Show",
col.names = c("Attended", "Count", "Percentage (%)")) |>
kable_styling(bootstrap_options = c("striped","hover"),
full_width = FALSE) |>
row_spec(1, background = "#fde8e8") |>
row_spec(2, background = "#e8f5f0")| Attended | Count | Percentage (%) |
|---|---|---|
| No | 985 | 55.8 |
| Yes | 780 | 44.2 |
Show code
df |>
group_by(Category) |>
summarise(Total = n(), Attended = sum(admitted_bin),
`Rate (%)` = round(mean(admitted_bin) * 100, 1)) |>
arrange(desc(`Rate (%)`)) |>
kbl(caption = "Attendance Rate by Category") |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE)| Category | Total | Attended | Rate (%) |
|---|---|---|---|
| Delegate | 89 | 89 | 100.0 |
| Official | 2 | 2 | 100.0 |
| Speaker | 29 | 29 | 100.0 |
| VIP | 1 | 1 | 100.0 |
| Visitor | 1644 | 659 | 40.1 |
Show code
df |>
filter(!is.na(mode_clean)) |>
group_by(mode_clean) |>
summarise(Total = n(), Attended = sum(admitted_bin),
`Rate (%)` = round(mean(admitted_bin) * 100, 1)) |>
kbl(caption = "Attendance Rate by Mode") |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE)| mode_clean | Total | Attended | Rate (%) |
|---|---|---|---|
| Physical | 1197 | 597 | 49.9 |
| Virtual | 294 | 22 | 7.5 |
Show code
df |>
group_by(Pre_Reg) |>
summarise(Total = n(), Attended = sum(admitted_bin),
`Rate (%)` = round(mean(admitted_bin) * 100, 1)) |>
kbl(caption = "Attendance Rate by Pre-Registration Status") |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE)| Pre_Reg | Total | Attended | Rate (%) |
|---|---|---|---|
| No | 121 | 121 | 100.0 |
| Yes | 1644 | 659 | 40.1 |
5. Visualisation
Technique 2 — Visualisation: Five interactive plots tell a single cohesive story — from the overall attendance gap, through category and mode breakdowns, to the registration timing pattern that preceded the no-show spike.
Show code
theme_summit <- function() {
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = "#1F3864", size = 14),
plot.subtitle = element_text(color = "#555555", size = 11,
margin = margin(b = 10)),
plot.caption = element_text(color = "#888888", size = 9),
axis.title = element_text(color = "#444444", size = 11),
axis.text = element_text(color = "#444444"),
panel.grid.major = element_line(color = "#eeeeee"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
panel.background = element_rect(fill = "white", color = NA),
legend.position = "none"
)
}
pal <- c("Yes" = "#1F6B75", "No" = "#C0392B")
p1 <- df |>
count(Admitted) |>
mutate(pct = round(n / sum(n) * 100, 1),
label = paste0(n, "\n(", pct, "%)")) |>
ggplot(aes(x = Admitted, y = n, fill = Admitted)) +
geom_col(width = 0.45, show.legend = FALSE) +
geom_text(aes(label = label), vjust = -0.3, size = 3.8,
fontface = "bold", color = "#1F3864") +
scale_fill_manual(values = pal) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
labs(title = "1,765 Registered — Only 780 Attended",
subtitle = "A 55.8% no-show rate concentrated entirely in the Visitor category",
x = NULL, y = "Number of Registrants",
caption = "Source: Lagos Climate Summit 2024 registration data") +
theme_summit()
ggplotly(p1, tooltip = c("x","y")) |>
layout(hoverlabel = list(bgcolor = "white"))Show code
p2 <- df |>
group_by(Category) |>
summarise(Rate = round(mean(admitted_bin) * 100, 1), Total = n()) |>
ggplot(aes(x = reorder(Category, Rate), y = Rate, fill = Rate,
text = paste0(Category, "<br>Rate: ", Rate, "%<br>n = ", Total))) +
geom_col(width = 0.55, show.legend = FALSE) +
geom_text(aes(label = paste0(Rate, "%")), hjust = -0.2,
size = 3.8, fontface = "bold", color = "#1F3864") +
scale_fill_gradient(low = "#C0392B", high = "#1F6B75") +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
coord_flip() +
labs(title = "Visitors Are the Only Problem Category",
subtitle = "Delegates, Speakers and Officials attended at 100%",
x = NULL, y = "Attendance Rate (%)",
caption = "Source: Lagos Climate Summit 2024 registration data") +
theme_summit()
ggplotly(p2, tooltip = "text") |>
layout(hoverlabel = list(bgcolor = "white"))Show code
p3 <- df |>
count(reg_week, Admitted) |>
ggplot(aes(x = reg_week, y = n, fill = Admitted,
text = paste0(format(reg_week, "%d %b"),
"<br>", Admitted, ": ", n))) +
geom_col(width = 5) +
scale_fill_manual(values = pal,
labels = c("Yes" = "Attended", "No" = "No Show")) +
scale_x_date(date_labels = "%d %b", date_breaks = "1 week") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
labs(title = "Late Registrations Drove the No-Show Spike",
subtitle = "The final two weeks accounted for 83% of all registrations",
x = "Registration Week", y = "Number of Registrants", fill = NULL,
caption = "Source: Lagos Climate Summit 2024 registration data") +
theme_summit() +
theme(legend.position = "top",
axis.text.x = element_text(angle = 30, hjust = 1))
ggplotly(p3, tooltip = "text") |>
layout(hoverlabel = list(bgcolor = "white"),
legend = list(orientation = "h", x = 0, y = 1.1))Show code
p4 <- df |>
mutate(Outcome = if_else(admitted_bin == 1, "Attended", "No Show")) |>
ggplot(aes(x = Outcome, y = reg_lead_days, fill = Outcome)) +
geom_violin(alpha = 0.3, width = 0.7) +
geom_boxplot(width = 0.2, outlier.shape = 21,
outlier.size = 1.5, outlier.alpha = 0.4) +
scale_fill_manual(values = c("Attended" = "#1F6B75", "No Show" = "#C0392B")) +
labs(title = "Attendees Registered Earlier",
subtitle = "Median lead time: Attended = 7 days vs No Show = 5 days",
x = NULL, y = "Days Before Event",
caption = "Source: Lagos Climate Summit 2024 registration data") +
theme_summit()
ggplotly(p4, tooltip = "y") |>
layout(hoverlabel = list(bgcolor = "white"))Show code
p5 <- df |>
filter(!is.na(mode_clean)) |>
group_by(mode_clean) |>
summarise(Rate = round(mean(admitted_bin) * 100, 1),
Total = n(), Attended = sum(admitted_bin)) |>
ggplot(aes(x = mode_clean, y = Rate, fill = mode_clean,
text = paste0(mode_clean, "<br>Rate: ", Rate,
"%<br>Attended: ", Attended, " of ", Total))) +
geom_col(width = 0.4, show.legend = FALSE) +
geom_text(aes(label = paste0(Rate, "%")), vjust = -0.5,
size = 5, fontface = "bold", color = "#1F3864") +
scale_fill_manual(values = c("Physical" = "#1F6B75", "Virtual" = "#C0392B")) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15)), limits = c(0, 60)) +
labs(title = "Virtual Registrants Almost Never Attend",
subtitle = "Physical: 49.9% attendance vs Virtual: 7.5% attendance",
x = "Mode of Attendance", y = "Attendance Rate (%)",
caption = "Source: Lagos Climate Summit 2024 registration data") +
theme_summit()
ggplotly(p5, tooltip = "text") |>
layout(hoverlabel = list(bgcolor = "white"))6. Hypothesis Testing
Technique 3 — Hypothesis Testing: We formally test whether the observed differences in attendance rates are statistically significant or could be due to chance. Two hypotheses are tested using appropriate non-parametric methods.
H1: Attendance rate differs by mode of attendance (Physical vs Virtual) — Test: Chi-squared
H2: Attendees registered earlier than no-shows — Test: Mann-Whitney U (data confirmed non-normal via Shapiro-Wilk)
Show code
df_mode <- df |>
filter(mode_clean %in% c("Physical", "Virtual")) |>
mutate(admitted_bin = as.factor(admitted_bin))
h1_table <- table(df_mode$mode_clean, df_mode$admitted_bin)
h1_test <- chisq.test(h1_table)
cat("H1 Chi-squared statistic:", round(h1_test$statistic, 3))H1 Chi-squared statistic: 172.951
Show code
cat("\nH1 p-value:", round(h1_test$p.value, 6))
H1 p-value: 0
Show code
h1_effect <- cramer_v(h1_table)
cat("\nCramer's V (effect size):", round(h1_effect, 3))
Cramer's V (effect size): 0.341
Show code
shapiro_sample <- shapiro.test(sample(df$reg_lead_days, 500))
cat("\n\nShapiro-Wilk p-value:", round(shapiro_sample$p.value, 4),
"— non-normal confirmed (p < 0.05)")
Shapiro-Wilk p-value: 0 — non-normal confirmed (p < 0.05)
Show code
h2_test <- wilcox.test(reg_lead_days ~ admitted_bin, data = df)
cat("\nMann-Whitney p-value:", round(h2_test$p.value, 6))
Mann-Whitney p-value: 0
Show code
h2_effect <- df |> wilcox_effsize(reg_lead_days ~ admitted_bin)
h2_effect |>
kbl(caption = "H2: Mann-Whitney Effect Size") |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE)| .y. | group1 | group2 | effsize | n1 | n2 | magnitude |
|---|---|---|---|---|---|---|
| reg_lead_days | 0 | 1 | 0.3805846 | 985 | 780 | moderate |
H1 result: Mode of attendance is a highly significant predictor (χ² p < 0.001, Cramér’s V = 0.341 — moderate effect). Virtual registrants were 6.6× less likely to attend than physical registrants.
H2 result: Attendees registered significantly earlier than no-shows (Mann-Whitney p < 0.001). Earlier registration is a meaningful signal of commitment.
7. Correlation Analysis
Technique 4 — Correlation Analysis: A Spearman correlation matrix quantifies the strength of relationships between attendance outcome and its potential predictors — mode, lead time, and nationality — focusing on Visitors where outcome variance exists.
Show code
df_corr <- df |>
filter(category_analysis == "Visitor",
mode_clean %in% c("Physical", "Virtual")) |>
mutate(
mode_physical = if_else(mode_clean == "Physical", 1L, 0L),
is_nigeria = if_else(is_nigeria == "Nigeria", 1L, 0L)
) |>
select(admitted_bin, reg_lead_days, mode_physical, is_nigeria) |>
drop_na()
cat("Rows in correlation dataset:", nrow(df_corr), "\n")Rows in correlation dataset: 1491
Show code
cor_matrix <- cor(df_corr, method = "spearman")
round(cor_matrix, 3) |>
kbl(caption = "Spearman Correlation Matrix") |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE)| admitted_bin | reg_lead_days | mode_physical | is_nigeria | |
|---|---|---|---|---|
| admitted_bin | 1.000 | -0.319 | 0.342 | -0.007 |
| reg_lead_days | -0.319 | 1.000 | -0.019 | 0.037 |
| mode_physical | 0.342 | -0.019 | 1.000 | 0.056 |
| is_nigeria | -0.007 | 0.037 | 0.056 | 1.000 |
Show code
heatmaply_cor(cor_matrix,
main = "Spearman Correlation — Visitor Attendance Drivers")The three strongest correlations: (1) mode_physical ↔︎ admitted_bin (r = 0.342) — strongest predictor; (2) reg_lead_days ↔︎ admitted_bin (r = −0.319) — earlier registration predicts attendance; (3) is_nigeria ↔︎ admitted_bin (r ≈ 0) — nationality is irrelevant. Correlation does not imply causation.
8. Logistic Regression
Technique 5 — Logistic Regression: A logistic regression model predicts the probability of attendance from the three strongest predictors. Coefficients are expressed as odds ratios for plain-language business interpretation.
Show code
df_model <- df |>
filter(category_analysis == "Visitor",
mode_clean %in% c("Physical", "Virtual")) |>
mutate(
mode_physical = if_else(mode_clean == "Physical", 1L, 0L),
is_nigeria = if_else(is_nigeria == "Nigeria", 1L, 0L),
admitted_bin = as.factor(admitted_bin)
) |>
drop_na(reg_lead_days, mode_physical, is_nigeria)
model <- glm(admitted_bin ~ reg_lead_days + mode_physical + is_nigeria,
data = df_model, family = binomial)
tidy(model, exponentiate = TRUE, conf.int = TRUE) |>
kbl(digits = 3, caption = "Logistic Regression — Odds Ratios") |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE) |>
row_spec(0, bold = TRUE)| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 0.352 | 0.753 | -1.387 | 0.165 | 0.077 | 1.509 |
| reg_lead_days | 0.875 | 0.014 | -9.349 | 0.000 | 0.850 | 0.899 |
| mode_physical | 13.890 | 0.233 | 11.274 | 0.000 | 8.995 | 22.548 |
| is_nigeria | 0.441 | 0.745 | -1.099 | 0.272 | 0.102 | 1.938 |
Show code
cat("\nAIC:", round(AIC(model), 1))
AIC: 1718.3
Show code
cat("\nNull deviance:", round(model$null.deviance, 1))
Null deviance: 2023.8
Show code
cat("\nResidual deviance:", round(model$deviance, 1))
Residual deviance: 1710.3
Show code
pred_probs <- predict(model, type = "response")
roc_obj <- roc(df_model$admitted_bin, pred_probs)
cat("\nAUC:", round(auc(roc_obj), 3))
AUC: 0.779
Show code
plot(roc_obj,
main = paste("ROC Curve — AUC =", round(auc(roc_obj), 3)),
col = "#1F6B75", lwd = 2)Model performance: AUC = 0.779 — the model correctly ranks 77.9% of attendee/no-show pairs. Key findings: Physical mode registrants have approximately 6× higher odds of attending than virtual registrants. Each additional day of lead time increases attendance odds by ~4%. Nationality is not a significant predictor (p > 0.05).
9. Integrated Findings
Based on the five analyses, the evidence consistently points to one conclusion: the no-show problem at the Lagos Climate Summit 2024 is driven primarily by mode of attendance and registration timing.
EDA revealed that 55.8% of registrants did not attend, with the problem concentrated entirely among pre-registered Visitors — virtual registrants had a 92.5% no-show rate. Visualisation confirmed that late registrations in the final week drove the largest gaps. Hypothesis testing found both mode of attendance (χ² p < 0.001) and registration lead time (Mann-Whitney p < 0.001) to be statistically significant predictors. Correlation analysis confirmed mode_physical as the strongest predictor (r = 0.342), followed by registration lead time (r = −0.319). The logistic regression model (AUC = 0.779) quantified the combined effect.
Recommendation: Implement a tiered automated reminder system — nudges at 7 days, 3 days, and 1 day before the event — prioritising virtual registrants and those who registered in the final week.
10. Limitations & Further Work
- No demographic data (age, sector seniority) to test deeper segmentation
- Organisation sector not classified — limits correlation analysis
- Single-event data — findings may not generalise to other summits
- Further work: A/B test reminder message formats; collect post-event survey data on reasons for non-attendance
References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online. https://markanalytics.online
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H., & Bryan, J. (2025). readxl: Read Excel files (R package version 1.4.5). https://CRAN.R-project.org/package=readxl
Kassambara, A. (2023). rstatix: Pipe-friendly framework for basic statistical tests (R package version 0.7.2). https://CRAN.R-project.org/package=rstatix
Robin, X., et al. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77. https://doi.org/10.1186/1471-2105-12-77
Galili, T., et al. (2018). heatmaply: An R package for creating interactive cluster heatmaps. Bioinformatics, 34(9), 1600–1602. https://doi.org/10.1093/bioinformatics/btx657
Appendix: AI Usage Statement
Claude (Anthropic) was used to assist with code generation and debugging during this analysis. All analytical decisions — technique selection, business interpretation, and recommendations — were made independently. The professional disclosure and data provenance sections were written entirely without AI assistance.