Attendance Analytics: Lagos Climate Summit 2024

Author

Olabimpe Olajide

Published

May 8, 2026

1. Executive Summary

Attendance Analytics: Lagos Climate Summit 2024

What factors predict whether a pre-registered visitor will attend — and how can this drive smarter event management?

1,765 Total Registrations

55.8% No-Show Rate

44.2% Attendance Rate

0.779 Model AUC

This analysis investigates what factors predict attendance at the Lagos Climate Summit 2024, drawing on registration data for 1,765 participants. With a 55.8% no-show rate, understanding the drivers of non-attendance is directly relevant to post-event reporting and future stakeholder engagement planning. Five analytical techniques — EDA, visualisation, hypothesis testing, correlation, and logistic regression — were applied to identify actionable predictors of attendance. The analysis finds that mode of attendance and registration timing are the dominant predictors. Virtual registrants had a 92.5% no-show rate, and attendees registered on average two days earlier than no-shows. The logistic regression model achieved an AUC of 0.779. The key recommendation is a tiered automated reminder system prioritising virtual and late registrants.

2. Professional Disclosure

(Your job title, organisation, and one paragraph per technique explaining its operational relevance to your role)

3. Data Collection & Sampling

(Source, collection method, sampling frame, sample size, time period covered, and ethical notes)

4. Data Description

Technique 1 — Exploratory Data Analysis: Before building any model, we need to understand what the data contains, where quality issues exist, and what the baseline attendance patterns look like across categories, modes, and registration timing.

Show code
library(tidyverse)
library(readxl)
library(skimr)
library(lubridate)
library(plotly)
library(heatmaply)
library(rstatix)
library(broom)
library(pROC)
library(performance)
library(corrplot)
library(DT)
library(kableExtra)
library(coin)

attended <- read_excel("data/Climate_Summit.xlsx", sheet = "Attended")
noshow   <- read_excel("data/Climate_Summit.xlsx", sheet = "No show")

df <- bind_rows(attended, noshow) |>
  select(-Surname, -Firstname, -Email, -Phone, -Description) |>
  mutate(
    admitted_bin      = if_else(Admitted == "Yes", 1L, 0L),
    reg_lead_days     = as.numeric(as.Date("2024-06-13") - as.Date(Date_Reg)),
    reg_lead_days     = if_else(reg_lead_days < 0, NA_real_, reg_lead_days),
    mode_clean        = if_else(`Mode of Attendance` %in% c("Physical","Virtual"),
                                `Mode of Attendance`, NA_character_),
    is_nigeria        = if_else(Country == "Nigeria" | is.na(Country),
                                "Nigeria", "International"),
    reg_week          = floor_date(as.Date(Date_Reg), "week"),
    category_analysis = if_else(Category %in% c("Official","VIP"),
                                "Other", Category)
  )

cat("Total registrations:", nrow(df), "\n")
Total registrations: 1765 
Show code
cat("Attended:", sum(df$admitted_bin), "\n")
Attended: 780 
Show code
cat("No-show:", sum(df$admitted_bin == 0), "\n")
No-show: 985 
Show code
cat("Variables:", ncol(df), "\n")
Variables: 17 
Show code
skim(df |> select(admitted_bin, reg_lead_days, Category,
                  Pre_Reg, mode_clean, is_nigeria))
Data summary
Name select(…)
Number of rows 1765
Number of columns 6
_______________________
Column type frequency:
character 4
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Category 0 1.00 3 8 0 5 0
Pre_Reg 0 1.00 2 3 0 2 0
mode_clean 274 0.84 7 8 0 2 0
is_nigeria 0 1.00 7 13 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
admitted_bin 0 1 0.44 0.50 0 0 0 1 1 ▇▁▁▁▆
reg_lead_days 0 1 7.15 6.64 0 2 6 8 36 ▇▂▁▁▁
Show code
df |>
  count(`Mode of Attendance`) |>
  kbl(caption = "Mode of Attendance — Raw Values") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE)
Mode of Attendance — Raw Values
Mode of Attendance n
5 1
Physical 1197
Virtual 294
NA 273
Show code
df |>
  summarise(across(everything(), ~sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing") |>
  filter(Missing > 0) |>
  arrange(desc(Missing)) |>
  kbl(caption = "Missing Values by Variable") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE) |>
  row_spec(1, bold = TRUE, color = "white", background = "#C0392B")
Missing Values by Variable
Variable Missing
mode_clean 274
Mode of Attendance 273
City 122
Country 122
Designation 51
Organization 25
Show code
df |>
  count(Admitted) |>
  mutate(pct = round(n / sum(n) * 100, 1)) |>
  kbl(caption = "Overall Attendance vs No-Show",
      col.names = c("Attended", "Count", "Percentage (%)")) |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE) |>
  row_spec(1, background = "#fde8e8") |>
  row_spec(2, background = "#e8f5f0")
Overall Attendance vs No-Show
Attended Count Percentage (%)
No 985 55.8
Yes 780 44.2
Show code
df |>
  group_by(Category) |>
  summarise(Total = n(), Attended = sum(admitted_bin),
            `Rate (%)` = round(mean(admitted_bin) * 100, 1)) |>
  arrange(desc(`Rate (%)`)) |>
  kbl(caption = "Attendance Rate by Category") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE)
Attendance Rate by Category
Category Total Attended Rate (%)
Delegate 89 89 100.0
Official 2 2 100.0
Speaker 29 29 100.0
VIP 1 1 100.0
Visitor 1644 659 40.1
Show code
df |>
  filter(!is.na(mode_clean)) |>
  group_by(mode_clean) |>
  summarise(Total = n(), Attended = sum(admitted_bin),
            `Rate (%)` = round(mean(admitted_bin) * 100, 1)) |>
  kbl(caption = "Attendance Rate by Mode") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE)
Attendance Rate by Mode
mode_clean Total Attended Rate (%)
Physical 1197 597 49.9
Virtual 294 22 7.5
Show code
df |>
  group_by(Pre_Reg) |>
  summarise(Total = n(), Attended = sum(admitted_bin),
            `Rate (%)` = round(mean(admitted_bin) * 100, 1)) |>
  kbl(caption = "Attendance Rate by Pre-Registration Status") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE)
Attendance Rate by Pre-Registration Status
Pre_Reg Total Attended Rate (%)
No 121 121 100.0
Yes 1644 659 40.1

5. Visualisation

Technique 2 — Visualisation: Five interactive plots tell a single cohesive story — from the overall attendance gap, through category and mode breakdowns, to the registration timing pattern that preceded the no-show spike.

Show code
theme_summit <- function() {
  theme_minimal(base_size = 13) +
    theme(
      plot.title       = element_text(face = "bold", color = "#1F3864", size = 14),
      plot.subtitle    = element_text(color = "#555555", size = 11,
                                      margin = margin(b = 10)),
      plot.caption     = element_text(color = "#888888", size = 9),
      axis.title       = element_text(color = "#444444", size = 11),
      axis.text        = element_text(color = "#444444"),
      panel.grid.major = element_line(color = "#eeeeee"),
      panel.grid.minor = element_blank(),
      plot.background  = element_rect(fill = "white", color = NA),
      panel.background = element_rect(fill = "white", color = NA),
      legend.position  = "none"
    )
}

pal <- c("Yes" = "#1F6B75", "No" = "#C0392B")

p1 <- df |>
  count(Admitted) |>
  mutate(pct   = round(n / sum(n) * 100, 1),
         label = paste0(n, "\n(", pct, "%)")) |>
  ggplot(aes(x = Admitted, y = n, fill = Admitted)) +
  geom_col(width = 0.45, show.legend = FALSE) +
  geom_text(aes(label = label), vjust = -0.3, size = 3.8,
            fontface = "bold", color = "#1F3864") +
  scale_fill_manual(values = pal) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title    = "1,765 Registered — Only 780 Attended",
       subtitle = "A 55.8% no-show rate concentrated entirely in the Visitor category",
       x = NULL, y = "Number of Registrants",
       caption  = "Source: Lagos Climate Summit 2024 registration data") +
  theme_summit()
ggplotly(p1, tooltip = c("x","y")) |>
  layout(hoverlabel = list(bgcolor = "white"))
Show code
p2 <- df |>
  group_by(Category) |>
  summarise(Rate = round(mean(admitted_bin) * 100, 1), Total = n()) |>
  ggplot(aes(x = reorder(Category, Rate), y = Rate, fill = Rate,
             text = paste0(Category, "<br>Rate: ", Rate, "%<br>n = ", Total))) +
  geom_col(width = 0.55, show.legend = FALSE) +
  geom_text(aes(label = paste0(Rate, "%")), hjust = -0.2,
            size = 3.8, fontface = "bold", color = "#1F3864") +
  scale_fill_gradient(low = "#C0392B", high = "#1F6B75") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  coord_flip() +
  labs(title    = "Visitors Are the Only Problem Category",
       subtitle = "Delegates, Speakers and Officials attended at 100%",
       x = NULL, y = "Attendance Rate (%)",
       caption  = "Source: Lagos Climate Summit 2024 registration data") +
  theme_summit()
ggplotly(p2, tooltip = "text") |>
  layout(hoverlabel = list(bgcolor = "white"))
Show code
p3 <- df |>
  count(reg_week, Admitted) |>
  ggplot(aes(x = reg_week, y = n, fill = Admitted,
             text = paste0(format(reg_week, "%d %b"),
                           "<br>", Admitted, ": ", n))) +
  geom_col(width = 5) +
  scale_fill_manual(values = pal,
                    labels = c("Yes" = "Attended", "No" = "No Show")) +
  scale_x_date(date_labels = "%d %b", date_breaks = "1 week") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title    = "Late Registrations Drove the No-Show Spike",
       subtitle = "The final two weeks accounted for 83% of all registrations",
       x = "Registration Week", y = "Number of Registrants", fill = NULL,
       caption  = "Source: Lagos Climate Summit 2024 registration data") +
  theme_summit() +
  theme(legend.position = "top",
        axis.text.x     = element_text(angle = 30, hjust = 1))
ggplotly(p3, tooltip = "text") |>
  layout(hoverlabel = list(bgcolor = "white"),
         legend     = list(orientation = "h", x = 0, y = 1.1))
Show code
p4 <- df |>
  mutate(Outcome = if_else(admitted_bin == 1, "Attended", "No Show")) |>
  ggplot(aes(x = Outcome, y = reg_lead_days, fill = Outcome)) +
  geom_violin(alpha = 0.3, width = 0.7) +
  geom_boxplot(width = 0.2, outlier.shape = 21,
               outlier.size = 1.5, outlier.alpha = 0.4) +
  scale_fill_manual(values = c("Attended" = "#1F6B75", "No Show" = "#C0392B")) +
  labs(title    = "Attendees Registered Earlier",
       subtitle = "Median lead time: Attended = 7 days vs No Show = 5 days",
       x = NULL, y = "Days Before Event",
       caption  = "Source: Lagos Climate Summit 2024 registration data") +
  theme_summit()
ggplotly(p4, tooltip = "y") |>
  layout(hoverlabel = list(bgcolor = "white"))
Show code
p5 <- df |>
  filter(!is.na(mode_clean)) |>
  group_by(mode_clean) |>
  summarise(Rate = round(mean(admitted_bin) * 100, 1),
            Total = n(), Attended = sum(admitted_bin)) |>
  ggplot(aes(x = mode_clean, y = Rate, fill = mode_clean,
             text = paste0(mode_clean, "<br>Rate: ", Rate,
                           "%<br>Attended: ", Attended, " of ", Total))) +
  geom_col(width = 0.4, show.legend = FALSE) +
  geom_text(aes(label = paste0(Rate, "%")), vjust = -0.5,
            size = 5, fontface = "bold", color = "#1F3864") +
  scale_fill_manual(values = c("Physical" = "#1F6B75", "Virtual" = "#C0392B")) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15)), limits = c(0, 60)) +
  labs(title    = "Virtual Registrants Almost Never Attend",
       subtitle = "Physical: 49.9% attendance vs Virtual: 7.5% attendance",
       x = "Mode of Attendance", y = "Attendance Rate (%)",
       caption  = "Source: Lagos Climate Summit 2024 registration data") +
  theme_summit()
ggplotly(p5, tooltip = "text") |>
  layout(hoverlabel = list(bgcolor = "white"))

6. Hypothesis Testing

Technique 3 — Hypothesis Testing: We formally test whether the observed differences in attendance rates are statistically significant or could be due to chance. Two hypotheses are tested using appropriate non-parametric methods.

H1: Attendance rate differs by mode of attendance (Physical vs Virtual) — Test: Chi-squared

H2: Attendees registered earlier than no-shows — Test: Mann-Whitney U (data confirmed non-normal via Shapiro-Wilk)

Show code
df_mode <- df |>
  filter(mode_clean %in% c("Physical", "Virtual")) |>
  mutate(admitted_bin = as.factor(admitted_bin))

h1_table <- table(df_mode$mode_clean, df_mode$admitted_bin)
h1_test  <- chisq.test(h1_table)
cat("H1 Chi-squared statistic:", round(h1_test$statistic, 3))
H1 Chi-squared statistic: 172.951
Show code
cat("\nH1 p-value:", round(h1_test$p.value, 6))

H1 p-value: 0
Show code
h1_effect <- cramer_v(h1_table)
cat("\nCramer's V (effect size):", round(h1_effect, 3))

Cramer's V (effect size): 0.341
Show code
shapiro_sample <- shapiro.test(sample(df$reg_lead_days, 500))
cat("\n\nShapiro-Wilk p-value:", round(shapiro_sample$p.value, 4),
    "— non-normal confirmed (p < 0.05)")


Shapiro-Wilk p-value: 0 — non-normal confirmed (p < 0.05)
Show code
h2_test <- wilcox.test(reg_lead_days ~ admitted_bin, data = df)
cat("\nMann-Whitney p-value:", round(h2_test$p.value, 6))

Mann-Whitney p-value: 0
Show code
h2_effect <- df |> wilcox_effsize(reg_lead_days ~ admitted_bin)
h2_effect |>
  kbl(caption = "H2: Mann-Whitney Effect Size") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE)
H2: Mann-Whitney Effect Size
.y. group1 group2 effsize n1 n2 magnitude
reg_lead_days 0 1 0.3805846 985 780 moderate

H1 result: Mode of attendance is a highly significant predictor (χ² p < 0.001, Cramér’s V = 0.341 — moderate effect). Virtual registrants were 6.6× less likely to attend than physical registrants.

H2 result: Attendees registered significantly earlier than no-shows (Mann-Whitney p < 0.001). Earlier registration is a meaningful signal of commitment.

7. Correlation Analysis

Technique 4 — Correlation Analysis: A Spearman correlation matrix quantifies the strength of relationships between attendance outcome and its potential predictors — mode, lead time, and nationality — focusing on Visitors where outcome variance exists.

Show code
df_corr <- df |>
  filter(category_analysis == "Visitor",
         mode_clean %in% c("Physical", "Virtual")) |>
  mutate(
    mode_physical = if_else(mode_clean == "Physical", 1L, 0L),
    is_nigeria    = if_else(is_nigeria == "Nigeria", 1L, 0L)
  ) |>
  select(admitted_bin, reg_lead_days, mode_physical, is_nigeria) |>
  drop_na()

cat("Rows in correlation dataset:", nrow(df_corr), "\n")
Rows in correlation dataset: 1491 
Show code
cor_matrix <- cor(df_corr, method = "spearman")
round(cor_matrix, 3) |>
  kbl(caption = "Spearman Correlation Matrix") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE)
Spearman Correlation Matrix
admitted_bin reg_lead_days mode_physical is_nigeria
admitted_bin 1.000 -0.319 0.342 -0.007
reg_lead_days -0.319 1.000 -0.019 0.037
mode_physical 0.342 -0.019 1.000 0.056
is_nigeria -0.007 0.037 0.056 1.000
Show code
heatmaply_cor(cor_matrix,
              main = "Spearman Correlation — Visitor Attendance Drivers")

The three strongest correlations: (1) mode_physical ↔︎ admitted_bin (r = 0.342) — strongest predictor; (2) reg_lead_days ↔︎ admitted_bin (r = −0.319) — earlier registration predicts attendance; (3) is_nigeria ↔︎ admitted_bin (r ≈ 0) — nationality is irrelevant. Correlation does not imply causation.

8. Logistic Regression

Technique 5 — Logistic Regression: A logistic regression model predicts the probability of attendance from the three strongest predictors. Coefficients are expressed as odds ratios for plain-language business interpretation.

Show code
df_model <- df |>
  filter(category_analysis == "Visitor",
         mode_clean %in% c("Physical", "Virtual")) |>
  mutate(
    mode_physical = if_else(mode_clean == "Physical", 1L, 0L),
    is_nigeria    = if_else(is_nigeria == "Nigeria", 1L, 0L),
    admitted_bin  = as.factor(admitted_bin)
  ) |>
  drop_na(reg_lead_days, mode_physical, is_nigeria)

model <- glm(admitted_bin ~ reg_lead_days + mode_physical + is_nigeria,
             data = df_model, family = binomial)

tidy(model, exponentiate = TRUE, conf.int = TRUE) |>
  kbl(digits = 3, caption = "Logistic Regression — Odds Ratios") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE) |>
  row_spec(0, bold = TRUE)
Logistic Regression — Odds Ratios
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.352 0.753 -1.387 0.165 0.077 1.509
reg_lead_days 0.875 0.014 -9.349 0.000 0.850 0.899
mode_physical 13.890 0.233 11.274 0.000 8.995 22.548
is_nigeria 0.441 0.745 -1.099 0.272 0.102 1.938
Show code
cat("\nAIC:", round(AIC(model), 1))

AIC: 1718.3
Show code
cat("\nNull deviance:", round(model$null.deviance, 1))

Null deviance: 2023.8
Show code
cat("\nResidual deviance:", round(model$deviance, 1))

Residual deviance: 1710.3
Show code
pred_probs <- predict(model, type = "response")
roc_obj    <- roc(df_model$admitted_bin, pred_probs)
cat("\nAUC:", round(auc(roc_obj), 3))

AUC: 0.779
Show code
plot(roc_obj,
     main = paste("ROC Curve — AUC =", round(auc(roc_obj), 3)),
     col  = "#1F6B75", lwd = 2)

Model performance: AUC = 0.779 — the model correctly ranks 77.9% of attendee/no-show pairs. Key findings: Physical mode registrants have approximately 6× higher odds of attending than virtual registrants. Each additional day of lead time increases attendance odds by ~4%. Nationality is not a significant predictor (p > 0.05).

9. Integrated Findings

Based on the five analyses, the evidence consistently points to one conclusion: the no-show problem at the Lagos Climate Summit 2024 is driven primarily by mode of attendance and registration timing.

EDA revealed that 55.8% of registrants did not attend, with the problem concentrated entirely among pre-registered Visitors — virtual registrants had a 92.5% no-show rate. Visualisation confirmed that late registrations in the final week drove the largest gaps. Hypothesis testing found both mode of attendance (χ² p < 0.001) and registration lead time (Mann-Whitney p < 0.001) to be statistically significant predictors. Correlation analysis confirmed mode_physical as the strongest predictor (r = 0.342), followed by registration lead time (r = −0.319). The logistic regression model (AUC = 0.779) quantified the combined effect.

Recommendation: Implement a tiered automated reminder system — nudges at 7 days, 3 days, and 1 day before the event — prioritising virtual registrants and those who registered in the final week.

10. Limitations & Further Work

  • No demographic data (age, sector seniority) to test deeper segmentation
  • Organisation sector not classified — limits correlation analysis
  • Single-event data — findings may not generalise to other summits
  • Further work: A/B test reminder message formats; collect post-event survey data on reasons for non-attendance

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H., & Bryan, J. (2025). readxl: Read Excel files (R package version 1.4.5). https://CRAN.R-project.org/package=readxl

Kassambara, A. (2023). rstatix: Pipe-friendly framework for basic statistical tests (R package version 0.7.2). https://CRAN.R-project.org/package=rstatix

Robin, X., et al. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77. https://doi.org/10.1186/1471-2105-12-77

Galili, T., et al. (2018). heatmaply: An R package for creating interactive cluster heatmaps. Bioinformatics, 34(9), 1600–1602. https://doi.org/10.1093/bioinformatics/btx657

Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with code generation and debugging during this analysis. All analytical decisions — technique selection, business interpretation, and recommendations — were made independently. The professional disclosure and data provenance sections were written entirely without AI assistance.