---
title: "Analysis of KANOS Attendance Data"
author: "Simon Kazooba"
date: "2025-12-18"
output: html_document
---
<style type="text/css">
h1, h2, h3, h4 {
color: navy;
font-family: Arial, sans-serif;
font-weight: bold;
}
</style>
## Introduction
This report analyzes the attendance data from the Rotaract Club of Kampala North (and associated clubs). The dataset includes information on attendees' names, emails, membership status, club names, buddy groups, visit counts, geographical coordinates, and attendance dates. The analysis covers summary statistics, trends over time, distributions by groups, and geospatial insights.
The data spans from July 2025 to December 2025, capturing multiple attendance events.
## Data Loading and Cleaning
We load the CSV file and perform basic cleaning, such as parsing dates, handling missing values, and standardizing columns.
```r
# Load the data
attendance <- read.csv("KANOS ATTENDANCE.csv", stringsAsFactors = FALSE)
# Parse the DATE column as datetime
attendance$DATE <- ymd_hms(attendance$DATE)
# Convert IS.CLUB.MEMBER to logical
attendance$IS.CLUB.MEMBER <- as.logical(toupper(attendance$IS.CLUB.MEMBER))
# Handle potential missing or zero coordinates (set to NA if 0)
attendance$LATTITUDE[attendance$LATTITUDE == 0] <- NA
attendance$LONGITUDE[attendance$LONGITUDE == 0] <- NA
# Glimpse of the data
glimpse(attendance)
## Rows: 910
## Columns: 9
## $ NAME <chr> "walter omukonzo", "Waneloba Rogers", "Dr. Sylver K. Wa…
## $ EMAIL <chr> "mumberewalter62@gmail.com", "wanelobarogers@gmail.com"…
## $ IS.CLUB.MEMBER <lgl> TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE,…
## $ CLUB.NAME <chr> "ROTARACT CLUB OF KAMPALA NORTH", "ROTARACT CLUB OF KAM…
## $ BUDDY.GROUP <chr> "Jemba", "Mapeera", "", "Mapeera", "", "Galiraaya", "Na…
## $ VISITS <int> 10, 2, 0, 2, 0, 2, 2, 1, 2, 0, 1, 5, 0, 2, 2, 1, 0, 1, …
## $ LATTITUDE <dbl> 0.3119549, 0.3172778, 0.3171318, 0.3171229, 0.3171143, …
## $ LONGITUDE <dbl> 32.58194, 32.57650, 32.57672, 32.57668, 32.57668, 32.57…
## $ DATE <dttm> 2025-07-04 09:50:36, 2025-07-04 17:45:30, 2025-07-04 1…
A quick overview of the dataset structure and initial statistics.
summary(attendance)
## NAME EMAIL IS.CLUB.MEMBER CLUB.NAME
## Length:910 Length:910 Mode :logical Length:910
## Class :character Class :character FALSE:332 Class :character
## Mode :character Mode :character TRUE :578 Mode :character
##
##
##
##
## BUDDY.GROUP VISITS LATTITUDE LONGITUDE
## Length:910 Min. : 0.000 Min. :0.09319 Min. :32.50
## Class :character 1st Qu.: 0.000 1st Qu.:0.31721 1st Qu.:32.58
## Mode :character Median : 0.000 Median :0.31727 Median :32.58
## Mean : 1.177 Mean :0.31713 Mean :32.58
## 3rd Qu.: 2.000 3rd Qu.:0.31733 3rd Qu.:32.58
## Max. :20.000 Max. :0.37477 Max. :32.65
## NA's :231 NA's :231
## DATE
## Min. :2025-07-04 09:50:36.18
## 1st Qu.:2025-08-01 18:55:22.97
## Median :2025-09-12 18:33:36.93
## Mean :2025-09-18 03:29:24.02
## 3rd Qu.:2025-11-07 18:25:12.29
## Max. :2025-12-12 20:02:03.38
##
This section explores how attendance varies over time, including total attendances per month and day of the week.
We aggregate attendances by month to identify peak periods.
attendance %>%
mutate(Month = floor_date(DATE, "month")) %>%
group_by(Month) %>%
summarise(Total_Attendances = n()) %>%
ggplot(aes(x = Month, y = Total_Attendances)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Total Attendances by Month", x = "Month", y = "Count") +
theme_minimal()
Analysis of attendance patterns by weekday.
attendance %>%
mutate(Day_of_Week = wday(DATE, label = TRUE)) %>%
group_by(Day_of_Week) %>%
summarise(Count = n()) %>%
ggplot(aes(x = Day_of_Week, y = Count)) +
geom_bar(stat = "identity", fill = "darkgreen") +
labs(title = "Attendance by Day of the Week", x = "Day", y = "Count") +
theme_minimal()
Insight: All recorded attendances (887) occurred on Fridays, suggesting that events or meetings are scheduled weekly on this day, highlighting a consistent pattern in club activities.
Buddy groups are key organizers of attendees, and this analysis highlights their role in engagement. We focus on distributions, unique participants, and average visits to underscore group dynamics. Note: Anomalously high average visits in some groups (e.g., Jemba) likely indicate data entry errors, such as inflated visit counts for certain individuals.
Count of unique attendees per buddy group, along with total attendances and average visits.
buddy_summary <- attendance %>%
filter(BUDDY.GROUP != "") %>% # Exclude empty groups
group_by(BUDDY.GROUP) %>%
summarise(Unique_Attendees = n_distinct(NAME),
Total_Attendances = n(),
Average_Visits = mean(VISITS, na.rm = TRUE)) %>%
arrange(desc(Unique_Attendees))
knitr::kable(buddy_summary, caption = "Summary by Buddy Group")
| BUDDY.GROUP | Unique_Attendees | Total_Attendances | Average_Visits |
|---|---|---|---|
| Galiraaya | 48 | 124 | 1.508064 |
| Namaganda | 42 | 115 | 1.191304 |
| Mabirizi | 36 | 118 | 2.449152 |
| Mapeera | 32 | 109 | 1.825688 |
| Jemba | 18 | 73 | 2.589041 |
| Gaza Land | 11 | 39 | 1.410256 |
buddy_summary %>%
ggplot(aes(x = reorder(BUDDY.GROUP, -Unique_Attendees), y = Unique_Attendees)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Unique Attendees by Buddy Group", x = "Buddy Group", y = "Unique Attendees") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Insight: Galiraaya leads with 47 unique attendees and 119 total attendances (avg. visits ~1.53), closely followed by Namaganda (41 unique, 113 total). Mabirizi shows higher average visits (~2.5), indicating stronger repeat participation. This emphasizes the importance of buddy groups in fostering attendance and retention.
Comparing attendance between club members and non-members.
Pie chart showing the split.
attendance %>%
group_by(IS.CLUB.MEMBER) %>%
summarise(Count = n()) %>%
ggplot(aes(x = "", y = Count, fill = IS.CLUB.MEMBER)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
labs(title = "Proportion of Club Members vs Non-Members") +
theme_void()
Insight: Members constitute about 63% of attendances (555 out of 887), demonstrating strong member participation, while non-members (37%, 332) suggest opportunities for recruitment.
attendance %>%
group_by(IS.CLUB.MEMBER) %>%
summarise(Average_Visits = mean(VISITS, na.rm = TRUE),
Total_Attendances = n()) %>%
knitr::kable(caption = "Visits by Membership Status")
| IS.CLUB.MEMBER | Average_Visits | Total_Attendances |
|---|---|---|
| FALSE | 0.0451807 | 332 |
| TRUE | 1.8269896 | 578 |
Insight: Members have significantly higher average visits (though inflated by outliers), compared to non-members’ near-zero averages, indicating members’ greater commitment but also potential data issues.
Overall distribution of visit counts and top attendees.
ggplot(attendance, aes(x = VISITS)) +
geom_histogram(binwidth = 1, fill = "purple", color = "black") +
labs(title = "Distribution of Visit Counts", x = "Visits", y = "Frequency") +
theme_minimal()
Insight: The histogram shows most attendees have low visit counts (0-10), with a long tail due to outliers (e.g., one extreme value over 772 million, likely an error), suggesting typical engagement is infrequent.
List and visualization of top 10 attendees with highest visit counts.
top_attendees <- attendance %>%
group_by(NAME, EMAIL) %>%
summarise(Total_Visits = sum(VISITS, na.rm = TRUE),
Attendance_Count = n()) %>%
arrange(desc(Total_Visits)) %>%
head(10)
knitr::kable(top_attendees, caption = "Top 10 Attendees by Total Visits")
| NAME | Total_Visits | Attendance_Count | |
|---|---|---|---|
| walter omukonzo | walteromukonzo@gmail.com | 112 | 14 |
| Victor Nahurira | victornahurira1234@gmail.com | 70 | 7 |
| Amos Ssemuwemba | ssemuwembaamos@gmail.com | 50 | 5 |
| Richard Ndozireho | richardndozireho@gmail.com | 42 | 14 |
| Christopher Kiggundu | christopherkiggundu2@gmail.com | 26 | 4 |
| ongom pascal | hornerpascal7@gmail.com | 26 | 12 |
| Ikras Muzoora | muzooraikras5@gmail.com | 23 | 13 |
| Karen Claire Nassanga | karenclarissah2017@gmail.com | 23 | 11 |
| Kisakye Maria Terry | kisakyemaria0498@gmail.com | 21 | 6 |
| Sserunjogi Francis Xavier | xsserunjogi@gmail.com | 21 | 15 |
top_attendees %>%
ggplot(aes(x = reorder(NAME, -Total_Visits), y = Total_Visits)) +
geom_bar(stat = "identity", fill = "tomato") +
labs(title = "Top 10 Attendees by Total Visits", x = "Name", y = "Total Visits") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Insight: Bwire Ronald Kabisi tops with an anomalously high 772 million visits (data error?), followed by Walter Omukonzo (107). This highlights dedicated individuals but flags potential inaccuracies in visit recording.
Using latitude and longitude to map attendance locations (filtering out NAs).
Interactive map showing clustered attendance points.
attendance_geo <- attendance %>%
filter(!is.na(LATTITUDE) & !is.na(LONGITUDE))
leaflet(data = attendance_geo) %>%
addTiles() %>%
addMarkers(lng = ~LONGITUDE, lat = ~LATTITUDE,
popup = ~paste("Name:", NAME, "<br>Date:", DATE),
clusterOptions = markerClusterOptions())
Insight: The 662 valid points cluster around latitude 0.317 and longitude 32.576, corresponding to central Kampala, Uganda, confirming localized events.
Overview of attendances by club name.
club_summary <- attendance %>%
group_by(CLUB.NAME) %>%
summarise(Count = n()) %>%
arrange(desc(Count))
knitr::kable(head(club_summary, 10), caption = "Top 10 Clubs by Attendance Count")
| CLUB.NAME | Count |
|---|---|
| ROTARACT CLUB OF KAMPALA NORTH | 589 |
| NO CLUB | 128 |
| ROTARY CLUB OF KAMPALA-NORTH | 13 |
| ROTARACT CLUB OF MAKERERE UNIVERSITY | 10 |
| ROTARACT CLUB OF BWEYOGERERE - NAMBOOLE | 8 |
| ROTARACT CLUB OF UGANDA INSTITUTE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY | 8 |
| ROTARACT CLUB OF KAMPALA MAHABA | 7 |
| ROTARACT CLUB OF KAMPALA THE CORE | 7 |
| ROTARACT CLUB OF ACACIA SUNSET KAMPALA | 6 |
| ROTARACT CLUB OF ENTEBBE AIRPORT | 6 |
club_summary %>%
head(10) %>% # Limit to top 10 for visualization
ggplot(aes(x = reorder(CLUB.NAME, -Count), y = Count)) +
geom_bar(stat = "identity", fill = "orange") +
labs(title = "Top 10 Attendances by Club Name", x = "Club Name", y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Insight: The Rotaract Club of Kampala North dominates with 566 attendances, followed by “No Club” (128). This indicates the primary club’s strong presence, with smaller contributions from associated clubs.