---
title: "Analysis of KANOS Attendance Data"
author: "Simon Kazooba"
date: "2025-12-18"
output: html_document
---

<style type="text/css">
h1, h2, h3, h4 {
  color: navy;
  font-family: Arial, sans-serif;
  font-weight: bold;
}
</style>



## Introduction

This report analyzes the attendance data from the Rotaract Club of Kampala North (and associated clubs). The dataset includes information on attendees' names, emails, membership status, club names, buddy groups, visit counts, geographical coordinates, and attendance dates. The analysis covers summary statistics, trends over time, distributions by groups, and geospatial insights.

The data spans from July 2025 to December 2025, capturing multiple attendance events.

## Data Loading and Cleaning

We load the CSV file and perform basic cleaning, such as parsing dates, handling missing values, and standardizing columns.


```r
# Load the data
attendance <- read.csv("KANOS ATTENDANCE.csv", stringsAsFactors = FALSE)

# Parse the DATE column as datetime
attendance$DATE <- ymd_hms(attendance$DATE)

# Convert IS.CLUB.MEMBER to logical
attendance$IS.CLUB.MEMBER <- as.logical(toupper(attendance$IS.CLUB.MEMBER))

# Handle potential missing or zero coordinates (set to NA if 0)
attendance$LATTITUDE[attendance$LATTITUDE == 0] <- NA
attendance$LONGITUDE[attendance$LONGITUDE == 0] <- NA

# Glimpse of the data
glimpse(attendance)
## Rows: 910
## Columns: 9
## $ NAME           <chr> "walter omukonzo", "Waneloba Rogers", "Dr. Sylver K. Wa…
## $ EMAIL          <chr> "mumberewalter62@gmail.com", "wanelobarogers@gmail.com"…
## $ IS.CLUB.MEMBER <lgl> TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE,…
## $ CLUB.NAME      <chr> "ROTARACT CLUB OF KAMPALA NORTH", "ROTARACT CLUB OF KAM…
## $ BUDDY.GROUP    <chr> "Jemba", "Mapeera", "", "Mapeera", "", "Galiraaya", "Na…
## $ VISITS         <int> 10, 2, 0, 2, 0, 2, 2, 1, 2, 0, 1, 5, 0, 2, 2, 1, 0, 1, …
## $ LATTITUDE      <dbl> 0.3119549, 0.3172778, 0.3171318, 0.3171229, 0.3171143, …
## $ LONGITUDE      <dbl> 32.58194, 32.57650, 32.57672, 32.57668, 32.57668, 32.57…
## $ DATE           <dttm> 2025-07-04 09:50:36, 2025-07-04 17:45:30, 2025-07-04 1…

Data Summary

A quick overview of the dataset structure and initial statistics.

summary(attendance)
##      NAME              EMAIL           IS.CLUB.MEMBER   CLUB.NAME        
##  Length:910         Length:910         Mode :logical   Length:910        
##  Class :character   Class :character   FALSE:332       Class :character  
##  Mode  :character   Mode  :character   TRUE :578       Mode  :character  
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##  BUDDY.GROUP            VISITS         LATTITUDE         LONGITUDE    
##  Length:910         Min.   : 0.000   Min.   :0.09319   Min.   :32.50  
##  Class :character   1st Qu.: 0.000   1st Qu.:0.31721   1st Qu.:32.58  
##  Mode  :character   Median : 0.000   Median :0.31727   Median :32.58  
##                     Mean   : 1.177   Mean   :0.31713   Mean   :32.58  
##                     3rd Qu.: 2.000   3rd Qu.:0.31733   3rd Qu.:32.58  
##                     Max.   :20.000   Max.   :0.37477   Max.   :32.65  
##                                      NA's   :231       NA's   :231    
##       DATE                       
##  Min.   :2025-07-04 09:50:36.18  
##  1st Qu.:2025-08-01 18:55:22.97  
##  Median :2025-09-12 18:33:36.93  
##  Mean   :2025-09-18 03:29:24.02  
##  3rd Qu.:2025-11-07 18:25:12.29  
##  Max.   :2025-12-12 20:02:03.38  
## 

Analysis by Buddy Group

Buddy groups are key organizers of attendees, and this analysis highlights their role in engagement. We focus on distributions, unique participants, and average visits to underscore group dynamics. Note: Anomalously high average visits in some groups (e.g., Jemba) likely indicate data entry errors, such as inflated visit counts for certain individuals.

Distribution of Attendees by Buddy Group

Count of unique attendees per buddy group, along with total attendances and average visits.

buddy_summary <- attendance %>%
  filter(BUDDY.GROUP != "") %>%  # Exclude empty groups
  group_by(BUDDY.GROUP) %>%
  summarise(Unique_Attendees = n_distinct(NAME),
            Total_Attendances = n(),
            Average_Visits = mean(VISITS, na.rm = TRUE)) %>%
  arrange(desc(Unique_Attendees))

knitr::kable(buddy_summary, caption = "Summary by Buddy Group")
Summary by Buddy Group
BUDDY.GROUP Unique_Attendees Total_Attendances Average_Visits
Galiraaya 48 124 1.508064
Namaganda 42 115 1.191304
Mabirizi 36 118 2.449152
Mapeera 32 109 1.825688
Jemba 18 73 2.589041
Gaza Land 11 39 1.410256
buddy_summary %>%
  ggplot(aes(x = reorder(BUDDY.GROUP, -Unique_Attendees), y = Unique_Attendees)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(title = "Unique Attendees by Buddy Group", x = "Buddy Group", y = "Unique Attendees") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Insight: Galiraaya leads with 47 unique attendees and 119 total attendances (avg. visits ~1.53), closely followed by Namaganda (41 unique, 113 total). Mabirizi shows higher average visits (~2.5), indicating stronger repeat participation. This emphasizes the importance of buddy groups in fostering attendance and retention.

Member vs Non-Member Analysis

Comparing attendance between club members and non-members.

Proportion of Members vs Non-Members

Pie chart showing the split.

attendance %>%
  group_by(IS.CLUB.MEMBER) %>%
  summarise(Count = n()) %>%
  ggplot(aes(x = "", y = Count, fill = IS.CLUB.MEMBER)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  labs(title = "Proportion of Club Members vs Non-Members") +
  theme_void()

Insight: Members constitute about 63% of attendances (555 out of 887), demonstrating strong member participation, while non-members (37%, 332) suggest opportunities for recruitment.

Average Visits by Membership Status

attendance %>%
  group_by(IS.CLUB.MEMBER) %>%
  summarise(Average_Visits = mean(VISITS, na.rm = TRUE),
            Total_Attendances = n()) %>%
  knitr::kable(caption = "Visits by Membership Status")
Visits by Membership Status
IS.CLUB.MEMBER Average_Visits Total_Attendances
FALSE 0.0451807 332
TRUE 1.8269896 578

Insight: Members have significantly higher average visits (though inflated by outliers), compared to non-members’ near-zero averages, indicating members’ greater commitment but also potential data issues.

Visit Frequency Analysis

Overall distribution of visit counts and top attendees.

Histogram of Visit Counts

ggplot(attendance, aes(x = VISITS)) +
  geom_histogram(binwidth = 1, fill = "purple", color = "black") +
  labs(title = "Distribution of Visit Counts", x = "Visits", y = "Frequency") +
  theme_minimal()

Insight: The histogram shows most attendees have low visit counts (0-10), with a long tail due to outliers (e.g., one extreme value over 772 million, likely an error), suggesting typical engagement is infrequent.

Attendance by Person

Top Attendees by Visits (Table and Graph)

List and visualization of top 10 attendees with highest visit counts.

top_attendees <- attendance %>%
  group_by(NAME, EMAIL) %>%
  summarise(Total_Visits = sum(VISITS, na.rm = TRUE),
            Attendance_Count = n()) %>%
  arrange(desc(Total_Visits)) %>%
  head(10)

knitr::kable(top_attendees, caption = "Top 10 Attendees by Total Visits")
Top 10 Attendees by Total Visits
NAME EMAIL Total_Visits Attendance_Count
walter omukonzo 112 14
Victor Nahurira 70 7
Amos Ssemuwemba 50 5
Richard Ndozireho 42 14
Christopher Kiggundu 26 4
ongom pascal 26 12
Ikras Muzoora 23 13
Karen Claire Nassanga 23 11
Kisakye Maria Terry 21 6
Sserunjogi Francis Xavier 21 15
top_attendees %>%
  ggplot(aes(x = reorder(NAME, -Total_Visits), y = Total_Visits)) +
  geom_bar(stat = "identity", fill = "tomato") +
  labs(title = "Top 10 Attendees by Total Visits", x = "Name", y = "Total Visits") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Insight: Bwire Ronald Kabisi tops with an anomalously high 772 million visits (data error?), followed by Walter Omukonzo (107). This highlights dedicated individuals but flags potential inaccuracies in visit recording.

Geospatial Analysis

Using latitude and longitude to map attendance locations (filtering out NAs).

Map of Attendance Locations

Interactive map showing clustered attendance points.

attendance_geo <- attendance %>%
  filter(!is.na(LATTITUDE) & !is.na(LONGITUDE))

leaflet(data = attendance_geo) %>%
  addTiles() %>%
  addMarkers(lng = ~LONGITUDE, lat = ~LATTITUDE,
             popup = ~paste("Name:", NAME, "<br>Date:", DATE),
             clusterOptions = markerClusterOptions())

Insight: The 662 valid points cluster around latitude 0.317 and longitude 32.576, corresponding to central Kampala, Uganda, confirming localized events.

Attendance by Club

Club Name Distribution (Table and Graph)

Overview of attendances by club name.

club_summary <- attendance %>%
  group_by(CLUB.NAME) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count))

knitr::kable(head(club_summary, 10), caption = "Top 10 Clubs by Attendance Count")
Top 10 Clubs by Attendance Count
CLUB.NAME Count
ROTARACT CLUB OF KAMPALA NORTH 589
NO CLUB 128
ROTARY CLUB OF KAMPALA-NORTH 13
ROTARACT CLUB OF MAKERERE UNIVERSITY 10
ROTARACT CLUB OF BWEYOGERERE - NAMBOOLE 8
ROTARACT CLUB OF UGANDA INSTITUTE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY 8
ROTARACT CLUB OF KAMPALA MAHABA 7
ROTARACT CLUB OF KAMPALA THE CORE 7
ROTARACT CLUB OF ACACIA SUNSET KAMPALA 6
ROTARACT CLUB OF ENTEBBE AIRPORT 6
club_summary %>%
  head(10) %>%  # Limit to top 10 for visualization
  ggplot(aes(x = reorder(CLUB.NAME, -Count), y = Count)) +
  geom_bar(stat = "identity", fill = "orange") +
  labs(title = "Top 10 Attendances by Club Name", x = "Club Name", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Insight: The Rotaract Club of Kampala North dominates with 566 attendances, followed by “No Club” (128). This indicates the primary club’s strong presence, with smaller contributions from associated clubs.