Importing the Libraries and Dataset

## I am first loading the libraries I need in order for me to import my data, organize it, and create graphs to show better understanding. 
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(readr)
## In this chunk I am loading my dataset into R and take a quick look at it to understand what the data looks like.

aquatics <- read_csv("C:/Users/arnav/OneDrive/Data 101/Recreation_Spring_Aquatics_Programs_2016_20260316.csv")
## Rows: 601 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (15): Season, Primary Category, Secondary Category, ActivityName, Descr...
## dbl   (3): Zip, Sessions, Fee
## time  (2): Start Time, End Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(aquatics)  # I use this function to look at the first few rows to understand the data 
## # A tibble: 6 × 20
##   Season `Primary Category` `Secondary Category` ActivityName        Description
##   <chr>  <chr>              <chr>                <chr>               <chr>      
## 1 SPRING Aquatics           Water Fitness        Deep Water Running  Running in…
## 2 Spring Aquatics           <NA>                 Kids Facility Take… <NA>       
## 3 Spring Aquatics           Lifeguard Training   Lifeguarding Class… This cours…
## 4 Spring Aquatics           Water Fitness        Aqua Cardio Challe… This class…
## 5 Spring Aquatics           Water Fitness        Aqua Lite           This class…
## 6 Spring Aquatics           Diving               Masters Diving      This progr…
## # ℹ 15 more variables: ActivityNumber <chr>, Ages <chr>, Location <chr>,
## #   Address <chr>, City <chr>, State <chr>, Zip <dbl>, `Start Date` <chr>,
## #   `End Date` <chr>, `Start Time` <time>, `End Time` <time>, Sessions <dbl>,
## #   `Days of the Week` <chr>, Fee <dbl>, `Address/Location` <chr>
dim(aquatics)   # I use this function to check how many rows and columns are in the dataset
## [1] 601  20

How do program fees vary by activity type among aquatics programs offered at Montgomery County recreation centers?

Introduction

In my project I am going to analyze how program fees vary across different aquatics activities offered by Montgomery County recreation centers. After cleaning the dataset by removing missing or zero values, I will use summary statistics to better understand the overall distribution of the fees. I will use Visualizations like a bar chart to show how fees are distributed within each category. Overall, the type of activity plays a key role in determining program pricing.

Data Analysis

To answer my research question, I will clean the dataset by removing missing values, then use summary statistics to explore the Fee column. I will use group_by() and summarise() to find the average fee for each activity type as well as create a bar chart and a boxplot to visualize how fees differ across categories. The dataset I used is the Montgomery County Recreation Spring Aquatics Programs (2016), from the Montgomery County Open Data Portal (https://data.montgomerycountymd.gov/Community-Recreation/Recreation-Spring-Aquatics-Programs-2016/ymfp-d5nv/about_data). This dataset consists of 20 variables and 601 observations. It contains information about aquatics programs offered at recreation centers across Montgomery County, Maryland.

Some of the variables in this dataset I am using are:

Secondary Category - the type of aquatics activity
Fee - the cost in dollars to enroll in the program Location - the recreation center where the program takes place).

Cleaning the Dataset

# In this chunk I check how many missing values are in each column
colSums(is.na(aquatics))
##             Season   Primary Category Secondary Category       ActivityName 
##                  0                  0                  7                  0 
##        Description     ActivityNumber               Ages           Location 
##                 10                  0                  0                  0 
##            Address               City              State                Zip 
##                  0                  0                  0                  0 
##         Start Date           End Date         Start Time           End Time 
##                  0                  0                  0                  0 
##           Sessions   Days of the Week                Fee   Address/Location 
##                  0                  0                  0                  0
aquatics_clean <- aquatics %>% # I remove the rows where Fee is missing or zero
  filter(!is.na(Fee), Fee > 0) # I create a cleaned version of the data set
dim(aquatics_clean) # I use this function to check how many rows and columns are left after cleaning
## [1] 592  20

Exploratory Data Analysis

# This chunk shows the Summary statistics for Fee
summary(aquatics_clean$Fee) # I look at the overall summary of the max, min, and median
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    10.0    65.0    65.0   101.9    77.0  2439.0
# Mean and max fee
mean(aquatics_clean$Fee) # I find the average value for the fee variable 
## [1] 101.8902
max(aquatics_clean$Fee)  # I find the maximum fee for the fee variable 
## [1] 2439
# This chunk shows the average fee by activity type 
fee_by_category <- aquatics_clean %>%
  group_by(`Secondary Category`) %>% # I group this data by what type of activity using group_by function  
  summarise(average_fee = mean(Fee)) # I calculate the average fee for each activity type using summarise

fee_by_category  # The results are displayed of the average fee of each activity type 
## # A tibble: 10 × 2
##    `Secondary Category`         average_fee
##    <chr>                              <dbl>
##  1 Adult Swim Lessons                  65  
##  2 Beginner Swim Lessons               65  
##  3 Developmental Swim                 144. 
##  4 Diving                             425. 
##  5 Lifeguard Training                  50  
##  6 Parent-Assisted Swim Lessons        60  
##  7 Scuba Classes                      265  
##  8 Water Fitness                       98.0
##  9 Youth Swim Lessons                  67.7
## 10 <NA>                               130

Visualization

# In this chunk I am creating a bar chart to clearly compare average fees across different activity types
# I use the Secondary Category activity type on the x-axis and the average fee on the y-axis 
# I am creating bars using the variable geom_bar() using average_fee values 
# I use labs() to label what each part of the graph is
# I apply a solid theme using theme_minimal() function 
ggplot(fee_by_category, aes(x = `Secondary Category`, y = average_fee, fill = `Secondary Category`)) +
  geom_bar(stat = "identity", na.rm = FALSE) +
  coord_flip() +
  labs(title = "Average Fee by Activity Type",
       x = "Activity Type",
       y = "Average Fee ($)") +
  theme_minimal()

Analysis of Visualization

Based on the graph, the average of fee of diving is just above $400 and the average fee of Scuba classes is just above $250, meanwhile the average fee for the other activity type in comparison is statistically much lower. Regarding that for example the average fee of water fitness and youth swim lessons is just under $100.

Conclusion

Based on my analysis, program fees do vary by activity type in Montgomery County. Scuba Classes and Diving tend to have the highest average fees, because they require more sessions and specialized instruction. Lifeguard Training and Parent-Assisted Swim Lessons are generally more affordable, which makes them more accessible to a wider range of residents. In the future, it would be interesting to look at how fees have changed over multiple years, or to compare fees across different recreation center locations. It would also be useful to see if higher fees affects the amount of people that sign up for certain programs.


References

Montgomery County, Maryland. (2016). Recreation Spring Aquatics Programs 2016. Montgomery County Open Data Portal. https://data.montgomerycountymd.gov/Community-Recreation/Recreation-Spring-Aquatics-Programs-2016/ymfp-d5nv/about_data