Importing the Libraries and Dataset

## I am loading the libraries I need in order for me to import my data, organize it, and create graphs to show better understanding. 
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(readr)
## In this chunk I am loading my dataset into R and take a quick look at it to understand what the data looks like.

aquatics <- read_csv("C:/Users/arnav/OneDrive/Data 101/Recreation_Spring_Aquatics_Programs_2016_20260316.csv")
## Rows: 601 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (15): Season, Primary Category, Secondary Category, ActivityName, Descr...
## dbl   (3): Zip, Sessions, Fee
## time  (2): Start Time, End Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(aquatics)  # I use this function to look at the first few rows to understand the data 
## # A tibble: 6 × 20
##   Season `Primary Category` `Secondary Category` ActivityName        Description
##   <chr>  <chr>              <chr>                <chr>               <chr>      
## 1 SPRING Aquatics           Water Fitness        Deep Water Running  Running in…
## 2 Spring Aquatics           <NA>                 Kids Facility Take… <NA>       
## 3 Spring Aquatics           Lifeguard Training   Lifeguarding Class… This cours…
## 4 Spring Aquatics           Water Fitness        Aqua Cardio Challe… This class…
## 5 Spring Aquatics           Water Fitness        Aqua Lite           This class…
## 6 Spring Aquatics           Diving               Masters Diving      This progr…
## # ℹ 15 more variables: ActivityNumber <chr>, Ages <chr>, Location <chr>,
## #   Address <chr>, City <chr>, State <chr>, Zip <dbl>, `Start Date` <chr>,
## #   `End Date` <chr>, `Start Time` <time>, `End Time` <time>, Sessions <dbl>,
## #   `Days of the Week` <chr>, Fee <dbl>, `Address/Location` <chr>
dim(aquatics)   # I use this function to check how many rows and columns are in the dataset
## [1] 601  20

Question

How do program fees vary by activity type among aquatics programs offered at Montgomery County recreation centers?

Introduction

The dataset I chose comes from the Montgomery County Open Data Portal (https://data.montgomerycountymd.gov/Community-Recreation/Recreation-Spring-Aquatics-Programs-2016/ymfp-d5nv/about_data). It is provided by the Montgomery County Department of Recreation. It includes information about recreation programs such as aquatics programs that are offered in Montgomery County, Maryland. The dataset was originally created on March 10, 2016. The dataset contains 602 observations and 20 variables, including program type, location, age requirements, and program fees.

It includes programs within Montgomery County and does not include data from other counties or regions.The dataset is limited to one county and represents local recreation programs only. Even though the dataset was created in 2016, it includes information which is still relevant that is useful for analyzing pricing for the aquatic activities. I chose this dataset because it provides real-world information about community recreation programs. This essentially allows me to analyze how program fees vary by activity type in a real world public dataset.

Data Analysis

To answer my research question, I will clean the dataset by removing missing values, then use summary statistics to explore the Fee column. I will use group_by() and summarise() to find the average fee for each activity type as well as create a bar chart to visualize how fees differ across categories.

Some of the variables in this dataset I am using are:

Secondary Category(Categorical) - the type of aquatics activity
Fee (Numerical)- the cost in dollars to enroll in the program Location(Qualitative)- the recreation center where that the program takes place

Cleaning the Dataset

# In this chunk I check how many missing values are in each column
colSums(is.na(aquatics))
##             Season   Primary Category Secondary Category       ActivityName 
##                  0                  0                  7                  0 
##        Description     ActivityNumber               Ages           Location 
##                 10                  0                  0                  0 
##            Address               City              State                Zip 
##                  0                  0                  0                  0 
##         Start Date           End Date         Start Time           End Time 
##                  0                  0                  0                  0 
##           Sessions   Days of the Week                Fee   Address/Location 
##                  0                  0                  0                  0
aquatics_clean <- aquatics %>% # I remove the rows where Fee is missing or zero
filter(!is.na(Fee)& !is.na(`Secondary Category`))# I removed rows of missing Fee values and kept the programs that have a fee of $0 since they represent fee programs which are valid for analysis 
dim(aquatics_clean) # I use this function to check how many rows and columns are left after cleaning
## [1] 594  20
head(aquatics_clean) # I used this function to display first few rows of the dataset
## # A tibble: 6 × 20
##   Season `Primary Category` `Secondary Category` ActivityName        Description
##   <chr>  <chr>              <chr>                <chr>               <chr>      
## 1 SPRING Aquatics           Water Fitness        Deep Water Running  Running in…
## 2 Spring Aquatics           Lifeguard Training   Lifeguarding Class… This cours…
## 3 Spring Aquatics           Water Fitness        Aqua Cardio Challe… This class…
## 4 Spring Aquatics           Water Fitness        Aqua Lite           This class…
## 5 Spring Aquatics           Diving               Masters Diving      This progr…
## 6 Spring Aquatics           Diving               Level 1:  Learn to… This is a …
## # ℹ 15 more variables: ActivityNumber <chr>, Ages <chr>, Location <chr>,
## #   Address <chr>, City <chr>, State <chr>, Zip <dbl>, `Start Date` <chr>,
## #   `End Date` <chr>, `Start Time` <time>, `End Time` <time>, Sessions <dbl>,
## #   `Days of the Week` <chr>, Fee <dbl>, `Address/Location` <chr>

Exploratory Data Analysis

# This chunk shows the Summary statistics for Fee variable 
summary(aquatics_clean$Fee) # I looked at the overall summary of the max, min, and median
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0      65      65     100      77    2439
# Mean and max fee
mean(aquatics_clean$Fee, na.rm = TRUE) # I calculated the mean fee across the programs
## [1] 100.0152
max(aquatics_clean$Fee, na.rm = TRUE)  # I calculated the maximum fee for the fee variable 
## [1] 2439
# This chunk shows the average fee by activity type 
fee_by_category <- aquatics_clean %>%
  group_by(`Secondary Category`) %>% # I am grouping this data by what type of activity using group_by function  
  summarise(average_fee = mean(Fee, na.rm = TRUE)) # I am calculating the average fee for each activity type using the summarise function

fee_by_category  # The results are displayed of the average fee for each activity type 
## # A tibble: 10 × 2
##    `Secondary Category`         average_fee
##    <chr>                              <dbl>
##  1 Adult Swim Lessons                 65   
##  2 Beginner Swim Lessons              65   
##  3 Developmental Swim                144.  
##  4 Diving                            425.  
##  5 Lifeguard Training                  8.33
##  6 Parent-Assisted Swim Lessons       60   
##  7 Scuba Classes                     265   
##  8 Staff Training                      0   
##  9 Water Fitness                      98.0 
## 10 Youth Swim Lessons                 67.7

Visualization

# In this chunk I am creating a bar chart to clearly compare average fees across different activity types
# I use the Secondary Category activity type on the x-axis and the average fee on the y-axis 
# I am creating bars using the variable geom_bar() by using average_fee values 
# I am using labs() in order to label what each part of the graph is
# I am applying a solid theme using theme_minimal() function 
ggplot(fee_by_category, aes(x = `Secondary Category`, y = average_fee, fill = `Secondary Category`)) +
  geom_bar(stat = "identity", na.rm = FALSE) +
  coord_flip() +
  labs(title = "Average Fee by Activity Type",
       x = "Activity Type",
       y = "Average Fee ($)") +
  theme_minimal()

Analysis of Visualization

Based on the graph, the average fee for diving is just above $400, and the average fee for scuba classes is just above $250, while the average fees for other activity types are much lower. For example, water fitness and youth swim lessons have average fees just under $200.

Conclusion

In conclusion, program fees do vary by activity type in Montgomery County. Scuba Classes and diving tend to have the highest average fees. This is because they require more sessions and specialized instruction. However, lifeguard training and parent-assisted swim lessons are generally more affordable which makes them more accessible to a wider range of residents. In the future it would be interesting for me to look at how fees have changed over multiple years, or to compare fees across different recreation center locations. I also think it would also be useful for me to see if higher fees affects the amount of people who enroll in certain programs.


References

Montgomery County, Maryland. (2016). Recreation Spring Aquatics Programs 2016. Montgomery County Open Data Portal. https://data.montgomerycountymd.gov/Community-Recreation/Recreation-Spring-Aquatics-Programs-2016/ymfp-d5nv/about_data