Importing the Libraries and Dataset
## I am first loading the libraries I need in order for me to import my data, organize it, and create graphs to show better understanding.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(readr)
## In this chunk I am loading my dataset into R and take a quick look at it to understand what the data looks like.
aquatics <- read_csv("C:/Users/arnav/OneDrive/Data 101/Recreation_Spring_Aquatics_Programs_2016_20260316.csv")
## Rows: 601 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): Season, Primary Category, Secondary Category, ActivityName, Descr...
## dbl (3): Zip, Sessions, Fee
## time (2): Start Time, End Time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(aquatics) # I use this function to look at the first few rows to understand the data
## # A tibble: 6 × 20
## Season `Primary Category` `Secondary Category` ActivityName Description
## <chr> <chr> <chr> <chr> <chr>
## 1 SPRING Aquatics Water Fitness Deep Water Running Running in…
## 2 Spring Aquatics <NA> Kids Facility Take… <NA>
## 3 Spring Aquatics Lifeguard Training Lifeguarding Class… This cours…
## 4 Spring Aquatics Water Fitness Aqua Cardio Challe… This class…
## 5 Spring Aquatics Water Fitness Aqua Lite This class…
## 6 Spring Aquatics Diving Masters Diving This progr…
## # ℹ 15 more variables: ActivityNumber <chr>, Ages <chr>, Location <chr>,
## # Address <chr>, City <chr>, State <chr>, Zip <dbl>, `Start Date` <chr>,
## # `End Date` <chr>, `Start Time` <time>, `End Time` <time>, Sessions <dbl>,
## # `Days of the Week` <chr>, Fee <dbl>, `Address/Location` <chr>
dim(aquatics) # I use this function to check how many rows and columns are in the dataset
## [1] 601 20
How do program fees vary by activity type among aquatics programs offered at Montgomery County recreation centers?
In my project I am going to analyze how program fees vary across different aquatics activities offered by Montgomery County recreation centers. After cleaning the dataset by removing missing or zero values, I will use summary statistics to better understand the overall distribution of the fees. I will use Visualizations like a bar chart to show how fees are distributed within each category. Overall, the type of activity plays a key role in determining program pricing.
To answer my research question, I will clean the dataset by removing missing values, then use summary statistics to explore the Fee column. I will use group_by() and summarise() to find the average fee for each activity type as well as create a bar chart and a boxplot to visualize how fees differ across categories. The dataset I used is the Montgomery County Recreation Spring Aquatics Programs (2016), from the Montgomery County Open Data Portal (https://data.montgomerycountymd.gov/Community-Recreation/Recreation-Spring-Aquatics-Programs-2016/ymfp-d5nv/about_data). This dataset consists of 20 variables and 601 observations. It contains information about aquatics programs offered at recreation centers across Montgomery County, Maryland.
Some of the variables in this dataset I am using are:
Secondary Category - the type of aquatics
activity
Fee - the cost in dollars to enroll in the program
Location - the recreation center where the program
takes place).
Cleaning the Dataset
# In this chunk I check how many missing values are in each column
colSums(is.na(aquatics))
## Season Primary Category Secondary Category ActivityName
## 0 0 7 0
## Description ActivityNumber Ages Location
## 10 0 0 0
## Address City State Zip
## 0 0 0 0
## Start Date End Date Start Time End Time
## 0 0 0 0
## Sessions Days of the Week Fee Address/Location
## 0 0 0 0
aquatics_clean <- aquatics %>% # I remove the rows where Fee is missing or zero
filter(!is.na(Fee), Fee > 0) # I create a cleaned version of the data set
dim(aquatics_clean) # I use this function to check how many rows and columns are left after cleaning
## [1] 592 20
Exploratory Data Analysis
# This chunk shows the Summary statistics for Fee
summary(aquatics_clean$Fee) # I look at the overall summary of the max, min, and median
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 65.0 65.0 101.9 77.0 2439.0
# Mean and max fee
mean(aquatics_clean$Fee) # I find the average value for the fee variable
## [1] 101.8902
max(aquatics_clean$Fee) # I find the maximum fee for the fee variable
## [1] 2439
# This chunk shows the average fee by activity type
fee_by_category <- aquatics_clean %>%
group_by(`Secondary Category`) %>% # I group this data by what type of activity using group_by function
summarise(average_fee = mean(Fee)) # I calculate the average fee for each activity type using summarise
fee_by_category # The results are displayed of the average fee of each activity type
## # A tibble: 10 × 2
## `Secondary Category` average_fee
## <chr> <dbl>
## 1 Adult Swim Lessons 65
## 2 Beginner Swim Lessons 65
## 3 Developmental Swim 144.
## 4 Diving 425.
## 5 Lifeguard Training 50
## 6 Parent-Assisted Swim Lessons 60
## 7 Scuba Classes 265
## 8 Water Fitness 98.0
## 9 Youth Swim Lessons 67.7
## 10 <NA> 130
Visualization
# In this chunk I am creating a bar chart to clearly compare average fees across different activity types
# I use the Secondary Category activity type on the x-axis and the average fee on the y-axis
# I am creating bars using the variable geom_bar() using average_fee values
# I use labs() to label what each part of the graph is
# I apply a solid theme using theme_minimal() function
ggplot(fee_by_category, aes(x = `Secondary Category`, y = average_fee, fill = `Secondary Category`)) +
geom_bar(stat = "identity", na.rm = FALSE) +
coord_flip() +
labs(title = "Average Fee by Activity Type",
x = "Activity Type",
y = "Average Fee ($)") +
theme_minimal()
Based on the graph, the average of fee of diving is just above $400 and the average fee of Scuba classes is just above $250, meanwhile the average fee for the other activity type in comparison is statistically much lower. Regarding that for example the average fee of water fitness and youth swim lessons is just under $100.
Based on my analysis, program fees do vary by activity type in Montgomery County. Scuba Classes and Diving tend to have the highest average fees, because they require more sessions and specialized instruction. Lifeguard Training and Parent-Assisted Swim Lessons are generally more affordable, which makes them more accessible to a wider range of residents. In the future, it would be interesting to look at how fees have changed over multiple years, or to compare fees across different recreation center locations. It would also be useful to see if higher fees affects the amount of people that sign up for certain programs.
Montgomery County, Maryland. (2016). Recreation Spring Aquatics Programs 2016. Montgomery County Open Data Portal. https://data.montgomerycountymd.gov/Community-Recreation/Recreation-Spring-Aquatics-Programs-2016/ymfp-d5nv/about_data