Importing the Libraries and Dataset
## I am loading the libraries I need in order for me to import my data, organize it, and create graphs to show better understanding.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(readr)
## In this chunk I am loading my dataset into R and take a quick look at it to understand what the data looks like.
aquatics <- read_csv("C:/Users/arnav/OneDrive/Data 101/Recreation_Spring_Aquatics_Programs_2016_20260316.csv")
## Rows: 601 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): Season, Primary Category, Secondary Category, ActivityName, Descr...
## dbl (3): Zip, Sessions, Fee
## time (2): Start Time, End Time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(aquatics) # I use this function to look at the first few rows to understand the data
## # A tibble: 6 × 20
## Season `Primary Category` `Secondary Category` ActivityName Description
## <chr> <chr> <chr> <chr> <chr>
## 1 SPRING Aquatics Water Fitness Deep Water Running Running in…
## 2 Spring Aquatics <NA> Kids Facility Take… <NA>
## 3 Spring Aquatics Lifeguard Training Lifeguarding Class… This cours…
## 4 Spring Aquatics Water Fitness Aqua Cardio Challe… This class…
## 5 Spring Aquatics Water Fitness Aqua Lite This class…
## 6 Spring Aquatics Diving Masters Diving This progr…
## # ℹ 15 more variables: ActivityNumber <chr>, Ages <chr>, Location <chr>,
## # Address <chr>, City <chr>, State <chr>, Zip <dbl>, `Start Date` <chr>,
## # `End Date` <chr>, `Start Time` <time>, `End Time` <time>, Sessions <dbl>,
## # `Days of the Week` <chr>, Fee <dbl>, `Address/Location` <chr>
dim(aquatics) # I use this function to check how many rows and columns are in the dataset
## [1] 601 20
How do program fees vary by activity type among aquatics programs offered at Montgomery County recreation centers?
The dataset I chose comes from the Montgomery County Open Data Portal (https://data.montgomerycountymd.gov/Community-Recreation/Recreation-Spring-Aquatics-Programs-2016/ymfp-d5nv/about_data). It is provided by the Montgomery County Department of Recreation. It includes information about recreation programs such as aquatics programs that are offered in Montgomery County, Maryland. The dataset was originally created on March 10, 2016. The dataset contains 602 observations and 20 variables, including program type, location, age requirements, and program fees.
It includes programs within Montgomery County and does not include data from other counties or regions.The dataset is limited to one county and represents local recreation programs only. Even though the dataset was created in 2016, it includes information which is still relevant that is useful for analyzing pricing for the aquatic activities. I chose this dataset because it provides real-world information about community recreation programs. This essentially allows me to analyze how program fees vary by activity type in a real world public dataset.
To answer my research question, I will clean the dataset by removing missing values, then use summary statistics to explore the Fee column. I will use group_by() and summarise() to find the average fee for each activity type as well as create a bar chart to visualize how fees differ across categories.
Some of the variables in this dataset I am using are:
Secondary Category(Categorical) - the type of
aquatics activity
Fee (Numerical)- the cost in dollars to enroll in the
program Location(Qualitative)- the recreation center
where that the program takes place
Cleaning the Dataset
# In this chunk I check how many missing values are in each column
colSums(is.na(aquatics))
## Season Primary Category Secondary Category ActivityName
## 0 0 7 0
## Description ActivityNumber Ages Location
## 10 0 0 0
## Address City State Zip
## 0 0 0 0
## Start Date End Date Start Time End Time
## 0 0 0 0
## Sessions Days of the Week Fee Address/Location
## 0 0 0 0
aquatics_clean <- aquatics %>% # I remove the rows where Fee is missing or zero
filter(!is.na(Fee)& !is.na(`Secondary Category`))# I removed rows of missing Fee values and kept the programs that have a fee of $0 since they represent fee programs which are valid for analysis
dim(aquatics_clean) # I use this function to check how many rows and columns are left after cleaning
## [1] 594 20
head(aquatics_clean) # I used this function to display first few rows of the dataset
## # A tibble: 6 × 20
## Season `Primary Category` `Secondary Category` ActivityName Description
## <chr> <chr> <chr> <chr> <chr>
## 1 SPRING Aquatics Water Fitness Deep Water Running Running in…
## 2 Spring Aquatics Lifeguard Training Lifeguarding Class… This cours…
## 3 Spring Aquatics Water Fitness Aqua Cardio Challe… This class…
## 4 Spring Aquatics Water Fitness Aqua Lite This class…
## 5 Spring Aquatics Diving Masters Diving This progr…
## 6 Spring Aquatics Diving Level 1: Learn to… This is a …
## # ℹ 15 more variables: ActivityNumber <chr>, Ages <chr>, Location <chr>,
## # Address <chr>, City <chr>, State <chr>, Zip <dbl>, `Start Date` <chr>,
## # `End Date` <chr>, `Start Time` <time>, `End Time` <time>, Sessions <dbl>,
## # `Days of the Week` <chr>, Fee <dbl>, `Address/Location` <chr>
Exploratory Data Analysis
# This chunk shows the Summary statistics for Fee variable
summary(aquatics_clean$Fee) # I looked at the overall summary of the max, min, and median
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 65 65 100 77 2439
# Mean and max fee
mean(aquatics_clean$Fee, na.rm = TRUE) # I calculated the mean fee across the programs
## [1] 100.0152
max(aquatics_clean$Fee, na.rm = TRUE) # I calculated the maximum fee for the fee variable
## [1] 2439
# This chunk shows the average fee by activity type
fee_by_category <- aquatics_clean %>%
group_by(`Secondary Category`) %>% # I am grouping this data by what type of activity using group_by function
summarise(average_fee = mean(Fee, na.rm = TRUE)) # I am calculating the average fee for each activity type using the summarise function
fee_by_category # The results are displayed of the average fee for each activity type
## # A tibble: 10 × 2
## `Secondary Category` average_fee
## <chr> <dbl>
## 1 Adult Swim Lessons 65
## 2 Beginner Swim Lessons 65
## 3 Developmental Swim 144.
## 4 Diving 425.
## 5 Lifeguard Training 8.33
## 6 Parent-Assisted Swim Lessons 60
## 7 Scuba Classes 265
## 8 Staff Training 0
## 9 Water Fitness 98.0
## 10 Youth Swim Lessons 67.7
Visualization
# In this chunk I am creating a bar chart to clearly compare average fees across different activity types
# I use the Secondary Category activity type on the x-axis and the average fee on the y-axis
# I am creating bars using the variable geom_bar() by using average_fee values
# I am using labs() in order to label what each part of the graph is
# I am applying a solid theme using theme_minimal() function
ggplot(fee_by_category, aes(x = `Secondary Category`, y = average_fee, fill = `Secondary Category`)) +
geom_bar(stat = "identity", na.rm = FALSE) +
coord_flip() +
labs(title = "Average Fee by Activity Type",
x = "Activity Type",
y = "Average Fee ($)") +
theme_minimal()
Based on the graph, the average fee for diving is just above $400, and the average fee for scuba classes is just above $250, while the average fees for other activity types are much lower. For example, water fitness and youth swim lessons have average fees just under $200.
In conclusion, program fees do vary by activity type in Montgomery County. Scuba Classes and diving tend to have the highest average fees. This is because they require more sessions and specialized instruction. However, lifeguard training and parent-assisted swim lessons are generally more affordable which makes them more accessible to a wider range of residents. In the future it would be interesting for me to look at how fees have changed over multiple years, or to compare fees across different recreation center locations. I also think it would also be useful for me to see if higher fees affects the amount of people who enroll in certain programs.
Montgomery County, Maryland. (2016). Recreation Spring Aquatics Programs 2016. Montgomery County Open Data Portal. https://data.montgomerycountymd.gov/Community-Recreation/Recreation-Spring-Aquatics-Programs-2016/ymfp-d5nv/about_data