The US National Park Service publishes how many people visit national parks per year. With this data, citizens can ask and answer questions such as:
The following report creates a series of charts that visualize the annual number of visitors to national parks as well as the relationship between visitor numbers and park size in an attempt to answer these questions.
The original National Parks Visitation dataset contains visitor data at all 63 national parks from 1979 to 2024. However, this report focuses on the 19 national parks located in the intermountain region from 2015 to 2024. The entire dataset was obtained from the Responsible Datasets in Context website (Walsh and Keyes, 2024).
Prepare the workspace by loading numerous packages.
library(tidyverse)
library(here)
library(dplyr)
library(scales)
The National Parks Visitation data (NP_RecreationVisits.csv) is loaded here and needs minor cleaning. The column name RecreationVisits is shortened to Visitors for ease of reference. The entire dataset is filtered to only retain the national parks in the intermountain region as well as the most recent ten years of visitor data (i.e., 2015 to 2024). The Year column is changed from a double to a factor data type; this process allows the plots to treat the years as a category rather than as a continuous variable. This cleaned, subset of data is exported as a csv, has had the parks’ areas added to it in Microsoft Excel (with Esri’s area data), and then is loaded back into this report for additional analysis (intermountain_area.csv).
# Load CSV data
parks <- read_csv(here("Data", "NP_RecreationVisits.csv"))
# Shorten column name, filter to intermountain region, filter years 2015 to 2024
# Save changes to a new variable
intermountain <- rename(parks, Visits = RecreationVisits) %>%
filter(Region == "Intermountain", Year>=2015)
# Change the Year to a factor data type and overwrite the original file
intermountain$Year <- as.factor(intermountain$Year)
# Export clean data subset by removing the comment below (#)
# write_csv(parks, "intermountain_area.csv")
# Add area data in software of choice, then...
# Load cleaned, subsetted data with area data
parkarea <- read_csv(here("Data", "intermountain_area.csv"))
These faceted point charts show the yearly number of visitors to each national park as a function of time (i.e., yearly from 2015 to 2024) in an attempt to answer the first question “How have visitor numbers changed over the past ten years at national parks?”.
# Create point plots showing visit numbers by year
intermountain %>%
ggplot(aes(x = Year, y = Visits)) +
geom_point() +
# Create individual charts for each park based on name (wrap park name)
facet_wrap(~ParkName, labeller = label_wrap_gen()) +
# Include comma separators for the visitor count (y-axis)
# Provides the benefit of standard notation
scale_y_continuous(labels = scales::label_comma()) +
# Use a simple pre-programmed theme
theme_bw() +
# Change the x-axis text direction to improve legibility
theme(axis.text.x = element_text(angle = 90)) +
# Label the chart
labs(title = "Number of Visitors to National Parks",
subtitle = "Intermountain Region, 2015 - 2024",
caption = "Figure 1: Number of visitors to national parks. The annual
number of visitors to national parks in the intermountain
region between 2015 to 2024. Data from Walsh and Keyes, 2024.",
x = "Years",
y = "Number of Visitors")
Analyzing the charts reveal that visitor numbers vary throughout the region over time. This data reveals other interesting insights at individual parks as well. For instance:
This chart attempts to answer the second question “How do visitor numbers compare between national parks?” Presented here is a series of box-and-whisker plots that relay the relative statistics for each national park. A point plot of the yearly number of visitors is overlaid on the respective box-and-whisker plot.
# Create box-and-whisker and point plots of visit levels by park and by year
intermountain %>%
ggplot(aes(x = ParkName, y = Visits)) +
geom_boxplot() +
# Overlay the points on the boxplot, color by year for additional variable
geom_point(aes(x = ParkName, y = Visits, color = Year)) +
# Flip axes to improve legibility
coord_flip()+
# Use a simple pre-programmed theme
theme_bw() +
# Include comma separators for the visitor count (y-axis)
scale_y_continuous(labels = scales::label_comma()) +
# Change color scheme to improve legibility
scale_color_brewer(palette = "Paired") +
# Add labels
labs(title = "Statistics of National Park Visits",
subtitle = "Intermountain Region, 2015 - 2024",
caption = "Figure 2: Statistics of national park visits. The statistics
of the annual number of visitors to intermountain national parks
between 2015 to 2024. Data from Walsh and Keyes, 2024.",
x = "National Park",
y = "Number of Annual Visitors")
# Save this chart by removing the comment below (#)
# ggsave(here("output","NP Visitor Statistics.png"))
Although this chart shows similar data to the prior chart, this one better distinguishes the minimum and maximum number of visitors by year and also facilitates comparison between the parks. For instance:
The scatterplot attempts to answer the question “Is there a correlation between the park’s area and its average number of visitors?” The parks’ areas are measured in square miles. In the dataset, each park’s area is repeated multiple times (one instance per year), hence the mean calculation in the code below. The average number of visitors is calculated by averaging each park’s visitors per year during the ten-year study. The correlation is modeled with a linear regression line (blue) and its confidence interval (gray). The number of park visitors does not appear correlated to the park’s area because of the lack of data points alongside the linear model or within the confidence interval.
# Create scatter plot of mean visitors and park area
parkarea %>%
# Prepare data by grouping park name and summarizing each park's data
group_by(ParkName) %>%
summarise(meanvis = mean(Visits),
meanarea = mean(Area_mi2)) %>%
# Create scatter plot of mean values
ggplot(aes(x= meanarea, y = meanvis)) +
geom_point(aes(color=ParkName))+
# Add linear regression line
geom_smooth(method = "lm") +
# Include comma separators for legibility and standard notation
scale_y_continuous(labels = scales::label_comma()) +
scale_x_continuous(labels = scales::label_comma()) +
# Use a simple pre-programmed theme
theme_bw() +
# Add labels
labs(title = "Comparing the Park's Area to its Average \nNumber of Visitors",
subtitle = "Intermountain Region, 2015 - 2024",
caption = "Figure 3: Linear correlation between the park's area and the
average number of annual visitors. The chart only captures intermountain
national parks and their visitors between 2015 to 2024. Area measured is
in square miles. Data from Walsh and Keyes, 2024.",
x = "Area in square miles",
y = "Average number of visitors")
Esri. (2026). USA Parks. [Data]. Retrieved March 5, 2026 from https://www.arcgis.com/home/item.html?id=f092c20803a047cba81fbf1e30eff0b5.
Walsh, M., and Keyes, Os. 2024. U.S. National Park Visit Data (1979-2024) from Responsible Datasets in Context. [Data]. Retrieved March 16, 2026 from https://www.responsible-datasets-in-context.com/posts/np-data/.