For my final project, I will be analyzing how the distribution of singletrack trails correlates with population density across Utah Census Tracts. To do this, I will be using two datasets. The first contains Census Tract data from 2010. This is a shapefile which also includes area and population variables of each tract. This will be used to calculate population density. The second shapefile consists of identified trails across the state. The datasets are further discussed below:
This Utah Census Tract dataset has been used in several of our assignments throughout the semester. While we have used tract populations in some of our analysis, we have not yet combined them with the corresponding Tract area to compute population density. I feel that this is a much more sound measure as it allows us to better compare each tract against each other.
Data Citiation: UGRC, 2010, 2010 U.S. Census Bureau Data, Utah Division of Geographic Information, https://gis.utah.gov/data/demographic/census/
This map layer represents off-street features making up Utah’s recreation trail and transportation network. This data was originally collected and purchased in 2014 and revised by several entities throughout the years. The latest update was made in November 2023. Each trail and pathway is classified based on the surface and user type. This will allow me to focus solely on soft-surface singletrack trails designated for hiking and biking. There is no particular reason for this focus other than personal interest. This said, the results may conclude a lack of singletrack trails in a particular tract while other trail types are abundant. This bias and error must be taken into account. Furthermore, it is important to note that while great efforts have been made to include as many trails and pathways as possible, this is not a complete list. This said, this will introduce error into our results.
Data Citation: UGRC, 2020, Utah Trails and Pathways, Utah Division of Geographic Information, https://gis.utah.gov/data/recreation/trails/
RQ1: Are there areas of high population density with a lack of singletrack trail access?
To address this question, I will first create a new variable with values representing the population density of each observation (tract). This is easily computed by dividing the population (POP100) by the tract area (AREALAND). Two considerations need to be made for this. First, I will be strictly using land area and not the combination of land area and water area. This is due to the assumption that none of the population lives within the water area and no trails are present on the water area. Secondly, the tract area is presented in square meters. To increase the readability of the population density values, I will be dividing the area by 1,000,000 to represent square kilometers. Secondly, I will use the tract polygons to add a new variable to the trail dataset, which includes which tract the trail resides in. I can then do a simple analysis of the average trail accessibility in relation to the population density.
One potential limitation to this analysis is that trails may intersect multiple tracts. To address this, I will assign the trail with the tract it resides in the most…. I don’t know how I will do this quite yet, but I will try my best! Ha!
RQ2: Are there areas of high population density with a lack of singletrack access in terms of trail length?
This analysis would be very similar to RQ1 but instead of simply calculating the number of directly available trails, I would calculate the number of kilometers of singletrack trail within a particular census tract. this may be a more reliable and comparable measure as two tracts may have the same number of trails, yet one may have longer trails, thus being more acceptable.
RQ3: Are census tracts with more singletrack trails generally more densely populated? Is this a linear relationship?
This question builds off of the previous questions by taking the trail-density value of each tract and determining if there is a linear relationship between the two.
library("tidyverse")
library("ggExtra")
library("sf")
library("spdep")
library("spgwr")
library("spatialreg")
library("scales")
## Reading layer `CensusTracts2010' from data source
## `/Users/glendonvansandt/Desktop/NR6950 - R/Final/Utah_Census_Tracts_2010/CensusTracts2010.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 588 features and 32 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -114.053 ymin: 36.99795 xmax: -109.0411 ymax: 42.00162
## Geodetic CRS: WGS 84
## Reading layer `TrailsAndPathways' from data source
## `/Users/glendonvansandt/Desktop/NR6950 - R/Final/Utah_Trails_and_Pathways/TrailsAndPathways.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 41727 features and 27 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: -114.0435 ymin: 36.93035 xmax: -108.9807 ymax: 42.09105
## Geodetic CRS: WGS 84
To be classified as tidy, data must follow these three rules:
Every column is a variable.
Every row is an observation.
Every cell is a single value.
view(trails)
view(tract)
when inspecting the data, it is clear that both datasets follow each of these rules, thus making them tidy.
Fortunately, both datasets are relatively clean, thus making it quite easy for me to use. While its not technically ‘cleaning’ the data, I will need to omit all existing trails not classified as dirt or unpaved or permitting hiking and/or biking.
hbst <- trails[trails$SurfaceTyp == "Unpaved" & trails$Status == "EXISTING" & trails$Class == "Trail" & trails$CartoCode %in% c("1 - Hiking Only", "2 - Hiking and Biking Allowed", "5 - Biking Only"), ]
Additionally, the land area variable was categorized as categorical even though it was numerical. Ill have to change this. While I’m at it, I might as well create new variables. One of which is converting the area to square kilometers. The other is the computed population density.
tract$AREALAND <- as.numeric(tract$AREALAND)
tract <- tract %>%
mutate(AREALAND_km2 = AREALAND / 1e6)
tract <- tract %>%
mutate(popdens = POP100 / AREALAND_km2)
I started by simply looking at the summary statistics of each of my variables.
summary(tract$POP100)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3438 4442 4700 5786 21591
# The minimum tract population is 0 residents while the maximum is 21591 residents. The average population of the tract is 4442 residents.
summary(tract$AREALAND_km2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.467 2.067 4.059 361.936 22.716 14822.635
# The smallest tract had a land area of 0.467 square kilometers while the largest was 14822 square kilometers. This said, the average land area of all Utah Census Tracts was 4.059 square kilometers.
summary(tract$popdens)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 230.9 1185.9 1330.2 2004.5 11352.3
# Tract population density ranged from 0 people per square kilometer to 11352.3 people per square kilometer. The average population density was 1185.9 people per square kilometer.
Because the tract data was numerical, I could visualize the variables better than the trail data.
# There are several outliers but one sticks out from the rest. This represents a tract in SLC with a population of more than 20,000 people. Because the data seems to be accurate, I will not be removing it from the dataset.
boxplot(tract$POP100, main = "Distribution of Tract Population")
# Much like the population data, there are no land areas that I deemed necessary for removal.
boxplot(tract$AREALAND_km2, main = "Distribution of Land Area")
# There are 6 outliers identified for population density. These are all located in the heart of SLC and thus make sense why it is so densely populated. This said, I will keep them in the dataset.
boxplot(tract$popdens, main = "Distribution of Population Density")
I also wanted to plot the population density to gain a better understanding of how the data was spatially related.
ggplot() +
geom_sf(data = tract, aes(fill = popdens)) +
scale_fill_distiller(palette = "Spectral", breaks = pretty_breaks(n = 5)) +
labs(title = "Population Density by Census Tract", fill = "Population Density") +
theme_minimal()
Looking at the population of the tract data, an issue was noticed. There are three areas representing public lands with no permanent residents, thus having a population of 0. One in particular represents the Wasatch–Cache National Forest. This tract contains the main source of trails for Residents of Cache Valley. This means that while cache valley has lots of singletrack available, their trail-density value will be very low. To address this, I will create a variable that includes trails within a 10km radios of the tract boundary. This would allow Logan, UT tracts to include some of the trails in Logan Canyon and Green Canyon.
First, I wanted to visualize the trail user type distribution.
# This has shown that the majority of the filtered trails are multi-use with hiking-only trails being more predominant than cycling-only trails. These results are expected as most trails are managed as multi-use. Furthermore, wilderness areas do not allow bicycles which may provide one explanation for the difference between hiking-only and biking-only trails.
barplot(table(hbst$CartoCode), main = "Distribution of User Types", xlab = "User Types", ylab = "Number of Trails")
I then wanted to visualize the trails a couple of ways.
#Simply plotting the points
ggplot() +
geom_sf(data = hbst,) +
theme_minimal()