Context: The San Antonio Public Works Department (PWD) will be providing 1) Pavement Condition Index (PCI) scores for all streets in city Council District 2 (D2) for both 2023 and 2019, and a list of all pavement projects completed in the last 5 years. For the time being, I have a list of pavement projects for the upcoming 5 years along with the PCI scores for that limited selection of streets.
For now, I will perform the basic exercises with the limited data I have. Once I have the full data on-hand, I will use that instead. The Director of 311 will also be sending me the last 5 years of pavement maintenance calls. If it comes in soon enough, I will integrate it into my work along with Census ACS demographic data.
If I only use the PWD data, I will analyze the relationship between PCI measurements and PWD’s selection of IMP projects. With the data on-hand at the moment, I am only analyzing the relationship between the PCI score and the dollars invested in projects that are selected.
If I obtain the other data in time, I may analyse the relationship between those factors, the effect of 311 calls, and possible ways the PWD response may differ along the lines of race and/or income.
IMP <- read_excel("IMP.xlsx")
summary(IMP$PCI)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 51.97 78.89 68.81 91.30 98.44
summary(IMP$Cost)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 133.8 5375.0 21826.6 96525.3 104759.8 1748162.5
From my understanding, there is a negative correlation between PCI and dollars invested, so the worse the street, the more money the project is given.
cor(IMP$PCI,IMP$Cost)
## [1] -0.4269036
hist(IMP$PCI)
hist(IMP$Cost)
library(ggplot2)
ggplot(IMP,aes(IMP$PCI,IMP$Cost)) +geom_point()
## Warning: Use of `IMP$PCI` is discouraged.
## ℹ Use `PCI` instead.
## Warning: Use of `IMP$Cost` is discouraged.
## ℹ Use `Cost` instead.
There are some extreme outliers, so I will try to cut out any project
over 1,000,000 to start (I ended up changing that to 75,000 for a better
view)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
IMPClean<-IMP %>% filter(Cost<750000)
ggplot(IMPClean,aes(PCI,Cost)) +geom_point()
The plot and correlation number did not drastically change.
cor(IMPClean$PCI,IMPClean$Cost)
## [1] -0.4822869
``` In short, the pattern still remained that there is a moderate negative correlation between PCI score and dollars invested in IMP projects. Future projects will also incorporate PCI data from streets that are not included on the IMP