Context: The San Antonio Public Works Department (PWD) will be providing 1) Pavement Condition Index (PCI) scores for all streets in city Council District 2 (D2) for both 2023 and 2019, and a list of all pavement projects completed in the last 5 years. For the time being, I have a list of pavement projects for the upcoming 5 years along with the PCI scores for that limited selection of streets.

For now, I will perform the basic exercises with the limited data I have. Once I have the full data on-hand, I will use that instead. The Director of 311 will also be sending me the last 5 years of pavement maintenance calls. If it comes in soon enough, I will integrate it into my work along with Census ACS demographic data.

If I only use the PWD data, I will analyze the relationship between PCI measurements and PWD’s selection of IMP projects. With the data on-hand at the moment, I am only analyzing the relationship between the PCI score and the dollars invested in projects that are selected.

If I obtain the other data in time, I may analyse the relationship between those factors, the effect of 311 calls, and possible ways the PWD response may differ along the lines of race and/or income.

IMP <- read_excel("IMP.xlsx")
summary(IMP$PCI)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   51.97   78.89   68.81   91.30   98.44
summary(IMP$Cost)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     133.8    5375.0   21826.6   96525.3  104759.8 1748162.5

From my understanding, there is a negative correlation between PCI and dollars invested, so the worse the street, the more money the project is given.

cor(IMP$PCI,IMP$Cost)
## [1] -0.4269036
hist(IMP$PCI)

hist(IMP$Cost)

library(ggplot2)
ggplot(IMP,aes(IMP$PCI,IMP$Cost)) +geom_point()
## Warning: Use of `IMP$PCI` is discouraged.
## ℹ Use `PCI` instead.
## Warning: Use of `IMP$Cost` is discouraged.
## ℹ Use `Cost` instead.

There are some extreme outliers, so I will try to cut out any project over 1,000,000 to start (I ended up changing that to 75,000 for a better view)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
IMPClean<-IMP %>% filter(Cost<750000)

ggplot(IMPClean,aes(PCI,Cost)) +geom_point()

The plot and correlation number did not drastically change.

cor(IMPClean$PCI,IMPClean$Cost)
## [1] -0.4822869

``` In short, the pattern still remained that there is a moderate negative correlation between PCI score and dollars invested in IMP projects. Future projects will also incorporate PCI data from streets that are not included on the IMP