We aim to investigate the relationship between zip code and local demographics. What characteristics, trends, and relationships do we see?
#Note: we throw out the "junk" results with zero participants.
zipDemos <- read.csv("Demographic_Statistics_By_Zip_Code.csv")
zipDem <- zipDemos %>% filter(COUNT.PARTICIPANTS>10)
#summary(zipDem$COUNT.PARTICIPANTS)
Let’s investigate the percentage of resident who receive public assistance. What summary statistics would be appropriate? What visual summaries?
summary(zipDem$PERCENT.RECEIVES.PUBLIC.ASSISTANCE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0300 0.2325 0.3550 0.3522 0.4525 0.8000
qplot(PERCENT.RECEIVES.PUBLIC.ASSISTANCE,
geom="histogram",
data=zipDem,
bins=15,
xlab="Percentage",
ylab="Frequency",
main="Percentage of Participants on Public Assistance") +
theme_minimal()
Ok, we’ve got the basic picture. The typical proportion is around 35%, with an IQR (spread) of about .2 (in other words, the middle half of the data lies between .2325 and .4525). There’s also a bit of right-skew: a smaller number of zip codes have a very high percentage of assistance.
Let’s dig deeper: what factors are correlated with PPPA?
Let’s look for relationships between our variables. Is PPPA correlated with some other factor? First, let’s see if gender has an impact.
qplot(PERCENT.MALE, PERCENT.RECEIVES.PUBLIC.ASSISTANCE,
geom=c("point","smooth"),
data=zipDem,
xlab="Percent Men",
ylab="Percent of Reisdents Receiving Assitance",
main="Percent Men vs Percent Receiving Assistance")+theme_minimal()
Oooh! How interesting!
qplot(PERCENT.US.CITIZEN, PERCENT.RECEIVES.PUBLIC.ASSISTANCE,
geom=c("point","smooth"),
data=zipDem,
xlab="Percent Citizens",
ylab="Percent of Reisdents Receiving Assitance",
main="Percent Citizens vs Percent Receiving Assistance")+theme_minimal()