Intro

We aim to investigate the relationship between zip code and local demographics. What characteristics, trends, and relationships do we see?

#Note: we throw out the "junk" results with zero participants.

zipDemos <- read.csv("Demographic_Statistics_By_Zip_Code.csv")
zipDem <- zipDemos %>% filter(COUNT.PARTICIPANTS>10) 


#summary(zipDem$COUNT.PARTICIPANTS)

Let’s investigate the percentage of resident who receive public assistance. What summary statistics would be appropriate? What visual summaries?

summary(zipDem$PERCENT.RECEIVES.PUBLIC.ASSISTANCE) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0300  0.2325  0.3550  0.3522  0.4525  0.8000
qplot(PERCENT.RECEIVES.PUBLIC.ASSISTANCE, 
      geom="histogram", 
      data=zipDem, 
      bins=15,
      xlab="Percentage",
      ylab="Frequency",
      main="Percentage of Participants on Public Assistance") +
  theme_minimal()

Ok, we’ve got the basic picture. The typical proportion is around 35%, with an IQR (spread) of about .2 (in other words, the middle half of the data lies between .2325 and .4525). There’s also a bit of right-skew: a smaller number of zip codes have a very high percentage of assistance.

Let’s dig deeper: what factors are correlated with PPPA?

2 - Correlates for PPPA

Let’s look for relationships between our variables. Is PPPA correlated with some other factor? First, let’s see if gender has an impact.

qplot(PERCENT.MALE, PERCENT.RECEIVES.PUBLIC.ASSISTANCE,
      geom=c("point","smooth"),
      data=zipDem,
      xlab="Percent Men",
      ylab="Percent of Reisdents Receiving Assitance",
      main="Percent Men vs Percent Receiving Assistance")+theme_minimal()

Oooh! How interesting!

qplot(PERCENT.US.CITIZEN, PERCENT.RECEIVES.PUBLIC.ASSISTANCE,
      geom=c("point","smooth"),
      data=zipDem,
      xlab="Percent Citizens",
      ylab="Percent of Reisdents Receiving Assitance",
      main="Percent Citizens vs Percent Receiving Assistance")+theme_minimal()