Notes

Before we can answer any questions, we load the openintro package and the county dataset again here, because variables that exist in the console don’t necessarily exist in the R Markdown workspace unless you declare them.

library(openintro)
library(mosaic)
data(county)

Please consult the “R Markdown/Knit Debugging” document on Moodle if you’re stuck.

Total out of 4 pts.

Question 1

Question

(2 points) Re-using code from the in-lab exercise, look at some summary measures of federal spending in each of the counties. Describe the distribution of spending across these counties.

Answer

summary(county$fed_spend)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   6.964   8.669   9.991  10.860 204.600       4
histogram(~fed_spend, data=county, type="percent")

Looking at the histogram (using percentages on the y-axis), we see that most counties (at least 90%) get between 0 and around $25 per capita (assuming those are the units); very few get more. This is a very right/positively skewed data set (right tail = right skew), hence the median is lower than the mean. There are a couple of counties that get a very large amount. While I didn’t expect you to be able to do this, let’s look at the top 5. This code

  • selects the name, state, fed_spend variables
  • arranges them in descending order of fed_spend
  • slices only the top 5
library(dplyr)
county %>% 
  select(name, state, fed_spend) %>% 
  arrange(desc(fed_spend)) %>% 
  slice(1:5)
##                   name                state fed_spend
## 1 Chattahoochee County              Georgia 204.61569
## 2    Los Alamos County           New Mexico 146.75994
## 3 District of Columbia District of Columbia 100.90907
## 4        Austin County                Texas  97.90551
## 5    Falls Church city             Virginia  93.78949

I don’t know what’s special about Chattahoochee County GA, but Los Alamos has a national research lab while DC is DC!

Question 2

Question

(2 points) One would hope that the federal government spends more per capita in counties with higher rates of poverty. Is this true?

Answer

Here is the bivariate plot with poverty on the x-axis. While it is not technically wrong to put poverty on the y-axis, I would argue it makes more sense on the x-axis because the way the question is phrased, it is asking if the federal government responds to poverty and spends accordingly.

xyplot(fed_spend ~ poverty, data=county)

It’s a bit hard to tell what’s going on because

  • the outlying points on the top are squishing the action below
  • the large overlap in points

So let’s narrow the display y-axis display range by setting the argument ylim=c(0,50)

xyplot(fed_spend ~ poverty, data=county, ylim=c(0, 50))

Now we see that there does seem to be a slight positive relationship between the two variables. i.e. as poverty increases, so also is there an increase in federal spending. Later in this class, we will study regression i.e. the best fitting line. We plot the regression line and the positive relationship is apparent!

regression <- lm(fed_spend ~ poverty, data=county)
plotModel(regression, ylim=c(0,50))