In this exercise, you will explore a dataset containing annual high school graduation rates in all fifty states (plus the District of Columbia) from the three most recent years (data from the 2013-14 school year are not yet available). Read more about these data on the NCES website. Ensure that the .csv file containing the data is saved in the same folder as this .Rmd file. Read in the data by running the two lines below.

ACGR <- read.csv("ACGR 2010-11 to 2012-13.csv", as.is = TRUE)
head(ACGR)
##        State Abbr SY2010_11 SY2011_12 SY2012_13
## 1    Alabama   AL        72        75        80
## 2     Alaska   AK        68        70        72
## 3    Arizona   AZ        78        76        75
## 4   Arkansas   AR        81        84        85
## 5 California   CA        76        79        80
## 6   Colorado   CO        74        75        77
  1. How many rows and how many columns does the dataset contain?
dim(ACGR)
## [1] 51  5
  1. Show the last 3 rows of the dataset using the tail function.
tail(ACGR, n=3L)
##            State Abbr SY2010_11 SY2011_12 SY2012_13
## 49 West Virginia   WV        78        79        81
## 50     Wisconsin   WI        87        88        88
## 51       Wyoming   WY        80        79        77
  1. What were the highest and lowest graduation rates during the 2012-13 school year? Store these values in the variables grad_min and grad_max.
grad_min <- min(ACGR$SY2012_13,na.rm=TRUE)
grad_min
## [1] 62
grad_max <- max(ACGR$SY2012_13,na.rm=TRUE)
grad_max
## [1] 90
  1. Use the subset function to find the states with the highest and lowest graduation rates during the 2012-13 school year.
subset(ACGR,SY2012_13 == min(ACGR$SY2012_13,na.rm=TRUE))
##                  State Abbr SY2010_11 SY2011_12 SY2012_13
## 9 District of Columbia   DC        59        59        62
subset(ACGR,SY2012_13 == max(ACGR$SY2012_13,na.rm=TRUE))
##    State Abbr SY2010_11 SY2011_12 SY2012_13
## 16  Iowa   IA        88        89        90
  1. Find the median of the 2011-12 graduation rates and save it as a variable. Use subset to create a list of states whose 2011-12 graduation rates were below that year’s median.
medSY_2011_12 <- median(ACGR$SY2011_12,na.rm = TRUE)
medSY_2011_12
## [1] 80.5
subset(ACGR,SY2011_12 < medSY_2011_12)
##                   State Abbr SY2010_11 SY2011_12 SY2012_13
## 1               Alabama   AL        72        75        80
## 2                Alaska   AK        68        70        72
## 3               Arizona   AZ        78        76        75
## 5            California   CA        76        79        80
## 6              Colorado   CO        74        75        77
## 8              Delaware   DE        78        80        80
## 9  District of Columbia   DC        59        59        62
## 10              Florida   FL        71        75        76
## 11              Georgia   GA        67        70        72
## 19            Louisiana   LA        71        72        74
## 23             Michigan   MI        74        76        77
## 24            Minnesota   MN        77        78        80
## 25          Mississippi   MS        75        75        76
## 29               Nevada   NV        62        63        71
## 32           New Mexico   NM        63        70        70
## 33             New York   NY        77        77        77
## 34       North Carolina   NC        78        80        83
## 38               Oregon   OR        68        68        69
## 40         Rhode Island   RI        77        77        80
## 41       South Carolina   SC        74        75        78
## 45                 Utah   UT        76        80        83
## 48           Washington   WA        76        77        76
## 49        West Virginia   WV        78        79        81
## 51              Wyoming   WY        80        79        77
  1. Calculate the mean of the 2012-13 graduation rates. How does this number compare to the overall national average reported on the NCES website? Bonus (+1): Give at least two reasons that the two numbers might not agree exactly.
mean(ACGR$SY2012_13,na.rm = TRUE)
## [1] 81.14

The mean I found (81.14) is the same as the national average reported on the NCES website for 12-13 (81).

  1. Use the cor function to calculate the year-to-year correlation of the graduation rates from 2010-11 to 2011-12 and from 2011-12 to 2012-13. Use complete cases only, ignoring states that are missing one or both graduation rates. Store the results as variables. Calculate the average year-to-year correlation.
cor1011_1112 <- cor(ACGR$SY2010_11, ACGR$SY2011_12, use = "complete.obs")
cor1011_1112
## [1] 0.9757277
cor1112_1213 <- cor(ACGR$SY2012_13, ACGR$SY2011_12, use = "complete.obs")
cor1112_1213
## [1] 0.9727099
avg_cor <- (cor1011_1112 + cor1112_1213)/2
avg_cor
## [1] 0.9742188
  1. Use the plot function to create a scatterplot with the 2011-12 graduation rates on the x axis and the 2012-13 graduation rates on the y axis. Be sure to label the axes appropriately. Bonus (+0.5): Use the text function to label each point using the two-letter abbreviation for the state.
plot(ACGR$SY2011_12, ACGR$SY2012_13)
text(ACGR$SY2011_12, ACGR$SY2012_13, labels = ACGR$Abbr,adj = 0)

  1. Which state had the largest change in graduation rates between 2010-11 and 2012-13? Was it an increase or a decrease?
diff1011_1213 <- ACGR$SY2012_13 - ACGR$SY2010_11
max(diff1011_1213, na.rm = TRUE)
## [1] 9
min(diff1011_1213, na.rm = TRUE)
## [1] -3
ACGR$Abbr[which.max(diff1011_1213)]
## [1] "NV"

Nevada had the largest change in graduation rates between 10-11 and 12-13, with an increase of 9.

Bonus (+1)

What is the correlation between a state’s graduation rate and its median household income in 2012? To answer this question, you’ll need to do the following:

Median_Income <- read.csv("Median Household Income.csv", as.is = TRUE)
cor(ACGR$SY2011_12,Median_Income$X2012Median_Household_Income, use = "pairwise.complete.obs")
## [1] 0.1233957

Bonus (+1)

Complete the first six sections of the TryR Code School. Take a screenshot showing the completed “badges” and send it to Maura.