Scraping the MA school expenditures data, and a bit of data cleaning:
library(XML, quietly=T)
library(ineq, quietly=T)
suppressPackageStartupMessages(library(reldist, quietly=T))
url <- "http://profiles.doe.mass.edu/state_report/ppx.aspx"
tables <- readHTMLTable(url)
schools<- tables[[2]]
schools <- schools[schools$V1!="MASSACHUSETTS TOTAL",]
schools$V7 <- gsub("[$]", "", as.character(schools$V7))
schools$V7 <- as.numeric(gsub("[,]", "", as.character(schools$V7)))
schools$V3 <- as.numeric(gsub("[,]", "", as.character(schools$V3)))
schools<-schools[complete.cases(schools),]
We can measure inequality in a number of ways. Graphically, we can look at the Lorenz curve:
Lc.educ <- Lc(schools$V7, schools$V3)
plot(Lc.educ, main="Lorenz Curve for Educational Expenditures, 2011-12")
Or just look at the density or histogram:
hist(schools$V7)
We can also use scalar inequality indices, like the Gini index:
gini(schools$V7, schools$V3)
## [1] 0.09265
For context, the gini measure of income inequality in 2011 for Massachusetts was about 0.48.
How has this changed over time? Let’s examine the first year of data (manually grabbed, since the Javascript doesn’t play well with the XML library).
schools2005 <- read.csv("~/Documents/ma_schools2005.csv", header=FALSE)
schools2005$V7 <- gsub("[$]", "", as.character(schools2005$V7))
schools2005$V7 <- as.numeric(gsub("[,]", "", as.character(schools2005$V7)))
schools2005$V3 <- as.numeric(gsub("[,]", "", as.character(schools2005$V3)))
schools2005<-schools2005[complete.cases(schools),]
schools2005<-schools2005[schools2005$V1 %in% schools$V1,]
gini(schools2005$V7, schools2005$V3)
## [1] 0.1068
What about Lorenz dominance?
schools<-schools[schools$V1 %in% schools2005$V1,]
Lc.educ <- Lc(schools$V7, schools$V3)
Lc2005.educ <- Lc(schools2005$V7, schools2005$V3)
plot(Lc.educ, main="Lorenz Curve for Educational Expenditures, 2011-12")
lines(Lc2005.educ, col="red")
min(Lc.educ$L -Lc2005.educ$L)
## [1] -0.01161
plot(Lc.educ$L -Lc2005.educ$L, type="l", main="Difference in Lorenz Curve ordinates, 2011 vs. 2005")
So we can’t say anything for sure about Lorenz dominance.