Effects of Public Education Spending on the Economy

Research Question:

It seems that every other day, there are debates going on from local governments all the way to the federal government on the funding of public education. Some want more than others, and some want to gut the whole public education system entirely. One The question I wanted to ask was, what are the real world economic effects of public education? Just how much is the economy influenced by public education financing? To measure the economic effects of public education, I will be looking at GDP growth, median income, and unemployment. The population of interest is U.S. state public school systems.

Background:

In the United States, the federal government contributes about 10% of the money needed to finance public education throughout the country, irregardless of the state. This is the lowest number in the developed world. The rest of the funding for schools is the responsibility of the state and local government. Most states source education funding through sales and income taxes. On the local level, property taxes are used; set by the local school board and local officials.

Some of the pros of cutting public school financing are that it saves the states money. These days, nearly every state in the U.S is in serious debt. The state of California alone owes more than $617 Billion. In total, America’s state governments owe more than $4 Trillion. The potential savings from cuts to public education may be alluring to many states suffering from debt problems.

Public education financing is often the target for conservative politicians eager to reduce taxes for the populace and moreover for corporations. It is an easy target; public education financing can take up a massive 40% of state budgets, and the ramifications of poor public education funding would not be felt for years to come. By that time, the politicians that pushed for such cuts may not even be in office anymore.

Data Explanation:

Financing for public education varies from state to state, as does the economy. To answer the research question, I will be comparing the financing of public education vs the economy on a state by state basis. To capture the effects of public education, I will look at average public education funding from 2005 per student for each state. This data includes revenue from local property tax, transportation payments, school lunch charges, direct state aid, federal aid, and investments. This data comes from a survey by the U.S. Census Bureau that covers all public school systems that provide elementary or secondary education. The data is structured by listing the per student spending on public education for each State.

I will be comparing that data to the economy of each state in 2016. The reason why I will be looking at the economic data 11 years after the report for public education financing is because generally, the effects of public education take time to be seen. This is because the students that benefit from public education are still to young to join the workforce, so they would not be contributing the the economy. By looking at economic data 11 years after the 2005 public education financing report, we can analyze the effects on the economy of 11 generations of students entering the workforce. This is a total of 36.3 million students over the course of 11 years.

One of the variable that I will be using to assess the situation of the economy is the unemployment rate. The Bureau of Labor Statistics defines the unemployment rate as the percentage of unemployed workers in the total labor force. It is widely recognized as a key indicator of labor market performance and the economy at large. This is because when workers are unemployed, their families lose wages; and the state and nation as a whole loses their contribution to the economy in terms of the goods and services that could have been produced. Unemployed workers also lose their purchasing power, which decreases consumer spending; a direct effect on the economy. The reason why I am looking at the unemployment rate is because theoretically speaking, a more educated populace should be a more employable populace. Workers that are well educated should have a competitive advantage over those who are not so. So theoretically speaking, the states with the most public education funding should have a more employable workforce and have a lower unemployment rate. The data for unemployment rates come from the February 2018 News Release on State Unemployment from the Bureau of Labor our Statistics. The data is structure organizes the unemployment rate for 2016 and 2017 for each state.

Another variable being used to analyze the economic situation is median income. Median income is defined as the amount that divides the income distribution into two equal groups, half having income above the amount and half having income below that amount. Using median income rather than mean income is seen as a much more accurate picture of the typical income of the middle class since data will not be skewed by gains and abnormalities in the extreme ends. The reason why I am using median income as a variable potentially influenced by public education financing is because theoretically speaking, a more educated workforce should be more productive and be making more money. If public education financing is indeed influencing the economy, then states with higher public education spending should have higher median wages. The source for this data is the U.S. Census Bureau 2017 Current Population Survey, Annual Social and Economic Supplements. Its structured by matching median annual household income for every state.

The final variable being used to analyze the economic impact of public education spending is GDP growth. The GDP Growth Rate is the rate at which a the Gross Domestic Product (GDP) changes from one year to another. GDP is the market value of all the goods and services produced in a particular time period. It is one of the primary indicators used to gauge the health of the economy. The reason why GDP growth is being used for this analysis is because theoretically speaking, a state with a well educated populace will grow the economy more compared to a state with a less educated populace. If more public school financing results in a better educated populace, then there should be an increase in GDP growth. The GDP growth data used for this analysis comes from the Bureau of Economic Analysis, U.S. Department of Commerce. It is structured by listing real GDP growth rates for each quarter of 2016 and 2017 as well as 2016 full year by state.

## Warning: package 'readxl' was built under R version 3.4.4

Descriptive Statistics:

##            Range Frequency
## 1    5000 - 6000         1
## 2    6000 - 7000         6
## 3    7000 - 8000        14
## 4    8000 - 9000        11
## 5   9000 - 10000         6
## 6  10000 - 11000         6
## 7  11000 - 12000         3
## 8  12000 - 13000         1
## 9  13000 - 14000         1
## 10 14000 - 15000         1

Figure 1 is a distribution of Education Spending per Student, the red line indicates the mean at 8750.29. There are 22 states that spend more than the mean. We can see that the distribution is skewed right. The variance of the distribution is 3793619. Raw Data Source: U.S. Census Bureau

##        Range Frequency
## 1  -6 <-> -5         2
## 2  -5 <-> -4         1
## 3  -4 <-> -3         0
## 4  -3 <-> -2         0
## 5  -2 <-> -1         2
## 6   -1 <-> 0         3
## 7    0 <-> 1        14
## 8    1 <-> 2        14
## 9    2 <-> 3        10
## 10   3 <-> 4         3
## 11   4 <-> 5         1

Figure 2 is a frequency distribution of the Real GDP Growth in 2016. The red line marks the mean GDP growth at 0.988. We can see that with exception of a few outliers on the low end, the data is actually fairly normally distributed. The variance of the distribution is 3.74. Raw Data Source: Bureau of Economic Analysis, US Department of Commerce

##     Range Frequency
## 1 2 <-> 3         3
## 2 3 <-> 4        13
## 3 4 <-> 5        16
## 4 5 <-> 6        14
## 5 6 <-> 7         4

Figure 3 is a frequency distribution of the unemployment rate by state. The red line marks the average unemployment rate at 4.65. The variance of the distribution is 0.99 Raw Data Source: Bureau of Labor Statistics

##           Range Frequency
## 1 40000 - 50000         6
## 2 50000 - 60000        24
## 3 60000 - 70000        10
## 4 70000 - 80000        10

Figure 4 is a frequency distribution of median incomes of states. The red line marks the average median income across all 50 states at $59418.16. The variance of the distribution is 80835327. Raw Data Source: US Census Bureau

Regressions:

## 
## Call:
## lm(formula = mainData$`2016` ~ mainData$`Education Spending per Student`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1004 -0.4587  0.2538  1.1126  2.9857 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                                2.6519432  1.2608269   2.103
## mainData$`Education Spending per Student` -0.0001902  0.0001407  -1.351
##                                           Pr(>|t|)  
## (Intercept)                                 0.0407 *
## mainData$`Education Spending per Student`   0.1829  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.919 on 48 degrees of freedom
## Multiple R-squared:  0.03665,    Adjusted R-squared:  0.01658 
## F-statistic: 1.826 on 1 and 48 DF,  p-value: 0.1829

In figure 5, we want to plot these two variables against each other to see how GDP growth changes when we increase the public education spending; public education spending being our independent variable that we can theoretically control to see the impact on GDP growth, the dependent variable. The linear regression line is plotted in red. We can see that when we plot GDP growth rate against public education spending, the GDP growth rate actually decreases. This contradicts our earlier prediction that higher public education spending would be associated with higher GDP growth rates.

The R-squared for the regression is only 3.7% which suggests that the relationship is not strong. Moreover, the correlation between the variables is -0.19, indicating a weak negative correlation. Just because the GDP growth rate of states that spend more on public education is low does not imply that public education financing influences GDP growth rates negatively. There are a host of many reason as to why this relationship is taking place. Most likely, the states that spend more on education are already large economies, so their GDP growth rates will naturally be much smaller compared to poorer states. It is easy to post a huge GDP growth rate if you are changing your GDP from 1 to 2 (Growth rate of 100%) as opposed to 100 to 110 (Growth rate of 10%). Because of all this, it is most probable that public education financing is not a good predictor of future GDP growth.

The confidence intervals for this regression are -4.741308e-04 <-> 9.381373e-05, The T-stat is -1.351. Because the T-stat is considered insignificant, and the confidence interval includes 0, this regression is statistically insignificant at a 95% confidence interval.

## 
## Call:
## lm(formula = mainData$`unemployment 2016` ~ mainData$`Education Spending per Student`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7777 -0.7232  0.1352  0.6505  2.1595 
## 
## Coefficients:
##                                            Estimate Std. Error t value
## (Intercept)                               4.248e+00  6.598e-01   6.439
## mainData$`Education Spending per Student` 4.544e-05  7.364e-05   0.617
##                                           Pr(>|t|)    
## (Intercept)                               5.33e-08 ***
## mainData$`Education Spending per Student`     0.54    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.004 on 48 degrees of freedom
## Multiple R-squared:  0.00787,    Adjusted R-squared:  -0.0128 
## F-statistic: 0.3807 on 1 and 48 DF,  p-value: 0.5401

Figure 6 plots Public Education Spending per Student against the Unemployment rate in each state. The red line is the linear regression of the plot. The blue line marks the mean unemployment rate accross all 50 states.

In figure 6, we want to plot these two variables against each other to see how unemployment changes when we increase the public education spending; public education spending being our independent variable that we can theoretically control to see the impact on unemployment, the dependent variable. The linear regression is plotted in red. We can see that when we plot unemployment against public education spending, the unemployment rate actually increased. This contradicts our earlier prediction that higher public education spending would be associated with lower unemployment.

The R-squared for the regression is only .7% which suggests that the relationship is extremely weak. Moreover, the correlation between the variables is 0.08, indicating a very weak positive correlation. There is also very little deviation of the unemployment rate from the mean. Recall from before that the variance of unemployment in the country in 2016 was .99, the standard deviation being .997. Here you can even see how little the linear regression line deviates from the mean, plotted in blue. From these findings, we can see that unemployment does not really change much across the country, and that public education financing may not be a good predictor of future unemployment.

The confidence intervals for this regression are -0.0001031663 <-> 0.0001940400. The T-stat is 0.617. Because the T-stat is considered not significant and the confidence intervals contain 0, we can say that this regression is not statistically significant at a 95% confidence interval.

## 
## Call:
## lm(formula = mainData$MedianIncome ~ mainData$`Education Spending per Student`)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -15600  -4657  -1092   2854  15406 
## 
## Coefficients:
##                                            Estimate Std. Error t value
## (Intercept)                               4.102e+04  5.315e+03   7.719
## mainData$`Education Spending per Student` 2.102e+00  5.932e-01   3.544
##                                           Pr(>|t|)    
## (Intercept)                               5.88e-10 ***
## mainData$`Education Spending per Student` 0.000891 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8087 on 48 degrees of freedom
## Multiple R-squared:  0.2074, Adjusted R-squared:  0.1909 
## F-statistic: 12.56 on 1 and 48 DF,  p-value: 0.0008907

In figure 7, we want to plot these two variables against each other to see how median income changes when we increase the public education spending; public education spending being our independent variable that we can theoretically control to see the impact on median income, the dependent variable. The linear regression is plotted in red. We can see that when we plot median income against public education spending, the median income dramatically increased. This finding is in line with our earlier prediction that higher public education spending would be associated with higher median income.

The R-squared for the regression is 20%, our highest R-squared yet within this analysis. This suggests that there is a much stronger relationship between public education spending and median income. Moreover, the correlation between the variables is .455, indicating a much stronger positive correlation compared to the relationships with other variables used within this analysis. From these findings, we can see that median income varies widely across the country, and that public education financing may be a good predictor of future median income. Although there are other issues at play here. One way to look at these results is to say that more public education financing contributes to greater median income later on. Another possibility though is that the states that already had a higher median income are able to fund public schools more, thus they do so. The question that remains unanswered here is what influences what?

The T-stat is above 2.5 meaning it is considered significant. The confidence intervals are 0.9100399 <-> 3.2941254. Because of the T-stat and because the confidence intervals DO NOT contain 0, we can say that this regression is statistically significant at a 95% confidence interval.

One way we could try to answer this question is to look at how median income changes over time on a state by state basis. If we were to color the plots of those states whose median income changed the most a different color, we could see if more public school funding had an effect or not on those states.

## 
## Call:
## lm(formula = incChngData$MedianIncome[which(incChngData$top == 
##     1)] ~ incChngData$`Education Spending per Student`[which(incChngData$top == 
##     1)])
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7090  -3873  -1682    109  10365 
## 
## Coefficients:
##                                                                            Estimate
## (Intercept)                                                               44697.128
## incChngData$`Education Spending per Student`[which(incChngData$top == 1)]     2.017
##                                                                           Std. Error
## (Intercept)                                                                 8509.146
## incChngData$`Education Spending per Student`[which(incChngData$top == 1)]      1.007
##                                                                           t value
## (Intercept)                                                                 5.253
## incChngData$`Education Spending per Student`[which(incChngData$top == 1)]   2.002
##                                                                           Pr(>|t|)
## (Intercept)                                                               0.000271
## incChngData$`Education Spending per Student`[which(incChngData$top == 1)] 0.070567
##                                                                              
## (Intercept)                                                               ***
## incChngData$`Education Spending per Student`[which(incChngData$top == 1)] .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6114 on 11 degrees of freedom
## Multiple R-squared:  0.267,  Adjusted R-squared:  0.2004 
## F-statistic: 4.008 on 1 and 11 DF,  p-value: 0.07057

Figure 8 is nearly identical to Figure 7. The only difference is that now states that have experienced the most change in median income (upper quartile) from 2009 to 2016 are highlighted in red. The black line represents the linear regression for all state data. The red line represents a linear regression for the states whose median income growth is in the top quartile. The blue line marks the mean for Public Education Spending. The green line marks the median Public Education Spending. The yellow line marks the Mean of the Median Incomes for all 50 states.

Looking at Figure 8, we see that many of the the states where Median Income has risen the most actually spend much less than the average on Public Education per Student; they do spend around the median. Although these states that have experienced such growth in Median Income, their median Income in 2016 is around the average median income for the country.

When regressing Public Education Spending per Student on Median Incomes for the upper quartile of changes in median incomes (red plot and line), we see that the positive relation between Public Education Spending per Student and Median Income still holds. The R-squared for this particular regression is 26.7%, even higher than the regression of all the median incomes. The correlation coefficient is .517, our strongest positive correlation yet. This implies that there is a strong positive relationship between Median Incomes and Public School Funding amounts the top quartile of changes in median income from 2009. Our initial assumptions, that a more educated workforce will be more competitive and earn a higher median income seems to hold, stronger with the top 25% of changes in median income states than the data at large.

The t-value is 2.002 which is under 2.5 and above -2.5 and the confidence interval is -0.1595877 <-> 4.1931867. Because of the t-stat and because the confidence interval contains 0, this regression is statistically insignificant at a 95% confidence level. This may be occurring because the number of states being analyzed in this particular analysis is extremely low, only the top 25% of states, that is to say n = 13 which may be skewing the T-stat and confidence intervals.

Estimation:

The estimation method being used to estimate the dependent variables of GDP Growth, Unemployment, and Median Income using Public Education Spending per Student is the ordinary least squares method (OLS). In order to use OLS, some key assumptions must be made in regards to how Y (GDP Growth, Unemployment, and Median Income) relate to X (Public Education Spending per Student) and how they are collected.

Assumption 1) Given the general linear equation : Yi = β0 + β1Xi + ui, i = 1,…, n The conditional distribution of u given X has mean zero; so that β^1 can be considered unbiased. This is best accomplished if X is randomly assigned so that the components that make up the error term “u” are also distributed independent of X.

The source of the independent X data is the U.S. Census Bureau. On their website, the Census Bureau states…

*"The survey covers **all** public school systems that provide elementary or secondary education. The data include revenue by source (local property tax, monies from other school systems, private tuition and transportation payments, school lunch charges, direct state aid, and federal aid passed through the state government), expenditure by function and object (instruction, support service functions, salaries, and capital outlay), indebtedness, and cash and investments.*

Since according to the Census Bureau, “all” public school systems across the country were surveyed, then it stands to reason that ther would be no error term (ui = 0); thus The conditional distribution of u given X does has mean zero and β^1 = β1. The first assumption has been forfilled.

Assumption 2) (Xi,Yi), i = 1,…,n are i.i.d.

The main way non-i.i.d. occurs is when data is recorded over time as time-series data. Time-series data is not considered i.i.d.

For this analysis, all Independent X data was collected in 2005 and is recorded as such. The time periods for the X data do not fluctuate, so it is not time-series data and does not violate assumption 2.

For this analsis, all Dependent Y data was collected in 2016 and is recorded as such. The time periods for the Y data; be it GDP growth rate, Unemployment, or median income; do not fluctuate. As such, it is not time-series data cand does not violate assumption 2. The data used within this analysis should be considered i.i.d.

Assumption 3) Large Outlieres are rare

This assumption is best checked by viewing Figures 1 throgh 4; the frequency distributions of each variable used within the analyis. We can see very clearly that theare are no extreme outliers. As such, assumption 3 has been forfilled.

Conclusions:

Overall, we can conclude that using past spending on public education is not a good way to predict future GDP growth nor is it a good way to predict future unemployment. The linear regressions created are not good fits and are not statistically significant. The regression with median income was far more statistically significant in comparison. There seems to be some sort of positive relation between public education spending and median income. However, even with our results, the correlation is still too low for public education to be the only reliable predictor of future median income. To improve the model, other predictors would have to be added to the model alongside public education in order to accurately predict median income.

Appendix:

##education spending data
library(readxl)

##source: https://www.census.gov/data/tables/2005/econ/school-finances/secondary-education-finance.html

eduSpndPrStd <- read_excel("C:/Users/Arafat/Desktop/ECO4804/Econometrics4/elsec05_sttables.xls", 
    sheet = "8", skip = 10)
##adding in titles
eduTitles <- c("States"
," "               
,"Total"
,"Salaries and wages"
,"Employee benefits"
,"Total"
,"Salaries and wages"
,"Employee benefits"
,"Total"
,"Pupil support"
,"Staff support"
,"General administration"
,"School administration"
)
colnames(eduSpndPrStd) <- eduTitles
##cleaning data
eduSpndPrStd <- eduSpndPrStd[-which(is.na(eduSpndPrStd$States)),]
eduSpndPrStd <- eduSpndPrStd[-c(51:53),]
eduSpndPrStd$States <- gsub("\\..*"," ",eduSpndPrStd$States)
eduSpndPrStd$States <- gsub("\\s","",eduSpndPrStd$States)

##economic data
##source: bea.gov
##bureau of economic analysis; US department of commerce
gdpGrwSt <- read_excel("C:/Users/Arafat/Desktop/ECO4804/Econometrics4/qgdpstate0118.xlsx", sheet = "Table 1", skip = 3)
colnames(gdpGrwSt) <- c("States", "2016"
,"2016Q1"
,"2016Q2"
,"2016Q3"
,"2016Q4"
,"2017Q1"
,"2017Q2"
,"2017Q3"
)
##cleaning
gdpGrwSt <- gdpGrwSt[,-10]
gdpGrwSt <- gdpGrwSt[-1,]
gdpGrwSt$States <- gsub("\\W"," ",gdpGrwSt$States)
gdpGrwSt$States <- gsub("\\s\\W","",gdpGrwSt$States)
gdpGrwSt <- gdpGrwSt[-c(60,61),]
gdpGrwSt$States <- gsub("\\s","",gdpGrwSt$States)
##unemployment data
##data extracted from pdf from bls, then converted to csv, formatted in excel, then reloaded into r using following code
##in excel, used text to column to separate 2016 unemployment rate from 2017 unemployment rate and deleted all other data (not needed for this analysis)
##unemployment rates were manually checked against original pdf to ensure accuracy
##library(tabulizer)
##unempTbl <- extract_tables("./unemploymentTable2.pdf")
##unempTbl2 <- data.frame(unempTbl)
##write.csv(unempTbl2, "unempTbl2.csv")
unempdata <- read.csv("./unempTbl2.csv")
unempdata <- unempdata[-c(1,5,6)]
colnames(unempdata) <- c("States","unemployment 2016", "unemployment 2017")
unempdata <- unempdata[-c(1,2),]
unempdata$States <- gsub("\\W"," ",unempdata$State)
unempdata$States <- gsub("\\s\\W","",unempdata$State)
unempdata$States <- gsub("\\s","",unempdata$States)


##median income
##source: https://www.kff.org/other/state-indicator/median-annual-income/?currentTimeframe=0&sortModel=%7B%22colId%22:%22Location%22,%22sort%22:%22asc%22%7D

medianIncome <- read.csv("./raw_data.csv")
medianIncome$States <- rownames(medianIncome)
colnames(medianIncome)[1] <- "MedianIncome"
medianIncome$States <- gsub("\\s","",medianIncome$States)

##main data for analysis
mainData <- eduSpndPrStd[,c(1,3)]
mainData <- merge(mainData,gdpGrwSt, by = "States")
mainData <- merge(mainData,unempdata, by = "States")
mainData <- merge(mainData,medianIncome, by = "States")
colnames(mainData)[2] <- "Education Spending per Student"
mainData$MedianIncome <- as.numeric(substring(mainData$MedianIncome,2))

##median income 2009
##source: Census Bureau
##chart comparing median inc from 2009 vs 2016
medianIncome2009 <- read.csv("C:/Users/Arafat/Desktop/ECO4804/Econometrics4/ACS_09_5YR_S1903_with_ann.csv", skip = 1)
medianInc09 <- data.frame(medianIncome2009$Geography, medianIncome2009$Median.income..dollars...Estimate..HOUSEHOLD.INCOME.BY.RACE.AND.HISPANIC.OR.LATINO.ORIGIN.OF.HOUSEHOLDER...Households )
medianChart <- medianIncome
colnames(medianInc09) <- c("States", "medianIncome2009")
medianInc09$States <- gsub("\\s","",medianInc09$States)
medianChart <- merge(medianChart, medianInc09, by = "States")
medianChart$MedianIncome <- as.numeric(substring(medianChart$MedianIncome,2))
medianChart$percentChange <- ((medianChart$MedianIncome - medianChart$medianIncome2009) / medianChart$MedianIncome) *100

##PLOT 1
hist(mainData$`2016`, col = "darkblue", xlab = "Real GDP Growth (2016)", ylab = "Number of States", main = "Figure 2: Frequency Distribution of Real GDP Growth in 2016")
abline(v = mean(mainData$`2016`), col = "red", lwd = 2)
##freq tab
gdpBins <- seq(-6,5, by = 1)
gdpRanges <- paste(head(gdpBins, -1) , gdpBins[-1], sep = " <-> ")
gdpFreq <- hist(mainData$`2016`, breaks = gdpBins, include.lowest = TRUE, plot = FALSE)
data.frame(Range = gdpRanges, Frequency = gdpFreq$counts)


##plot2
hist(mainData$`2016`, col = "darkblue", xlab = "Real GDP Growth (2016)", ylab = "Number of States", main = "Figure 2: Frequency Distribution of Real GDP Growth in 2016")
abline(v = mean(mainData$`2016`), col = "red", lwd = 2)
##freq tab
gdpBins <- seq(-6,5, by = 1)
gdpRanges <- paste(head(gdpBins, -1) , gdpBins[-1], sep = " <-> ")
gdpFreq <- hist(mainData$`2016`, breaks = gdpBins, include.lowest = TRUE, plot = FALSE)
data.frame(Range = gdpRanges, Frequency = gdpFreq$counts)

##PLOT3
hist(mainData$`unemployment 2016`, col = "gold2", main = "Figure 3: Distribution of Unemployment Rates by State (2016)" , xlab = "Unemployment Rate(%)", ylab = "Number of States")

abline(v = mean(mainData$`unemployment 2016`), col = "red", lwd = 2)

##freq table

unempBins <- seq(2,7, by = 1)
unempRanges <- paste(head(unempBins, -1) , unempBins[-1], sep = " <-> ")
unempFreq <- hist(mainData$`unemployment 2016`, breaks = unempBins, include.lowest = TRUE, plot = FALSE)
data.frame(Range = unempRanges, Frequency = unempFreq$counts)

##plot4
hist(mainData$MedianIncome, col = "darkorchid4", main = "Figure 4: Distribution of Median Incomes", xlab = "Median Incomes (in 2016 Dollars)", ylab = "Number of States")
abline(v = mean(mainData$MedianIncome), col = "red", lwd = 2)

medBins <- seq(40000,80000, by = 10000)
medRanges <- paste(head(medBins, -1) , medBins[-1], sep = " - ")
medFreq <- hist(mainData$MedianIncome, breaks = medBins, include.lowest = TRUE, plot = FALSE)
data.frame(Range = medRanges, Frequency = medFreq$counts)

##plot 5

plot(mainData$`Education Spending per Student`,mainData$`2016`, pch = 19, col = "darkblue", ylab = "GDP Growth Rate (%)", xlab = "Public Education Spending per Student", main = "Figure 5: GDP Growth is Going Down as Spending on Education Increases")
abline(lm(mainData$`2016`~ mainData$`Education Spending per Student`), col = "darkred", lwd = 2)
summary(lm(mainData$`2016`~ mainData$`Education Spending per Student`))

##plot 6

plot(mainData$`Education Spending per Student`,mainData$`unemployment 2016`, pch = 19, col = "gold2", ylab = "Unemployment Rate (%)", xlab = "Public Education Spending per Student", main = "Figure 6: Unemployment is Going up as Public Education Spending Increases")
abline(lm(mainData$`unemployment 2016`~ mainData$`Education Spending per Student`), col = "darkred", lwd = 2)
abline(h = mean(mainData$`unemployment 2016`), col = "darkblue", lwd = 2)
summary(lm(mainData$`unemployment 2016` ~ mainData$`Education Spending per Student`))

##plot 7


plot(mainData$`Education Spending per Student`,mainData$MedianIncome , pch = 19, col = "darkorchid4", ylab = "Median Income", xlab = "Public Education Spending per Student", main = "Figure 7: Median Income Dramatically Rises with Public Education Funding")
abline(lm(mainData$MedianIncome ~ mainData$`Education Spending per Student`), col = "darkred", lwd = 2)
summary(lm(mainData$MedianIncome ~ mainData$`Education Spending per Student`))


##plot 8

incChngData <- data.frame(mainData$States,mainData$`Education Spending per Student`,mainData$MedianIncome)
colnames(incChngData) <- c("States", "Education Spending per Student", "Median Income 2016")
incChngData <- merge(incChngData, medianChart, by = "States")
incChngData$top <- factor(c(0,1))
a <- as.numeric(summary(incChngData$percentChange)[5])
for(i in 1:50){
 if(incChngData$percentChange[i] > a){
 incChngData$top[i] <- 1
 }
  else{
    incChngData$top[i] <- 0
  }
 }

plot(incChngData$`Education Spending per Student`,incChngData$MedianIncome , pch = 19, col = incChngData$top, ylab = "Median Income (2016)", xlab = "Public Education Spending per Student (2005)", main = "Figure 8: States with Highest Percent Change in Income in Red")
abline(lm(incChngData$MedianIncome ~ incChngData$`Education Spending per Student`), col = "black", lwd = 2)
abline(lm(incChngData$MedianIncome[which(incChngData$top == 1)] ~  incChngData$`Education Spending per Student`[which(incChngData$top == 1)]), col = "red", lwd = 2)

abline(v = mean(incChngData$`Education Spending per Student`), col = "blue", lwd = 2)
abline(v = median(incChngData$`Education Spending per Student`), col = "darkgreen", lwd = 2)
abline(h = mean(incChngData$`Median Income 2016`), col = "gold", lwd = 2)

summary(lm(incChngData$MedianIncome[which(incChngData$top == 1)] ~  incChngData$`Education Spending per Student`[which(incChngData$top == 1)]))

randome statistics calculated:

sum(mainData$`Education Spending per Student`> mean(mainData$`Education Spending per Student`))

## [1] 22

##var
var(mainData$`Education Spending per Student`)

## [1] 3793619

var(mainData$`2016`)

## [1] 3.74271

var(mainData$`unemployment 2016`)

## [1] 0.9951878

var(mainData$MedianIncome)

## [1] 80835327

##mean
mean(mainData$`Education Spending per Student`)

## [1] 8750.294

mean(mainData$`2016`)

## [1] 0.988

mean(mainData$`unemployment 2016`)

## [1] 4.646

mean(mainData$MedianIncome)

## [1] 59418.16

##cov, cor
cov(mainData$`Education Spending per Student`,mainData$`2016`)

## [1] -721.3891

cor(mainData$`Education Spending per Student`,mainData$`2016`)

## [1] -0.1914475

cov(mainData$`Education Spending per Student`,mainData$`unemployment 2016` )

## [1] 172.3701

cor(mainData$`Education Spending per Student`,mainData$`unemployment 2016` )

## [1] 0.08871202

cov(mainData$`Education Spending per Student`,mainData$MedianIncome )

## [1] 7974501

cor(mainData$`Education Spending per Student`,mainData$MedianIncome )

## [1] 0.4553822

cor(y = incChngData$MedianIncome[which(incChngData$top == 1)] , x = incChngData$`Education Spending per Student`[which(incChngData$top == 1)])

## [1] 0.5167679

##Confidence intervals

coef1=summary(lm(mainData$`unemployment 2016` ~ mainData$`Education Spending per Student`))$coefficients[2,1] 
err1=summary(lm(mainData$`unemployment 2016` ~ mainData$`Education Spending per Student`))$coefficients[2,2] 

coef1 + c(-1,1)*err1*qt(0.975, 49)

## [1] -0.0001025400  0.0001934136

coef2=summary(lm(mainData$`2016` ~ mainData$`Education Spending per Student` ))$coefficients[2,1] 
err2=summary(lm(mainData$`2016` ~ mainData$`Education Spending per Student` ))$coefficients[2,2] 

coef2 + c(-1,1)*err2*qt(0.975, 49)

## [1] -4.729338e-04  9.261675e-05

coef3=summary(lm(mainData$MedianIncome ~ mainData$`Education Spending per Student` ))$coefficients[2,1] 
err3=summary(lm(mainData$MedianIncome ~ mainData$`Education Spending per Student` ))$coefficients[2,2] 

coef3 + c(-1,1)*err3*qt(0.975, 49)

## [1] 0.9100399 3.2941254

coef4=summary(lm(incChngData$MedianIncome[which(incChngData$top == 1)] ~  incChngData$`Education Spending per Student`[which(incChngData$top == 1)]))$coefficients[2,1] 
err4=summary(lm(incChngData$MedianIncome[which(incChngData$top == 1)] ~  incChngData$`Education Spending per Student`[which(incChngData$top == 1)]))$coefficients[2,2] 

coef4 + c(-1,1)*err4*qt(0.975, 13)

## [1] -0.1595877  4.1931867