Double-checking to see if data is correct as before
load("~/Desktop/Fall 19/POGO 611/Assignments/R stuff/pogo_611.RData")
print(states[, c("state", "csat", "NEast")])
## state csat NEast
## 1 Alabama 991 0
## 2 Alaska 920 0
## 3 Arizona 932 0
## 4 Arkansas 1005 0
## 5 California 897 0
## 6 Colorado 959 0
## 7 Connecticut 897 1
## 8 Delaware 892 0
## 9 District of Columbia 840 NA
## 10 Florida 882 0
## 11 Georgia 844 0
## 12 Hawaii 883 0
## 13 Idaho 968 0
## 14 Illinois 1006 0
## 15 Indiana 865 0
## 16 Iowa 1093 0
## 17 Kansas 1039 0
## 18 Kentucky 993 0
## 19 Louisiana 994 0
## 20 Maine 879 1
## 21 Maryland 904 0
## 22 Massachusetts 896 1
## 23 Michigan 980 0
## 24 Minnesota 1023 0
## 25 Mississippi 997 0
## 26 Missouri 1002 0
## 27 Montana 982 0
## 28 Nebraska 1024 0
## 29 Nevada 919 0
## 30 New Hampshire 921 1
## 31 New Jersey 886 1
## 32 New Mexico 996 0
## 33 New York 881 1
## 34 North Carolina 844 0
## 35 North Dakota 1073 0
## 36 Ohio 946 0
## 37 Oklahoma 997 0
## 38 Oregon 922 0
## 39 Pennsylvania 876 1
## 40 Rhode Island 880 1
## 41 South Carolina 832 0
## 42 South Dakota 1047 0
## 43 Tennessee 1015 0
## 44 Texas 874 0
## 45 Utah 1031 0
## 46 Vermont 890 1
## 47 Virginia 890 0
## 48 Washington 913 0
## 49 West Virginia 926 0
## 50 Wisconsin 1023 0
## 51 Wyoming 980 0
Running an Ancova model and making it into an object called “X”
x <- lm(csat ~ NEast + percent, data = states)
anova(x)
## Analysis of Variance Table
##
## Response: csat
## Df Sum Sq Mean Sq F value Pr(>F)
## NEast 1 35191 35191 43.189 3.651e-08 ***
## percent 1 139474 139474 171.172 < 2.2e-16 ***
## Residuals 47 38296 815
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(x)
##
## Call:
## lm(formula = csat ~ NEast + percent, data = states)
##
## Residuals:
## Min 1Q Median 3Q Max
## -60.267 -19.724 -2.515 19.526 73.217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1033.7485 7.2703 142.188 < 2e-16 ***
## NEast 57.5244 14.2833 4.027 0.000204 ***
## percent -2.7930 0.2135 -13.083 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.54 on 47 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.8202, Adjusted R-squared: 0.8125
## F-statistic: 107.2 on 2 and 47 DF, p-value: < 2.2e-16
From the Ancova model it can be determined that there is a statsitically significant relationship in the intercept terms between states in the NE and states that are not located in the NE. The intercept estimate is 1033.75, which means that on average, controlling for NE and Percent, the average sat score (csat) is 1033.75. If one is located in the Northeast, the csat will, on average, be 57.52 points higher than those who are not located in the NE - with a csat score of 1091.27.
Further, for every one unit increase in the percentage of high schoolers taking the SAT, csat will decline by 2.79 points.