Augment the data object in the ‘SAT’ lecture note with state.division{datasets}. For each of the 9 divisions, find the slope estimate for regressing average SAT scores onto average teacher’s salary. How many of them are of negative signs?
dta <- read.table('http://www.amstat.org/publications/jse/datasets/sat.dat.txt', row.names=1)
head(dta)
#> V2 V3 V4 V5 V6 V7 V8
#> Alabama 4.405 17.2 31.144 8 491 538 1029
#> Alaska 8.963 17.6 47.951 47 445 489 934
#> Arizona 4.778 19.3 32.175 27 448 496 944
#> Arkansas 4.459 17.1 28.934 6 482 523 1005
#> California 4.992 24.0 41.078 45 417 485 902
#> Colorado 5.443 18.4 34.571 29 462 518 980names(dta) <- c("Spending", "PTR", "Salary", "PE", "Verbal", "Math", "SAT")
str(dta)
#> 'data.frame': 50 obs. of 7 variables:
#> $ Spending: num 4.41 8.96 4.78 4.46 4.99 ...
#> $ PTR : num 17.2 17.6 19.3 17.1 24 18.4 14.4 16.6 19.1 16.3 ...
#> $ Salary : num 31.1 48 32.2 28.9 41.1 ...
#> $ PE : int 8 47 27 6 45 29 81 68 48 65 ...
#> $ Verbal : int 491 445 448 482 417 462 431 429 420 406 ...
#> $ Math : int 538 489 496 523 485 518 477 468 469 448 ...
#> $ SAT : int 1029 934 944 1005 902 980 908 897 889 854 ...dta$Division <- state.division
str(dta)
#> 'data.frame': 50 obs. of 8 variables:
#> $ Spending: num 4.41 8.96 4.78 4.46 4.99 ...
#> $ PTR : num 17.2 17.6 19.3 17.1 24 18.4 14.4 16.6 19.1 16.3 ...
#> $ Salary : num 31.1 48 32.2 28.9 41.1 ...
#> $ PE : int 8 47 27 6 45 29 81 68 48 65 ...
#> $ Verbal : int 491 445 448 482 417 462 431 429 420 406 ...
#> $ Math : int 538 489 496 523 485 518 477 468 469 448 ...
#> $ SAT : int 1029 934 944 1005 902 980 908 897 889 854 ...
#> $ Division: Factor w/ 9 levels "New England",..: 4 9 8 5 9 8 1 3 3 3 ...library(lattice)
xyplot(SAT ~ Salary, groups=Division, data=dta, type=c("g","p","r"), auto.key=list(columns=3))slope <- sapply(split(dta, dta$Division), function(x) coef(lm(x$SAT ~ x$Salary))[2])
as.data.frame(slope)
#> slope
#> New England.x$Salary -0.2067041
#> Middle Atlantic.x$Salary 3.9113380
#> South Atlantic.x$Salary 3.7559011
#> East South Central.x$Salary -2.6215161
#> West South Central.x$Salary -28.1929942
#> East North Central.x$Salary 18.4312440
#> West North Central.x$Salary -1.3174294
#> Mountain.x$Salary -13.7999482
#> Pacific.x$Salary 0.3558940根據回歸線斜率計算結果,我們可以發現,9 個 division 皆不同,以 West South Central、Mountain、West North Central 地區的教師薪水與學生 SAT 成績呈現負相關,程度由大到小排列;而 East North Cental 為主要正相關之地區,其次為 Middle and South Atlantic; New England 和 Pacific 二個地區幾乎沒有相關性。