Data wrangling: Homework 5

Augment the data object in the ‘SAT’ lecture note with state.division{datasets}. For each of the 9 divisions, find the slope estimate for regressing average SAT scores onto average teacher’s salary. How many of them are of negative signs?

dta <- read.table('http://www.amstat.org/publications/jse/datasets/sat.dat.txt', row.names=1) head(dta) #> V2 V3 V4 V5 V6 V7 V8 #> Alabama 4.405 17.2 31.144 8 491 538 1029 #> Alaska 8.963 17.6 47.951 47 445 489 934 #> Arizona 4.778 19.3 32.175 27 448 496 944 #> Arkansas 4.459 17.1 28.934 6 482 523 1005 #> California 4.992 24.0 41.078 45 417 485 902 #> Colorado 5.443 18.4 34.571 29 462 518 980

names(dta) <- c("Spending", "PTR", "Salary", "PE", "Verbal", "Math", "SAT") str(dta) #> 'data.frame': 50 obs. of 7 variables: #> $ Spending: num 4.41 8.96 4.78 4.46 4.99 ... #> $ PTR : num 17.2 17.6 19.3 17.1 24 18.4 14.4 16.6 19.1 16.3 ... #> $ Salary : num 31.1 48 32.2 28.9 41.1 ... #> $ PE : int 8 47 27 6 45 29 81 68 48 65 ... #> $ Verbal : int 491 445 448 482 417 462 431 429 420 406 ... #> $ Math : int 538 489 496 523 485 518 477 468 469 448 ... #> $ SAT : int 1029 934 944 1005 902 980 908 897 889 854 ...

dta$Division <- state.division str(dta) #> 'data.frame': 50 obs. of 8 variables: #> $ Spending: num 4.41 8.96 4.78 4.46 4.99 ... #> $ PTR : num 17.2 17.6 19.3 17.1 24 18.4 14.4 16.6 19.1 16.3 ... #> $ Salary : num 31.1 48 32.2 28.9 41.1 ... #> $ PE : int 8 47 27 6 45 29 81 68 48 65 ... #> $ Verbal : int 491 445 448 482 417 462 431 429 420 406 ... #> $ Math : int 538 489 496 523 485 518 477 468 469 448 ... #> $ SAT : int 1029 934 944 1005 902 980 908 897 889 854 ... #> $ Division: Factor w/ 9 levels "New England",..: 4 9 8 5 9 8 1 3 3 3 ...

尋找 slope

slope <- sapply(split(dta, dta$Division),  function(x) coef(lm(x$SAT ~ x$Salary))[2])
as.data.frame(slope)
#>                                   slope
#> New England.x$Salary         -0.2067041
#> Middle Atlantic.x$Salary      3.9113380
#> South Atlantic.x$Salary       3.7559011
#> East South Central.x$Salary  -2.6215161
#> West South Central.x$Salary -28.1929942
#> East North Central.x$Salary  18.4312440
#> West North Central.x$Salary  -1.3174294
#> Mountain.x$Salary           -13.7999482
#> Pacific.x$Salary              0.3558940

根據回歸線斜率計算結果，我們可以發現，9 個 division 皆不同，以 West South Central、Mountain、West North Central 地區的教師薪水與學生 SAT 成績呈現負相關，程度由大到小排列；而 East North Cental 為主要正相關之地區，其次為 Middle and South Atlantic； New England 和 Pacific 二個地區幾乎沒有相關性。

Data wrangling: Homework 5

2020-Spring [Data Management] Instructor: SHEU, Ching-Fan

CHIU, Ming-Tzu

2020-04-12

讀取資料

命名變量

增加欄位

畫圖看看資料長什麼樣子

尋找 slope