Data wrangling: Homework 1

2020-Spring [Data Management] Instructor: SHEU, Ching-Fan

CHIU, Ming-Tzu

2020-04-11

Select at random one school per county in the data set Caschool{Ecdat} and draw a scatter diagram of average math score mathscr against average reading score readscr for the sampled data set. Make sure your results are reproducible (e.g., the same random sample will be drawn each time).

讀取資料

# install.packages("Ecdat")
library(Ecdat)
#> Loading required package: Ecfun
#> 
#> Attaching package: 'Ecfun'
#> The following object is masked from 'package:base':
#> 
#>     sign
#> 
#> Attaching package: 'Ecdat'
#> The following object is masked from 'package:datasets':
#> 
#>     Orange
str(Caschool)
#> 'data.frame':    420 obs. of  17 variables:
#>  $ distcod : int  75119 61499 61549 61457 61523 62042 68536 63834 62331 67306 ...
#>  $ county  : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 25 ...
#>  $ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 94 ...
#>  $ grspan  : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 1 ...
#>  $ enrltot : int  195 240 1550 243 1335 137 195 888 379 2247 ...
#>  $ teachers: num  10.9 11.1 82.9 14 71.5 ...
#>  $ calwpct : num  0.51 15.42 55.03 36.48 33.11 ...
#>  $ mealpct : num  2.04 47.92 76.32 77.05 78.43 ...
#>  $ computer: int  67 101 169 85 171 25 28 66 35 0 ...
#>  $ testscr : num  691 661 644 648 641 ...
#>  $ compstu : num  0.344 0.421 0.109 0.35 0.128 ...
#>  $ expnstu : num  6385 5099 5502 7102 5236 ...
#>  $ str     : num  17.9 21.5 18.7 17.4 18.7 ...
#>  $ avginc  : num  22.69 9.82 8.98 8.98 9.08 ...
#>  $ elpct   : num  0 4.58 30 0 13.86 ...
#>  $ readscr : num  692 660 636 652 642 ...
#>  $ mathscr : num  690 662 651 644 640 ...

隨機在每州挑選一間學校

dta <- Caschool

使用分層隨機挑選

# install.packages("sampling")
library(sampling)
set.seed(3301)
sample <- strata(dta, stratanames = c("county"), size=rep(1,length(levels(dta$county))), method="srswor")
sample[,2]
#>  [1]   1   2 139 223  44  62 162 273  67 203  59  31 244 298  60 200  47 254 397
#> [20] 170 311 277 177 374 153 147 215 104 213 196 121 133 260 345 269 186 188 394
#> [39] 368 233 398 354 252 367 419
dtar <- dta[c(1, 94, 216, 204, 211, 58, 162, 15, 55, 203, 59, 362, 247, 404, 41,
              87, 399, 113, 303, 243,  73, 358, 411, 374, 175, 147, 215, 104,
              213, 372, 150, 133, 174, 250, 280, 308, 355, 395, 257, 233, 387,
              248, 252, 367, 420),
            c(2, 3, 16, 17)]
head(dtar)
#>          county                   district readscr mathscr
#> 1       Alameda         Sunol Glen Unified   691.6   690.0
#> 94        Butte    Bangor Union Elementary   646.1   629.8
#> 216      Fresno          Alvina Elementary   650.0   660.1
#> 204 San Joaquin    Lammersville Elementary   654.1   653.3
#> 211        Kern Kernville Union Elementary   655.1   653.5
#> 58   Sacramento           Robla Elementary   637.5   627.0

畫圖

library(lattice)
xyplot(readscr ~ mathscr, 
       groups = county, 
       data = dtar, 
       type=c("p", "g"), 
       auto.key=list(columns=4))