Data wrangling: Homework 2

2020-Spring [Data Management] Instructor: SHEU, Ching-Fan

CHIU, Ming-Tzu

2020-04-12

Find 133 class-level 95%-confidence intervals for language test score means of the nlschools{MASS} data set by using the tidy approach.

library(tidyverse)
#> -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --
#> √ ggplot2 3.3.0     √ purrr   0.3.3
#> √ tibble  2.1.3     √ dplyr   0.8.5
#> √ tidyr   1.0.2     √ stringr 1.4.0
#> √ readr   1.3.1     √ forcats 0.5.0
#> -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
library(tibble)
library(dplyr)

讀取資料

library(MASS)
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
dta <- nlschools
str(dta)
#> 'data.frame':    2287 obs. of  6 variables:
#>  $ lang : int  46 45 33 46 20 30 30 57 36 36 ...
#>  $ IQ   : num  15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
#>  $ class: Factor w/ 133 levels "180","280","1082",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ GS   : int  29 29 29 29 29 29 29 29 29 29 ...
#>  $ SES  : int  23 10 15 23 10 10 23 10 13 15 ...
#>  $ COMB : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

找出不同班級語言科成績的 95% 信賴區間

result <- dta %>% group_by(class) %>%
  dplyr::summarize(language_mean = mean(lang, na.rm=T), 
                   language_se = sd(lang, na.rm=T)/sqrt(n()),
                   language_lb = language_mean-1.96*language_se, 
                   language_ub = language_mean+1.96*language_se) %>%
  mutate(classID = 1:length(levels(dta$class)))
head(result)
#> # A tibble: 6 x 6
#>   class language_mean language_se language_lb language_ub classID
#>   <fct>         <dbl>       <dbl>       <dbl>       <dbl>   <int>
#> 1 180            36.4        1.75        33.0        39.8       1
#> 2 280            23.7        2.39        19.0        28.4       2
#> 3 1082           30.4        4.55        21.5        39.3       3
#> 4 1280           30.9        1.80        27.3        34.4       4
#> 5 1580           30.9        4.70        21.7        40.1       5
#> 6 1680           41.5        2.23        37.1        45.9       6
result %>%
   dplyr::select(classID, language_mean, language_lb, language_ub) %>%
   tail(.,3)
#> # A tibble: 3 x 4
#>   classID language_mean language_lb language_ub
#>     <int>         <dbl>       <dbl>       <dbl>
#> 1     131          38.1        34.7        41.4
#> 2     132          29.3        21.1        37.5
#> 3     133          28.4        23.3        33.6