讀入套件

pacman::p_load(mlmRev,HSAUR3,knitr,kableExtra,readr,dplyr,ggplot2,tidyr,car,magrittr,tibble,purrr,stringr)

Q1

A subset of data from the??National Longitudinal Survey of Youth??is presented here. Each student has two scores: math and reading. Create a variable “test_var” to store the labels: ‘math’ and ‘read’ and a variable “test_score” to store their corresponding values and expand the data set to a long format.

讀 csv 檔

q1dta1 <- read.csv("C:/for_English_path/wrengling0326/nlsy86long.csv")
head(q1dta1)
##     id    sex     race time grade year month      math      read
## 1 2390 Female Majority    1     0    6    67 14.285714 19.047619
## 2 2560 Female Majority    1     0    6    66 20.238095 21.428571
## 3 3740 Female Majority    1     0    6    67 17.857143 21.428571
## 4 4020   Male Majority    1     0    5    60  7.142857  7.142857
## 5 6350   Male Majority    1     1    7    78 29.761905 30.952381
## 6 7030   Male Majority    1     0    5    62 14.285714 17.857143

建立新變項,資料轉成 long format,並按照id排序

q1dta2<-gather(q1dta1,key=test_var,value=test_score,math,read)%>%arrange(id)
head(q1dta2)
##     id  sex     race time grade year month test_var test_score
## 1 1003 Male Minority    1     0    5    60     math   11.90476
## 2 1003 Male Minority    2     2    8    91     math   33.33333
## 3 1003 Male Minority    3     3   10   116     math   27.38095
## 4 1003 Male Minority    4     5   12   138     math   39.28571
## 5 1003 Male Minority    1     0    5    60     read   10.71429
## 6 1003 Male Minority    2     2    8    91     read   36.90476

Q2

The data set??Vocab{car}??gives observations on gender, education and vocabulary, from respondents to U.S. General Social Surveys, 1972-2004. Summarize the relationship between education and vocabulary over the years by gender.

建立相關係數list

q2p1<- Vocab %>% select(year,sex,education,vocabulary)%>%
  split(list(.$year))%>% 
  purrr::map(~coef(lm(vocabulary~education,data = .)))

擷取相關係數

try2 <- unlist(q2p1)
try3 <- as.numeric(try2[(1:16)*2])

擷取年份

q2y <- ls(q2p1)

合併相關係數和年份

q2m2 <- cbind(q2y,try3)

繪圖

qplot(q2m2[,1],q2m2[,2])

Q3

Convert the data set??probe words??from long to wide format as described.

透過帳密下載檔案

source("C:/for_English_path/passwd.txt")
q3fl <- paste0("http://",IDPW,"140.116.183.121/~sheu/dataM/Data/probeL.txt")
q3dta1 <- read.table(q3fl,header = T)
head(q3dta1)
##    ID Response_Time Position
## 1 S01            51        1
## 2 S01            36        2
## 3 S01            50        3
## 4 S01            35        4
## 5 S01            42        5
## 6 S02            27        1

將 long format 改成 wide format

q3dta2 <-spread(q3dta1,Position,Response_Time)
head(q3dta2)
##    ID  1  2  3  4  5
## 1 S01 51 36 50 35 42
## 2 S02 27 20 26 17 27
## 3 S03 37 22 41 37 30
## 4 S04 42 36 32 34 27
## 5 S05 27 18 33 14 29
## 6 S06 43 32 43 35 40

Q4

Reverse the order of input to the series of??dplyr::*_join??examples using data from the Nobel laureates in literature and explain the resulting output.?? list by countries???? ????list by winners

透過帳密下載檔案

q4fl1 <- paste0("http://",IDPW,"140.116.183.121/~sheu/dataM/Rdw/data/nobel_countries.txt")
q4fl2 <- paste0("http://",IDPW,"140.116.183.121/~sheu/dataM/Rdw/data/nobel_winners.txt")
dta_c <- read.table(q4fl1,header = T)
dta_w <- read.table(q4fl2,header = T)

我的資料名稱與例子相同

將例子裡的 x 與 y 位置調換 inner-join

inner_join(dta_c,dta_w)
## Joining, by = "Year"
##   Country Year              Name Gender
## 1  France 2014   Patrick Modiano   Male
## 2      UK 1950 Bertrand  Russell   Male
## 3      UK 2017    Kazuo Ishiguro   Male
## 4      US 2016        Bob  Dylan   Male
## 5  Canada 2013      Alice  Munro Female
## 6   China 2012            Mo Yan   Male

變項前後順序改變了。在指令中放在前面的資料,其變項會排在 output 中靠前位置

semi-join

semi_join(dta_c,dta_w)
## Joining, by = "Year"
##   Country Year
## 1  France 2014
## 2      UK 1950
## 3      UK 2017
## 4      US 2016
## 5  Canada 2013
## 6   China 2012

只剩下 dta_c 的變項,因 semi-join 指令的輸出只會顯示放在x位置資料的變項。

left-join

left_join(dta_c,dta_w)
## Joining, by = "Year"
##   Country Year              Name Gender
## 1  France 2014   Patrick Modiano   Male
## 2      UK 1950 Bertrand  Russell   Male
## 3      UK 2017    Kazuo Ishiguro   Male
## 4      US 2016        Bob  Dylan   Male
## 5  Canada 2013      Alice  Munro Female
## 6   China 2012            Mo Yan   Male
## 7  Russia 2015              <NA>   <NA>
## 8  Sweden 2011              <NA>   <NA>

比原本還要多一列,因為 left-join 的列數是取決於 x 位置資料

anti-join

anti_join(dta_c,dta_w)
## Joining, by = "Year"
##   Country Year
## 1  Russia 2015
## 2  Sweden 2011

anti-join 輸出的是 x 有值而 y 沒有值的 x 值,也就是上面 left-join 輸出中有 NA 項的兩列

full-join

full_join(dta_c,dta_w)
## Joining, by = "Year"
##   Country Year              Name Gender
## 1  France 2014   Patrick Modiano   Male
## 2      UK 1950 Bertrand  Russell   Male
## 3      UK 2017    Kazuo Ishiguro   Male
## 4      US 2016        Bob  Dylan   Male
## 5  Canada 2013      Alice  Munro Female
## 6   China 2012            Mo Yan   Male
## 7  Russia 2015              <NA>   <NA>
## 8  Sweden 2011              <NA>   <NA>
## 9    <NA> 1938        Pearl Buck Female

底部有 NA 項的列順序有改變。full-join 雖會輸出所有 x 和 y 的列,但 NA 的位置還是先依據 x 資料決定(排在最底),然後才是 y 資料的 NA 值。

The End