讀入套件
pacman::p_load(mlmRev,HSAUR3,knitr,kableExtra,readr,dplyr,ggplot2,tidyr,car,magrittr,tibble,purrr,stringr)
A subset of data from the??National Longitudinal Survey of Youth??is presented here. Each student has two scores: math and reading. Create a variable “test_var” to store the labels: ‘math’ and ‘read’ and a variable “test_score” to store their corresponding values and expand the data set to a long format.
讀 csv 檔
q1dta1 <- read.csv("C:/for_English_path/wrengling0326/nlsy86long.csv")
head(q1dta1)
## id sex race time grade year month math read
## 1 2390 Female Majority 1 0 6 67 14.285714 19.047619
## 2 2560 Female Majority 1 0 6 66 20.238095 21.428571
## 3 3740 Female Majority 1 0 6 67 17.857143 21.428571
## 4 4020 Male Majority 1 0 5 60 7.142857 7.142857
## 5 6350 Male Majority 1 1 7 78 29.761905 30.952381
## 6 7030 Male Majority 1 0 5 62 14.285714 17.857143
建立新變項,資料轉成 long format,並按照id排序
q1dta2<-gather(q1dta1,key=test_var,value=test_score,math,read)%>%arrange(id)
head(q1dta2)
## id sex race time grade year month test_var test_score
## 1 1003 Male Minority 1 0 5 60 math 11.90476
## 2 1003 Male Minority 2 2 8 91 math 33.33333
## 3 1003 Male Minority 3 3 10 116 math 27.38095
## 4 1003 Male Minority 4 5 12 138 math 39.28571
## 5 1003 Male Minority 1 0 5 60 read 10.71429
## 6 1003 Male Minority 2 2 8 91 read 36.90476
The data set??Vocab{car}??gives observations on gender, education and vocabulary, from respondents to U.S. General Social Surveys, 1972-2004. Summarize the relationship between education and vocabulary over the years by gender.
建立相關係數list
q2p1<- Vocab %>% select(year,sex,education,vocabulary)%>%
split(list(.$year))%>%
purrr::map(~coef(lm(vocabulary~education,data = .)))
擷取相關係數
try2 <- unlist(q2p1)
try3 <- as.numeric(try2[(1:16)*2])
擷取年份
q2y <- ls(q2p1)
合併相關係數和年份
q2m2 <- cbind(q2y,try3)
繪圖
qplot(q2m2[,1],q2m2[,2])
Convert the data set??probe words??from long to wide format as described.
透過帳密下載檔案
source("C:/for_English_path/passwd.txt")
q3fl <- paste0("http://",IDPW,"140.116.183.121/~sheu/dataM/Data/probeL.txt")
q3dta1 <- read.table(q3fl,header = T)
head(q3dta1)
## ID Response_Time Position
## 1 S01 51 1
## 2 S01 36 2
## 3 S01 50 3
## 4 S01 35 4
## 5 S01 42 5
## 6 S02 27 1
將 long format 改成 wide format
q3dta2 <-spread(q3dta1,Position,Response_Time)
head(q3dta2)
## ID 1 2 3 4 5
## 1 S01 51 36 50 35 42
## 2 S02 27 20 26 17 27
## 3 S03 37 22 41 37 30
## 4 S04 42 36 32 34 27
## 5 S05 27 18 33 14 29
## 6 S06 43 32 43 35 40
Reverse the order of input to the series of??dplyr::*_join??examples using data from the Nobel laureates in literature and explain the resulting output.?? list by countries???? ????list by winners
透過帳密下載檔案
q4fl1 <- paste0("http://",IDPW,"140.116.183.121/~sheu/dataM/Rdw/data/nobel_countries.txt")
q4fl2 <- paste0("http://",IDPW,"140.116.183.121/~sheu/dataM/Rdw/data/nobel_winners.txt")
dta_c <- read.table(q4fl1,header = T)
dta_w <- read.table(q4fl2,header = T)
我的資料名稱與例子相同
將例子裡的 x 與 y 位置調換 inner-join
inner_join(dta_c,dta_w)
## Joining, by = "Year"
## Country Year Name Gender
## 1 France 2014 Patrick Modiano Male
## 2 UK 1950 Bertrand Russell Male
## 3 UK 2017 Kazuo Ishiguro Male
## 4 US 2016 Bob Dylan Male
## 5 Canada 2013 Alice Munro Female
## 6 China 2012 Mo Yan Male
變項前後順序改變了。在指令中放在前面的資料,其變項會排在 output 中靠前位置
semi-join
semi_join(dta_c,dta_w)
## Joining, by = "Year"
## Country Year
## 1 France 2014
## 2 UK 1950
## 3 UK 2017
## 4 US 2016
## 5 Canada 2013
## 6 China 2012
只剩下 dta_c 的變項,因 semi-join 指令的輸出只會顯示放在x位置資料的變項。
left-join
left_join(dta_c,dta_w)
## Joining, by = "Year"
## Country Year Name Gender
## 1 France 2014 Patrick Modiano Male
## 2 UK 1950 Bertrand Russell Male
## 3 UK 2017 Kazuo Ishiguro Male
## 4 US 2016 Bob Dylan Male
## 5 Canada 2013 Alice Munro Female
## 6 China 2012 Mo Yan Male
## 7 Russia 2015 <NA> <NA>
## 8 Sweden 2011 <NA> <NA>
比原本還要多一列,因為 left-join 的列數是取決於 x 位置資料
anti-join
anti_join(dta_c,dta_w)
## Joining, by = "Year"
## Country Year
## 1 Russia 2015
## 2 Sweden 2011
anti-join 輸出的是 x 有值而 y 沒有值的 x 值,也就是上面 left-join 輸出中有 NA 項的兩列
full-join
full_join(dta_c,dta_w)
## Joining, by = "Year"
## Country Year Name Gender
## 1 France 2014 Patrick Modiano Male
## 2 UK 1950 Bertrand Russell Male
## 3 UK 2017 Kazuo Ishiguro Male
## 4 US 2016 Bob Dylan Male
## 5 Canada 2013 Alice Munro Female
## 6 China 2012 Mo Yan Male
## 7 Russia 2015 <NA> <NA>
## 8 Sweden 2011 <NA> <NA>
## 9 <NA> 1938 Pearl Buck Female
底部有 NA 項的列順序有改變。full-join 雖會輸出所有 x 和 y 的列,但 NA 的位置還是先依據 x 資料決定(排在最底),然後才是 y 資料的 NA 值。