Potthoff and Roy (1964) reported data on a study in 16 boys and 11 girls, who at ages 8, 10, 12, and 14 had the distance (mm) from the center of the pituitary gland to the pteryomaxillary fissure measured. Changes in pituitary-pteryomaxillary distances during growth is important in orthodontic therapy. We consider data from girls only here.
id sex d8 d10 d12 d14
1 1 F 21.0 20.0 21.5 23.0
2 2 F 21.0 21.5 24.0 25.5
3 3 F 20.5 24.0 24.5 26.0
4 4 F 23.5 24.5 25.0 26.5
5 5 F 21.5 23.0 22.5 23.5
6 6 F 20.0 21.0 21.0 22.5
7 7 F 21.5 22.5 23.0 25.0
8 8 F 23.0 23.0 23.5 24.0
9 9 F 20.0 21.0 22.0 21.5
10 10 F 16.5 19.0 19.0 19.5
11 11 F 24.5 25.0 28.0 28.0
dir.create為創造一個資料夾名字叫tep_data在設定的工作目錄getwd()
lapply(list, func),針對potthoffroy column 3 to 6 套入funtion中的語法,將資料分為4個csv。
function code means:subset potthoffroy, only use female data, divided to four csv, for example (column1, column3), (column1, column4)…
paste0 give files the output path and each csv name title with f_ (“./tmp_data/f_”), in order to create file name from 1 to 4, use i-2(because i from 3 to 6)
Through the previous code, We already created 4 separate files in a local folder called tmp_data.
[1] "f_1.csv" "f_2.csv" "f_3.csv" "f_4.csv"
list.files讀取tmp_data中所有以“f_”開頭的檔案
The content of the first one looks like this.
id d8
1 1 21.0
2 2 21.0
3 3 20.5
4 4 23.5
5 5 21.5
6 6 20.0
7 7 21.5
8 8 23.0
9 9 20.0
10 10 16.5
11 11 24.5
Now collect the file names.
[1] "f_1.csv" "f_2.csv" "f_3.csv" "f_4.csv"
Remember to give files the full path to their location.
[1] "./tmp_data/f_1.csv" "./tmp_data/f_2.csv" "./tmp_data/f_3.csv"
[4] "./tmp_data/f_4.csv"
We can merge two files by id.
id d8 d10
1 1 21.0 20.0
2 2 21.0 21.5
3 3 20.5 24.0
4 4 23.5 24.5
5 5 21.5 23.0
6 6 20.0 21.0
7 7 21.5 22.5
8 8 23.0 23.0
9 9 20.0 21.0
10 10 16.5 19.0
11 11 24.5 25.0
The function Reduce allows us to ‘loop’ through the list of files with our own version of merge called mrg2.
id d8 d10 d12 d14
1 1 21.0 20.0 21.5 23.0
2 2 21.0 21.5 24.0 25.5
3 3 20.5 24.0 24.5 26.0
4 4 23.5 24.5 25.0 26.5
5 5 21.5 23.0 22.5 23.5
6 6 20.0 21.0 21.0 22.5
7 7 21.5 22.5 23.0 25.0
8 8 23.0 23.0 23.5 24.0
9 9 20.0 21.0 22.0 21.5
10 10 16.5 19.0 19.0 19.5
11 11 24.5 25.0 28.0 28.0
Instead of ‘merge’, ‘inner_join’ is used; instead of ‘Reduce’, ‘reduce’.
id d8 d10 d12 d14
1 1 21.0 20.0 21.5 23.0
2 2 21.0 21.5 24.0 25.5
3 3 20.5 24.0 24.5 26.0
4 4 23.5 24.5 25.0 26.5
5 5 21.5 23.0 22.5 23.5
6 6 20.0 21.0 21.0 22.5
7 7 21.5 22.5 23.0 25.0
8 8 23.0 23.0 23.5 24.0
9 9 20.0 21.0 22.0 21.5
10 10 16.5 19.0 19.0 19.5
11 11 24.5 25.0 28.0 28.0
We can ‘bind’ the input in the vertical direction to construct an output in the long format in contrast to the wide format of the original data.
sapply用法語lapply相似,針對ff這個list files透過dim(x)可以知道每個list結構的row and column,dim(x)[1]為row總數, dim(x)[2]為column總數
針對ff這個list files透過names(x)[2]擷取每一個list裡第二個column name, 並使用parse_number擷取column name裡的數字(for example d8→8, d10→10)
augment data with a new column variable ‘year’
# approch1: we know the dimensions of the initial data
dtaL <- cbind(Reduce(rbind, ff),
year=rep(c(8,10,12,14), c(11,11,11,11))) %>% as.data.frame()
# approach2: binding data-we do not know the dimensions of the initial data
dtaL2 <- cbind(Reduce(rbind, ff), year=rep(p, n)) %>% as.data.frame()
# rename the second column
names(dtaL)[2] <- "pp_distance"
names(dtaL2)[2] <- "pp_distance"
Use Reduce to ‘loop’ through the list of files(ff) with rbind fuction.
Approch1:construct year variable, 給定值分別為11個8,10,12,14
Approch2:use p, n to replace the number we gave in previous code. p=c(8,10,12,14), n=c(11,11,11,11)
id pp_distance year
1 1 21.0 8
2 2 21.0 8
3 3 20.5 8
4 4 23.5 8
5 5 21.5 8
6 6 20.0 8
7 7 21.5 8
8 8 23.0 8
9 9 20.0 8
10 10 16.5 8
11 11 24.5 8
12 1 20.0 10
13 2 21.5 10
Repeat the above using the male data only.
Repeat the above using both data from males and females