class: center, middle, inverse, title-slide # Data management: Exploring data ## social isolation and cognitive function in gender differences ### Jenny Hsu ### 2022-01-03 --- ## Overview #### introduction of dataset #### data management - category variable - order (likert) - nominal - continuous variable #### conclusion --- ### 退休者健康調查資料庫 - 調查時間與樣本數 - Wave1: 2015-2016 (n=3,141; column=923) - Wave2: 2018-2019 (n=2,448; column=740) - 對象 - Wave1時50-74歲公勞保退休者 - 變項 - social isolation - demographic variables - cognitive function (SLUMS scale) --- ### Social isolation - Loneliness - Three questions - 在過去一星期裡,您是不是曾有下面的情形或感覺?覺得很寂寞(孤單、沒伴) - 在過去一星期裡,您是不是曾有下面的情形或感覺?覺得身邊的人不友善 - 在過去一星期裡,您是不是曾有下面的情形或感覺?覺得身邊的人不喜歡您(嘸甲意你) - Likert scale - 0.沒有 ; 1.很少(1天內); 2.有時候會(2-3天); 3.經常(4-7天); 999.無法分析(空白、不知道) - Data management Issue - 三題回答0的人很多 - 使用**{Likert}**整理資料(畫圖) - 使用**{ggplot2}**畫圖 --- ### Loneliness ```r knitr::kable(head(presub,4)) ``` |ID | gender| pre_lonely| pre_unfriendly| pre_hate| |:--------------|------:|----------:|--------------:|--------:| |10002030003201 | 1| 0| 0| 0| |10002030003202 | 2| 3| 0| 0| |10002030003203 | 2| 0| 0| 0| |10002030003204 | 1| 0| 0| 0| ```r knitr::kable(head(postsub,4)) #沒有gender ``` |ID | post_lonely| post_unfriendly| post_hate| |:--------------|-----------:|---------------:|---------:| |10007020005001 | 1| 1| 1| |10007020005002 | 0| 0| 1| |10007020005003 | 0| 0| 0| |10007020005004 | 0| 0| 0| --- ### Alluvial plot: Original .pull-left[ ```r # 原始圖 genlon <- data.frame(with(presub, xtabs(~ gender + pre_lonely + pre_unfriendly + pre_hate))) Ori<-ggplot(genlon, aes(axis1=pre_hate, axis2=pre_unfriendly, axis3=pre_lonely, y=Freq)) + scale_x_discrete(limits=c("pre_hate", "pre_unfriendly", "pre_lonely"), expand=c(.1, .05)) + labs(x='', y='Count') + geom_alluvium(aes(fill=gender)) + geom_stratum() + geom_text(stat = "stratum", aes(label = after_stat(stratum))) + scale_fill_manual(values=c('skyblue','hotpink'))+ theme_minimal() + ggtitle("pre_subjective isolation, stratify by gender") ``` ] .pull-right[ <!-- --> ] --- ### Alluvial plot : Modified .pull-left[ ```r # 排除三題為0者 presub$inanalysis<-apply(presub[,3:5], 1, sum) subex0<-presub |> dplyr::filter(inanalysis!=0) ``` ```r genlonex0 <- data.frame(with(subex0, xtabs(~ gender + pre_lonely + pre_unfriendly + pre_hate))) mod<-ggplot(genlonex0, aes(axis1=pre_hate, axis2=pre_unfriendly, axis3=pre_lonely, y=Freq)) + scale_x_discrete(limits=c("pre_hate", "pre_unfriendly", "pre_lonely"), expand=c(.1, .05)) + labs(x='', y='Count') + geom_alluvium(aes(fill=gender)) + geom_stratum() + geom_text(stat = "stratum", aes(label = after_stat(stratum))) + scale_fill_manual(values=c('skyblue','hotpink'))+ theme_minimal() + ggtitle("pre_subjective isolation(exclude all 0), stratify by gender") ``` ] .pull-right[ <!-- --> ] --- ### {likert} approach ```r for(i in 2:4) { postsub[,i] <- factor(postsub[,i], levels=0:3, labels=c("never", "seldom", "somtimes", "usually"))} # give it gender variable presex<-presub |> select(ID ,gender) postsub<-merge(presex,postsub, by="ID") # convert to likert data type post.lkrt <- likert(postsub[,3:5], grouping=postsub[,2]) postdta<-as.data.frame(post.lkrt$results) postdta$Group <-fct_collapse(postdta$Group , "Male" = "1","Female" = "2") knitr::kable(postdta) ``` |Group |Item | never| seldom| somtimes| usually| |:------|:---------------|--------:|--------:|--------:|---------:| |Male |post_lonely | 80.38777| 14.98881| 3.579418| 1.0439970| |Male |post_unfriendly | 86.42804| 12.08054| 1.118568| 0.3728561| |Male |post_hate | 89.63460| 8.57569| 1.565996| 0.2237136| |Female |post_lonely | 73.41772| 20.43400| 4.882459| 1.2658228| |Female |post_unfriendly | 82.82098| 14.46655| 1.989150| 0.7233273| |Female |post_hate | 86.43761| 11.48282| 1.717902| 0.3616637| --- ### bar plot .pull-left[ ```r plot(pre.lkrt, ordered=FALSE) ``` <!-- --> ] .pull-right[ ```r plot(post.lkrt, ordered=FALSE) ``` <!-- --> ] --- ### density plot .pull-left[ ```r # density plot plot(pre.lkrt, type="density") ``` <!-- --> ] .pull-right[ ```r # density plot plot(post.lkrt, type="density") ``` <!-- --> ] --- ### using {ggplot2} do it again ```r knitr::kable(head(predta,4)) ``` |Group |Item | never| seldom| somtimes| usually| |:------|:--------------|--------:|--------:|--------:|---------:| |Male |pre_lonely | 79.95444| 14.92027| 4.328018| 0.7972665| |Male |pre_unfriendly | 87.98405| 10.02278| 1.822323| 0.1708428| |Male |pre_hate | 87.12251| 9.97151| 2.051282| 0.8547009| |Female |pre_lonely | 78.82524| 13.99565| 6.236403| 0.9427121| ```r knitr::kable(head(predtaL,4)) ``` |Group |Item |FreQ | Percent| Differ| |:-----|:------|:--------|----------:|----------:| |Male |lonely |never | 79.9544419| -1.1292062| |Male |lonely |seldom | 14.9202733| -0.9246243| |Male |lonely |somtimes | 4.3280182| 1.9083850| |Male |lonely |usually | 0.7972665| 0.1454456| --- ### Coding issue .pull-left[ ```r subgender <- split(predtaL, predtaL$Group) male <-as.data.frame(subgender[["1"]]) male <- unite(male, fulabel,c("Item","FreQ")) |> select(fulabel, Percent) female <-as.data.frame(subgender[["2"]]) female <- unite(female, fulabel,c("Item","FreQ")) |> select(fulabel, Percent) predtaL2<-merge(female, male, by="fulabel") names(predtaL2)<-c("label", "F.per", "M.per") predtaL2 <-predtaL2|> mutate(Differ=M.per-F.per) predtaL2$Item <- rep(c("hate", "lonely", "unfriendly"), each=4) predtaL2$FreQ <- rep(c("somtimes","never", "seldom", "usually"), times=3) predtaL2<-predtaL2[,-1] predtaL2v<-predtaL2 |> pivot_longer(cols=-c(Item,FreQ,Differ), names_to= "Gender", values_to= "Percent") predtaL2v$Gender = substr(predtaL2v$Gender, start = 1, stop = 1) ``` ] .pull-right[ ```r # trans to long form predtaL <-predta|> pivot_longer(cols=-c(Group, Item),names_to= "FreQ", values_to= "Percent") |> group_by(Item, FreQ) |> mutate(Differ=diff(Percent)) predtaL$Item <-gsub("pre_","",predtaL$Item) ``` ] --- ### {ggplot2} approach .pull-left[ ```r a<-ggplot(predtaL, aes(y=Percent, x=Group, group=Item)) + geom_bar(stat='identity', aes(fill=FreQ)) + ylim(c(-5,105)) + coord_flip() + scale_fill_manual(values = brewer.pal(4, "BrBG"), breaks=levels(predtaL$FreQ), labels=levels(predtaL$FreQ))+ geom_text(aes(label =paste(round(Percent), '%', sep='')), size=4, position = position_stack(vjust = .5))+ facet_wrap(~Item, ncol=1)+ ylab('') + xlab('') b<-ggplot(postdtaL, aes(y=Percent, x=Group, group=Item)) + geom_bar(stat='identity', aes(fill=FreQ)) + ylim(c(-5,105)) + coord_flip() + scale_fill_manual(values = brewer.pal(4, "BrBG"), breaks=levels(postdtaL$FreQ), labels=levels(postdtaL$FreQ))+ geom_text(aes(label =paste(round(Percent), '%', sep='')), size=4, position = position_stack(vjust = .5))+ facet_wrap(~Item, ncol=1)+ ylab('') + xlab('') # grid.arrange(a,b) ``` ] .pull-right[ <!-- --> ] --- ### dot plot difference .pull-left[ ```r c<-ggplot(predtaL, aes(x=Percent , y=reorder(Item,abs(Differ)), shape=Group)) + geom_point(aes(color=Group), size=3) + geom_line(aes(group=Item), color="black", lty=1, size=1)+ scale_shape_manual(name="Gender", values = c(16,16))+ scale_color_manual(name="Gender", values =c("red","blue"))+ scale_x_continuous(limits = c(0, 100), breaks =seq(0, 100, 25))+ facet_grid(FreQ~. , space = 'free_x', scales = 'free_x', switch = 'x') + theme(panel.spacing.y = unit(2,"line")) + labs(x="Percent", y="Item", title = "")+ theme_minimal()+ ylab('') + xlab('') d<-ggplot(postdtaL, aes(x=Percent , y=reorder(Item,abs(Differ)), shape=Group)) + geom_point(aes(color=Group), size=3) + geom_line(aes(group=Item),color="black", lty=1, size=1)+ scale_shape_manual(name="Gender", values = c(16,16))+ scale_color_manual(name="Gender", values =c("red","blue"))+ facet_grid(FreQ~. , space = 'free_x', scales = 'free_x', switch = 'x') + theme(panel.spacing.y = unit(2,"line")) + labs(x="Percent", y="Item", title = "")+ theme_minimal()+ ylab('') #grid.arrange(c,d) ``` ] .pull-right[ <!-- --> ] --- ### Social isolation - network size - Two questions - 您退休後經常在一起的人,共多少人?(不包括同住家人) - 您每個月有聯絡(聊天互動,包括電話或網路)的非同住親戚與朋友有多少? - scale - 0.無; 1.約1~5人; 2.約6~10人; 3.超過11人; 999無法分析(空白、不知道) - Graphics - bar plot - dot plot - dot plot(segment) --- ### network size .pull-left[ ```r network <- presub |> select(ID,a1,w2,w4) |> as.data.frame() network<-lapply(network, as.factor) |> as.data.frame() names(network)<-c("ID", "Gender", "Accompany", "Contact") knitr::kable(head(network)) ``` |ID |Gender |Accompany |Contact | |:--------------|:------|:---------|:-------| |10002030003201 |1 |2 |1 | |10002030003202 |2 |2 |0 | |10002030003203 |2 |3 |0 | |10002030003204 |1 |3 |0 | |10002030003205 |2 |1 |0 | |10002030003206 |2 |2 |0 | ] .pull-right[ ```r networkL <-network |> pivot_longer(cols = -c("ID","Gender"), names_to = "Net", values_to = "Size") knitr::kable(head(networkL)) ``` |ID |Gender |Net |Size | |:--------------|:------|:---------|:----| |10002030003201 |1 |Accompany |2 | |10002030003201 |1 |Contact |1 | |10002030003202 |2 |Accompany |2 | |10002030003202 |2 |Contact |0 | |10002030003203 |2 |Accompany |3 | |10002030003203 |2 |Contact |0 | ] --- ### bar plot .pull-left[ ```r net.bar<-ggplot(plotnetL, aes(x = factor(Gender), y = pct, fill = factor(Size))) + geom_bar(stat = "identity", position = "fill") + scale_y_continuous(breaks = seq(0, 1, .2)) + geom_text(aes(label = lbl), size = 4, position = position_stack(vjust = 0.5)) + scale_fill_brewer(palette = "Set2") + facet_wrap(~Net)+ labs(y = "Percentage", fill = "Social Network Size", x = "Gender", title = "Social Network Size by Gender") + theme_minimal() ``` - size 的差異以顏色面積顯示,比較不易 - 耗費墨水的問題 ] .pull-right[ <!-- --> ] --- ### dot plot .pull-left[ ```r net.dot2<-ggplot(plotnetL, aes(pct, NetSize)) + geom_line(aes(group = NetSize),size = 2,color="grey") + geom_point(aes(color = Gender),size = 5)+ labs(y = "Network size", x="Percentage")+ theme_minimal() ``` - 用線段表示差異大小 - 不同size level分開用點來呈現 - 略顯零散 - 因差異太小,x軸只呈現0-0.4略顯失真? ] .pull-right[ <!-- --> ] --- ### dot plot (segment by gender) .pull-left[ ```r net.dot<-ggplot(plotnetL,aes(x=pct, y=GNetSize, color = Gender))+ geom_segment(aes(xend=0, yend=GNetSize),color="grey", size=2)+ geom_point(size = 5) + geom_text(aes(label = pct),size = 5, color="black", nudge_x = 0.1) + scale_x_continuous(limits = c(0, 1), breaks =seq(0, 1, .2))+ scale_color_hue(direction = -1)+ ggtitle("") + labs(x = "Percent", y = "")+ theme_minimal()+ theme(text = element_text(family = "Songti SC"))+ facet_grid(Net~. , space = 'free_y', scales = 'free_y', switch = 'y') + theme(panel.spacing.y = unit(2,"line")) + theme(strip.placement = 'outside', strip.background.y = element_blank())+ guides(color= guide_legend(title="Gender",reverse = F)) ``` - facet_grid by network, segment gender - y axis using two labels ] .pull-right[ <!-- --> ] --- # Demographic (Pretest) - Four questions variables | response ----------------------|------------- 您的性別 |1.男; 2.女 您是民國哪一年出生?|民國年 您的最高學歷 |0.不識字; 1.未受正規教育但識字;2.國小畢業; 3.初中/國中畢業; |4.高中職畢業; 5.專科畢業; 6.大學畢業; 7.研究所及以上; 8.其他 您是哪類型退休? |1.公保(包含所有軍公教); 2.勞保(職場或工會勞保); 3.國營企業退休; 4.私校保險; 5.其他 - Data management - 民國年改為年齡(連續) - case_when 分組 - tbl_summary --- ### Demographic (Pretest) .pull-left[ ```r # variables info<-presub |> dplyr::select(c(ID,a1,a2a,a6,a8)) |> as.data.frame() names(info) <-c("ID","gender","ageY","edu","RType") # age info[info$ageY=="0",]<-NA info$age <- 104-info$ageY # education and insurance type info<-info |> mutate( cEdu= case_when(edu %in% c(0:3)~ "Junior", edu %in% c(4:7)~ "Senior"), cRType= case_when(RType == 1 ~ "Public", RType == 2 ~ "Labor"), Gender= case_when(gender == 1 ~ "Male", gender == 2 ~ "Female")) cinfo <-info |> dplyr::select(-c(RType,ageY,gender)) tbl_info<-cinfo[,-c(1,2)] |> tbl_summary(by=Gender, missing="no",label = list(age ~ "Age",cEdu ~ "Education",cRType ~ "Insurance type"),statistic = list(all_continuous() ~ "{mean} ({sd})")) ``` ] .pull-right[
Characteristic
Female
, N = 1,380
1
Male
, N = 1,760
1
Age
63.9 (5.4)
65.0 (5.6)
Education
Junior
772 (56%)
707 (40%)
Senior
608 (44%)
1,050 (60%)
Insurance type
Labor
1,091 (80%)
1,220 (70%)
Public
275 (20%)
511 (30%)
1
Mean (SD); n (%)
] --- ### cognitive function (SLUMS) .pull-left[ <img src="SLUMS.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] .pull-right[ - **Data management issue** - replace value and sum cognitive score - **graphics** - Demographic variables - Between - Within ] --- ### cognitive function (SLUMS) .pull-left[ ```r cog <-presub |> dplyr::select(c(1,11:28)) cogpost <-postsub|> dplyr::select(c(1,5:22)) summary(cog) ``` - z7a不計分 - Z4b, Z8a, Z8b, Z10a, Z10b, Z10c, Z10d 答對為2分 - 其他答對皆為1分 ] .pull-right[ ``` tibble [3,141 x 19] (S3: tbl_df/tbl/data.frame) $ ID : chr [1:3141] "10002030003201" "10002030003202" "10002030003203" "10002030003204" ... $ z1 : num [1:3141] 1 0 1 1 1 1 1 1 1 1 ... $ z2 : num [1:3141] 1 0 1 1 1 1 0 1 0 1 ... $ z3 : num [1:3141] 1 1 1 1 1 1 1 1 1 1 ... $ z4a : num [1:3141] 1 0 1 1 1 1 0 1 1 1 ... $ z4b : num [1:3141] 1 0 1 1 1 1 0 1 1 1 ... $ z5 : num [1:3141] 1 2 1 2 1 1 NA 3 3 2 ... $ z6 : num [1:3141] 2 2 2 4 4 4 NA 4 5 5 ... $ z7a : num [1:3141] 0 NA 0 0 0 0 0 0 0 0 ... $ z7b : num [1:3141] 0 NA 0 1 0 1 1 1 1 1 ... $ z7c : num [1:3141] 0 NA 0 0 0 0 0 0 0 1 ... $ z8a : num [1:3141] 1 0 1 1 1 0 0 0 1 1 ... $ z8b : num [1:3141] 0 1 0 0 1 0 0 0 1 0 ... $ z9a : num [1:3141] 1 0 1 1 0 1 0 1 1 1 ... $ z9b : num [1:3141] 0 1 1 1 1 1 1 1 1 1 ... $ z10a: num [1:3141] 1 0 1 1 1 1 1 1 1 1 ... $ z10b: num [1:3141] 1 0 0 1 1 0 0 1 0 1 ... $ z10c: num [1:3141] 0 0 1 0 1 0 0 1 0 1 ... $ z10d: num [1:3141] 0 0 1 1 0 0 0 1 0 0 ... ``` ] --- ### management (wide form) .pull-left[ ```r # z7a不計分 cog<-cog[,-9] # replaces yes(1) to Score(2) cogsub2<-sapply(cog[,c("z4b","z8a","z8b","z10a","z10b","z10c","z10d")], function(x){ replace(x , x %in% 1, 2)}) |> as.data.frame() # sapply後的結果回到原始資料中 dtacog<-cog colnames <- colnames(dtacog) dtacog[ ,colnames %in% colnames(cogsub2)] <- cogsub2 dtacog$Pre_Score <- rowSums(dtacog[,2:18]) # 保留ID與總分 mercog<-dtacog[,c(1,19)] ``` ```r # merge Tdtacog <- left_join(mercogpost, mercog, by='ID') %>% left_join(., cinfo, by='ID') Tdtacog[,c(1,8,5,4,6,7,2,3)] |>head() |>knitr::kable() ``` ] .pull-right[ |Gender | age| edu|cEdu |cRType | Post_Score| Pre_Score| |:------|---:|---:|:------|:------|----------:|---------:| |Male | 66| 3|Junior |Labor | 22| 17| |Male | 65| 6|Senior |Public | 21| 25| |Male | 59| 2|Junior |Labor | 30| 24| |Male | 66| 4|Senior |Labor | 25| 29| |Male | 63| 2|Junior |Labor | 21| 20| |Male | 73| 6|Senior |Public | 28| NA| ] --- ### cognitive score and insurance type .pull-left[ ```r # calculate mean ci plottype <- PREdtacog |> group_by(cRType, Gender) |> filter(!is.na(cRType)) |> summarize(n = n(), mean = mean(Pre_Score, na.rm=T), sd = sd(Pre_Score, na.rm=T), se = sd / sqrt(n), ci = qt(0.975, df = n - 1) * sd / sqrt(n)) ``` ```r pd <- position_dodge(0.2) MStype<-ggplot(plottype, aes(x = cRType, y = mean, color = Gender)) + geom_point(size = 4, position = pd) + geom_line(position =pd, size = 1) + theme_minimal() + scale_color_brewer(palette="Set1") + geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = .1, position = pd)+ labs(y = "Cognitive Score (Mean)", x="Insurance type") ``` ] .pull-right[ <!-- --> ] --- ### cognitive score and education .pull-left[ ```r pd <- position_dodge(0.2) MSedu<-ggplot(plotedu, aes(x = edu, y = mean, color = Gender)) + geom_point(size = 4, position = pd) + geom_line(aes(group=Gender),position = pd,linetype = "dashed") + theme_minimal() + scale_color_brewer(palette="Set1") + geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = .1, position = pd)+ labs(y = "Cognitive Score (Mean)", x="Education level")+ theme(axis.text.x = element_text(angle = 45, hjust = 1)) ``` ] .pull-right[ <!-- --> ] --- ### management (long form) .pull-left[ ```r Tdtacog2<-Tdtacog |>mutate(Pre_age=age, Post_age=Pre_age+3) Tdtacog2<-Tdtacog2[,-5] TscoreL1 <-Tdtacog2|> pivot_longer(cols = c("Pre_Score","Post_Score"), names_to = "Time", values_to = "Score") TscoreL1 <-TscoreL1[c(1,5,8,9)] TscoreL1$Time <-fct_collapse(TscoreL1$Time, "Posttest" = "Post_Score","Pretset" = "Pre_Score") TscoreL2 <-Tdtacog2|> pivot_longer(cols = c("Pre_age","Post_age"), names_to = "Time", values_to = "Age") TscoreL2 <-TscoreL2[c(1,5,8,9)] TscoreL2$Time <-fct_collapse(TscoreL2$Time, "Posttest" = "Post_age","Pretset" = "Pre_age") TscoreL<-merge(TscoreL1,TscoreL2,by=c("ID","Time")) TscoreL|>head() |>knitr::kable() ``` ] .pull-right[ |ID |Time |Gender | Score|cEdu | Age| |:--------------|:--------|:------|-----:|:------|---:| |10002030003202 |Posttest |Female | NA|Junior | 77| |10002030003202 |Pretset |Female | NA|Junior | 74| |10002030003203 |Posttest |Female | 21|Junior | 58| |10002030003203 |Pretset |Female | 19|Junior | 55| |10002030003204 |Posttest |Male | 26|Junior | 63| |10002030003204 |Pretset |Male | 23|Junior | 60| ] --- ### cognitive score and gender .pull-left[ - jitter plot <!-- --> ] .pull-right[ - boxplot plot <!-- --> ] --- ### Score change stratify by gender .pull-left[ ```r set.seed(123) chosen <- Tdtacog2 |> group_by(Gender) %>% sample_n(200) chosen$diff <- chosen$Post_Score-chosen$Pre_Score chosen$Change <- ifelse(chosen$diff>0, "better", ifelse(chosen$diff==0, "same", ifelse(chosen$diff<0, "worse", NA))) chosen <- chosen |>filter(!is.na(Change)) # sample=400 with(chosen, plot(Pre_age,Pre_Score, bty="n", xlim = c(50,80), ylim=c(0,30),xlab = "Age (year)",ylab = "Score",xaxs = "r", yaxs ="r", col = "dark grey")) grid(col="thistle") with(chosenlist[["worse"]], arrows(Pre_age, Pre_Score, Post_age, Post_Score,length=0.1, col = "pink", lwd=1)) with(chosenlist[["better"]], arrows(Pre_age, Pre_Score, Post_age, Post_Score,length=0.1, col = "skyblue", lwd=1)) with(chosenlist[["same"]], arrows(Pre_age, Pre_Score, Post_age, Post_Score,length=0.1, col = "orange", lwd=1)) ``` ] .pull-right[ <!-- --> ] --- ### Base R approach .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ### using {ggplot}: facet_grid(Change ~ Gender) .pull-left[ ```r chosen.s<-chosen[,-c(4:6,8:10)] %>% melt(id.var= c("ID","Gender","Change")) chosen.s<-chosen.s |>mutate(Time=as.factor(gsub("_Score","",variable))) chosen.s$Time<-relevel(chosen.s$Time, "Pre") # pathplot<-ggplot(chosen.s, aes(x=Time, y=value, color=Change, group=ID)) + geom_jitter(alpha = 0.2, size = 1.5)+ geom_path(arrow=arrow(ends = "first", length=unit(0.1,"in")), show.legend=FALSE) + facet_grid(Change ~ Gender) + theme_tufte() + theme(strip.text.x=element_text(size=15)) + guides(color=guide_legend(reverse=F)) + scale_y_continuous(limits=c(0,30)) + labs(x="Time", y="Cognitive Score") ``` ] .pull-right[ <!-- --> ] --- ### conclusion -- - 資料整理策略? - X變項很多時(x1,x2,x3)資料整理的抉擇(Big dataframe(col=58) or Small dataframes(col=28)*2) - Big dataframe 在處理資料的時候object有時會被取代掉(apply..reshape..) - Small dataframes 在變項命名的時候時常混亂(還未找到好的命名原則) -- - package VS by self - 好用快速產生想要的結果,但續做或調整彈性較差。(感覺我一直再轉資料成合乎畫圖的格式) -- - 畫圖的選擇 - 類別*類別資料要免去data-ink ratio問題,除了dot plot, density plot? else? (line plot好像也不適合畫類別?) -- **心得** - 腦中要先有圖→對應資料結構→整理策略(先轉資料還是先新增變項?)→善用function(才不會鋒回入轉白作工) - 衷於原始資料分布才不會做出錯誤結論(認知分數=0 ? change?) - 上完資料管理課後,我想我會開始考慮畫圖這件事...