Data management: Exploring data

class: center, middle, inverse, title-slide

# Data management: Exploring data
## social isolation and cognitive function in gender differences
### Jenny Hsu
### 2022-01-03

---

## Overview

#### introduction of dataset

#### data management

- category variable

- order (likert)

- nominal

- continuous variable

#### conclusion

---

### 退休者健康調查資料庫

- 調查時間與樣本數

- Wave1: 2015-2016 (n=3,141; column=923)
 - Wave2: 2018-2019 (n=2,448; column=740)

- 對象

- Wave1時50-74歲公勞保退休者

- 變項

- social isolation

- demographic variables

- cognitive function (SLUMS scale)

---

### Social isolation - Loneliness

- Three questions

- 在過去一星期裡，您是不是曾有下面的情形或感覺？覺得很寂寞（孤單、沒伴）

- 在過去一星期裡，您是不是曾有下面的情形或感覺？覺得身邊的人不友善

- 在過去一星期裡，您是不是曾有下面的情形或感覺？覺得身邊的人不喜歡您（嘸甲意你)

- Likert scale

- 0.沒有 ; 1.很少（1天內）; 2.有時候會（2-3天）; 3.經常（4-7天）; 999.無法分析(空白、不知道)
  
- Data management Issue

- 三題回答0的人很多
  
  - 使用**{Likert}**整理資料(畫圖)
  
  - 使用**{ggplot2}**畫圖

---
### Loneliness

```r
knitr::kable(head(presub,4))
```

|ID             | gender| pre_lonely| pre_unfriendly| pre_hate|
|:--------------|------:|----------:|--------------:|--------:|
|10002030003201 |      1|          0|              0|        0|
|10002030003202 |      2|          3|              0|        0|
|10002030003203 |      2|          0|              0|        0|
|10002030003204 |      1|          0|              0|        0|

```r
knitr::kable(head(postsub,4)) #沒有gender
```

|ID             | post_lonely| post_unfriendly| post_hate|
|:--------------|-----------:|---------------:|---------:|
|10007020005001 |           1|               1|         1|
|10007020005002 |           0|               0|         1|
|10007020005003 |           0|               0|         0|
|10007020005004 |           0|               0|         0|

---

### Alluvial plot: Original 
.pull-left[

```r
# 原始圖
genlon <- data.frame(with(presub, xtabs(~ gender + pre_lonely + pre_unfriendly + pre_hate)))

Ori<-ggplot(genlon, 
       aes(axis1=pre_hate, axis2=pre_unfriendly, axis3=pre_lonely, y=Freq)) +
  scale_x_discrete(limits=c("pre_hate", "pre_unfriendly", "pre_lonely"), expand=c(.1, .05)) +
  labs(x='',  y='Count') +
  geom_alluvium(aes(fill=gender)) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_fill_manual(values=c('skyblue','hotpink'))+
  theme_minimal() +
  ggtitle("pre_subjective isolation, stratify by gender")
```
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-4-1.png)
]

---

### Alluvial plot : Modified 
.pull-left[

```r
# 排除三題為0者
presub$inanalysis<-apply(presub[,3:5], 1, sum)
subex0<-presub |> dplyr::filter(inanalysis!=0)
```

```r
genlonex0 <- data.frame(with(subex0, xtabs(~ gender + pre_lonely + pre_unfriendly + pre_hate)))
mod<-ggplot(genlonex0, 
       aes(axis1=pre_hate, axis2=pre_unfriendly, axis3=pre_lonely, y=Freq)) +
  scale_x_discrete(limits=c("pre_hate", "pre_unfriendly", "pre_lonely"), expand=c(.1, .05)) +
  labs(x='', y='Count') +
  geom_alluvium(aes(fill=gender)) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_fill_manual(values=c('skyblue','hotpink'))+
  theme_minimal() +
  ggtitle("pre_subjective isolation(exclude all 0), stratify by gender")
```
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-7-1.png)
]

---

### {likert} approach

```r
for(i in 2:4) {
 postsub[,i] <- factor(postsub[,i], levels=0:3, labels=c("never", "seldom", "somtimes", "usually"))}
# give it gender variable 
presex<-presub |> select(ID ,gender)
postsub<-merge(presex,postsub, by="ID")
# convert to likert data type
post.lkrt <- likert(postsub[,3:5], grouping=postsub[,2])
postdta<-as.data.frame(post.lkrt$results)
postdta$Group <-fct_collapse(postdta$Group , "Male" = "1","Female" = "2") 
knitr::kable(postdta)
```

|Group  |Item            |    never|   seldom| somtimes|   usually|
|:------|:---------------|--------:|--------:|--------:|---------:|
|Male   |post_lonely     | 80.38777| 14.98881| 3.579418| 1.0439970|
|Male   |post_unfriendly | 86.42804| 12.08054| 1.118568| 0.3728561|
|Male   |post_hate       | 89.63460|  8.57569| 1.565996| 0.2237136|
|Female |post_lonely     | 73.41772| 20.43400| 4.882459| 1.2658228|
|Female |post_unfriendly | 82.82098| 14.46655| 1.989150| 0.7233273|
|Female |post_hate       | 86.43761| 11.48282| 1.717902| 0.3616637|

---
###  bar plot
.pull-left[

```r
plot(pre.lkrt, ordered=FALSE)
```

![](Project0103_files/figure-html/unnamed-chunk-8-1.png)

]

.pull-right[

```r
plot(post.lkrt, ordered=FALSE)
```

![](Project0103_files/figure-html/unnamed-chunk-9-1.png)
]

---

### density plot
.pull-left[

```r
# density plot
plot(pre.lkrt, type="density")
```

![](Project0103_files/figure-html/unnamed-chunk-10-1.png)

]

.pull-right[

```r
# density plot
plot(post.lkrt, type="density")
```

![](Project0103_files/figure-html/unnamed-chunk-11-1.png)

]

---

### using {ggplot2} do it again

```r
knitr::kable(head(predta,4))
```

|Group  |Item           |    never|   seldom| somtimes|   usually|
|:------|:--------------|--------:|--------:|--------:|---------:|
|Male   |pre_lonely     | 79.95444| 14.92027| 4.328018| 0.7972665|
|Male   |pre_unfriendly | 87.98405| 10.02278| 1.822323| 0.1708428|
|Male   |pre_hate       | 87.12251|  9.97151| 2.051282| 0.8547009|
|Female |pre_lonely     | 78.82524| 13.99565| 6.236403| 0.9427121|

```r
knitr::kable(head(predtaL,4))
```

|Group |Item   |FreQ     |    Percent|     Differ|
|:-----|:------|:--------|----------:|----------:|
|Male  |lonely |never    | 79.9544419| -1.1292062|
|Male  |lonely |seldom   | 14.9202733| -0.9246243|
|Male  |lonely |somtimes |  4.3280182|  1.9083850|
|Male  |lonely |usually  |  0.7972665|  0.1454456|

---

### Coding issue

.pull-left[

```r
subgender <- split(predtaL, predtaL$Group)

male <-as.data.frame(subgender[["1"]]) 
male <- unite(male, fulabel,c("Item","FreQ")) |> select(fulabel, Percent)
female <-as.data.frame(subgender[["2"]]) 
female <- unite(female, fulabel,c("Item","FreQ")) |> select(fulabel, Percent)

predtaL2<-merge(female, male, by="fulabel")
names(predtaL2)<-c("label", "F.per", "M.per")
predtaL2 <-predtaL2|> mutate(Differ=M.per-F.per)

predtaL2$Item <- rep(c("hate", "lonely", "unfriendly"), each=4)
predtaL2$FreQ <- rep(c("somtimes","never", "seldom", "usually"), times=3)
predtaL2<-predtaL2[,-1]

predtaL2v<-predtaL2 |> pivot_longer(cols=-c(Item,FreQ,Differ),
                     names_to= "Gender",
                     values_to= "Percent")
predtaL2v$Gender = substr(predtaL2v$Gender, start = 1, stop = 1)
```
]

.pull-right[

```r
# trans to long form
predtaL <-predta|> pivot_longer(cols=-c(Group, Item),names_to= "FreQ", values_to= "Percent") |> 
  group_by(Item, FreQ) |> mutate(Differ=diff(Percent))
predtaL$Item <-gsub("pre_","",predtaL$Item)
```

]
---

### {ggplot2} approach
.pull-left[

```r
a<-ggplot(predtaL, aes(y=Percent, x=Group, group=Item)) + 
  geom_bar(stat='identity', aes(fill=FreQ)) + 
  ylim(c(-5,105)) + 
  coord_flip() +
  scale_fill_manual(values = brewer.pal(4, "BrBG"), 
              breaks=levels(predtaL$FreQ), 
              labels=levels(predtaL$FreQ))+ 
  geom_text(aes(label =paste(round(Percent), '%', sep='')), size=4, position = position_stack(vjust = .5))+
  facet_wrap(~Item, ncol=1)+
  ylab('') + xlab('')
b<-ggplot(postdtaL, aes(y=Percent, x=Group, group=Item)) + 
  geom_bar(stat='identity', aes(fill=FreQ)) + 
  ylim(c(-5,105)) + 
  coord_flip() +
  scale_fill_manual(values = brewer.pal(4, "BrBG"), 
              breaks=levels(postdtaL$FreQ), 
              labels=levels(postdtaL$FreQ))+ 
  geom_text(aes(label =paste(round(Percent), '%', sep='')), size=4, position = position_stack(vjust = .5))+
  facet_wrap(~Item, ncol=1)+
  ylab('') + xlab('')
# grid.arrange(a,b)
```
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-17-1.png)
]
---

### dot plot difference

.pull-left[

```r
c<-ggplot(predtaL, aes(x=Percent , y=reorder(Item,abs(Differ)), shape=Group)) + 
  geom_point(aes(color=Group), size=3) +
  geom_line(aes(group=Item), color="black", lty=1, size=1)+
  scale_shape_manual(name="Gender", values = c(16,16))+
  scale_color_manual(name="Gender", values =c("red","blue"))+
  scale_x_continuous(limits = c(0, 100), breaks =seq(0, 100, 25))+
  facet_grid(FreQ~. , space = 'free_x', scales = 'free_x', switch = 'x') +
  theme(panel.spacing.y = unit(2,"line")) +
  labs(x="Percent", y="Item", title = "")+
  theme_minimal()+
  ylab('') + xlab('')
d<-ggplot(postdtaL, aes(x=Percent , y=reorder(Item,abs(Differ)), shape=Group)) + 
  geom_point(aes(color=Group), size=3) +
  geom_line(aes(group=Item),color="black", lty=1, size=1)+
  scale_shape_manual(name="Gender", values = c(16,16))+
  scale_color_manual(name="Gender", values =c("red","blue"))+
  facet_grid(FreQ~. , space = 'free_x', scales = 'free_x', switch = 'x') +
  theme(panel.spacing.y = unit(2,"line")) +
  labs(x="Percent", y="Item", title = "")+
  theme_minimal()+
  ylab('') 
#grid.arrange(c,d)
```
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-19-1.png)
]

---
### Social isolation - network size

- Two questions

- 您退休後經常在一起的人，共多少人？（不包括同住家人）

- 您每個月有聯絡（聊天互動，包括電話或網路）的非同住親戚與朋友有多少？

-  scale

- 0.無; 1.約1～5人; 2.約6～10人; 3.超過11人; 999無法分析(空白、不知道)
  
- Graphics

- bar plot
  
  - dot plot
  
  - dot plot(segment)
---
### network size

.pull-left[

```r
network <- presub |> select(ID,a1,w2,w4) |> as.data.frame()
network<-lapply(network, as.factor) |> as.data.frame()
names(network)<-c("ID", "Gender", "Accompany", "Contact")
knitr::kable(head(network))
```

|ID             |Gender |Accompany |Contact |
|:--------------|:------|:---------|:-------|
|10002030003201 |1      |2         |1       |
|10002030003202 |2      |2         |0       |
|10002030003203 |2      |3         |0       |
|10002030003204 |1      |3         |0       |
|10002030003205 |2      |1         |0       |
|10002030003206 |2      |2         |0       |
]

.pull-right[

```r
networkL <-network |> pivot_longer(cols = -c("ID","Gender"),
                            names_to = "Net",
                            values_to = "Size")
knitr::kable(head(networkL))
```

|ID             |Gender |Net       |Size |
|:--------------|:------|:---------|:----|
|10002030003201 |1      |Accompany |2    |
|10002030003201 |1      |Contact   |1    |
|10002030003202 |2      |Accompany |2    |
|10002030003202 |2      |Contact   |0    |
|10002030003203 |2      |Accompany |3    |
|10002030003203 |2      |Contact   |0    |
]

---

### bar plot

.pull-left[

```r
net.bar<-ggplot(plotnetL, 
       aes(x = factor(Gender),
           y = pct,
           fill = factor(Size))) + 
  geom_bar(stat = "identity", 
           position = "fill") +
  scale_y_continuous(breaks = seq(0, 1, .2)) +
  geom_text(aes(label = lbl),
            size = 4, 
            position = position_stack(vjust = 0.5)) +
  scale_fill_brewer(palette = "Set2") +
  facet_wrap(~Net)+
  labs(y = "Percentage", fill = "Social Network Size", x = "Gender", title = "Social Network Size by Gender") +
  theme_minimal()
```

- size 的差異以顏色面積顯示，比較不易
- 耗費墨水的問題
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-25-1.png)

]
---

### dot plot

.pull-left[

```r
net.dot2<-ggplot(plotnetL, aes(pct, NetSize)) +
        geom_line(aes(group = NetSize),size = 2,color="grey") +
        geom_point(aes(color = Gender),size = 5)+
  labs(y = "Network size", x="Percentage")+ theme_minimal()
```

- 用線段表示差異大小
- 不同size level分開用點來呈現
- 略顯零散
- 因差異太小，x軸只呈現0-0.4略顯失真?
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-27-1.png)
]

---

### dot plot (segment by gender)
.pull-left[

```r
net.dot<-ggplot(plotnetL,aes(x=pct,
                    y=GNetSize, color = Gender))+
  geom_segment(aes(xend=0, yend=GNetSize),color="grey", size=2)+ 
  geom_point(size = 5) +
  geom_text(aes(label = pct),size = 5, color="black", nudge_x = 0.1) +
  scale_x_continuous(limits = c(0, 1), breaks =seq(0, 1, .2))+
  scale_color_hue(direction = -1)+
  ggtitle("") +
  labs(x = "Percent", y = "")+
  theme_minimal()+
  theme(text = element_text(family = "Songti SC"))+
  facet_grid(Net~. , space = 'free_y', scales = 'free_y', switch = 'y') +
  theme(panel.spacing.y = unit(2,"line")) +
  theme(strip.placement = 'outside',
        strip.background.y = element_blank())+
  guides(color= guide_legend(title="Gender",reverse = F))
```

- facet_grid by network, segment gender
- y axis using two labels
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-29-1.png)
]

---
# Demographic (Pretest)

- Four questions

variables           |  response
----------------------|-------------
  您的性別            |1.男; 2.女 
  您是民國哪一年出生？|民國年
  您的最高學歷        |0.不識字; 1.未受正規教育但識字;2.國小畢業; 3.初中／國中畢業; 
                      |4.高中職畢業; 5.專科畢業; 6.大學畢業; 7.研究所及以上; 8.其他
  您是哪類型退休？    |1.公保(包含所有軍公教); 2.勞保(職場或工會勞保); 3.國營企業退休; 4.私校保險; 5.其他

- Data management

- 民國年改為年齡(連續)
  
  - case_when 分組
  
  - tbl_summary

---
### Demographic (Pretest)

.pull-left[

```r
# variables
info<-presub |> dplyr::select(c(ID,a1,a2a,a6,a8)) |> as.data.frame()
names(info) <-c("ID","gender","ageY","edu","RType")
# age
info[info$ageY=="0",]<-NA
info$age <- 104-info$ageY
# education and insurance type
info<-info |> mutate(
  cEdu= case_when(edu %in% c(0:3)~ "Junior", 
                  edu %in% c(4:7)~ "Senior"),
  cRType= case_when(RType == 1 ~ "Public",
                    RType == 2 ~ "Labor"),
  Gender= case_when(gender == 1 ~ "Male",
                    gender == 2 ~ "Female"))
cinfo <-info |> dplyr::select(-c(RType,ageY,gender))
tbl_info<-cinfo[,-c(1,2)] |> tbl_summary(by=Gender, missing="no",label = list(age ~ "Age",cEdu ~ "Education",cRType ~ "Insurance type"),statistic = list(all_continuous() ~ "{mean} ({sd})"))
```
]

.pull-right[
<div id="oggkbhhtup" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

#oggkbhhtup .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#oggkbhhtup .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#oggkbhhtup .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#oggkbhhtup .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#oggkbhhtup .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#oggkbhhtup .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#oggkbhhtup .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#oggkbhhtup .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#oggkbhhtup .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#oggkbhhtup .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#oggkbhhtup .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#oggkbhhtup .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#oggkbhhtup .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#oggkbhhtup .gt_from_md > :first-child {
  margin-top: 0;
}

#oggkbhhtup .gt_from_md > :last-child {
  margin-bottom: 0;
}

#oggkbhhtup .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#oggkbhhtup .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#oggkbhhtup .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#oggkbhhtup .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#oggkbhhtup .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#oggkbhhtup .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#oggkbhhtup .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#oggkbhhtup .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#oggkbhhtup .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#oggkbhhtup .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#oggkbhhtup .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#oggkbhhtup .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#oggkbhhtup .gt_left {
  text-align: left;
}

#oggkbhhtup .gt_center {
  text-align: center;
}

#oggkbhhtup .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#oggkbhhtup .gt_font_normal {
  font-weight: normal;
}

#oggkbhhtup .gt_font_bold {
  font-weight: bold;
}

#oggkbhhtup .gt_font_italic {
  font-style: italic;
}

#oggkbhhtup .gt_super {
  font-size: 65%;
}

#oggkbhhtup .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 65%;
}
</style>
<table class="gt_table">
  
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1"><strong>Characteristic</strong></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>Female</strong>, N = 1,380<sup class="gt_footnote_marks">1</sup></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>Male</strong>, N = 1,760<sup class="gt_footnote_marks">1</sup></th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td class="gt_row gt_left">Age</td>
<td class="gt_row gt_center">63.9 (5.4)</td>
<td class="gt_row gt_center">65.0 (5.6)</td></tr>
    <tr><td class="gt_row gt_left">Education</td>
<td class="gt_row gt_center"></td>
<td class="gt_row gt_center"></td></tr>
    <tr><td class="gt_row gt_left" style="text-align: left; text-indent: 10px;">Junior</td>
<td class="gt_row gt_center">772 (56%)</td>
<td class="gt_row gt_center">707 (40%)</td></tr>
    <tr><td class="gt_row gt_left" style="text-align: left; text-indent: 10px;">Senior</td>
<td class="gt_row gt_center">608 (44%)</td>
<td class="gt_row gt_center">1,050 (60%)</td></tr>
    <tr><td class="gt_row gt_left">Insurance type</td>
<td class="gt_row gt_center"></td>
<td class="gt_row gt_center"></td></tr>
    <tr><td class="gt_row gt_left" style="text-align: left; text-indent: 10px;">Labor</td>
<td class="gt_row gt_center">1,091 (80%)</td>
<td class="gt_row gt_center">1,220 (70%)</td></tr>
    <tr><td class="gt_row gt_left" style="text-align: left; text-indent: 10px;">Public</td>
<td class="gt_row gt_center">275 (20%)</td>
<td class="gt_row gt_center">511 (30%)</td></tr>
  </tbody>
  
  <tfoot>
    <tr class="gt_footnotes">
      <td colspan="3">
        <p class="gt_footnote">
          <sup class="gt_footnote_marks">
            <em>1</em>
          </sup>
           
          Mean (SD); n (%)
          <br />
        </p>
      </td>
    </tr>
  </tfoot>
</table>
</div>
]

---

### cognitive function (SLUMS)

.pull-left[
<img src="SLUMS.png" width="90%" style="display: block; margin: auto auto auto 0;" />

]

.pull-right[

- **Data management issue**

- replace value and sum cognitive score
  
  - **graphics** 
  
  - Demographic variables
  
  - Between
  
  - Within
  
]

---

### cognitive function (SLUMS)

.pull-left[

```r
cog <-presub |> dplyr::select(c(1,11:28))
cogpost <-postsub|> dplyr::select(c(1,5:22))
summary(cog)
```

- z7a不計分

- Z4b, Z8a, Z8b, Z10a, Z10b, Z10c, Z10d 答對為2分

- 其他答對皆為1分

]

.pull-right[

```
tibble [3,141 x 19] (S3: tbl_df/tbl/data.frame)
 $ ID  : chr [1:3141] "10002030003201" "10002030003202" "10002030003203" "10002030003204" ...
 $ z1  : num [1:3141] 1 0 1 1 1 1 1 1 1 1 ...
 $ z2  : num [1:3141] 1 0 1 1 1 1 0 1 0 1 ...
 $ z3  : num [1:3141] 1 1 1 1 1 1 1 1 1 1 ...
 $ z4a : num [1:3141] 1 0 1 1 1 1 0 1 1 1 ...
 $ z4b : num [1:3141] 1 0 1 1 1 1 0 1 1 1 ...
 $ z5  : num [1:3141] 1 2 1 2 1 1 NA 3 3 2 ...
 $ z6  : num [1:3141] 2 2 2 4 4 4 NA 4 5 5 ...
 $ z7a : num [1:3141] 0 NA 0 0 0 0 0 0 0 0 ...
 $ z7b : num [1:3141] 0 NA 0 1 0 1 1 1 1 1 ...
 $ z7c : num [1:3141] 0 NA 0 0 0 0 0 0 0 1 ...
 $ z8a : num [1:3141] 1 0 1 1 1 0 0 0 1 1 ...
 $ z8b : num [1:3141] 0 1 0 0 1 0 0 0 1 0 ...
 $ z9a : num [1:3141] 1 0 1 1 0 1 0 1 1 1 ...
 $ z9b : num [1:3141] 0 1 1 1 1 1 1 1 1 1 ...
 $ z10a: num [1:3141] 1 0 1 1 1 1 1 1 1 1 ...
 $ z10b: num [1:3141] 1 0 0 1 1 0 0 1 0 1 ...
 $ z10c: num [1:3141] 0 0 1 0 1 0 0 1 0 1 ...
 $ z10d: num [1:3141] 0 0 1 1 0 0 0 1 0 0 ...
```
]

---

### management (wide form)

.pull-left[

```r
# z7a不計分
cog<-cog[,-9]
# replaces yes(1) to Score(2)
cogsub2<-sapply(cog[,c("z4b","z8a","z8b","z10a","z10b","z10c","z10d")], function(x){
  replace(x , x %in% 1, 2)}) |> as.data.frame()
# sapply後的結果回到原始資料中
dtacog<-cog
colnames <- colnames(dtacog)
dtacog[ ,colnames %in% colnames(cogsub2)] <- cogsub2
dtacog$Pre_Score <- rowSums(dtacog[,2:18])
# 保留ID與總分
mercog<-dtacog[,c(1,19)]
```

```r
# merge
Tdtacog <- left_join(mercogpost, mercog, by='ID') %>%
                left_join(., cinfo, by='ID')
Tdtacog[,c(1,8,5,4,6,7,2,3)] |>head() |>knitr::kable()
```
]
.pull-right[

|Gender | age| edu|cEdu   |cRType | Post_Score| Pre_Score|
|:------|---:|---:|:------|:------|----------:|---------:|
|Male   |  66|   3|Junior |Labor  |         22|        17|
|Male   |  65|   6|Senior |Public |         21|        25|
|Male   |  59|   2|Junior |Labor  |         30|        24|
|Male   |  66|   4|Senior |Labor  |         25|        29|
|Male   |  63|   2|Junior |Labor  |         21|        20|
|Male   |  73|   6|Senior |Public |         28|        NA|
]
---

### cognitive score and insurance type

.pull-left[

```r
# calculate mean ci
plottype <- PREdtacog |>   group_by(cRType, Gender) |>
  filter(!is.na(cRType)) |>  summarize(n = n(),
         mean = mean(Pre_Score, na.rm=T),
         sd = sd(Pre_Score, na.rm=T),
         se = sd / sqrt(n),
         ci = qt(0.975, df = n - 1) * sd / sqrt(n))
```

```r
pd <- position_dodge(0.2)
MStype<-ggplot(plottype, 
       aes(x = cRType, y = mean, color = Gender)) +
  geom_point(size = 4, position = pd) +
  geom_line(position =pd, size = 1) +
  theme_minimal() +
  scale_color_brewer(palette="Set1") +
  geom_errorbar(aes(ymin = mean - se, 
                    ymax = mean + se), 
                width = .1, position = pd)+
  labs(y = "Cognitive Score (Mean)", x="Insurance type")
```
]
.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-42-1.png)
]

---

### cognitive score and education

.pull-left[

```r
pd <- position_dodge(0.2)
MSedu<-ggplot(plotedu, 
       aes(x = edu, 
           y = mean, 
           color = Gender)) +
  geom_point(size = 4, position = pd) +
  geom_line(aes(group=Gender),position = pd,linetype = "dashed") +
  theme_minimal() +
  scale_color_brewer(palette="Set1") +
  geom_errorbar(aes(ymin = mean - se, 
                    ymax = mean + se), 
                width = .1, position = pd)+
  labs(y = "Cognitive Score (Mean)", x="Education level")+
  theme(axis.text.x = element_text(angle = 45, 
                                   hjust = 1))
```
]
.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-45-1.png)
]

---

### management (long form)
.pull-left[

```r
Tdtacog2<-Tdtacog |>mutate(Pre_age=age, Post_age=Pre_age+3) 
Tdtacog2<-Tdtacog2[,-5]

TscoreL1 <-Tdtacog2|> pivot_longer(cols = c("Pre_Score","Post_Score"),
                            names_to = "Time",
                            values_to = "Score")  
TscoreL1 <-TscoreL1[c(1,5,8,9)]
TscoreL1$Time <-fct_collapse(TscoreL1$Time, "Posttest" = "Post_Score","Pretset" = "Pre_Score") 
TscoreL2 <-Tdtacog2|> pivot_longer(cols = c("Pre_age","Post_age"),
                            names_to = "Time",
                            values_to = "Age") 
TscoreL2 <-TscoreL2[c(1,5,8,9)]

TscoreL2$Time <-fct_collapse(TscoreL2$Time, "Posttest" = "Post_age","Pretset" = "Pre_age")
TscoreL<-merge(TscoreL1,TscoreL2,by=c("ID","Time"))
TscoreL|>head() |>knitr::kable()
```
]

.pull-right[

|ID             |Time     |Gender | Score|cEdu   | Age|
|:--------------|:--------|:------|-----:|:------|---:|
|10002030003202 |Posttest |Female |    NA|Junior |  77|
|10002030003202 |Pretset  |Female |    NA|Junior |  74|
|10002030003203 |Posttest |Female |    21|Junior |  58|
|10002030003203 |Pretset  |Female |    19|Junior |  55|
|10002030003204 |Posttest |Male   |    26|Junior |  63|
|10002030003204 |Pretset  |Male   |    23|Junior |  60|
]
---

### cognitive score and gender

.pull-left[

- jitter plot

![](Project0103_files/figure-html/unnamed-chunk-48-1.png)
]

.pull-right[
- boxplot plot
![](Project0103_files/figure-html/unnamed-chunk-49-1.png)
]

---

### Score change stratify by gender
.pull-left[

```r
set.seed(123)
chosen <- Tdtacog2 |> group_by(Gender) %>%  sample_n(200)

chosen$diff <- chosen$Post_Score-chosen$Pre_Score
chosen$Change <- ifelse(chosen$diff>0, "better",
              ifelse(chosen$diff==0, "same",
              ifelse(chosen$diff<0, "worse", NA)))  
chosen <- chosen |>filter(!is.na(Change))

# sample=400
with(chosen, plot(Pre_age,Pre_Score, bty="n", xlim = c(50,80), ylim=c(0,30),xlab = "Age (year)",ylab = "Score",xaxs = "r", yaxs ="r",
     col = "dark grey"))
grid(col="thistle")
with(chosenlist[["worse"]], arrows(Pre_age, Pre_Score, 
                  Post_age, Post_Score,length=0.1, col = "pink", lwd=1))
with(chosenlist[["better"]], arrows(Pre_age, Pre_Score, 
                  Post_age, Post_Score,length=0.1, col = "skyblue", lwd=1))
with(chosenlist[["same"]], arrows(Pre_age, Pre_Score, 
                  Post_age, Post_Score,length=0.1, col = "orange", lwd=1))
```
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-51-1.png)

]
---

### Base R approach
.pull-left[

![](Project0103_files/figure-html/unnamed-chunk-53-1.png)
]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-54-1.png)
]

---

### using {ggplot}: facet_grid(Change ~ Gender)

.pull-left[

```r
chosen.s<-chosen[,-c(4:6,8:10)] %>% melt(id.var= c("ID","Gender","Change")) 
chosen.s<-chosen.s |>mutate(Time=as.factor(gsub("_Score","",variable)))
chosen.s$Time<-relevel(chosen.s$Time, "Pre")
#
pathplot<-ggplot(chosen.s,
       aes(x=Time, y=value, color=Change, group=ID)) + 
  geom_jitter(alpha = 0.2,
              size = 1.5)+
  geom_path(arrow=arrow(ends = "first", length=unit(0.1,"in")), show.legend=FALSE) +
  facet_grid(Change ~ Gender) +
  theme_tufte() +
  theme(strip.text.x=element_text(size=15)) +
  guides(color=guide_legend(reverse=F)) +
  scale_y_continuous(limits=c(0,30)) +
  labs(x="Time", y="Cognitive Score")
```

]

.pull-right[
![](Project0103_files/figure-html/unnamed-chunk-56-1.png)
]

---

### conclusion

- 資料整理策略?

- X變項很多時(x1,x2,x3)資料整理的抉擇(Big dataframe(col=58) or Small dataframes(col=28)*2)
  - Big dataframe 在處理資料的時候object有時會被取代掉(apply..reshape..)
  - Small dataframes 在變項命名的時候時常混亂(還未找到好的命名原則)
  
--

- package VS by self

- 好用快速產生想要的結果，但續做或調整彈性較差。(感覺我一直再轉資料成合乎畫圖的格式)

- 畫圖的選擇
  
  - 類別*類別資料要免去data-ink ratio問題，除了dot plot, density plot? else? (line plot好像也不適合畫類別?)
  
--

**心得**

- 腦中要先有圖→對應資料結構→整理策略(先轉資料還是先新增變項?)→善用function(才不會鋒回入轉白作工)
 - 衷於原始資料分布才不會做出錯誤結論(認知分數=0 ? change?)
 - 上完資料管理課後，我想我會開始考慮畫圖這件事...