2023 厚數據與意義探勘實做

關於資料來源

這個練習的資料，是個來自兩波網路調查。這是個依受訪者id合併之後的「定群追蹤資料」（panel data）。由smilepoll.tw提供。第一筆（代號B）是「統獨意見大調查」，調查時間： 2018. 10.22～2018. 11.13，N=886（完成率86.7%）；第二筆（代號D）是「2018地方選舉選後心情札記」，調查時間：2019.01.21～2019.02.19，N=1,297 (完成率78%)。

專案準備

請開啟一個新的專案，將語法檔及資料檔（dataBD.rda）都放入該專案資料夾內。

讀入資料與變數觀察

## Warning: 套件 'sjPlot' 是用 R 版本 4.3.1 來建造的

## Warning: 套件 'sjmisc' 是用 R 版本 4.3.1 來建造的

## 性別 (x) <categorical> 
## # total N=579 valid N=576 mean=0.43 sd=0.50
## 
## Value | Label |   N | Raw % | Valid % | Cum. %
## ----------------------------------------------
##     0 |    女 | 329 | 56.82 |   57.12 |  57.12
##     1 |    男 | 247 | 42.66 |   42.88 | 100.00
##  <NA> |  <NA> |   3 |  0.52 |    <NA> |   <NA>

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): Windows
## 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): Windows
## 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## x <numeric> 
## # total N=579 valid N=579 mean=34.95 sd=11.40
## 
## Value |  N | Raw % | Valid % | Cum. %
## -------------------------------------
##    15 |  2 |  0.35 |    0.35 |   0.35
##    17 |  4 |  0.69 |    0.69 |   1.04
##    18 |  4 |  0.69 |    0.69 |   1.73
##    19 | 12 |  2.07 |    2.07 |   3.80
##    20 | 11 |  1.90 |    1.90 |   5.70
##    21 | 13 |  2.25 |    2.25 |   7.94
##    22 | 13 |  2.25 |    2.25 |  10.19
##    23 | 21 |  3.63 |    3.63 |  13.82
##    24 | 20 |  3.45 |    3.45 |  17.27
##    25 | 23 |  3.97 |    3.97 |  21.24
##    26 | 29 |  5.01 |    5.01 |  26.25
##    27 | 17 |  2.94 |    2.94 |  29.19
##    28 | 21 |  3.63 |    3.63 |  32.82
##    29 | 13 |  2.25 |    2.25 |  35.06
##    30 | 26 |  4.49 |    4.49 |  39.55
##    31 | 25 |  4.32 |    4.32 |  43.87
##    32 | 17 |  2.94 |    2.94 |  46.80
##    33 | 31 |  5.35 |    5.35 |  52.16
##    34 | 22 |  3.80 |    3.80 |  55.96
##    35 | 25 |  4.32 |    4.32 |  60.28
##    36 | 24 |  4.15 |    4.15 |  64.42
##    37 | 15 |  2.59 |    2.59 |  67.01
##    38 | 14 |  2.42 |    2.42 |  69.43
##    39 | 13 |  2.25 |    2.25 |  71.68
##    40 | 11 |  1.90 |    1.90 |  73.58
##    41 | 12 |  2.07 |    2.07 |  75.65
##    42 | 11 |  1.90 |    1.90 |  77.55
##    43 |  9 |  1.55 |    1.55 |  79.10
##    44 |  6 |  1.04 |    1.04 |  80.14
##    45 | 16 |  2.76 |    2.76 |  82.90
##    46 |  8 |  1.38 |    1.38 |  84.28
##    47 | 12 |  2.07 |    2.07 |  86.36
##    48 |  9 |  1.55 |    1.55 |  87.91
##    49 |  6 |  1.04 |    1.04 |  88.95
##    50 |  6 |  1.04 |    1.04 |  89.98
##    51 |  4 |  0.69 |    0.69 |  90.67
##    52 |  7 |  1.21 |    1.21 |  91.88
##    53 |  5 |  0.86 |    0.86 |  92.75
##    54 |  5 |  0.86 |    0.86 |  93.61
##    55 |  4 |  0.69 |    0.69 |  94.30
##    56 |  2 |  0.35 |    0.35 |  94.65
##    57 |  4 |  0.69 |    0.69 |  95.34
##    58 |  4 |  0.69 |    0.69 |  96.03
##    60 |  2 |  0.35 |    0.35 |  96.37
##    61 |  1 |  0.17 |    0.17 |  96.55
##    62 |  3 |  0.52 |    0.52 |  97.06
##    63 |  8 |  1.38 |    1.38 |  98.45
##    64 |  1 |  0.17 |    0.17 |  98.62
##    66 |  3 |  0.52 |    0.52 |  99.14
##    68 |  1 |  0.17 |    0.17 |  99.31
##    74 |  1 |  0.17 |    0.17 |  99.48
##    78 |  1 |  0.17 |    0.17 |  99.65
##    79 |  1 |  0.17 |    0.17 |  99.83
##    90 |  1 |  0.17 |    0.17 | 100.00
##  <NA> |  0 |  0.00 |    <NA> |   <NA>

## Warning: `stat(density)` was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## Windows 字型資料庫裡不明的字型系列
## Windows 字型資料庫裡不明的字型系列
## Windows 字型資料庫裡不明的字型系列
## Windows 字型資料庫裡不明的字型系列
## Windows 字型資料庫裡不明的字型系列
## Windows 字型資料庫裡不明的字型系列
## ℹ The deprecated feature was likely used in the sjPlot package.
##   Please report the issue at <https://github.com/strengejacke/sjPlot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## 教育程度 (x) <categorical> 
## # total N=579 valid N=572 mean=2.03 sd=0.53
## 
## Value |    Label |   N | Raw % | Valid % | Cum. %
## -------------------------------------------------
##     1 | 大專以下 |  72 | 12.44 |   12.59 |  12.59
##     2 |     大專 | 408 | 70.47 |   71.33 |  83.92
##     3 |   研究所 |  92 | 15.89 |   16.08 | 100.00
##  <NA> |     <NA> |   7 |  1.21 |    <NA> |   <NA>

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## 居住地 (x) <categorical> 
## # total N=579 valid N=558 mean=7.98 sd=5.58
## 
## Value |                      Label |   N | Raw % | Valid % | Cum. %
## -------------------------------------------------------------------
##     1 |                     台北市 |  56 |  9.67 |   10.04 |  10.04
##     2 |                     新北市 | 120 | 20.73 |   21.51 |  68.64
##     3 |                     基隆市 |   5 |  0.86 |    0.90 |  70.07
##     4 |                     桃園市 |  42 |  7.25 |    7.53 |  77.60
##     5 |                     新竹市 |  13 |  2.25 |    2.33 |  79.93
##     6 |                     新竹縣 |  12 |  2.07 |    2.15 |  82.08
##     7 |                     苗栗縣 |   8 |  1.38 |    1.43 |  83.51
##     8 |                     台中市 |  73 | 12.61 |   13.08 |  96.59
##     9 |                     彰化縣 |  19 |  3.28 |    3.41 | 100.00
##    10 |                     南投縣 |   5 |  0.86 |    0.90 |  10.93
##    11 |                     雲林縣 |  18 |  3.11 |    3.23 |  14.16
##    12 |                     嘉義市 |   5 |  0.86 |    0.90 |  15.05
##    13 |                     嘉義縣 |  13 |  2.25 |    2.33 |  17.38
##    14 |                     台南市 |  45 |  7.77 |    8.06 |  25.45
##    15 |                     高雄市 |  99 | 17.10 |   17.74 |  43.19
##    16 |                     屏東縣 |  14 |  2.42 |    2.51 |  45.70
##    17 |                     台東縣 |   0 |  0.00 |    0.00 |  45.70
##    18 |                     花蓮縣 |   5 |  0.86 |    0.90 |  46.59
##    19 |                     宜蘭縣 |   3 |  0.52 |    0.54 |  47.13
##    20 |                     澎湖縣 |   2 |  0.35 |    0.36 |  69.00
##    21 |                     金門縣 |   1 |  0.17 |    0.18 |  69.18
##    22 |                     連江縣 |   0 |  0.00 |    0.00 |  69.18
##    23 | 中國大陸(含香港、澳門)地區 |   0 |  0.00 |    0.00 |  69.18
##    24 |                       其他 |   0 |  0.00 |    0.00 |  69.18
##  <NA> |                       <NA> |  21 |  3.63 |    <NA> |   <NA>

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## Windows 字型資料庫裡不明的字型系列

請依你圖表上的資訊，在此寫下你對這筆資料的基本印象及看法：

關於這筆資料，我發現…………

變數選取與MCA分析

## 
## 載入套件：'dplyr'

## 下列物件被遮斷自 'package:stats':
## 
##     filter, lag

## 下列物件被遮斷自 'package:base':
## 
##     intersect, setdiff, setequal, union

## Warning: 套件 'factoextra' 是用 R 版本 4.3.1 來建造的

## 載入需要的套件：ggplot2

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

## [1] 564

##  [1] "Gender"  "college" "B23r"    "B25r"    "B29r"    "B33r"    "B39r"   
##  [8] "B42r"    "B46r"    "B47r"    "B51r"    "B53r"    "B54r"    "B56r"   
## [15] "B57r"    "D25r"    "D52r"    "D58r"    "D61r"    "D81r"    "D99r"   
## [22] "D146r"   "D147r"

探索

#前30個重要的選項類別
plot(resBD, axes=c(1, 2), new.plot=TRUE, 
     col.var="black", col.ind="black", col.ind.sup="black",
     col.quali.sup="darkgreen", col.quanti.sup="blue",
     label=c("var"), cex=0.7, 
     selectMod = "cos2 10",   # 試試看，將20調為更高的數字，你會看到更多的變數類別
     invisible=c("ind", "quali.sup"), 
     autoLab = "yes",
     xlim=c(-1, 1.5), ylim=c(-1.3, 1.5),
     title="")

## Warning: Removed 1 rows containing missing values (`geom_point()`).

## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).

# 受訪者分佈圖
plot(resBD, axes=c(1, 2), new.plot=TRUE,
     col.var="red", col.ind="brown", col.ind.sup="black",
     col.quali.sup="darkgreen", col.quanti.sup="blue",
     label=c("var"), cex=0.8,
     selectMod = "cos2",
     invisible=c("var", "quali.sup"),
     xlim=c(-1, 1.5),
     title="")

大膽假設

現在，請你依圖找到你覺得有趣的、有相關的問卷題，將你的假設寫下。

## 用卡方檢定確認具潛在關聯的變數之間的相關性  
library(sjPlot)
library(sjmisc)

sjt.xtab(dataBD$B39r, dataBD$B47r,    ## 請把這兩個變數換成你想要檢視變數
         show.row.prc = TRUE, # 顯示列百分比
         show.col.prc = TRUE  # 顯示欄百分比
)

B39r	B47r			Total
B39r	1	2	3	Total
1	137 67.8 % 60.6 %	47 23.3 % 19.2 %	18 8.9 % 16.7 %	202 100 % 34.9 %
2	55 25 % 24.3 %	142 64.5 % 58 %	23 10.5 % 21.3 %	220 100 % 38 %
3	34 21.7 % 15 %	56 35.7 % 22.9 %	67 42.7 % 62 %	157 100 % 27.1 %
Total	226 39 % 100 %	245 42.3 % 100 %	108 18.7 % 100 %	579 100 % 100 %
χ²=177.566 · df=4 · Cramer’s V=0.392 · p=0.000

#想檢視多組關係，請把上一段複製貼上，調整變數，多做幾次吧！

再來，請你寫下經過你驗證後的、非常可能相關的一組組變數（請寫下最多三組有趣的即可）

編譯出你專屬的作品

最後，請你用knit將你的分析結果存為html繳交給老師。