鐵達尼號資料分析專案

Author

Miao Chien

這是 Quarto 的示範文件

  • 我們繼續使用老朋友 titanic 資料集
library(titanic)
library(dplyr)
library(magrittr)

titanic::titanic_train %>% head
  PassengerId Survived Pclass
1           1        0      3
2           2        1      1
3           3        1      3
4           4        1      1
5           5        0      3
6           6        0      3
                                                 Name    Sex Age SibSp Parch
1                             Braund, Mr. Owen Harris   male  22     1     0
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female  38     1     0
3                              Heikkinen, Miss. Laina female  26     0     0
4        Futrelle, Mrs. Jacques Heath (Lily May Peel) female  35     1     0
5                            Allen, Mr. William Henry   male  35     0     0
6                                    Moran, Mr. James   male  NA     0     0
            Ticket    Fare Cabin Embarked
1        A/5 21171  7.2500              S
2         PC 17599 71.2833   C85        C
3 STON/O2. 3101282  7.9250              S
4           113803 53.1000  C123        S
5           373450  8.0500              S
6           330877  8.4583              Q

生成表格

knitr::kable():以表格呈現資料

  • 因為 titanic 資料集共有 891 筆資料,長度過長,因此擷取前20 筆資料呈現之
library(knitr)
Warning: package 'knitr' was built under R version 4.4.3
titanic_train %>% head(20) %>% kable()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 S
6 0 3 Moran, Mr. James male NA 0 0 330877 8.4583 Q
7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S
8 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 S
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 C
11 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S
12 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S
13 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 S
14 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 S
15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 S
16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 S
17 0 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 Q
18 1 2 Williams, Mr. Charles Eugene male NA 0 0 244373 13.0000 S
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) female 31 1 0 345763 18.0000 S
20 1 3 Masselmani, Mrs. Fatima female NA 0 0 2649 7.2250 C

DT::datatable(): 如果想要能夠更清楚的觀察資料

#install.packages("DT")
library(DT)
Warning: package 'DT' was built under R version 4.4.3
# 更多 DT::datatable 的設定可查閱文件:https://rstudio.github.io/DT/
datatable(titanic_train, extensions = 'Scroller', 
          options = list(scrollY = 400, scroller = TRUE))

生成圖表

生成 ggplot 分析圖

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.4.3
library(dplyr)

p <- titanic_train %>% 
  mutate(Pclass = factor(Pclass, levels=c(3, 2, 1), label = c("三等艙", "二等艙", "頭等艙"))) %>%
  mutate(Survived = factor(Survived)) %>% 
  ggplot(., aes(x = Pclass, fill = Survived)) +
  geom_bar(position = "fill") +  # Stacked bar plot
  facet_wrap(~ Sex) +  # Separate by gender
  labs(
    title = "Survival Rate by Pclass and Sex",
    x = "Pclass (Passenger Class)",
    y = "Proportion of Survivors",
    fill = "Survived"
  ) +
  scale_fill_manual(values = c("#888", "red"), labels = c("Did not survive", "Survived")) +
  theme_minimal() +
  theme(text = element_text(family = '黑體-繁 中黑')) # 設定顯示中文字(Mac)

p
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database

生成互動式圖表

  • 由於讀取套件 plotly 時會出現系統訊息,因此可以在 code chunk 中設定 message=FALSE,以避免訊息出現
library(plotly)
Warning: package 'plotly' was built under R version 4.4.3
ggplotly(p)