組員:賴加耕 黃丞祥 姚冠豪 趙致忠
資料來源:https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset
這是資料中的變數及其中文翻譯:
| 變數名稱 | 中文翻譯 |
|---|---|
| Person ID | 人員編號 |
| Gender | 性別 |
| Age | 年齡 |
| Occupation | 職業 |
| Sleep Duration | 睡眠時間(小時) |
| Quality of Sleep | 睡眠質量 |
| Physical Activity Level | 身體活動水平 |
| Stress Level | 壓力水平 |
| BMI Category | 體重指數類別 |
| Blood Pressure | 血壓 |
| Heart Rate | 心跳速率 |
| Daily Steps | 每日步數 |
| Sleep Disorder | 睡眠障礙 |
# 讀取資料
library(readr)
data<-read_csv("C:/Users/Howard/Sleep_health_and_lifestyle_dataset.csv")
## Rows: 374 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Gender, Occupation, BMI Category, Blood Pressure, Sleep Disorder
## dbl (8): Person ID, Age, Sleep Duration, Quality of Sleep, Physical Activity...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 13
## `Person ID` Gender Age Occupation `Sleep Duration` `Quality of Sleep`
## <dbl> <chr> <dbl> <chr> <dbl> <dbl>
## 1 1 Male 27 Software Engineer 6.1 6
## 2 2 Male 28 Doctor 6.2 6
## 3 3 Male 28 Doctor 6.2 6
## 4 4 Male 28 Sales Representa… 5.9 4
## 5 5 Male 28 Sales Representa… 5.9 4
## 6 6 Male 28 Software Engineer 5.9 4
## # ℹ 7 more variables: `Physical Activity Level` <dbl>, `Stress Level` <dbl>,
## # `BMI Category` <chr>, `Blood Pressure` <chr>, `Heart Rate` <dbl>,
## # `Daily Steps` <dbl>, `Sleep Disorder` <chr>
str(data)
## spc_tbl_ [374 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Person ID : num [1:374] 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender : chr [1:374] "Male" "Male" "Male" "Male" ...
## $ Age : num [1:374] 27 28 28 28 28 28 29 29 29 29 ...
## $ Occupation : chr [1:374] "Software Engineer" "Doctor" "Doctor" "Sales Representative" ...
## $ Sleep Duration : num [1:374] 6.1 6.2 6.2 5.9 5.9 5.9 6.3 7.8 7.8 7.8 ...
## $ Quality of Sleep : num [1:374] 6 6 6 4 4 4 6 7 7 7 ...
## $ Physical Activity Level: num [1:374] 42 60 60 30 30 30 40 75 75 75 ...
## $ Stress Level : num [1:374] 6 8 8 8 8 8 7 6 6 6 ...
## $ BMI Category : chr [1:374] "Overweight" "Normal" "Normal" "Obese" ...
## $ Blood Pressure : chr [1:374] "126/83" "125/80" "125/80" "140/90" ...
## $ Heart Rate : num [1:374] 77 75 75 85 85 85 82 70 70 70 ...
## $ Daily Steps : num [1:374] 4200 10000 10000 3000 3000 3000 3500 8000 8000 8000 ...
## $ Sleep Disorder : chr [1:374] "None" "None" "None" "Sleep Apnea" ...
## - attr(*, "spec")=
## .. cols(
## .. `Person ID` = col_double(),
## .. Gender = col_character(),
## .. Age = col_double(),
## .. Occupation = col_character(),
## .. `Sleep Duration` = col_double(),
## .. `Quality of Sleep` = col_double(),
## .. `Physical Activity Level` = col_double(),
## .. `Stress Level` = col_double(),
## .. `BMI Category` = col_character(),
## .. `Blood Pressure` = col_character(),
## .. `Heart Rate` = col_double(),
## .. `Daily Steps` = col_double(),
## .. `Sleep Disorder` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
# 載入必要的庫
library(dplyr)
##
## 載入套件:'dplyr'
## 下列物件被遮斷自 'package:stats':
##
## filter, lag
## 下列物件被遮斷自 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(stats)
data <- data %>%
rename(Sleep_Disorder = `Sleep Disorder`)
data <- data %>%
mutate(Sleep_Disorder = ifelse(Sleep_Disorder == "None", 0, 1), # 將睡眠障礙轉換為0或1
Gender = factor(Gender), # 將性別轉換為因子
`BMI Category` = factor(`BMI Category`)) # 將 BMI 類別轉換為因子
model_data <- data %>%
select(Sleep_Disorder, Age, Gender, `Stress Level`, `Heart Rate`, `Daily Steps`, `Sleep Duration`, `Quality of Sleep`, `BMI Category`)
m1<- glm(Sleep_Disorder ~ Age + Gender + `Stress Level` + `Heart Rate` + `Daily Steps` + `Sleep Duration` + `Quality of Sleep` + `BMI Category`,
data = model_data,
family = binomial)
summary(m1)
##
## Call:
## glm(formula = Sleep_Disorder ~ Age + Gender + `Stress Level` +
## `Heart Rate` + `Daily Steps` + `Sleep Duration` + `Quality of Sleep` +
## `BMI Category`, family = binomial, data = model_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.622e+00 6.165e+00 0.425 0.67065
## Age 1.766e-01 5.601e-02 3.154 0.00161 **
## GenderMale -1.699e-01 6.493e-01 -0.262 0.79351
## `Stress Level` -4.657e-01 4.430e-01 -1.051 0.29320
## `Heart Rate` 3.613e-02 9.812e-02 0.368 0.71271
## `Daily Steps` 1.721e-04 1.565e-04 1.100 0.27137
## `Sleep Duration` 6.030e-02 7.542e-01 0.080 0.93627
## `Quality of Sleep` -1.811e+00 6.719e-01 -2.695 0.00703 **
## `BMI Category`Normal Weight 1.287e-01 9.332e-01 0.138 0.89026
## `BMI Category`Obese 1.906e+01 1.137e+03 0.017 0.98662
## `BMI Category`Overweight 2.206e+00 8.115e-01 2.719 0.00655 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 507.47 on 373 degrees of freedom
## Residual deviance: 202.38 on 363 degrees of freedom
## AIC: 224.38
##
## Number of Fisher Scoring iterations: 16
AIC為224.38
年齡 (Age)、睡眠質量 (Quality of Sleep)、以及 BMI 為 Overweight
是影響睡眠障礙的主要變數。
m2<- glm(Sleep_Disorder ~ (Age + Gender + `Stress Level` + `Heart Rate` + `Daily Steps` + `Sleep Duration` + `Quality of Sleep` + `BMI Category`)^2,
data = model_data,
family = binomial)
## Warning: glm.fit:擬合機率算出來是數值零或一
summary(m2)
##
## Call:
## glm(formula = Sleep_Disorder ~ (Age + Gender + `Stress Level` +
## `Heart Rate` + `Daily Steps` + `Sleep Duration` + `Quality of Sleep` +
## `BMI Category`)^2, family = binomial, data = model_data)
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error z value
## (Intercept) 6.107e+17 1.052e+10 58067770
## Age -4.470e+15 1.359e+08 -32895498
## GenderMale -3.104e+16 1.193e+09 -26018208
## `Stress Level` 1.622e+16 5.343e+08 30366615
## `Heart Rate` -1.024e+16 1.161e+08 -88192939
## `Daily Steps` -2.331e+13 5.336e+05 -43675824
## `Sleep Duration` -1.041e+17 1.305e+09 -79763414
## `Quality of Sleep` 7.012e+16 1.095e+09 64060574
## `BMI Category`Normal Weight -8.992e+16 3.062e+09 -29363604
## `BMI Category`Obese -1.000e+15 7.851e+09 -127388
## `BMI Category`Overweight -2.493e+16 2.814e+09 -8859376
## Age:GenderMale 6.578e+13 6.127e+06 10735123
## Age:`Stress Level` -8.043e+13 4.212e+06 -19096756
## Age:`Heart Rate` 6.797e+13 1.548e+06 43908654
## Age:`Daily Steps` -6.105e+09 2.495e+03 -2447261
## Age:`Sleep Duration` -2.554e+14 6.995e+06 -36513742
## Age:`Quality of Sleep` 2.613e+14 6.049e+06 43202540
## Age:`BMI Category`Normal Weight -8.330e+13 8.215e+06 -10138999
## Age:`BMI Category`Obese -7.944e+14 2.760e+07 -28780614
## Age:`BMI Category`Overweight -1.547e+14 1.103e+07 -14019958
## GenderMale:`Stress Level` -5.843e+13 5.599e+07 -1043589
## GenderMale:`Heart Rate` 5.457e+14 1.369e+07 39864744
## GenderMale:`Daily Steps` -2.090e+11 5.389e+04 -3878167
## GenderMale:`Sleep Duration` 3.398e+15 1.002e+08 33898487
## GenderMale:`Quality of Sleep` -4.232e+15 8.720e+07 -48528256
## GenderMale:`BMI Category`Normal Weight -3.964e+14 8.254e+07 -4802403
## GenderMale:`BMI Category`Obese -1.175e+16 1.881e+08 -62492391
## GenderMale:`BMI Category`Overweight -6.878e+15 1.019e+08 -67515029
## `Stress Level`:`Heart Rate` 7.687e+13 4.181e+06 18387900
## `Stress Level`:`Daily Steps` -6.100e+11 2.279e+04 -26766619
## `Stress Level`:`Sleep Duration` -4.850e+15 5.369e+07 -90343297
## `Stress Level`:`Quality of Sleep` 2.756e+15 3.299e+07 83559672
## `Stress Level`:`BMI Category`Normal Weight 3.109e+13 7.551e+07 411796
## `Stress Level`:`BMI Category`Obese -1.836e+15 1.930e+08 -9510505
## `Stress Level`:`BMI Category`Overweight 1.537e+15 8.597e+07 17872773
## `Heart Rate`:`Daily Steps` 2.846e+11 6.578e+03 43263793
## `Heart Rate`:`Sleep Duration` 1.690e+15 1.870e+07 90396015
## `Heart Rate`:`Quality of Sleep` -1.048e+15 1.110e+07 -94452286
## `Heart Rate`:`BMI Category`Normal Weight 1.647e+15 2.713e+07 60712245
## `Heart Rate`:`BMI Category`Obese 9.370e+14 7.030e+07 13327559
## `Heart Rate`:`BMI Category`Overweight 6.565e+14 2.946e+07 22287388
## `Daily Steps`:`Sleep Duration` 2.926e+12 2.672e+04 109510445
## `Daily Steps`:`Quality of Sleep` -1.632e+12 3.649e+04 -44726322
## `Daily Steps`:`BMI Category`Normal Weight -5.421e+11 4.347e+04 -12471461
## `Daily Steps`:`BMI Category`Obese 1.008e+12 6.897e+05 1461519
## `Daily Steps`:`BMI Category`Overweight -2.006e+12 5.659e+04 -35445375
## `Sleep Duration`:`Quality of Sleep` -5.328e+14 6.829e+07 -7800958
## `Sleep Duration`:`BMI Category`Normal Weight 6.518e+15 2.526e+08 25802640
## `Sleep Duration`:`BMI Category`Obese NA NA NA
## `Sleep Duration`:`BMI Category`Overweight 6.110e+15 1.088e+08 56144728
## `Quality of Sleep`:`BMI Category`Normal Weight -8.357e+15 1.408e+08 -59336916
## `Quality of Sleep`:`BMI Category`Obese NA NA NA
## `Quality of Sleep`:`BMI Category`Overweight -6.107e+15 1.713e+08 -35642973
## Pr(>|z|)
## (Intercept) <2e-16 ***
## Age <2e-16 ***
## GenderMale <2e-16 ***
## `Stress Level` <2e-16 ***
## `Heart Rate` <2e-16 ***
## `Daily Steps` <2e-16 ***
## `Sleep Duration` <2e-16 ***
## `Quality of Sleep` <2e-16 ***
## `BMI Category`Normal Weight <2e-16 ***
## `BMI Category`Obese <2e-16 ***
## `BMI Category`Overweight <2e-16 ***
## Age:GenderMale <2e-16 ***
## Age:`Stress Level` <2e-16 ***
## Age:`Heart Rate` <2e-16 ***
## Age:`Daily Steps` <2e-16 ***
## Age:`Sleep Duration` <2e-16 ***
## Age:`Quality of Sleep` <2e-16 ***
## Age:`BMI Category`Normal Weight <2e-16 ***
## Age:`BMI Category`Obese <2e-16 ***
## Age:`BMI Category`Overweight <2e-16 ***
## GenderMale:`Stress Level` <2e-16 ***
## GenderMale:`Heart Rate` <2e-16 ***
## GenderMale:`Daily Steps` <2e-16 ***
## GenderMale:`Sleep Duration` <2e-16 ***
## GenderMale:`Quality of Sleep` <2e-16 ***
## GenderMale:`BMI Category`Normal Weight <2e-16 ***
## GenderMale:`BMI Category`Obese <2e-16 ***
## GenderMale:`BMI Category`Overweight <2e-16 ***
## `Stress Level`:`Heart Rate` <2e-16 ***
## `Stress Level`:`Daily Steps` <2e-16 ***
## `Stress Level`:`Sleep Duration` <2e-16 ***
## `Stress Level`:`Quality of Sleep` <2e-16 ***
## `Stress Level`:`BMI Category`Normal Weight <2e-16 ***
## `Stress Level`:`BMI Category`Obese <2e-16 ***
## `Stress Level`:`BMI Category`Overweight <2e-16 ***
## `Heart Rate`:`Daily Steps` <2e-16 ***
## `Heart Rate`:`Sleep Duration` <2e-16 ***
## `Heart Rate`:`Quality of Sleep` <2e-16 ***
## `Heart Rate`:`BMI Category`Normal Weight <2e-16 ***
## `Heart Rate`:`BMI Category`Obese <2e-16 ***
## `Heart Rate`:`BMI Category`Overweight <2e-16 ***
## `Daily Steps`:`Sleep Duration` <2e-16 ***
## `Daily Steps`:`Quality of Sleep` <2e-16 ***
## `Daily Steps`:`BMI Category`Normal Weight <2e-16 ***
## `Daily Steps`:`BMI Category`Obese <2e-16 ***
## `Daily Steps`:`BMI Category`Overweight <2e-16 ***
## `Sleep Duration`:`Quality of Sleep` <2e-16 ***
## `Sleep Duration`:`BMI Category`Normal Weight <2e-16 ***
## `Sleep Duration`:`BMI Category`Obese NA
## `Sleep Duration`:`BMI Category`Overweight <2e-16 ***
## `Quality of Sleep`:`BMI Category`Normal Weight <2e-16 ***
## `Quality of Sleep`:`BMI Category`Obese NA
## `Quality of Sleep`:`BMI Category`Overweight <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 507.47 on 373 degrees of freedom
## Residual deviance: 1730.10 on 323 degrees of freedom
## AIC: 1832.1
##
## Number of Fisher Scoring iterations: 12
m3<-step(m1)
## Start: AIC=224.38
## Sleep_Disorder ~ Age + Gender + `Stress Level` + `Heart Rate` +
## `Daily Steps` + `Sleep Duration` + `Quality of Sleep` + `BMI Category`
##
## Df Deviance AIC
## - `Sleep Duration` 1 202.38 222.38
## - Gender 1 202.44 222.44
## - `Heart Rate` 1 202.51 222.51
## - `Stress Level` 1 203.45 223.45
## - `Daily Steps` 1 203.64 223.64
## <none> 202.38 224.38
## - `Quality of Sleep` 1 210.77 230.77
## - `BMI Category` 3 215.48 231.48
## - Age 1 213.68 233.68
##
## Step: AIC=222.38
## Sleep_Disorder ~ Age + Gender + `Stress Level` + `Heart Rate` +
## `Daily Steps` + `Quality of Sleep` + `BMI Category`
##
## Df Deviance AIC
## - Gender 1 202.44 220.44
## - `Heart Rate` 1 202.58 220.58
## - `Daily Steps` 1 203.64 221.64
## - `Stress Level` 1 203.80 221.80
## <none> 202.38 222.38
## - `Quality of Sleep` 1 211.47 229.47
## - Age 1 215.97 233.97
## - `BMI Category` 3 221.98 235.98
##
## Step: AIC=220.44
## Sleep_Disorder ~ Age + `Stress Level` + `Heart Rate` + `Daily Steps` +
## `Quality of Sleep` + `BMI Category`
##
## Df Deviance AIC
## - `Heart Rate` 1 202.69 218.69
## - `Daily Steps` 1 203.68 219.68
## <none> 202.44 220.44
## - `Stress Level` 1 204.74 220.74
## - `Quality of Sleep` 1 214.02 230.02
## - `BMI Category` 3 222.97 234.97
## - Age 1 220.63 236.63
##
## Step: AIC=218.69
## Sleep_Disorder ~ Age + `Stress Level` + `Daily Steps` + `Quality of Sleep` +
## `BMI Category`
##
## Df Deviance AIC
## - `Daily Steps` 1 203.76 217.76
## <none> 202.69 218.69
## - `Stress Level` 1 204.98 218.98
## - `Quality of Sleep` 1 214.17 228.17
## - Age 1 220.65 234.65
## - `BMI Category` 3 241.66 251.66
##
## Step: AIC=217.76
## Sleep_Disorder ~ Age + `Stress Level` + `Quality of Sleep` +
## `BMI Category`
##
## Df Deviance AIC
## - `Stress Level` 1 205.24 217.24
## <none> 203.76 217.76
## - `Quality of Sleep` 1 214.23 226.23
## - Age 1 221.43 233.43
## - `BMI Category` 3 241.93 249.93
##
## Step: AIC=217.24
## Sleep_Disorder ~ Age + `Quality of Sleep` + `BMI Category`
##
## Df Deviance AIC
## <none> 205.24 217.24
## - Age 1 221.50 231.50
## - `Quality of Sleep` 1 222.20 232.20
## - `BMI Category` 3 257.00 263.00
summary(m3)
##
## Call:
## glm(formula = Sleep_Disorder ~ Age + `Quality of Sleep` + `BMI Category`,
## family = binomial, data = model_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.88731 1.19552 -0.742 0.457965
## Age 0.15800 0.04118 3.837 0.000124 ***
## `Quality of Sleep` -1.05602 0.28015 -3.769 0.000164 ***
## `BMI Category`Normal Weight 0.75233 0.72263 1.041 0.297829
## `BMI Category`Obese 19.49590 1178.11022 0.017 0.986797
## `BMI Category`Overweight 2.77389 0.54659 5.075 3.88e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 507.47 on 373 degrees of freedom
## Residual deviance: 205.24 on 368 degrees of freedom
## AIC: 217.24
##
## Number of Fisher Scoring iterations: 16
最終模型公式為:
\[
\text{Sleep Disorder} = \beta_0 + \beta_1 \cdot \text{Age} + \beta_2
\cdot \text{Quality of Sleep} + \beta_3 \cdot \text{BMI Category}
\] 其中:
- \(\beta_0\)
Age 的係數;
- \(\beta_2\)
是 Quality of Sleep 的係數;
- \(\beta_3\)
是 BMI Category 的係數。
# 載入必要的套件
library(ggplot2)
library(dplyr)
# 繪製殘差圖
plot(fitted(m3), residuals(m3),
xlab = "Fitted Values",
ylab = "Residuals",
main = "Residuals vs Fitted Values")
abline(h = 0, col = "red", lty = 2)
# 提取標準化殘差
std_residuals <- rstandard(m3)
# 繪製 Q-Q 圖
qqnorm(std_residuals,
main = "Normal Q-Q Plot of Residuals")
qqline(std_residuals, col = "blue", lty = 2)
library(lmtest)
## 載入需要的套件:zoo
##
## 載入套件:'zoo'
## 下列物件被遮斷自 'package:base':
##
## as.Date, as.Date.numeric
bptest(m3)
##
## studentized Breusch-Pagan test
##
## data: m3
## BP = 8.4072, df = 5, p-value = 0.1352
BP值: 8.4072 ; 自由度 (df): 5; ;
p-value: 0.1352
p-value 大於
0.05:我們無法拒絕虛無假設,這意味著模型的殘差顯示出均齊性。
模型的殘差變異性是穩定的