1. Perform an appropriate hypothesis test to determine whether there is a statistical difference in mean sleep duration between those who are MetS+ and those who are MetS-. What can you conclude about the association between sleep duration and MetS? (5 points)
mets_sleep %>%
  group_by(MetS) %>%
  summarise(mean = mean(sleep, na.rm = T), 
            sd = sd(sleep, na.rm = T))
MetSmeansd
MetS+7.421.53
MetS-7.441.46
density <- mets_sleep %>%
  filter(!is.na(sleep)) %>%
  group_by(MetS) %>%
  mutate(m = mean(sleep, na.rm=T)) %>%
  #ungroup %>% 
  ggplot(aes(x=sleep)) +
  geom_density(aes(fill=MetS, linetype=MetS),
               color="darkgray", 
               linewidth=.5,
               alpha=.2) +
  geom_vline(aes(xintercept = m, color=MetS), linetype=2, size=.5) +
  xlab("Hours of Sleep") +
  ylab(NULL)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
density

The distribution is not normal (left-skewed).

density +
  scale_x_continuous(trans = "log")

Log transformation makes the distribution even more skewed.

The SD and mean are pretty close. I would guess that the variances are equal, but I will run the Levene Test to verify.

# Perform levene test to determine whether variance of the distributions are the same.
variance = leveneTest(sleep ~ MetS, mets_sleep)

print(variance)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)  
## group    1  3.7178 0.05395 .
##       2417                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-value of the levene test of homogeneity = 0.054, which is >0.05. Hence, the variance of the two groups is equal.

#perform t-test with equal variances

mets_sleep %>%
  mutate(MetS = relevel(MetS, ref="MetS-")) %>%
  ungroup() %>%
  t_test(sleep ~ MetS, detailed=TRUE, var.equal=T)
estimateestimate1estimate2.y.group1group2n1n2statisticpdfconf.lowconf.highmethodalternative
0.02137.447.42sleepMetS-MetS+15149050.3410.7332.42e+03-0.1010.144T-testtwo.sided

***We are unable to reject the null hypothesis, and it appears that the mean of sleep duration is the same in both the MetS+ and MetS- groups.

Since the assumption of normality of the distribution is violated, I will also run the Mann-Whitney test.**

wilcox.test(sleep ~ MetS, data = mets_sleep)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  sleep by MetS
## W = 684585, p-value = 0.9758
## alternative hypothesis: true location shift is not equal to 0

Again, we cannot reject the null and we are fairly confident that the difference of the means is 0, and that the average sleep duration of those presenting with evidence of Metabolic Syndrome is no different from the average sleep duration of those who do not present with it.

  1. Estimate the odds ratio of the association between MetS and sex, and compute the 95% confidence interval. Briefly state your conclusions. (5 points)
mets_sleep %>%
  mutate(MetS = relevel(MetS, ref="MetS-"),
         sex = relevel(sex, ref="Male"),
         race = relevel(race, ref="White")) %>%
  contingency(MetS ~ sex)
##          Outcome
## Predictor MetS+ MetS-
##    Female   501   739
##    Male     404   775
## 
##              Outcome +    Outcome -      Total                 Inc risk *
## Exposed +          501          739       1240     40.40 (37.66 to 43.19)
## Exposed -          404          775       1179     34.27 (31.56 to 37.05)
## Total              905         1514       2419     37.41 (35.48 to 39.38)
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc risk ratio                                 1.18 (1.06, 1.31)
## Inc odds ratio                                 1.30 (1.10, 1.53)
## Attrib risk in the exposed *                   6.14 (2.29, 9.98)
## Attrib fraction in the exposed (%)            15.19 (5.89, 23.57)
## Attrib risk in the population *                3.15 (-0.18, 6.47)
## Attrib fraction in the population (%)         8.41 (2.97, 13.54)
## -------------------------------------------------------------------
## Uncorrected chi2 test that OR = 1: chi2(1) = 9.721 Pr>chi2 = 0.002
## Fisher exact test that OR = 1: Pr>chi2 = 0.002
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 population units 
## 
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  dat
## X-squared = 9.4603, df = 1, p-value = 0.0021

Females have a 30% greater odds of being MetS+ than males (OR=1.30, 95% CI = 1.10, 1.53).

  1. Test whether there is an association between MetS and sex using a chi-squared test of independence. State your conclusions and describe whether they match the conclusions in the previous question. (5 points)

As also revealed in the outcome of the contingency procedure run in the previous question…. The results remain the same.

chisq.test(mets_sleep$MetS, mets_sleep$sex)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  mets_sleep$MetS and mets_sleep$sex
## X-squared = 9.4603, df = 1, p-value = 0.0021
  1. Perform logistic regression to model the relationship between MetS and sleep duration where (i) sleep duration is treated as categorical and (ii) sleep duration is treated as numeric. For categorical sleep duration, round the sleep variable to the nearest hour and convert to factor.

**First the logistic model with sleep as a continuous variable

logistic_model1 <- glm(MetS ~ sleep, family=binomial (link='logit'), data = regression_relevel)

anova(logistic_model1, test="Chisq")
DfDevianceResid. DfResid. DevPr(>Chi)
    24183.2e+03    
10.11624173.2e+030.733
summary(logistic_model1)
## 
## Call:
## glm(formula = MetS ~ sleep, family = binomial(link = "logit"), 
##     data = regression_relevel)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -0.443134   0.213692  -2.074   0.0381 *
## sleep       -0.009612   0.028201  -0.341   0.7332  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3198.5  on 2418  degrees of freedom
## Residual deviance: 3198.3  on 2417  degrees of freedom
## AIC: 3202.3
## 
## Number of Fisher Scoring iterations: 4

** Then with sleep as a Factor**

logistic_model2 <- glm(MetS ~ sleep_factor, family=binomial (link='logit'), data = regression_relevel)

anova(logistic_model2, test="Chisq")
DfDevianceResid. DfResid. DevPr(>Chi)
   24183.2e+03     
89.4224103.19e+030.308
summary(logistic_model2)
## 
## Call:
## glm(formula = MetS ~ sleep_factor, family = binomial(link = "logit"), 
##     data = regression_relevel)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.62357    0.07097  -8.787   <2e-16 ***
## sleep_factor2   0.06395    0.63079   0.101   0.9192    
## sleep_factor3   0.40043    0.47962   0.835   0.4038    
## sleep_factor4   0.41226    0.24137   1.708   0.0876 .  
## sleep_factor5   0.23410    0.21207   1.104   0.2696    
## sleep_factor6   0.11754    0.12097   0.972   0.3312    
## sleep_factor7   0.04908    0.12636   0.388   0.6977    
## sleep_factor9   0.35391    0.13796   2.565   0.0103 *  
## sleep_factor10  0.10227    0.16127   0.634   0.5260    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3198.5  on 2418  degrees of freedom
## Residual deviance: 3189.0  on 2410  degrees of freedom
## AIC: 3207
## 
## Number of Fisher Scoring iterations: 4
  1. Tabulate the estimated ORs, p-values, and 95% Cis, and present your results in a single table. (5 points)
tab_or <- tab_model(logistic_model1, logistic_model2)
tab_or
  Met S Met S
Predictors Odds Ratios CI p Odds Ratios CI p
(Intercept) 0.64 0.42 – 0.97 0.038 0.54 0.47 – 0.62 <0.001
Sleep hours-weekdays or
workdays
0.99 0.94 – 1.05 0.733
sleep factor: Sleep
hours-weekdays or
workdays: sleep factor 2
1.07 0.28 – 3.56 0.919
sleep factor: Sleep
hours-weekdays or
workdays: sleep factor 3
1.49 0.56 – 3.82 0.404
sleep factor: Sleep
hours-weekdays or
workdays: sleep factor 4
1.51 0.94 – 2.42 0.088
sleep factor: Sleep
hours-weekdays or
workdays: sleep factor 5
1.26 0.83 – 1.91 0.270
sleep factor: Sleep
hours-weekdays or
workdays: sleep factor 6
1.12 0.89 – 1.42 0.331
sleep factor: Sleep
hours-weekdays or
workdays: sleep factor 7
1.05 0.82 – 1.34 0.698
sleep factor: Sleep
hours-weekdays or
workdays: sleep factor 9
1.42 1.09 – 1.87 0.010
sleep factor: Sleep
hours-weekdays or
workdays: sleep factor 10
1.11 0.81 – 1.52 0.526
Observations 2419 2419
R2 Tjur 0.000 0.004
  1. Compute the predicted proportion of MetS+ for the different hours of sleep duration based on each model above. Plot these estimated values as points and add error bars to denote the 95% CI. Plot each model values separately. Improve the appearance of the plots. (5 points)
glm_plot1 <- plot_model(logistic_model1, type="pred")
glm_plot1

glm_plot2 <- plot_model(logistic_model2, type="pred")

glm_plot2

  1. Bonus: Plot both model-predicted values and error bars in a single figure, using different colors for each model. (5 points)
  1. Repeat the above logistic models, adjusting for the variables in Table 2 of Smiley et al. (not including Sitting). Tabulate the estimated odds ratios, p-values, and 95% CI. Display the results of all four models in a single table. Comment on whether the model covariates are potential confounders. (5 points)
logistic_model3 <- glm(MetS ~ sleep_factor + age_scale, family=binomial (link='logit'), data = regression_relevel)

logistic_model4 <- glm(MetS ~ sleep + age_scale, family=binomial (link='logit'), data = regression_relevel)

logistic_model5 <- glm(MetS ~ sleep_factor + sex, family=binomial (link='logit'), data = regression_relevel)

logistic_model6 <- glm(MetS ~ sleep + sex, family=binomial (link='logit'), data = regression_relevel)
tab_or2 <- tab_model(logistic_model3,
                     logistic_model4,
                     logistic_model5,
                     logistic_model6)

tab_or2
  Met S Met S Met S Met S
Predictors Odds Ratios CI p Odds Ratios CI p Odds Ratios CI p Odds Ratios CI p
(Intercept) 0.23 0.19 – 0.29 <0.001 0.32 0.21 – 0.50 <0.001 0.46 0.39 – 0.54 <0.001 0.58 0.38 – 0.89 0.012
Sleep hours-weekdays or
workdays: sleep factor:
sleep factor 2
1.06 0.27 – 3.64 0.922 1.02 0.26 – 3.40 0.980
Sleep hours-weekdays or
workdays: sleep factor:
sleep factor 3
1.48 0.55 – 3.88 0.424 1.45 0.55 – 3.72 0.441
Sleep hours-weekdays or
workdays: sleep factor:
sleep factor 4
1.52 0.93 – 2.46 0.091 1.53 0.95 – 2.46 0.078
Sleep hours-weekdays or
workdays: sleep factor:
sleep factor 5
1.24 0.81 – 1.90 0.314 1.32 0.86 – 2.00 0.194
Sleep hours-weekdays or
workdays: sleep factor:
sleep factor 6
1.13 0.89 – 1.44 0.306 1.15 0.91 – 1.46 0.242
Sleep hours-weekdays or
workdays: sleep factor:
sleep factor 7
1.04 0.81 – 1.34 0.735 1.07 0.83 – 1.37 0.587
Sleep hours-weekdays or
workdays: sleep factor:
sleep factor 9
1.34 1.01 – 1.77 0.040 1.43 1.09 – 1.88 0.009
Sleep hours-weekdays or
workdays: sleep factor:
sleep factor 10
0.94 0.68 – 1.30 0.707 1.10 0.80 – 1.51 0.537
Age in years at screening 1.03 1.02 – 1.03 <0.001 1.03 1.02 – 1.03 <0.001
Sleep hours-weekdays or
workdays
0.97 0.92 – 1.03 0.271 0.99 0.93 – 1.04 0.598
sex: Female 1.31 1.11 – 1.55 0.001 1.30 1.11 – 1.54 0.002
Observations 2419 2419 2419 2419
R2 Tjur 0.048 0.045 0.008 0.004

None of the co-variates were greater than 10% different from the covariates in the crude model. It is unlikely that any of the covariates are confounders.

  1. Use linear regression to assess the relationship between Systolic Blood Pressure (SBP) and sleep duration (numeric), adjusting for the demographic variables in Table 1 (sex, ethnicity, marital status, education) and fasting glucose.
summary(lm(sleep~sys_scale, data=regression_relevel))
## 
## Call:
## lm(formula = sleep ~ sys_scale, data = regression_relevel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.4470 -0.9315  0.0666  1.0573  3.0900 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.43489    0.03028 245.564   <2e-16 ***
## sys_scale   -0.01291    0.03028  -0.426     0.67    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.489 on 2417 degrees of freedom
## Multiple R-squared:  7.518e-05,  Adjusted R-squared:  -0.0003385 
## F-statistic: 0.1817 on 1 and 2417 DF,  p-value: 0.6699
model0 <- lm(sleep ~ sys_scale, data = regression_relevel)
anova(model0)
DfSum SqMean SqF valuePr(>F)
10.403   0.4030.1820.67
24175.36e+032.22        
summary(model0)
## 
## Call:
## lm(formula = sleep ~ sys_scale, data = regression_relevel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.4470 -0.9315  0.0666  1.0573  3.0900 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.43489    0.03028 245.564   <2e-16 ***
## sys_scale   -0.01291    0.03028  -0.426     0.67    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.489 on 2417 degrees of freedom
## Multiple R-squared:  7.518e-05,  Adjusted R-squared:  -0.0003385 
## F-statistic: 0.1817 on 1 and 2417 DF,  p-value: 0.6699
model1 <- lm(sleep ~ sys_scale + sex + race + marital + education + glucose_scale, data = regression_relevel)
anova(model1)
DfSum SqMean SqF valuePr(>F)
10.403   0.4030.1850.667   
118.8     18.8  8.62 0.00335 
477.3     19.3  8.89 4.04e-07
323.5     7.83 3.6  0.013   
49.4     2.35 1.08 0.365   
11.98    1.98 0.9110.34    
24035.23e+032.18            
summary(model1)
## 
## Call:
## lm(formula = sleep ~ sys_scale + sex + race + marital + education + 
##     glucose_scale, data = regression_relevel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.5920 -0.8512  0.0532  0.9757  3.4725 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                            7.633859   0.126799  60.205  < 2e-16 ***
## sys_scale                             -0.001734   0.031629  -0.055  0.95628    
## sexFemale                              0.148365   0.061324   2.419  0.01562 *  
## raceHispanic                          -0.162037   0.088011  -1.841  0.06573 .  
## raceBlack                             -0.463822   0.083033  -5.586 2.59e-08 ***
## raceAsian                             -0.017238   0.099171  -0.174  0.86202    
## raceMultiracial (Other)               -0.161546   0.139862  -1.155  0.24819    
## maritalMarried or Living with partner -0.022851   0.084023  -0.272  0.78567    
## maritalWidowed                         0.386489   0.137287   2.815  0.00491 ** 
## maritalDivorced or Separated          -0.002916   0.105602  -0.028  0.97798    
## education< 9th grade                  -0.226687   0.140144  -1.618  0.10589    
## educationHigh School                  -0.186037   0.107109  -1.737  0.08254 .  
## educationSome College                 -0.141982   0.102905  -1.380  0.16780    
## education>= College Graduate          -0.094366   0.110710  -0.852  0.39409    
## glucose_scale                         -0.029337   0.030743  -0.954  0.34005    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.475 on 2403 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.02451,    Adjusted R-squared:  0.01883 
## F-statistic: 4.313 on 14 and 2403 DF,  p-value: 1.282e-07

Sex, Race and Marital Status are all significant and could have an effect on the relationship between sBP and sleep duration. I will run a model testing effect modification of those particular covariates.

model2 <- lm(sleep ~ sys_scale + sex + race + marital + race*sys_scale + sex*sys_scale + marital*sys_scale, data = regression_relevel)
anova(model2)
DfSum SqMean SqF valuePr(>F)
10.403   0.4030.1860.667   
118.7     18.7  8.59 0.0034  
476.9     19.2  8.85 4.29e-07
323.5     7.83 3.61 0.0129  
45.32    1.33 0.6130.654   
10.329   0.3290.1520.697   
323.4     7.8  3.59 0.0131  
24015.21e+032.17            
summary(model2)
## 
## Call:
## lm(formula = sleep ~ sys_scale + sex + race + marital + race * 
##     sys_scale + sex * sys_scale + marital * sys_scale, data = regression_relevel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6689 -0.8079  0.0553  0.9905  3.4221 
## 
## Coefficients:
##                                                 Estimate Std. Error t value
## (Intercept)                                      7.46898    0.09412  79.355
## sys_scale                                       -0.16465    0.10205  -1.613
## sexFemale                                        0.14890    0.06114   2.435
## raceHispanic                                    -0.17711    0.08196  -2.161
## raceBlack                                       -0.46967    0.08345  -5.628
## raceAsian                                       -0.02056    0.09612  -0.214
## raceMultiracial (Other)                         -0.18053    0.14040  -1.286
## maritalMarried or Living with partner            0.01413    0.08515   0.166
## maritalWidowed                                   0.27108    0.15007   1.806
## maritalDivorced or Separated                     0.02356    0.10615   0.222
## sys_scale:raceHispanic                          -0.02799    0.08808  -0.318
## sys_scale:raceBlack                              0.04121    0.08138   0.506
## sys_scale:raceAsian                             -0.08797    0.10026  -0.877
## sys_scale:raceMultiracial (Other)               -0.14408    0.14266  -1.010
## sys_scale:sexFemale                             -0.02630    0.06277  -0.419
## sys_scale:maritalMarried or Living with partner  0.20708    0.08631   2.399
## sys_scale:maritalWidowed                         0.38889    0.12584   3.090
## sys_scale:maritalDivorced or Separated           0.21374    0.10513   2.033
##                                                 Pr(>|t|)    
## (Intercept)                                      < 2e-16 ***
## sys_scale                                        0.10677    
## sexFemale                                        0.01495 *  
## raceHispanic                                     0.03079 *  
## raceBlack                                       2.03e-08 ***
## raceAsian                                        0.83063    
## raceMultiracial (Other)                          0.19862    
## maritalMarried or Living with partner            0.86825    
## maritalWidowed                                   0.07099 .  
## maritalDivorced or Separated                     0.82440    
## sys_scale:raceHispanic                           0.75071    
## sys_scale:raceBlack                              0.61266    
## sys_scale:raceAsian                              0.38038    
## sys_scale:raceMultiracial (Other)                0.31260    
## sys_scale:sexFemale                              0.67528    
## sys_scale:maritalMarried or Living with partner  0.01650 *  
## sys_scale:maritalWidowed                         0.00202 ** 
## sys_scale:maritalDivorced or Separated           0.04216 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.473 on 2401 degrees of freedom
## Multiple R-squared:  0.02769,    Adjusted R-squared:  0.02081 
## F-statistic: 4.023 on 17 and 2401 DF,  p-value: 5.467e-08

Marital status appears to have significant effects modification on the relationship between systolic blood pressure and sleep duration. Age and race do not.

model3 <- lm(sleep ~ sys_scale + sex + race + marital + education + glucose_scale + marital*sys_scale, data = regression_relevel)
anova(model3)
DfSum SqMean SqF valuePr(>F)
10.403   0.4030.1860.667   
118.8     18.8  8.65 0.0033  
477.3     19.3  8.91 3.85e-07
323.5     7.83 3.61 0.0128  
49.4     2.35 1.08 0.363   
11.98    1.98 0.9130.339   
321.9     7.3  3.37 0.0179  
24005.21e+032.17            
summary(model3)
## 
## Call:
## lm(formula = sleep ~ sys_scale + sex + race + marital + education + 
##     glucose_scale + marital * sys_scale, data = regression_relevel)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6175 -0.8189  0.0717  0.9720  3.4571 
## 
## Coefficients:
##                                                 Estimate Std. Error t value
## (Intercept)                                      7.59503    0.12781  59.426
## sys_scale                                       -0.17611    0.07211  -2.442
## sexFemale                                        0.15192    0.06131   2.478
## raceHispanic                                    -0.15996    0.08801  -1.818
## raceBlack                                       -0.46366    0.08310  -5.579
## raceAsian                                       -0.03034    0.09918  -0.306
## raceMultiracial (Other)                         -0.16614    0.13969  -1.189
## maritalMarried or Living with partner            0.01893    0.08547   0.222
## maritalWidowed                                   0.28872    0.15169   1.903
## maritalDivorced or Separated                     0.03087    0.10668   0.289
## education< 9th grade                            -0.22784    0.13997  -1.628
## educationHigh School                            -0.18765    0.10702  -1.753
## educationSome College                           -0.14254    0.10279  -1.387
## education>= College Graduate                    -0.09883    0.11063  -0.893
## glucose_scale                                   -0.02587    0.03077  -0.841
## sys_scale:maritalMarried or Living with partner  0.18730    0.08282   2.262
## sys_scale:maritalWidowed                         0.37493    0.12444   3.013
## sys_scale:maritalDivorced or Separated           0.20909    0.10454   2.000
##                                                 Pr(>|t|)    
## (Intercept)                                      < 2e-16 ***
## sys_scale                                        0.01467 *  
## sexFemale                                        0.01329 *  
## raceHispanic                                     0.06925 .  
## raceBlack                                       2.69e-08 ***
## raceAsian                                        0.75969    
## raceMultiracial (Other)                          0.23442    
## maritalMarried or Living with partner            0.82472    
## maritalWidowed                                   0.05712 .  
## maritalDivorced or Separated                     0.77230    
## education< 9th grade                             0.10371    
## educationHigh School                             0.07966 .  
## educationSome College                            0.16567    
## education>= College Graduate                     0.37174    
## glucose_scale                                    0.40049    
## sys_scale:maritalMarried or Living with partner  0.02381 *  
## sys_scale:maritalWidowed                         0.00261 ** 
## sys_scale:maritalDivorced or Separated           0.04561 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.473 on 2400 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0286, Adjusted R-squared:  0.02172 
## F-statistic: 4.156 on 17 and 2400 DF,  p-value: 2.29e-08
  1. Assess the model assumption of normality of residuals using appropriate plots. What can you conclude? (5 points)
ggplot() +
  geom_qq(aes(sample = rstandard(model3))) +
  geom_abline(color = "red") +
  coord_fixed()+
  ggtitle("Q-Q Residuals")+
  labs(x="Standardized Residuals",
       y="Theoretical Quantiles")

The assumption of normality is violated.

  1. Tabulate model estimates, confidence intervals, and p-values. Briefly describe the adjusted relationship between SBP and sleep duration and comment on its statistical significance. (5 points)
tab_model(model0, model1, model2, model3)
  Sleep hours-weekdays or
workdays
Sleep hours-weekdays or
workdays
Sleep hours-weekdays or
workdays
Sleep hours-weekdays or
workdays
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 7.43 7.38 – 7.49 <0.001 7.63 7.39 – 7.88 <0.001 7.47 7.28 – 7.65 <0.001 7.60 7.34 – 7.85 <0.001
sys scale -0.01 -0.07 – 0.05 0.670 -0.00 -0.06 – 0.06 0.956 -0.16 -0.36 – 0.04 0.107 -0.18 -0.32 – -0.03 0.015
sex [Female] 0.15 0.03 – 0.27 0.016 0.15 0.03 – 0.27 0.015 0.15 0.03 – 0.27 0.013
race [Hispanic] -0.16 -0.33 – 0.01 0.066 -0.18 -0.34 – -0.02 0.031 -0.16 -0.33 – 0.01 0.069
race [Black] -0.46 -0.63 – -0.30 <0.001 -0.47 -0.63 – -0.31 <0.001 -0.46 -0.63 – -0.30 <0.001
race [Asian] -0.02 -0.21 – 0.18 0.862 -0.02 -0.21 – 0.17 0.831 -0.03 -0.22 – 0.16 0.760
race [Multiracial
(Other)]
-0.16 -0.44 – 0.11 0.248 -0.18 -0.46 – 0.09 0.199 -0.17 -0.44 – 0.11 0.234
marital [Married or
Living with partner]
-0.02 -0.19 – 0.14 0.786 0.01 -0.15 – 0.18 0.868 0.02 -0.15 – 0.19 0.825
marital [Widowed] 0.39 0.12 – 0.66 0.005 0.27 -0.02 – 0.57 0.071 0.29 -0.01 – 0.59 0.057
marital [Divorced or
Separated]
-0.00 -0.21 – 0.20 0.978 0.02 -0.18 – 0.23 0.824 0.03 -0.18 – 0.24 0.772
education [< 9th grade] -0.23 -0.50 – 0.05 0.106 -0.23 -0.50 – 0.05 0.104
education [High School] -0.19 -0.40 – 0.02 0.083 -0.19 -0.40 – 0.02 0.080
education [Some College] -0.14 -0.34 – 0.06 0.168 -0.14 -0.34 – 0.06 0.166
education [>= College
Graduate]
-0.09 -0.31 – 0.12 0.394 -0.10 -0.32 – 0.12 0.372
glucose scale -0.03 -0.09 – 0.03 0.340 -0.03 -0.09 – 0.03 0.400
sys scale × race
[Hispanic]
-0.03 -0.20 – 0.14 0.751
sys scale × race [Black] 0.04 -0.12 – 0.20 0.613
sys scale × race [Asian] -0.09 -0.28 – 0.11 0.380
sys scale × race
[Multiracial (Other)]
-0.14 -0.42 – 0.14 0.313
sys scale × sex [Female] -0.03 -0.15 – 0.10 0.675
sys scale × marital
[Married or Living with
partner]
0.21 0.04 – 0.38 0.016 0.19 0.02 – 0.35 0.024
sys scale × marital
[Widowed]
0.39 0.14 – 0.64 0.002 0.37 0.13 – 0.62 0.003
sys scale × marital
[Divorced or Separated]
0.21 0.01 – 0.42 0.042 0.21 0.00 – 0.41 0.046
Observations 2419 2418 2419 2418
R2 / R2 adjusted 0.000 / -0.000 0.025 / 0.019 0.028 / 0.021 0.029 / 0.022

When the model is adjusted for the effects modification of marital status on systolic blood pressure, systolic blood pressure becomes a significant predictor of sleep duration.