In this manuscript, we’re going to conduct an analysis into the ELSI Dataset. This is a modified version for usage in the Principles and Practice of Clinical Research program of Harvard T.H. Chan School of Public Health.
The aim here is to replicate the analysis done in a previous paper and see if the results also apply to the sample that was included in the ELSI Brazil.
First order of business is to download the dataset we’re going to be analyzing, which was kindly provided by PPCR staff.
## Loading the necessary libraries
library(tidyverse)
library(haven)
library(broom)
library(ggplot2)
library(gtsummary)
## Url provided by PPCR staff
url <- 'https://ryver-prd-files.s3.us-west-2.amazonaws.com/tnt45405/T9gKrWnaRr5K0yT/FINAL%20-%20ELSI_students.dta'
## Using the read_dta function from haven to read the .dta file created in STATA
PPCRdata <- read_dta(url)
## Removing the url from memory as it is no longer useful
rm(url)
## Transforming to tibble in order to increase readability
PPCRdata <- as_tibble(PPCRdata)
PPCRdata
## # A tibble: 6,974 × 121
## id2 psu calibrated_weight stratum region area hometype staircase
## <dbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl+l> <dbl+lb> <dbl+lbl>
## 1 20200001 21 3111. 30 1 [North] 1 [Urb… 1 [Hous… 0 [No]
## 2 20200002 21 1555. 30 1 [North] 1 [Urb… 1 [Hous… 0 [No]
## 3 20200003 21 1147. 30 1 [North] 1 [Urb… 1 [Hous… 0 [No]
## 4 20200004 21 1523. 30 1 [North] 1 [Urb… 1 [Hous… 0 [No]
## 5 20200007 21 2563. 30 1 [North] 1 [Urb… 1 [Hous… 0 [No]
## 6 20200009 21 5127. 30 1 [North] 1 [Urb… 1 [Hous… 0 [No]
## 7 20200012 21 918. 30 1 [North] 1 [Urb… 1 [Hous… 1 [Yes]
## 8 20200013 21 757. 30 1 [North] 1 [Urb… 1 [Hous… 1 [Yes]
## 9 20200014 21 1027. 30 1 [North] 1 [Urb… 1 [Hous… 0 [No]
## 10 20200015 21 1005. 30 1 [North] 1 [Urb… 1 [Hous… 0 [No]
## # ℹ 6,964 more rows
## # ℹ 113 more variables: sex <dbl+lbl>, age <dbl+lbl>, marital <dbl+lbl>,
## # race <dbl+lbl>, children <dbl+lbl>, grandchildren <dbl+lbl>,
## # siblings <dbl+lbl>, education <dbl+lbl>, dis_med <dbl+lbl>,
## # dis_social <dbl+lbl>, dis_work <dbl+lbl>, dis_family <dbl+lbl>,
## # dis_place <dbl+lbl>, work_12 <dbl+lbl>, workability <dbl+lbl>,
## # retired <dbl+lbl>, reasonretire <dbl+lbl>, incomescore <dbl+lbl>, …
PPCR Staff has described the variables included in this dataset with
the following:
Let’s now look at the some of the individual variables to understand what they mean
unique(PPCRdata$pain)
## <labelled<double>[2]>: Do you feel pain that bothers you frequently?
## [1] 1 0
##
## Labels:
## value label
## 0 No
## 1 Yes
## 8 Does not apply
## 9 Didn’t know/didn’t answer
unique(PPCRdata$p_intensity)
## <labelled<double>[4]>: How intense is this pain most of the time?
## [1] 2 NA 3 1
##
## Labels:
## value label
## 1 Soft/weak
## 2 Moderate
## 3 Intense/strong
## 8 Does not apply
## 9 Didn’t know/didn’t answer
unique(PPCRdata$sleepquality)
## <labelled<double>[6]>: How would you evaluate the quality of your sleep?
## [1] 4 2 3 1 5 NA
##
## Labels:
## value label
## 1 Very good
## 2 Good
## 3 Fair
## 4 Bad
## 5 Very bad
## 9 Didn’t know/didn’t answer
unique(PPCRdata$sleepproblems)
## <labelled<double>[3]>: Has a doctor ever told you that you have sleep problems?
## [1] 0 1 NA
##
## Labels:
## value label
## 0 No
## 1 Yes
## 8 Does not apply
## 9 Didn’t know/didn’t answer
We can then see that pain and sleepproblems are binary variables, while p_intensity and sleepquality are ordinal variables, although they aren’t coded in this manner yet.
Now let’s transform p_intensity and sleepquality into categorical variables and pain and sleepproblems into dichotomous variables, as they were read as numerical variables with labels.
## Using the as_factor function from haven to transform the variables
## with the labels as names for the levels in the factors
PPCRdata$sleepquality <- as_factor(PPCRdata$sleepquality)
PPCRdata$p_intensity <- as_factor(PPCRdata$p_intensity)
PPCRdata$pain <- as_factor(PPCRdata$pain)
PPCRdata$sleepproblems <- as_factor(PPCRdata$sleepproblems)
levels(PPCRdata$sleepquality)
## [1] "Very good" "Good"
## [3] "Fair" "Bad"
## [5] "Very bad" "Didn’t know/didn’t answer"
## Reversing the order of factors, as initially the order was wrong in sleepquality
PPCRdata$sleepquality <- fct_rev(PPCRdata$sleepquality)
levels(PPCRdata$sleepquality)
## [1] "Didn’t know/didn’t answer" "Very bad"
## [3] "Bad" "Fair"
## [5] "Good" "Very good"
## doing the same with genhealth as it will be used for a sensitivity analysis
# adjusting the genhealth variable
PPCRdata %>% mutate(genhealth = fct_rev(as_factor(genhealth))) -> PPCRdata
Other variables that we’re going to use in our model that haven’t been coded properly as well include: depression, alcohol, fracture, cancer, diabetes, dementia, arthritis, heartdisease and lungdisease.
## Showing that the variables are coded as numerical variables
unique(PPCRdata$depression)
## <labelled<double>[3]>: Has a doctor ever told you that you have depression?
## [1] 1 0 NA
##
## Labels:
## value label
## 0 No
## 1 Yes
## 9 Didn’t know/didn’t answer
## Commented the other variables for brevity, but they were all numerical
## unique(PPCRdata$sex)
## unique(PPCRdata$alcohol)
## unique(PPCRdata$fracture)
## unique(PPCRdata$cancer)
## unique(PPCRdata$diabetes)
## unique(PPCRdata$dementia)
## unique(PPCRdata$arthritis)
## unique(PPCRdata$heartdisease)
## unique(PPCRdata$lungdisease)
## Using the as_factor function from haven to transform the variables
## with the labels as names for the levels in the factors
PPCRdata$depression <- as_factor(PPCRdata$depression)
PPCRdata$sex <- as_factor(PPCRdata$sex)
PPCRdata$alcohol <- as_factor(PPCRdata$alcohol)
PPCRdata$fracture <- as_factor(PPCRdata$fracture)
PPCRdata$cancer <- as_factor(PPCRdata$cancer)
PPCRdata$diabetes <- as_factor(PPCRdata$diabetes)
PPCRdata$dementia <- as_factor(PPCRdata$dementia)
PPCRdata$arthritis <- as_factor(PPCRdata$arthritis)
PPCRdata$heartdisease <- as_factor(PPCRdata$heartdisease)
PPCRdata$lungdisease <- as_factor(PPCRdata$lungdisease)
## Converting age to numeric
PPCRdata$age <- as.numeric(PPCRdata$age)
Let’s now create a table summarizing all the information we have collected.
## Table 1
## divided by pain
## selecting all the variables that will be used
PPCRdata %>% dplyr::select(age,bmi,sex,pain,p_intensity,sleepquality,
sleepproblems,depression,alcohol,fracture,cancer,
diabetes,dementia,arthritis,heartdisease,
lungdisease) %>%
## recoding the variables to not have levels that aren't used, so
## that they don't clutter the table
mutate(pain = recode(droplevels(pain), No = "No Pain", Yes = "Pain"),
p_intensity = droplevels(p_intensity),
sleepquality = droplevels(sleepquality),
depression = droplevels(depression),
alcohol = droplevels(alcohol),
diabetes = droplevels(diabetes),
dementia = droplevels(dementia),
arthritis = droplevels(arthritis),
cancer = droplevels(cancer),
sleepproblems = droplevels(sleepproblems)) %>%
## determining how each type of variable will be presented
## continous as mean and sd and categorical as absolute relative
## frequencies
tbl_summary(statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"),
## not showing NA's in each variable
missing = "no",
## dividing the observations by pain
by = pain,
## presenting the variables better
label = list(p_intensity ~ "Pain Intensity",
sleepquality ~ "Sleep Quality",
depression ~ "Depression",
alcohol ~ "Alcohol Consumption",
diabetes ~ "Diabetes Mellitus",
dementia ~ "Dementia",
arthritis ~ "Arthritis",
cancer ~ "Cancer",
sleepproblems ~ "Sleep Problems",
age ~ "Age",
bmi ~ "Body Mass Index",
fracture ~ "Fracture of Hip or Wrist",
heartdisease ~ "Heart Disease",
lungdisease ~ "Lung Disease")) %>%
## title of the variable column
modify_header(label = "**Variable**") %>%
## title of the table
modify_caption("Subjects baseline characteristics") %>%
bold_labels() %>%
## since we're not showing NA's in each variable, showing the
## N for each specific variable
add_n() %>%
## adding a column for overall observations, not divided by pain
add_overall(last = TRUE) -> table1
## checking if table has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/tables/table1.docx")){
## if it hasn't, saving the table for publication
table1 %>%
as_gt() %>%
gt::gtsave(filename = "table1.docx", path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/tables")
}
## printing the table
table1
| Variable | N | No Pain, N = 4,3911 | Pain, N = 2,5831 | Overall, N = 6,9741 |
|---|---|---|---|---|
| Age | 6,974 | 66 (9) | 65 (9) | 66 (9) |
| Body Mass Index | 6,850 | 27.5 (4.9) | 28.6 (5.7) | 27.9 (5.2) |
| Sex | 6,974 | |||
| Female | 2,462 (56%) | 1,766 (68%) | 4,228 (61%) | |
| Male | 1,929 (44%) | 817 (32%) | 2,746 (39%) | |
| Pain Intensity | 2,583 | |||
| Soft/weak | 0 (NA%) | 318 (12%) | 318 (12%) | |
| Moderate | 0 (NA%) | 1,348 (52%) | 1,348 (52%) | |
| Intense/strong | 0 (NA%) | 917 (36%) | 917 (36%) | |
| Sleep Quality | 6,968 | |||
| Very bad | 70 (1.6%) | 162 (6.3%) | 232 (3.3%) | |
| Bad | 352 (8.0%) | 536 (21%) | 888 (13%) | |
| Fair | 907 (21%) | 724 (28%) | 1,631 (23%) | |
| Good | 2,387 (54%) | 932 (36%) | 3,319 (48%) | |
| Very good | 673 (15%) | 225 (8.7%) | 898 (13%) | |
| Sleep Problems | 6,941 | 293 (6.7%) | 495 (19%) | 788 (11%) |
| Depression | 6,963 | 422 (9.6%) | 477 (19%) | 899 (13%) |
| Alcohol Consumption | 6,961 | |||
| Never | 3,268 (75%) | 2,059 (80%) | 5,327 (77%) | |
| Less than once a month | 484 (11%) | 197 (7.6%) | 681 (9.8%) | |
| Once a month or more | 631 (14%) | 322 (12%) | 953 (14%) | |
| Fracture of Hip or Wrist | 6,974 | 42 (1.0%) | 47 (1.8%) | 89 (1.3%) |
| Cancer | 6,970 | 175 (4.0%) | 117 (4.5%) | 292 (4.2%) |
| Diabetes Mellitus | 6,943 | 695 (16%) | 563 (22%) | 1,258 (18%) |
| Dementia | 6,969 | 47 (1.1%) | 69 (2.7%) | 116 (1.7%) |
| Arthritis | 6,937 | 546 (12%) | 922 (36%) | 1,468 (21%) |
| Heart Disease | 6,968 | 260 (5.9%) | 292 (11%) | 552 (7.9%) |
| Lung Disease | 6,968 | 159 (3.6%) | 233 (9.0%) | 392 (5.6%) |
| 1 Mean (SD); n (%) | ||||
It’s important to note that the pain intensity variable doesn’t have any observations for those that do not have pain in the first place, as should be expected.
At this point we have coded all the variables correctly, so we can proceed with the analysis.
This analysis checks for the association between the pain variable and the sleepquality variable, without adjustments. As pain is a binary variable, we’re going to run a logistic regression.
## using the generalized linear model function
## with the family = "binomial" to create a logistical
## regression
model1 <- glm (data = PPCRdata, formula = pain ~ sleepquality,
family = "binomial")
summary(model1)
##
## Call:
## glm(formula = pain ~ sleepquality, family = "binomial", data = PPCRdata)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.8391 0.1430 5.866 4.45e-09 ***
## sleepqualityBad -0.4186 0.1586 -2.639 0.00832 **
## sleepqualityFair -1.0645 0.1515 -7.028 2.10e-12 ***
## sleepqualityGood -1.7796 0.1482 -12.011 < 2e-16 ***
## sleepqualityVery good -1.9347 0.1624 -11.910 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9184.1 on 6967 degrees of freedom
## Residual deviance: 8669.3 on 6963 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 8679.3
##
## Number of Fisher Scoring iterations: 4
## coef(model1) extracts the coefficients from the model
## confint(model1) calculates confidence intervals (95%)
## cbind then creates a table by joining the two by column
## finally, we exponentiate all the results using the base e
## in order to transform logit units into regular units
exp(cbind(OR = coef(model1),confint(model1)))
## Waiting for profiling to be done...
## OR 2.5 % 97.5 %
## (Intercept) 2.3142857 1.7567890 3.0806419
## sleepqualityBad 0.6579686 0.4799679 0.8945713
## sleepqualityFair 0.3449168 0.2550187 0.4621682
## sleepqualityGood 0.1687122 0.1255215 0.2245582
## sleepqualityVery good 0.1444610 0.1045639 0.1978048
## another way, which is simpler and does the same, is to use
## the broom::tidy function
model1Tidy <- tidy(model1, conf.int = TRUE, exponentiate = TRUE)
model1Tidy %>% rename(Odds.Ratio = estimate) -> model1Tidy
model1Tidy
## # A tibble: 5 × 7
## term Odds.Ratio std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.31 0.143 5.87 4.45e- 9 1.76 3.08
## 2 sleepqualityBad 0.658 0.159 -2.64 8.32e- 3 0.480 0.895
## 3 sleepqualityFair 0.345 0.151 -7.03 2.10e-12 0.255 0.462
## 4 sleepqualityGood 0.169 0.148 -12.0 3.10e-33 0.126 0.225
## 5 sleepqualityVery g… 0.144 0.162 -11.9 1.05e-32 0.105 0.198
## as we will be doing this extensively for each model, creating a function
## in order to not repeat the code every time
tbl_pub <- function(model, path, file, labels){
model %>%
tbl_regression(exponentiate = TRUE, conf.int = TRUE,
label = labels) %>%
bold_p() %>%
bold_labels() %>%
italicize_levels() %>%
modify_header(label = "**Variable**") -> tblx
## checking if the model has been saved already
if (!file.exists(paste(path,file,sep = ""))){
## if it hasn't, saving the model for publication
tblx %>%
as_gt() %>%
gt::gtsave(filename = file, path = path)
}
tblx
}
## creating a label list
labelList <- list(sleepquality ~ "Sleep Quality",
pain ~ "Pain",
age ~ "Age",
bmi ~ "Body Mass Index",
depression ~ "Depression",
alcohol ~ "Alcohol",
fracture ~ "Fracture",
cancer ~ "Cancer",
diabetes ~ "Diabetes",
dementia ~ "Dementia",
arthritis ~ "Arthritis",
heartdisease ~ "Heart Disease",
lungdisease ~ "Lung Disease",
sex ~ "Sex")
## creating a path variable
tablePath <- "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/tables/"
## creating and saving the publication table
tbl_pub (model1, tablePath, "model1.docx", labelList[1]) -> tbl1
## let's plot these coefficients and their CI's to increase
## readability
## setting up two label arrays for reuse later
labels1 <- c("sleepqualityVery good" = "Very Good", "sleepqualityGood" = "Good",
"sleepqualityFair" = "Fair", "sleepqualityBad" = "Bad")
labels2 <-c("sleepqualityVery good" = "Sleep Quality: Very Good",
"sleepqualityGood" = "Sleep Quality: Good",
"sleepqualityFair" = "Sleep Quality: Fair",
"sleepqualityBad" = "Sleep Quality: Bad",
"sleepqualityVery Bad" = "Reference - Sleep Quality: Very Bad",
"alcoholOnce a month or more" = "Alcohol Use: Once a month of more",
"alcoholLess than once a month" = "Alcohol Use: Less than once a month",
"alcoholNever" = "Reference - Alcohol Use: Never",
"depressionYes" = "Depression",
"alcoholYes" = "Alcohol Consumption",
"diabetesYes" = "Diabetes Mellitus",
"dementiaYes" = "Dementia",
"arthritisYes" = "Arthritis",
"cancerYes" = "Cancer",
"age" = "Age",
"bmi" = "Body Mass Index",
"fractureYes" = "Fracture of Hip or Wrist",
"heartdiseaseYes" = "Heart Disease",
"lungdiseaseYes" = "Lung Disease",
"painYes" = "Pain",
"sexMale" = "Male")
## creating a plot
model1Tidy %>% ggplot (aes(Odds.Ratio, term, xmin = conf.low,
xmax = conf.high, height = 0)) +
## inserting a point for the odds ratio
geom_point() +
## drawing a line at the 1 to improve readability
geom_vline(xintercept = 1, lty = 4) +
## drawing an horizontal line to represent the
## confidence interval
geom_errorbarh() +
## removing the intercept term and changing labels
scale_y_discrete(
limits = setdiff(model1Tidy$term, "(Intercept)"),
labels = labels1) +
labs(x = "Odds Ratio", y = "Sleep Quality",
title = "Presence of Pain",
subtitle = "Unadjusted odds ratio, in comparison to \"Very Bad\" Sleep Quality",
caption = paste("N =", nobs(model1))) +
## adjusting the ticks on the x axis and its range
scale_x_continuous (breaks = seq(0,1.2,0.2),
limits = c(0,1.2)) -> plot1
## checking if plot has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots/plot1.png")){
## if it hasn't, saving the plot for publication
ggsave("plot1.png", plot1,
path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots", height = 3)
}
plot1
This analysis allows us to determine that, when using a reference level of sleepquality that is very bad, the odds of experiencing pain when sleepquality increases are significantly lower (the odds ratio of the comparison of each other level with the very bad level of sleep quality can be seen in the table and plot above).
In this next model, we’re going to consider sleepquality as a continuous variable.
## doing the transformation and saving on another variable to not lose the
## categorical information (names for each level)
PPCRdata %>% mutate(sleepqualitynum =
case_when(PPCRdata$sleepquality == 'Very bad' ~ 1,
PPCRdata$sleepquality == 'Bad' ~ 2,
PPCRdata$sleepquality == 'Fair' ~ 3,
PPCRdata$sleepquality == 'Good' ~ 4,
PPCRdata$sleepquality == 'Very good' ~ 5)) -> PPCRdata
Now that the variable is transformed to numeric, let’s run the model again.
glm (data = PPCRdata, formula = pain ~ sleepqualitynum,
family = "binomial") -> model2
model2 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>%
rename(Odds.Ratio = estimate) -> model2Tidy
model2Tidy
## # A tibble: 2 × 7
## term Odds.Ratio std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4.33 0.0963 15.2 3.29e- 52 3.59 5.23
## 2 sleepqualitynum 0.564 0.0269 -21.3 2.07e-100 0.535 0.594
model2Tidy %>% ggplot (aes(Odds.Ratio, term, xmin = conf.low,
xmax = conf.high, height = 0)) +
## inserting a point for the odds ratio
geom_point() +
## drawing a line at the 1 to improve readability
geom_vline(xintercept = 1, lty = 4) +
## drawing an horizontal line to represent the
## confidence interval
geom_errorbarh() +
scale_y_discrete(
limits = setdiff(model2Tidy$term, "(Intercept)"))
In order to determine which model is best, we’re going to look at the AIC values. It looks like changing the variable to continues increases the AIC, which tells us that the model is behaving worse than the categorical. Let’s keep it categorical, then.
This analysis checks for the association between the pain variable and the sleepquality variable, adjusting the model for bmi, comorbidities and depression, as was done in the original article. As pain is a binary variable, we’re going to run a logistic regression.
model3 <- glm(data = PPCRdata, formula = pain ~ sleepquality + age + bmi +
depression + alcohol + fracture + cancer + diabetes +
dementia + arthritis + heartdisease + lungdisease,
family = "binomial")
summary(model3)
##
## Call:
## glm(formula = pain ~ sleepquality + age + bmi + depression +
## alcohol + fracture + cancer + diabetes + dementia + arthritis +
## heartdisease + lungdisease, family = "binomial", data = PPCRdata)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.178116 0.306158 0.582 0.560716
## sleepqualityBad -0.358261 0.172220 -2.080 0.037502 *
## sleepqualityFair -0.884865 0.165214 -5.356 8.51e-08 ***
## sleepqualityGood -1.502021 0.161661 -9.291 < 2e-16 ***
## sleepqualityVery good -1.671866 0.176887 -9.452 < 2e-16 ***
## age -0.011630 0.003067 -3.792 0.000150 ***
## bmi 0.028117 0.005353 5.253 1.50e-07 ***
## depressionYes 0.309084 0.082688 3.738 0.000186 ***
## alcoholLess than once a month -0.349505 0.097373 -3.589 0.000332 ***
## alcoholOnce a month or more -0.062550 0.082267 -0.760 0.447056
## fractureYes 0.445831 0.243776 1.829 0.067421 .
## cancerYes -0.019709 0.136656 -0.144 0.885322
## diabetesYes 0.194192 0.071420 2.719 0.006548 **
## dementiaYes 0.264841 0.216723 1.222 0.221699
## arthritisYes 1.228566 0.066383 18.507 < 2e-16 ***
## heartdiseaseYes 0.455709 0.101336 4.497 6.89e-06 ***
## lungdiseaseYes 0.818769 0.116491 7.029 2.09e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8872.2 on 6752 degrees of freedom
## Residual deviance: 7803.9 on 6736 degrees of freedom
## (221 observations deleted due to missingness)
## AIC: 7837.9
##
## Number of Fisher Scoring iterations: 4
## creating and saving publication-ready table
tbl_pub (model3, tablePath, "model3.docx", labelList[-c(2,14)]) -> tbl3
tbl_merge(list(tbl1,tbl3),
tab_spanner = c("**Unadjusted**", "**Adjusted**")) -> merged1
## The merging columns (variable name, variable label, row type, and label column)
## are not unique and the merge may fail or result in a malformed table. If you
## previously 'tbl_stack'ed your tables, then 'tbl_merge'ing before you 'tbl_stack'
## may resolve the issue.
if(!file.exists(paste(tablePath,"merged1.docx", sep=""))){
merged1 %>% as_gt() %>% gt::gtsave(filename = "merged1.docx", path = tablePath)
}
Let’s adjust the results and plot the odds ratios.
model3Tidy <- tidy(model3, conf.int = TRUE, exponentiate = TRUE)
model3Tidy %>% rename(Odds.Ratio = estimate) -> model3Tidy
model3Tidy
## # A tibble: 17 × 7
## term Odds.Ratio std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1.19 0.306 0.582 5.61e- 1 0.657 2.18
## 2 sleepqualityBad 0.699 0.172 -2.08 3.75e- 2 0.497 0.976
## 3 sleepqualityFair 0.413 0.165 -5.36 8.51e- 8 0.297 0.568
## 4 sleepqualityGood 0.223 0.162 -9.29 1.53e-20 0.161 0.305
## 5 sleepqualityVery … 0.188 0.177 -9.45 3.34e-21 0.132 0.265
## 6 age 0.988 0.00307 -3.79 1.50e- 4 0.982 0.994
## 7 bmi 1.03 0.00535 5.25 1.50e- 7 1.02 1.04
## 8 depressionYes 1.36 0.0827 3.74 1.86e- 4 1.16 1.60
## 9 alcoholLess than … 0.705 0.0974 -3.59 3.32e- 4 0.582 0.852
## 10 alcoholOnce a mon… 0.939 0.0823 -0.760 4.47e- 1 0.799 1.10
## 11 fractureYes 1.56 0.244 1.83 6.74e- 2 0.969 2.53
## 12 cancerYes 0.980 0.137 -0.144 8.85e- 1 0.748 1.28
## 13 diabetesYes 1.21 0.0714 2.72 6.55e- 3 1.06 1.40
## 14 dementiaYes 1.30 0.217 1.22 2.22e- 1 0.853 2.00
## 15 arthritisYes 3.42 0.0664 18.5 1.81e-76 3.00 3.89
## 16 heartdiseaseYes 1.58 0.101 4.50 6.89e- 6 1.29 1.92
## 17 lungdiseaseYes 2.27 0.116 7.03 2.09e-12 1.81 2.85
## adding the reference terms to the data frame to allow for a better plot
model3Tidy %>% add_row(term = "sleepqualityVery Bad", Odds.Ratio = -99, std.error = 0, statistic = 0, p.value = 0, conf.low = 0, conf.high = 0, .before = 2) %>%
add_row(term = "alcoholNever", Odds.Ratio = -99, std.error = 0, statistic = 0, p.value = 0, conf.low = 0, conf.high = 0, .before = 10) %>%
## filtering the intercept
filter (term !="(Intercept)") %>%
## saving the result so that we can refer to it in the next call
{. ->> model3Tidy}%>%
## adding a variable to determine whether the observation
## is a reference or not. Initially adding everything as FALSE
add_column(.after = "conf.high",
reference = rep(FALSE, length.out = nrow(model3Tidy))) %>%
## now the appropriate ones to TRUE
dplyr::mutate(reference = case_when
(term %in% c("Very Bad", "Never") ~ TRUE,
TRUE ~ FALSE)) %>%
## create a Variable column
dplyr::mutate(.after= "term",
Variable = case_when(grepl("sleep", term) ~ "Sleep Quality",
grepl("age", term) ~ "Age",
grepl("bmi", term) ~ "Body Mass Index",
grepl("depression",term) ~ "Depression",
grepl("alcohol",term) ~ "Alcohol",
grepl("fracture", term) ~ "Fracture",
grepl("cancer",term) ~ "Cancer",
grepl("diabetes",term) ~ "Diabetes",
grepl("dementia", term) ~ "Dementia",
grepl("arthritis",term) ~ "Arthritis",
grepl("heart",term) ~ "Heart Disease",
grepl("lung",term) ~ "Lung Disease")) %>%
## transform the variables to factor
dplyr::mutate(Variable = as_factor(Variable)) %>%
dplyr::mutate(term = as_factor(term)) -> model3Tidy
model3Tidy %>% ggplot (aes(x = Odds.Ratio, y = term, xmin = conf.low,
xmax = conf.high, height = 0, alpha = !reference)) +
## inserting a point for the odds ratio
geom_point(aes(alpha = !reference)) +
## drawing a line at the 1 to improve readability
geom_vline(xintercept = 1, lty = 4) +
## drawing an horizontal line to represent the
## confidence interval
geom_errorbarh() +
scale_y_discrete(labels = labels2) +
labs(x = "Odds Ratio", y = "",
title = "Presence of Pain",
subtitle = "Adjusted Odds Ratios",
caption = paste("N =", nobs(model3))) +
facet_grid(Variable ~ ., scales = "free_y", space = "free_y") +
scale_alpha_identity() +
# hide facet labels
theme(strip.background = element_blank(),
strip.text = element_blank()) +
# remove spacing between facets
theme(panel.spacing = unit(0, "pt")) +
scale_x_continuous(breaks = seq(0,3, by = 0.5),
limits = (c(0,2.9))) -> plot3
## checking if plot has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots/plot3.png")){
## if it hasn't, saving the plot for publication
ggsave("plot3.png", plot3,
path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots")
}
plot3
If we refer to the analysis of the unadjusted model, the results are the same for the sleepquality variable. In summary, increasing levels of sleep quality are decrease the odds of pain being present.
We can also test how the adjusted model behaves with sleep quality as a continuous variable.
model4 <- glm(data = PPCRdata, formula = pain ~ sleepqualitynum + age + bmi +
depression + alcohol + fracture + cancer + diabetes +
dementia + arthritis + heartdisease + lungdisease,
family = "binomial")
summary(model4)
##
## Call:
## glm(formula = pain ~ sleepqualitynum + age + bmi + depression +
## alcohol + fracture + cancer + diabetes + dementia + arthritis +
## heartdisease + lungdisease, family = "binomial", data = PPCRdata)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.734425 0.282044 2.604 0.009216 **
## sleepqualitynum -0.489627 0.029134 -16.806 < 2e-16 ***
## age -0.011787 0.003062 -3.849 0.000119 ***
## bmi 0.028120 0.005343 5.263 1.42e-07 ***
## depressionYes 0.304587 0.082189 3.706 0.000211 ***
## alcoholLess than once a month -0.357506 0.097191 -3.678 0.000235 ***
## alcoholOnce a month or more -0.063880 0.082100 -0.778 0.436524
## fractureYes 0.448646 0.242928 1.847 0.064772 .
## cancerYes -0.015393 0.136500 -0.113 0.910215
## diabetesYes 0.198541 0.071297 2.785 0.005358 **
## dementiaYes 0.276703 0.216056 1.281 0.200299
## arthritisYes 1.234864 0.066287 18.629 < 2e-16 ***
## heartdiseaseYes 0.461404 0.101158 4.561 5.09e-06 ***
## lungdiseaseYes 0.825247 0.116301 7.096 1.29e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8872.2 on 6752 degrees of freedom
## Residual deviance: 7818.1 on 6739 degrees of freedom
## (221 observations deleted due to missingness)
## AIC: 7846.1
##
## Number of Fisher Scoring iterations: 4
model4Tidy <- tidy(model4, conf.int = TRUE, exponentiate = TRUE)
model4Tidy %>% rename(Odds.Ratio = estimate) -> model4Tidy
model4Tidy
## # A tibble: 14 × 7
## term Odds.Ratio std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.08 0.282 2.60 9.22e- 3 1.20 3.63
## 2 sleepqualitynum 0.613 0.0291 -16.8 2.21e-63 0.579 0.649
## 3 age 0.988 0.00306 -3.85 1.19e- 4 0.982 0.994
## 4 bmi 1.03 0.00534 5.26 1.42e- 7 1.02 1.04
## 5 depressionYes 1.36 0.0822 3.71 2.11e- 4 1.15 1.59
## 6 alcoholLess than … 0.699 0.0972 -3.68 2.35e- 4 0.577 0.845
## 7 alcoholOnce a mon… 0.938 0.0821 -0.778 4.37e- 1 0.798 1.10
## 8 fractureYes 1.57 0.243 1.85 6.48e- 2 0.973 2.53
## 9 cancerYes 0.985 0.136 -0.113 9.10e- 1 0.752 1.28
## 10 diabetesYes 1.22 0.0713 2.78 5.36e- 3 1.06 1.40
## 11 dementiaYes 1.32 0.216 1.28 2.00e- 1 0.864 2.02
## 12 arthritisYes 3.44 0.0663 18.6 1.87e-77 3.02 3.92
## 13 heartdiseaseYes 1.59 0.101 4.56 5.09e- 6 1.30 1.93
## 14 lungdiseaseYes 2.28 0.116 7.10 1.29e-12 1.82 2.87
model4Tidy %>% ggplot (aes(Odds.Ratio, term, xmin = conf.low,
xmax = conf.high, height = 0)) +
## inserting a point for the odds ratio
geom_point() +
## drawing a line at the 1 to improve readability
geom_vline(xintercept = 1, lty = 4) +
## drawing an horizontal line to represent the
## confidence interval
geom_errorbarh() +
scale_y_discrete(
limits = setdiff(model4Tidy$term, "(Intercept)"))
The results point towards increases in the continuous variable also being significantly associated with a decrease in the odds of pain being present. Looking at the AIC, it still seems as if the model with sleepquality coded as categorical behaves better.
At this point, I haven’t yet learned how to properly test for the assumptions of the logistic regression model. On a later opportunity, I’ll try to remedy this situation. For now, I’ll include some commands that were listed as useful in the R you ready for R? e-book.
library(blorr)
blr_model_fit_stats(model3)
## Model Fit Statistics
## ---------------------------------------------------------------------------------
## Log-Lik Intercept Only: -4596.949 Log-Lik Full Model: -3901.953
## Deviance(6736): 7803.906 LR(16): 1389.993
## Prob > LR: 0.000
## MCFadden's R2 0.151 McFadden's Adj R2: 0.147
## ML (Cox-Snell) R2: 0.181 Cragg-Uhler(Nagelkerke) R2: 0.247
## McKelvey & Zavoina's R2: 0.194 Efron's R2: 0.154
## Count R2: 0.701 Adj Count R2: 0.182
## BIC: 7953.807 AIC: 7837.906
## ---------------------------------------------------------------------------------
blr_test_hosmer_lemeshow(model3)
## Partition for the Hosmer & Lemeshow Test
## --------------------------------------------------------------
## def = 1 def = 0
## Group Total Observed Expected Observed Expected
## --------------------------------------------------------------
## 1 676 107 107.90 569 568.10
## 2 675 107 128.45 568 546.55
## 3 675 134 142.86 541 532.14
## 4 675 162 160.29 513 514.71
## 5 676 210 188.13 466 487.87
## 6 675 215 226.35 460 448.65
## 7 675 313 272.65 362 402.35
## 8 675 323 327.53 352 347.47
## 9 675 389 398.34 286 276.66
## 10 676 513 520.51 163 155.49
## --------------------------------------------------------------
##
## Goodness of Fit Test
## ------------------------------
## Chi-Square DF Pr > ChiSq
## ------------------------------
## 20.6763 8 0.0081
## ------------------------------
blr_roc_curve(blr_gains_table(model3))
car::vif(model3)
## GVIF Df GVIF^(1/(2*Df))
## sleepquality 1.044858 4 1.005500
## age 1.080334 1 1.039391
## bmi 1.037185 1 1.018423
## depression 1.047143 1 1.023300
## alcohol 1.044415 2 1.010923
## fracture 1.006378 1 1.003184
## cancer 1.008435 1 1.004208
## diabetes 1.033202 1 1.016465
## dementia 1.012373 1 1.006168
## arthritis 1.026991 1 1.013406
## heartdisease 1.021015 1 1.010453
## lungdisease 1.004299 1 1.002147
In order to do this analysis I’ll try to follow the instructions from the UCLA OARC - Ordinal Logistic Regression | R Data Analysis Example.
As the original article by Montelhão et al tested for bidirectional associations, let’s try to replicate that as well. In this case, the dependent variable is ordinal, therefore we will run an ordinal logistical regression.
## creating an ordinal version of sleepquality
PPCRdata %>% mutate (sleepqualityord = factor(PPCRdata$sleepquality,
ordered = TRUE)) -> PPCRdata
## loading the library required to run the model
library(MASS)
## running the model
model5 <- polr (data = PPCRdata, formula = sleepqualityord ~ pain + depression +
alcohol + fracture + cancer + diabetes + dementia +
arthritis + heartdisease + lungdisease, Hess = TRUE)
summary(model5)
## Call:
## polr(formula = sleepqualityord ~ pain + depression + alcohol +
## fracture + cancer + diabetes + dementia + arthritis + heartdisease +
## lungdisease, data = PPCRdata, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## painYes -0.85302 0.05024 -16.9801
## depressionYes -0.82700 0.06790 -12.1801
## alcoholLess than once a month 0.07362 0.07643 0.9633
## alcoholOnce a month or more 0.06820 0.06710 1.0165
## fractureYes -0.36430 0.20135 -1.8093
## cancerYes -0.01869 0.11499 -0.1625
## diabetesYes -0.11152 0.05947 -1.8752
## dementiaYes -0.90439 0.17339 -5.2160
## arthritisYes -0.30678 0.05771 -5.3158
## heartdiseaseYes -0.28727 0.08436 -3.4051
## lungdiseaseYes -0.14181 0.09885 -1.4347
##
## Intercepts:
## Value Std. Error t value
## Very bad|Bad -4.1364 0.0780 -53.0304
## Bad|Fair -2.3132 0.0473 -48.9539
## Fair|Good -0.9812 0.0383 -25.6396
## Good|Very good 1.5042 0.0430 34.9433
##
## Residual Deviance: 17569.05
## AIC: 17599.05
## (98 observations deleted due to missingness)
## tidy(model5, exponentiate = TRUE, p.values = TRUE, conf.int = TRUE)
The tidy function is not working for the polr model, even though it should. I tried to debug it, but couldn’t. I will then describe a more manual approach, as was described in the UCLA Office of Advanced Research Computing (OARC) page for Ordinal Regression.
## getting the coefficients
ctable <- coef(summary(model5))
## getting the respective p-values from the normal distribution
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
## creating a table with the respective p-values added
ctable <- cbind(ctable, "p value" = p)
ctable
## Value Std. Error t value p value
## painYes -0.85301696 0.05023641 -16.9800548 1.153784e-64
## depressionYes -0.82699798 0.06789738 -12.1801156 3.967194e-34
## alcoholLess than once a month 0.07362053 0.07642528 0.9633007 3.353966e-01
## alcoholOnce a month or more 0.06820388 0.06709866 1.0164715 3.094049e-01
## fractureYes -0.36430180 0.20135407 -1.8092596 7.041068e-02
## cancerYes -0.01869110 0.11498945 -0.1625462 8.708757e-01
## diabetesYes -0.11151556 0.05946946 -1.8751736 6.076884e-02
## dementiaYes -0.90439261 0.17338723 -5.2160278 1.828008e-07
## arthritisYes -0.30678023 0.05771145 -5.3157605 1.062128e-07
## heartdiseaseYes -0.28726601 0.08436256 -3.4051363 6.613103e-04
## lungdiseaseYes -0.14181420 0.09884818 -1.4346668 1.513821e-01
## Very bad|Bad -4.13641304 0.07800080 -53.0303894 0.000000e+00
## Bad|Fair -2.31320370 0.04725268 -48.9539177 0.000000e+00
## Fair|Good -0.98124293 0.03827055 -25.6396324 5.518018e-145
## Good|Very good 1.50417355 0.04304618 34.9432561 1.639373e-267
## creating confidence intervals
ci<-confint.default(model5)
colnames(ci) <- c("conf.low", "conf.high")
## exponentiating the log-odds and the confidence intervals
## in order to create the odds-ratios and the respective CI's
model5OddsRatio <- exp(cbind(OR = coef(model5), ci))
## converting rownames to columns, as this will be necessary to build the plot
model5OddsRatio <- as_tibble(rownames_to_column(as.data.frame(model5OddsRatio)))
model5OddsRatio %>% rename(term = rowname) -> model5OddsRatio
Let’s build a plot using these values.
model5OddsRatio %>% ggplot (aes(OR, term, xmin = conf.low,
xmax = conf.high, height = 0)) +
geom_point() +
geom_vline(xintercept = 1, lty = 4) +
geom_errorbarh()
The plot allows us to visually determine that the presence of pain decreases the odds of sleepquality being higher. Unfortunately, the model has a high AIC.
In order to be able to compare the adjusted analysis, let’s proceed with one that is unadjusted as well.
## this code is equivalent to the one used for the adjusted analysis, just
## with no covariates added to the model
model6 <- polr (data = PPCRdata, formula = sleepqualityord ~ pain, Hess = TRUE)
## Warning in polr(data = PPCRdata, formula = sleepqualityord ~ pain, Hess =
## TRUE): design appears to be rank-deficient, so dropping some coefs
summary(model6)
## Call:
## polr(formula = sleepqualityord ~ pain, data = PPCRdata, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## painYes -1.023 0.04753 -21.53
##
## Intercepts:
## Value Std. Error t value
## Very bad|Bad -3.8629 0.0717 -53.8797
## Bad|Fair -2.1040 0.0402 -52.2757
## Fair|Good -0.8107 0.0313 -25.9252
## Good|Very good 1.6254 0.0379 42.8842
##
## Residual Deviance: 18102.95
## AIC: 18112.95
## (6 observations deleted due to missingness)
## getting the coefficients
ctable2 <- coef(summary(model6))
## getting the respective p-values from the normal distribution
p <- pnorm(abs(ctable2[, "t value"]), lower.tail = FALSE) * 2
## creating a table with the respective p-values added
ctable2 <- cbind(ctable2, "p value" = p)
## creating confidence intervals
ci<-confint.default(model6)
colnames(ci) <- c("conf.low", "conf.high")
## exponentiating the log-odds and the confidence intervals
## in order to create the odds-ratios and the respective CI's
model6OddsRatio <- exp(cbind(OR = coef(model6), ci))
## converting rownames to columns, as this will be necessary to build the plot
model6OddsRatio <- as_tibble(rownames_to_column(as.data.frame(model6OddsRatio)))
model6OddsRatio %>% rename(term = rowname) -> model6OddsRatio
model6OddsRatio
## # A tibble: 1 × 4
## term OR conf.low conf.high
## <chr> <dbl> <dbl> <dbl>
## 1 painYes 0.359 0.327 0.395
## building a plot
model6OddsRatio %>% ggplot (aes(OR, term, xmin = conf.low,
xmax = conf.high, height = 0)) +
geom_point() +
geom_vline(xintercept = 1, lty = 4) +
geom_errorbarh()
We need to check what the warning means, but the presence of pain reduces the odds of increased sleep quality in this model. The AIC is huge, as should be expected of a model that only has one predictor and that predictor is a dichotomous variable.
This is something to consider further on. At this point, I haven’t yet figured out exactly how to do it, but I’ll try to follow the instructions from the Office of Advanced Research Computing on another day.
## Loading the library that is needed to conduct the visual analysis of the
## proportional odds assumption
library (Hmisc)
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
sf <- function(y) {
c('Y>=1' = qlogis(mean(y >= 1)),
'Y>=2' = qlogis(mean(y >= 2)),
'Y>=3' = qlogis(mean(y >= 3)),
'Y>=4' = qlogis(mean(y >= 4)),
'Y>=5' = qlogis(mean(y >= 5)),
'Y>=6' = qlogis(mean(y >= 6)))
}
(s <- with(PPCRdata, summary(as.numeric(sleepqualityord) ~ pain + depression +
alcohol + fracture + cancer + diabetes + dementia +
arthritis + heartdisease + lungdisease, fun=sf)))
## as.numeric(sleepqualityord) N= 6968 , 6 Missing
##
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | | | N|Y>=1| Y>=2| Y>=3| Y>=4| Y>=5|Y>=6|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | pain| No|4389| Inf|4.122284|2.2407601| 0.83398814|-1.708658|-Inf|
## | | Yes|2579| Inf|2.702686|0.9913397|-0.20623388|-2.347771|-Inf|
## | | Does not apply| 0| | | | | | |
## | |Didn’t know/didn’t answer| 0| | | | | | |
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | depression| No|6058| Inf|3.626631|1.8714213| 0.54034089|-1.803168|-Inf|
## | | Yes| 899| Inf|2.411318|0.6418539|-0.29807178|-3.066889|-Inf|
## | | Missing| 11| Inf|2.302585|1.5040774| 0.18232156|-2.302585|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | alcohol| Never|5323| Inf|3.363997|1.5908421| 0.37907588|-1.937990|-Inf|
## | | Less than once a month| 680| Inf|4.019382|1.9459101| 0.61903921|-1.973077|-Inf|
## | | Once a month or more| 952| Inf|3.075775|1.8265024| 0.57408782|-1.741207|-Inf|
## | | Missing| 13| Inf| Inf|1.7047481|-0.15415068|-1.203973|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | fracture| No|6879| Inf|3.382288|1.6654277| 0.43438220|-1.905168|-Inf|
## | | Yes| 89| Inf|2.627081|0.8850382|-0.11247798|-2.460809|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | cancer| No|6672| Inf|3.411938|1.6677603| 0.43352306|-1.914758|-Inf|
## | | Yes| 292| Inf|2.665033|1.3523928| 0.28968008|-1.811881|-Inf|
## | | Missing| 4| Inf| Inf|1.0986123| 0.00000000| -Inf|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | diabetes| No|5682| Inf|3.397204|1.7253825| 0.48294991|-1.888043|-Inf|
## | | Yes|1256| Inf|3.225520|1.3576787| 0.20131409|-2.020781|-Inf|
## | | Missing| 30| Inf| Inf|1.8718022|-0.40546511|-1.871802|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | dementia| No|6848| Inf|3.414865|1.6819751| 0.45021038|-1.896113|-Inf|
## | | Yes| 115| Inf|2.060023|0.4784902|-0.91021169|-3.323236|-Inf|
## | | Missing| 5| Inf|1.386294|0.4054651|-0.40546511| -Inf|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | arthritis| No|5465| Inf|3.609603|1.8539727| 0.58442952|-1.835344|-Inf|
## | | Yes|1466| Inf|2.813170|1.0931627|-0.10651259|-2.232586|-Inf|
## | | Missing| 37| Inf|1.856298|0.7339692|-0.49643689|-2.110213|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |heartdisease| No|6412| Inf|3.451517|1.7161281| 0.47010500|-1.894850|-Inf|
## | | Yes| 551| Inf|2.690759|1.0484225|-0.05445992|-2.102100|-Inf|
## | | Missing| 5| Inf| Inf| Inf| 0.40546511| -Inf|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | lungdisease| No|6571| Inf|3.420717|1.6849207| 0.45543727|-1.901268|-Inf|
## | | Yes| 392| Inf|2.730029|1.1962508|-0.03061464|-2.068013|-Inf|
## | | Missing| 5| Inf| Inf|1.3862944| 0.40546511| -Inf|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | Overall| |6968| Inf|3.368484|1.6527710| 0.42715949|-1.910944|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
s[, 6] <- s[, 6] - s[, 5]
s[, 5] <- s[, 5] - s[, 4]
s[, 4] <- s[, 4] - s[, 3]
s[, 3] <- s[, 3] - s[, 3]
s
## as.numeric(sleepqualityord) N= 6968 , 6 Missing
##
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | | | N|Y>=1|Y>=2| Y>=3| Y>=4| Y>=5|Y>=6|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | pain| No|4389| Inf| 0|-1.8815238|-1.4067720|-2.542646|-Inf|
## | | Yes|2579| Inf| 0|-1.7113463|-1.1975736|-2.141537|-Inf|
## | | Does not apply| 0| | | | | | |
## | |Didn’t know/didn’t answer| 0| | | | | | |
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | depression| No|6058| Inf| 0|-1.7552100|-1.3310804|-2.343509|-Inf|
## | | Yes| 899| Inf| 0|-1.7694644|-0.9399257|-2.768818|-Inf|
## | | Missing| 11| Inf| 0|-0.7985077|-1.3217558|-2.484907|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | alcohol| Never|5323| Inf| 0|-1.7731550|-1.2117662|-2.317066|-Inf|
## | | Less than once a month| 680| Inf| 0|-2.0734714|-1.3268709|-2.592116|-Inf|
## | | Once a month or more| 952| Inf| 0|-1.2492726|-1.2524146|-2.315295|-Inf|
## | | Missing| 13| Inf| | -Inf|-1.8588988|-1.049822|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | fracture| No|6879| Inf| 0|-1.7168605|-1.2310455|-2.339551|-Inf|
## | | Yes| 89| Inf| 0|-1.7420430|-0.9975162|-2.348331|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | cancer| No|6672| Inf| 0|-1.7441774|-1.2342372|-2.348281|-Inf|
## | | Yes| 292| Inf| 0|-1.3126400|-1.0627127|-2.101561|-Inf|
## | | Missing| 4| Inf| | -Inf|-1.0986123| -Inf|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | diabetes| No|5682| Inf| 0|-1.6718214|-1.2424326|-2.370993|-Inf|
## | | Yes|1256| Inf| 0|-1.8678417|-1.1563646|-2.222095|-Inf|
## | | Missing| 30| Inf| | -Inf|-2.2772673|-1.466337|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | dementia| No|6848| Inf| 0|-1.7328899|-1.2317647|-2.346323|-Inf|
## | | Yes| 115| Inf| 0|-1.5815332|-1.3887019|-2.413024|-Inf|
## | | Missing| 5| Inf| 0|-0.9808293|-0.8109302| -Inf|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | arthritis| No|5465| Inf| 0|-1.7556306|-1.2695431|-2.419773|-Inf|
## | | Yes|1466| Inf| 0|-1.7200070|-1.1996753|-2.126073|-Inf|
## | | Missing| 37| Inf| 0|-1.1223288|-1.2304061|-1.613776|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |heartdisease| No|6412| Inf| 0|-1.7353891|-1.2460231|-2.364955|-Inf|
## | | Yes| 551| Inf| 0|-1.6423362|-1.1028825|-2.047640|-Inf|
## | | Missing| 5| Inf| | | -Inf| -Inf|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | lungdisease| No|6571| Inf| 0|-1.7357965|-1.2294834|-2.356706|-Inf|
## | | Yes| 392| Inf| 0|-1.5337783|-1.2268654|-2.037398|-Inf|
## | | Missing| 5| Inf| | -Inf|-0.9808293| -Inf|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | Overall| |6968| Inf| 0|-1.7157131|-1.2256115|-2.338103|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
plot(s, which=1:6, xlab='logit', main=' ', xlim=c(-2.8,0))
It doesn’t look like the proportional odds assumption holds.
In order to conduct a logistic regression using sleep quality as a dependent variable, we need to first convert it to a dichotomous variable. Let’s consider that “Very Bad” and “Bad” are failures and “Fair”, “Good” and “Very Good” are successes.
PPCRdata %>% mutate (sleepqualitydich = case_when(
sleepquality %in% c("Very bad", "Bad") ~ 0,
sleepquality %in% c("Fair", "Good", "Very good") ~ 1
)
) -> PPCRdata
head(PPCRdata$sleepqualitydich)
## [1] 0 1 1 1 1 1
summary(PPCRdata$sleepqualitydich)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 1.0000 1.0000 0.8393 1.0000 1.0000 6
Let’s proceed with the unadjusted analysis with sleepquality as a dichotomous variable.
model7 <- glm(sleepqualitydich ~ pain, "binomial", data = PPCRdata)
summary(model7)
##
## Call:
## glm(formula = sleepqualitydich ~ pain, family = "binomial", data = PPCRdata)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.24076 0.05120 43.77 <2e-16 ***
## painYes -1.24942 0.06772 -18.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 6144.2 on 6967 degrees of freedom
## Residual deviance: 5790.3 on 6966 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 5794.3
##
## Number of Fisher Scoring iterations: 4
## creating and saving table for publication
tbl_pub (model7, tablePath, "model7.docx", labelList[2]) -> tbl7
model7Tidy <- tidy(model7, conf.int = TRUE, exponentiate = TRUE)
model7Tidy %>% rename(Odds.Ratio = estimate) -> model7Tidy
model7Tidy
## # A tibble: 2 × 7
## term Odds.Ratio std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 9.40 0.0512 43.8 0 8.51 10.4
## 2 painYes 0.287 0.0677 -18.5 5.12e-76 0.251 0.327
model7Tidy %>% ggplot (aes(Odds.Ratio, term, xmin = conf.low,
xmax = conf.high, height = 0)) +
## inserting a point for the odds ratio
geom_point() +
## drawing a line at the 1 to improve readability
geom_vline(xintercept = 1, lty = 4) +
## drawing an horizontal line to represent the
## confidence interval
geom_errorbarh() +
scale_y_discrete(
limits = setdiff(model7Tidy$term, "(Intercept)"))
The presence of pain is associated with decreased odds of a success in sleep quality that is “Fair” or better.
model8 <- glm(sleepqualitydich ~ pain + age + bmi + depression + alcohol +
fracture + cancer + diabetes + dementia + arthritis +
heartdisease + lungdisease, "binomial", data = PPCRdata)
summary(model8)
##
## Call:
## glm(formula = sleepqualitydich ~ pain + age + bmi + depression +
## alcohol + fracture + cancer + diabetes + dementia + arthritis +
## heartdisease + lungdisease, family = "binomial", data = PPCRdata)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.757899 0.336922 5.218 1.81e-07 ***
## painYes -1.044460 0.073726 -14.167 < 2e-16 ***
## age 0.002957 0.003879 0.762 0.445873
## bmi 0.019638 0.006735 2.916 0.003546 **
## depressionYes -1.016393 0.086296 -11.778 < 2e-16 ***
## alcoholLess than once a month 0.166445 0.128151 1.299 0.194006
## alcoholOnce a month or more 0.102168 0.108679 0.940 0.347173
## fractureYes -0.521306 0.255911 -2.037 0.041644 *
## cancerYes -0.095977 0.163413 -0.587 0.556984
## diabetesYes -0.221428 0.086834 -2.550 0.010772 *
## dementiaYes -0.721659 0.216245 -3.337 0.000846 ***
## arthritisYes -0.332851 0.080613 -4.129 3.64e-05 ***
## heartdiseaseYes -0.384079 0.113976 -3.370 0.000752 ***
## lungdiseaseYes -0.137382 0.133866 -1.026 0.304769
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5929.9 on 6752 degrees of freedom
## Residual deviance: 5372.7 on 6739 degrees of freedom
## (221 observations deleted due to missingness)
## AIC: 5400.7
##
## Number of Fisher Scoring iterations: 5
## creating and saving table for publication
tbl_pub (model8, tablePath, "model8.docx", labelList[-c(1,14)]) -> tbl8
## creating merged table for adjusted and unadjusted analysis together
tbl_merge(list(tbl7,tbl8),
tab_spanner = c("**Unadjusted**", "**Adjusted**")) -> merged2
if(!file.exists(paste(tablePath,"merged2.docx", sep=""))){
merged2 %>% as_gt() %>% gt::gtsave(filename = "merged2.docx", path = tablePath)
}
## creating a table that has the four models together
tbl_merge(list(merged1,merged2)) %>%
as_gt() %>% gt::tab_spanner(label = gt::md("**Unadjusted**"),
columns = matches("_1_1"),
replace = TRUE,
level = 1,
id = "unadjusted1") %>%
gt::tab_spanner(label = gt::md("**Adjusted**"),
columns = matches("_2_1"),
replace = TRUE,
level = 1,
id = "adjusted1") %>%
gt::tab_spanner(label = gt::md("**Unadjusted**"),
columns = matches("_1_2"),
replace = TRUE,
level = 1,
id = "unadjusted2") %>%
gt::tab_spanner(label = gt::md("**Adjusted**"),
columns = matches("_2_2"),
replace = TRUE,
level = 1,
id = "adjusted2") %>%
gt::tab_spanner(label = gt::md("**Outcome: Pain**"),
spanners = c("unadjusted1","adjusted1"),
level = 2) %>%
gt::tab_spanner(label = gt::md("**Outcome: Sleep Quality**"),
spanners = c("unadjusted2", "adjusted2"),
level = 2) -> merged3
## The merging columns (variable name, variable label, row type, and label column)
## are not unique and the merge may fail or result in a malformed table. If you
## previously 'tbl_stack'ed your tables, then 'tbl_merge'ing before you 'tbl_stack'
## may resolve the issue.
if(!file.exists(paste(tablePath,"merged3.docx", sep=""))){
merged3 %>% gt::gtsave(filename = "merged3.docx", path = tablePath)
}
merged3
| Outcome: Pain | Outcome: Sleep Quality | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Variable | Unadjusted | Adjusted | Unadjusted | Adjusted | ||||||||
| OR1 | 95% CI1 | p-value | OR1 | 95% CI1 | p-value | OR1 | 95% CI1 | p-value | OR1 | 95% CI1 | p-value | |
| Sleep Quality | ||||||||||||
| Very bad | — | — | — | — | ||||||||
| Very bad | — | — | — | — | ||||||||
| Very bad | — | — | — | — | ||||||||
| Very bad | — | — | — | — | ||||||||
| Bad | 0.66 | 0.48, 0.89 | 0.008 | 0.70 | 0.50, 0.98 | 0.038 | ||||||
| Fair | 0.34 | 0.26, 0.46 | <0.001 | 0.41 | 0.30, 0.57 | <0.001 | ||||||
| Good | 0.17 | 0.13, 0.22 | <0.001 | 0.22 | 0.16, 0.30 | <0.001 | ||||||
| Very good | 0.14 | 0.10, 0.20 | <0.001 | 0.19 | 0.13, 0.26 | <0.001 | ||||||
| Age | 0.99 | 0.98, 0.99 | <0.001 | 1.00 | 1.00, 1.01 | 0.4 | ||||||
| Body Mass Index | 1.03 | 1.02, 1.04 | <0.001 | 1.02 | 1.01, 1.03 | 0.004 | ||||||
| Depression | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 1.36 | 1.16, 1.60 | <0.001 | 0.36 | 0.31, 0.43 | <0.001 | ||||||
| Alcohol | ||||||||||||
| Never | — | — | — | — | ||||||||
| Less than once a month | 0.71 | 0.58, 0.85 | <0.001 | 1.18 | 0.92, 1.53 | 0.2 | ||||||
| Once a month or more | 0.94 | 0.80, 1.10 | 0.4 | 1.11 | 0.90, 1.38 | 0.3 | ||||||
| Fracture | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 1.56 | 0.97, 2.53 | 0.067 | 0.59 | 0.36, 1.0 | 0.042 | ||||||
| Cancer | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 0.98 | 0.75, 1.28 | 0.9 | 0.91 | 0.66, 1.26 | 0.6 | ||||||
| Diabetes | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 1.21 | 1.06, 1.40 | 0.007 | 0.80 | 0.68, 0.95 | 0.011 | ||||||
| Dementia | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 1.30 | 0.85, 2.00 | 0.2 | 0.49 | 0.32, 0.75 | <0.001 | ||||||
| Arthritis | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 3.42 | 3.00, 3.89 | <0.001 | 0.72 | 0.61, 0.84 | <0.001 | ||||||
| Heart Disease | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 1.58 | 1.29, 1.92 | <0.001 | 0.68 | 0.55, 0.85 | <0.001 | ||||||
| Lung Disease | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 2.27 | 1.81, 2.85 | <0.001 | 0.87 | 0.67, 1.14 | 0.3 | ||||||
| Pain | ||||||||||||
| No | — | — | — | — | ||||||||
| Yes | 0.29 | 0.25, 0.33 | <0.001 | 0.35 | 0.30, 0.41 | <0.001 | ||||||
| 1 OR = Odds Ratio, CI = Confidence Interval | ||||||||||||
model8Tidy <- tidy(model8, conf.int = TRUE, exponentiate = TRUE)
model8Tidy %>% rename(Odds.Ratio = estimate) -> model8Tidy
model8Tidy
## # A tibble: 14 × 7
## term Odds.Ratio std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 5.80 0.337 5.22 1.81e- 7 3.00 11.2
## 2 painYes 0.352 0.0737 -14.2 1.47e-45 0.304 0.406
## 3 age 1.00 0.00388 0.762 4.46e- 1 0.995 1.01
## 4 bmi 1.02 0.00673 2.92 3.55e- 3 1.01 1.03
## 5 depressionYes 0.362 0.0863 -11.8 5.06e-32 0.306 0.429
## 6 alcoholLess than … 1.18 0.128 1.30 1.94e- 1 0.923 1.53
## 7 alcoholOnce a mon… 1.11 0.109 0.940 3.47e- 1 0.898 1.38
## 8 fractureYes 0.594 0.256 -2.04 4.16e- 2 0.364 0.995
## 9 cancerYes 0.908 0.163 -0.587 5.57e- 1 0.664 1.26
## 10 diabetesYes 0.801 0.0868 -2.55 1.08e- 2 0.677 0.951
## 11 dementiaYes 0.486 0.216 -3.34 8.46e- 4 0.319 0.747
## 12 arthritisYes 0.717 0.0806 -4.13 3.64e- 5 0.613 0.840
## 13 heartdiseaseYes 0.681 0.114 -3.37 7.52e- 4 0.546 0.854
## 14 lungdiseaseYes 0.872 0.134 -1.03 3.05e- 1 0.673 1.14
## adding the reference terms to the data frame to allow for a better plot
model8Tidy %>%
add_row(term = "alcoholNever", Odds.Ratio = -99, std.error = 0, statistic = 0, p.value = 0, conf.low = 0, conf.high = 0, .before = 10) %>%
## filtering the intercept
filter (term !="(Intercept)") %>%
## saving the result so that we can refer to it in the next call
{. ->> model8Tidy}%>%
## adding a variable to determine whether the observation
## is a reference or not. Initially adding everything as FALSE
add_column(.after = "conf.high",
reference = rep(FALSE, length.out = nrow(model8Tidy))) %>%
## now the appropriate ones to TRUE
dplyr::mutate(reference = case_when
(term %in% c("Never") ~ TRUE,
TRUE ~ FALSE)) %>%
## create a Variable column
dplyr::mutate(.after= "term",
Variable = case_when(grepl("pain", term) ~ "Pain",
grepl("age", term) ~ "Age",
grepl("bmi", term) ~ "Body Mass Index",
grepl("depression",term) ~ "Depression",
grepl("alcohol",term) ~ "Alcohol",
grepl("fracture", term) ~ "Fracture",
grepl("cancer",term) ~ "Cancer",
grepl("diabetes",term) ~ "Diabetes",
grepl("dementia", term) ~ "Dementia",
grepl("arthritis",term) ~ "Arthritis",
grepl("heart",term) ~ "Heart Disease",
grepl("lung",term) ~ "Lung Disease")) %>%
## transform the variables to factor
dplyr::mutate(Variable = as_factor(Variable)) %>%
dplyr::mutate(term = as_factor(term)) -> model8Tidy
model8Tidy %>% ggplot (aes(x = Odds.Ratio, y = term, xmin = conf.low,
xmax = conf.high, height = 0, alpha = !reference)) +
## inserting a point for the odds ratio
geom_point(aes(alpha = !reference)) +
## drawing a line at the 1 to improve readability
geom_vline(xintercept = 1, lty = 4) +
## drawing an horizontal line to represent the
## confidence interval
geom_errorbarh() +
scale_y_discrete(labels = labels2) +
labs(x = "Odds Ratio", y = "",
title = "Sleep Quality being Fair or Better",
subtitle = "Adjusted Odds Ratios",
caption = paste("N =", nobs(model8))) +
facet_grid(Variable ~ ., scales = "free_y", space = "free_y") +
scale_alpha_identity() +
# hide facet labels
theme(strip.background = element_blank(),
strip.text = element_blank()) +
# remove spacing between facets
theme(panel.spacing = unit(0, "pt")) +
scale_x_continuous(breaks = seq(0,1.8, by = 0.2),
limits = (c(0.3,1.53))) -> plot8
## checking if plot has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots/plot8.png")){
## if it hasn't, saving the plot for publication
ggsave("plot8.png", plot8,
path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots")
}
plot8
The analysis done by Morelhão et al. didn’t include sex as a covariate. We can analyze with it included to see its effect.
model9 <- glm(sleepqualitydich ~ pain + age + bmi + depression + alcohol +
fracture + cancer + diabetes + dementia + arthritis +
heartdisease + lungdisease + sex, "binomial", data = PPCRdata)
model9 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>%
rename(Odds.Ratio = estimate) -> model9Tidy
model9Tidy
## # A tibble: 15 × 7
## term Odds.Ratio std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4.76 0.340 4.60 4.29e- 6 2.45 9.27
## 2 painYes 0.358 0.0739 -13.9 7.31e-44 0.310 0.414
## 3 age 1.00 0.00388 0.700 4.84e- 1 0.995 1.01
## 4 bmi 1.02 0.00677 3.35 7.94e- 4 1.01 1.04
## 5 depressionYes 0.382 0.0871 -11.1 2.08e-28 0.322 0.453
## 6 alcoholLess than … 1.11 0.129 0.809 4.18e- 1 0.866 1.44
## 7 alcoholOnce a mon… 0.995 0.112 -0.0417 9.67e- 1 0.802 1.24
## 8 fractureYes 0.632 0.256 -1.79 7.29e- 2 0.387 1.06
## 9 cancerYes 0.905 0.164 -0.608 5.43e- 1 0.662 1.26
## 10 diabetesYes 0.804 0.0870 -2.50 1.23e- 2 0.679 0.955
## 11 dementiaYes 0.466 0.217 -3.52 4.38e- 4 0.306 0.718
## 12 arthritisYes 0.746 0.0812 -3.61 3.11e- 4 0.637 0.876
## 13 heartdiseaseYes 0.657 0.115 -3.67 2.46e- 4 0.526 0.824
## 14 lungdiseaseYes 0.879 0.134 -0.963 3.36e- 1 0.678 1.15
## 15 sexMale 1.42 0.0805 4.34 1.42e- 5 1.21 1.66
## creating and saving table for publication
tbl_pub (model9, tablePath, "model9.docx", labelList[-1])
| Variable | OR1 | 95% CI1 | p-value |
|---|---|---|---|
| Pain | |||
| No | — | — | |
| Yes | 0.36 | 0.31, 0.41 | <0.001 |
| Age | 1.00 | 1.00, 1.01 | 0.5 |
| Body Mass Index | 1.02 | 1.01, 1.04 | <0.001 |
| Depression | |||
| No | — | — | |
| Yes | 0.38 | 0.32, 0.45 | <0.001 |
| Alcohol | |||
| Never | — | — | |
| Less than once a month | 1.11 | 0.87, 1.44 | 0.4 |
| Once a month or more | 1.00 | 0.80, 1.24 | >0.9 |
| Fracture | |||
| No | — | — | |
| Yes | 0.63 | 0.39, 1.06 | 0.073 |
| Cancer | |||
| No | — | — | |
| Yes | 0.91 | 0.66, 1.26 | 0.5 |
| Diabetes | |||
| No | — | — | |
| Yes | 0.80 | 0.68, 0.96 | 0.012 |
| Dementia | |||
| No | — | — | |
| Yes | 0.47 | 0.31, 0.72 | <0.001 |
| Arthritis | |||
| No | — | — | |
| Yes | 0.75 | 0.64, 0.88 | <0.001 |
| Heart Disease | |||
| No | — | — | |
| Yes | 0.66 | 0.53, 0.82 | <0.001 |
| Lung Disease | |||
| No | — | — | |
| Yes | 0.88 | 0.68, 1.15 | 0.3 |
| Sex | |||
| Female | — | — | |
| Male | 1.42 | 1.21, 1.66 | <0.001 |
| 1 OR = Odds Ratio, CI = Confidence Interval | |||
## adding the reference terms to the data frame to allow for a better plot
model9Tidy %>%
add_row(term = "alcoholNever", Odds.Ratio = 0, std.error = 0, statistic = 0, p.value = 0, conf.low = 0, conf.high = 0, .before = 10) %>%
## filtering the intercept
filter (term !="(Intercept)") %>%
## saving the result so that we can refer to it in the next call
{. ->> model9Tidy}%>%
## adding a variable to determine whether the observation
## is a reference or not. Initially adding everything as FALSE
add_column(.after = "conf.high",
reference = rep(FALSE, length.out = nrow(model9Tidy))) %>%
## now the appropriate ones to TRUE
dplyr::mutate(reference = case_when
(term %in% c("alcoholNever") ~ TRUE,
TRUE ~ FALSE)) %>%
## create a Variable column
dplyr::mutate(.after= "term",
Variable = case_when(grepl("pain", term) ~ "Pain",
grepl("age", term) ~ "Age",
grepl("bmi", term) ~ "Body Mass Index",
grepl("depression",term) ~ "Depression",
grepl("alcohol",term) ~ "Alcohol",
grepl("fracture", term) ~ "Fracture",
grepl("cancer",term) ~ "Cancer",
grepl("diabetes",term) ~ "Diabetes",
grepl("dementia", term) ~ "Dementia",
grepl("arthritis",term) ~ "Arthritis",
grepl("heart",term) ~ "Heart Disease",
grepl("lung",term) ~ "Lung Disease",
grepl("sex",term) ~ "Sex")) %>%
## transform the variables to factor
dplyr::mutate(Variable = as_factor(Variable)) %>%
dplyr::mutate(term = as_factor(term)) -> model9Tidy
model9Tidy %>% ggplot (aes(x = Odds.Ratio, y = term, xmin = conf.low,
xmax = conf.high, height = 0, alpha = !reference)) +
## inserting a point for the odds ratio
geom_point(aes(alpha = !reference)) +
## drawing a line at the 1 to improve readability
geom_vline(xintercept = 1, lty = 4) +
## drawing an horizontal line to represent the
## confidence interval
geom_errorbarh() +
scale_y_discrete(labels = labels2) +
labs(x = "Odds Ratio", y = "",
title = "Sleep Quality being Fair or Better",
subtitle = "Adjusted Odds Ratios",
caption = paste("N =", nobs(model3))) +
facet_grid(Variable ~ ., scales = "free_y", space = "free_y") +
scale_alpha_identity() +
# hide facet labels
theme(strip.background = element_blank(),
strip.text = element_blank()) +
# remove spacing between facets
theme(panel.spacing = unit(0, "pt")) +
scale_x_continuous(breaks = seq(0,1.8, by = 0.2),
limits = (c(0.3,1.67))) -> plot9
## checking if plot has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots/plot9.png")){
## if it hasn't, saving the plot for publication
ggsave("plot9.png", plot9,
path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots")
}
plot9
The results show us that being male is associated with increased odds of having fair or better sleep quality.
We can also substitute the usage of variables to determine presence of comorbidities for a general health assessment reported by the patient himself, to see if it changes the analysis.
# modeling
model10 <- glm(sleepqualitydich ~ pain + genhealth + age + bmi + sex, "binomial", data = PPCRdata)
model10 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>%
rename(Odds.Ratio = estimate) -> model10Tidy
tbl_pub (model10, tablePath, "model10.docx",
append(labelList[-c(1,5,6,7,8,9,10,11,12,13)], genhealth ~ "General Health Assessment"))
| Variable | OR1 | 95% CI1 | p-value |
|---|---|---|---|
| Pain | |||
| No | — | — | |
| Yes | 0.45 | 0.39, 0.52 | <0.001 |
| General Health Assessment | |||
| Very bad | — | — | |
| Very bad | — | — | |
| Bad | 1.48 | 1.07, 2.03 | 0.016 |
| Fair | 2.85 | 2.11, 3.84 | <0.001 |
| Good | 6.33 | 4.58, 8.74 | <0.001 |
| Very good | 8.42 | 5.21, 14.0 | <0.001 |
| Excellent | 9.44 | 4.87, 20.2 | <0.001 |
| Age | 1.01 | 1.00, 1.02 | 0.017 |
| Body Mass Index | 1.02 | 1.01, 1.03 | 0.004 |
| Sex | |||
| Female | — | — | |
| Male | 1.67 | 1.44, 1.95 | <0.001 |
| 1 OR = Odds Ratio, CI = Confidence Interval | |||
While the direction of the effect was the same, when considering the subjects perception of their own general health status instead of comorbidity level, the presence of pain has a lower effect size in the association of decreased odds of having sleep quality that is fair or better.
We can also take a look at the subset of patients that have pain to see if sleep quality is worse with increasing levels of pain.
## unadjusted
model11 <- glm(sleepqualitydich ~ p_intensity,
data = filter(PPCRdata, pain == "Yes"), family = "binomial")
model11 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>%
rename(Odds.Ratio = estimate) -> model11Tidy
tbl_pub(model11, tablePath, "model11.docx",
list(p_intensity ~ "Pain Intensity")) -> tbl11
## adjusted
model12 <- glm(sleepqualitydich ~ p_intensity + age + bmi + sex + depression + alcohol +
fracture + cancer + diabetes + dementia + arthritis +
heartdisease + lungdisease,
data = filter(PPCRdata, pain == "Yes"),
family = "binomial")
model12 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>%
rename(Odds.Ratio = estimate) -> model12Tidy
tbl_pub (model12, tablePath, "model12.docx",
append(labelList[-c(1,2)], p_intensity ~ "Pain Intensity")) -> tbl12
tbl_merge(list(tbl11,tbl12),
tab_spanner = c("**Unadjusted**", "**Adjusted**")) -> merged4
if(!file.exists(paste(tablePath,"merged4.docx", sep=""))){
merged4 %>% as_gt() %>% gt::gtsave(filename = "merged4.docx", path = tablePath)
}
merged4
| Variable | Unadjusted | Adjusted | ||||
|---|---|---|---|---|---|---|
| OR1 | 95% CI1 | p-value | OR1 | 95% CI1 | p-value | |
| Pain Intensity | ||||||
| Soft/weak | — | — | — | — | ||
| Moderate | 0.60 | 0.43, 0.83 | 0.003 | 0.67 | 0.47, 0.94 | 0.022 |
| Intense/strong | 0.28 | 0.20, 0.39 | <0.001 | 0.35 | 0.24, 0.49 | <0.001 |
| Age | 1.00 | 0.99, 1.01 | >0.9 | |||
| Body Mass Index | 1.02 | 1.01, 1.04 | 0.005 | |||
| Sex | ||||||
| Female | — | — | ||||
| Male | 1.27 | 1.02, 1.58 | 0.036 | |||
| Depression | ||||||
| No | — | — | ||||
| Yes | 0.40 | 0.32, 0.51 | <0.001 | |||
| Alcohol | ||||||
| Never | — | — | ||||
| Less than once a month | 1.23 | 0.85, 1.82 | 0.3 | |||
| Once a month or more | 1.06 | 0.78, 1.43 | 0.7 | |||
| Fracture | ||||||
| No | — | — | ||||
| Yes | 0.50 | 0.27, 0.93 | 0.028 | |||
| Cancer | ||||||
| No | — | — | ||||
| Yes | 0.88 | 0.58, 1.38 | 0.6 | |||
| Diabetes | ||||||
| No | — | — | ||||
| Yes | 0.95 | 0.75, 1.19 | 0.6 | |||
| Dementia | ||||||
| No | — | — | ||||
| Yes | 0.47 | 0.27, 0.82 | 0.007 | |||
| Arthritis | ||||||
| No | — | — | ||||
| Yes | 0.84 | 0.69, 1.02 | 0.079 | |||
| Heart Disease | ||||||
| No | — | — | ||||
| Yes | 0.67 | 0.51, 0.89 | 0.005 | |||
| Lung Disease | ||||||
| No | — | — | ||||
| Yes | 0.81 | 0.60, 1.11 | 0.2 | |||
| 1 OR = Odds Ratio, CI = Confidence Interval | ||||||
Increasing levels of pain intensity are also associated with decreased odds of having fair or better sleep quality.