Introduction

In this manuscript, we’re going to conduct an analysis into the ELSI Dataset. This is a modified version for usage in the Principles and Practice of Clinical Research program of Harvard T.H. Chan School of Public Health.

The aim here is to replicate the analysis done in a previous paper and see if the results also apply to the sample that was included in the ELSI Brazil.

First order of business is to download the dataset we’re going to be analyzing, which was kindly provided by PPCR staff.

## Loading the necessary libraries
library(tidyverse)
library(haven)
library(broom)
library(ggplot2)
library(gtsummary)
## Url provided by PPCR staff
url <- 'https://ryver-prd-files.s3.us-west-2.amazonaws.com/tnt45405/T9gKrWnaRr5K0yT/FINAL%20-%20ELSI_students.dta'

## Using the read_dta function from haven to read the .dta file created in STATA
PPCRdata <- read_dta(url)

## Removing the url from memory as it is no longer useful
rm(url)

## Transforming to tibble in order to increase readability
PPCRdata <- as_tibble(PPCRdata)
PPCRdata
## # A tibble: 6,974 × 121
##         id2   psu calibrated_weight stratum region    area    hometype staircase
##       <dbl> <dbl>             <dbl>   <dbl> <dbl+lbl> <dbl+l> <dbl+lb> <dbl+lbl>
##  1 20200001    21             3111.      30 1 [North] 1 [Urb… 1 [Hous… 0 [No]   
##  2 20200002    21             1555.      30 1 [North] 1 [Urb… 1 [Hous… 0 [No]   
##  3 20200003    21             1147.      30 1 [North] 1 [Urb… 1 [Hous… 0 [No]   
##  4 20200004    21             1523.      30 1 [North] 1 [Urb… 1 [Hous… 0 [No]   
##  5 20200007    21             2563.      30 1 [North] 1 [Urb… 1 [Hous… 0 [No]   
##  6 20200009    21             5127.      30 1 [North] 1 [Urb… 1 [Hous… 0 [No]   
##  7 20200012    21              918.      30 1 [North] 1 [Urb… 1 [Hous… 1 [Yes]  
##  8 20200013    21              757.      30 1 [North] 1 [Urb… 1 [Hous… 1 [Yes]  
##  9 20200014    21             1027.      30 1 [North] 1 [Urb… 1 [Hous… 0 [No]   
## 10 20200015    21             1005.      30 1 [North] 1 [Urb… 1 [Hous… 0 [No]   
## # ℹ 6,964 more rows
## # ℹ 113 more variables: sex <dbl+lbl>, age <dbl+lbl>, marital <dbl+lbl>,
## #   race <dbl+lbl>, children <dbl+lbl>, grandchildren <dbl+lbl>,
## #   siblings <dbl+lbl>, education <dbl+lbl>, dis_med <dbl+lbl>,
## #   dis_social <dbl+lbl>, dis_work <dbl+lbl>, dis_family <dbl+lbl>,
## #   dis_place <dbl+lbl>, work_12 <dbl+lbl>, workability <dbl+lbl>,
## #   retired <dbl+lbl>, reasonretire <dbl+lbl>, incomescore <dbl+lbl>, …

PPCR Staff has described the variables included in this dataset with the following: Variables in modified dataset

Adjusting Variables for the Analysis

Let’s now look at the some of the individual variables to understand what they mean

unique(PPCRdata$pain)
## <labelled<double>[2]>: Do you feel pain that bothers you frequently?
## [1] 1 0
## 
## Labels:
##  value                     label
##      0                        No
##      1                       Yes
##      8            Does not apply
##      9 Didn’t know/didn’t answer
unique(PPCRdata$p_intensity)
## <labelled<double>[4]>: How intense is this pain most of the time?
## [1]  2 NA  3  1
## 
## Labels:
##  value                     label
##      1                 Soft/weak
##      2                  Moderate
##      3            Intense/strong
##      8            Does not apply
##      9 Didn’t know/didn’t answer
unique(PPCRdata$sleepquality)
## <labelled<double>[6]>: How would you evaluate the quality of your sleep?
## [1]  4  2  3  1  5 NA
## 
## Labels:
##  value                     label
##      1                 Very good
##      2                      Good
##      3                      Fair
##      4                       Bad
##      5                  Very bad
##      9 Didn’t know/didn’t answer
unique(PPCRdata$sleepproblems)
## <labelled<double>[3]>: Has a doctor ever told you that you have sleep problems?
## [1]  0  1 NA
## 
## Labels:
##  value                     label
##      0                        No
##      1                       Yes
##      8            Does not apply
##      9 Didn’t know/didn’t answer

We can then see that pain and sleepproblems are binary variables, while p_intensity and sleepquality are ordinal variables, although they aren’t coded in this manner yet.

Now let’s transform p_intensity and sleepquality into categorical variables and pain and sleepproblems into dichotomous variables, as they were read as numerical variables with labels.

## Using the as_factor function from haven to transform the variables 
## with the labels as names for the levels in the factors
PPCRdata$sleepquality <- as_factor(PPCRdata$sleepquality)
PPCRdata$p_intensity <- as_factor(PPCRdata$p_intensity)
PPCRdata$pain <- as_factor(PPCRdata$pain)
PPCRdata$sleepproblems <- as_factor(PPCRdata$sleepproblems)


levels(PPCRdata$sleepquality)
## [1] "Very good"                 "Good"                     
## [3] "Fair"                      "Bad"                      
## [5] "Very bad"                  "Didn’t know/didn’t answer"
## Reversing the order of factors, as initially the order was wrong in sleepquality
PPCRdata$sleepquality <- fct_rev(PPCRdata$sleepquality)
levels(PPCRdata$sleepquality)
## [1] "Didn’t know/didn’t answer" "Very bad"                 
## [3] "Bad"                       "Fair"                     
## [5] "Good"                      "Very good"
## doing the same with genhealth as it will be used for a sensitivity analysis
# adjusting the genhealth variable
PPCRdata %>% mutate(genhealth = fct_rev(as_factor(genhealth))) -> PPCRdata

Other variables that we’re going to use in our model that haven’t been coded properly as well include: depression, alcohol, fracture, cancer, diabetes, dementia, arthritis, heartdisease and lungdisease.

## Showing that the variables are coded as numerical variables
unique(PPCRdata$depression)
## <labelled<double>[3]>: Has a doctor ever told you that you have depression?
## [1]  1  0 NA
## 
## Labels:
##  value                     label
##      0                        No
##      1                       Yes
##      9 Didn’t know/didn’t answer
## Commented the other variables for brevity, but they were all numerical
## unique(PPCRdata$sex)
## unique(PPCRdata$alcohol)
## unique(PPCRdata$fracture)
## unique(PPCRdata$cancer)
## unique(PPCRdata$diabetes)
## unique(PPCRdata$dementia)
## unique(PPCRdata$arthritis)
## unique(PPCRdata$heartdisease)
## unique(PPCRdata$lungdisease)

## Using the as_factor function from haven to transform the variables 
## with the labels as names for the levels in the factors
PPCRdata$depression <- as_factor(PPCRdata$depression)
PPCRdata$sex <- as_factor(PPCRdata$sex)
PPCRdata$alcohol <- as_factor(PPCRdata$alcohol)
PPCRdata$fracture <- as_factor(PPCRdata$fracture)
PPCRdata$cancer <- as_factor(PPCRdata$cancer)
PPCRdata$diabetes <- as_factor(PPCRdata$diabetes)
PPCRdata$dementia <- as_factor(PPCRdata$dementia)
PPCRdata$arthritis <- as_factor(PPCRdata$arthritis)
PPCRdata$heartdisease <- as_factor(PPCRdata$heartdisease)
PPCRdata$lungdisease <- as_factor(PPCRdata$lungdisease)

## Converting age to numeric

PPCRdata$age <- as.numeric(PPCRdata$age)

Let’s now create a table summarizing all the information we have collected.

## Table 1
## divided by pain

## selecting all the variables that will be used
PPCRdata %>% dplyr::select(age,bmi,sex,pain,p_intensity,sleepquality,
                           sleepproblems,depression,alcohol,fracture,cancer,
                           diabetes,dementia,arthritis,heartdisease,
                           lungdisease) %>%
        
        ## recoding the variables to not have levels that aren't used, so
        ## that they don't clutter the table
        mutate(pain = recode(droplevels(pain), No = "No Pain", Yes = "Pain"),
               p_intensity = droplevels(p_intensity),
               sleepquality = droplevels(sleepquality),
               depression = droplevels(depression),
               alcohol = droplevels(alcohol),
               diabetes = droplevels(diabetes),
               dementia = droplevels(dementia),
               arthritis = droplevels(arthritis),
               cancer = droplevels(cancer),
               sleepproblems = droplevels(sleepproblems)) %>%
        
        ## determining how each type of variable will be presented
        ## continous as mean and sd and categorical as absolute relative
        ## frequencies
        tbl_summary(statistic = list(all_continuous() ~ "{mean} ({sd})",
                                    all_categorical() ~ "{n} ({p}%)"),
                    ## not showing NA's in each variable
                    missing = "no",
                    ## dividing the observations by pain
                    by = pain,
                    ## presenting the variables better
                    label = list(p_intensity ~ "Pain Intensity",
                                 sleepquality ~ "Sleep Quality",
                                 depression ~ "Depression",
                                 alcohol ~ "Alcohol Consumption",
                                 diabetes ~ "Diabetes Mellitus",
                                 dementia ~ "Dementia",
                                 arthritis ~ "Arthritis",
                                 cancer ~ "Cancer",
                                 sleepproblems ~ "Sleep Problems",
                                 age ~ "Age",
                                 bmi ~ "Body Mass Index",
                                 fracture ~ "Fracture of Hip or Wrist",
                                 heartdisease ~ "Heart Disease",
                                 lungdisease ~ "Lung Disease")) %>%
                ## title of the variable column
                modify_header(label = "**Variable**") %>%
                ## title of the table
                modify_caption("Subjects baseline characteristics") %>%
                bold_labels() %>%
                ## since we're not showing NA's in each variable, showing the
                ## N for each specific variable
                add_n() %>% 
                ## adding a column for overall observations, not divided by pain
                add_overall(last = TRUE) -> table1

## checking if table has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/tables/table1.docx")){
        ## if it hasn't, saving the table for publication
        table1 %>% 
                as_gt() %>% 
                gt::gtsave(filename = "table1.docx", path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/tables")
}
## printing the table
table1
Subjects baseline characteristics
Variable N No Pain, N = 4,3911 Pain, N = 2,5831 Overall, N = 6,9741
Age 6,974 66 (9) 65 (9) 66 (9)
Body Mass Index 6,850 27.5 (4.9) 28.6 (5.7) 27.9 (5.2)
Sex 6,974


    Female
2,462 (56%) 1,766 (68%) 4,228 (61%)
    Male
1,929 (44%) 817 (32%) 2,746 (39%)
Pain Intensity 2,583


    Soft/weak
0 (NA%) 318 (12%) 318 (12%)
    Moderate
0 (NA%) 1,348 (52%) 1,348 (52%)
    Intense/strong
0 (NA%) 917 (36%) 917 (36%)
Sleep Quality 6,968


    Very bad
70 (1.6%) 162 (6.3%) 232 (3.3%)
    Bad
352 (8.0%) 536 (21%) 888 (13%)
    Fair
907 (21%) 724 (28%) 1,631 (23%)
    Good
2,387 (54%) 932 (36%) 3,319 (48%)
    Very good
673 (15%) 225 (8.7%) 898 (13%)
Sleep Problems 6,941 293 (6.7%) 495 (19%) 788 (11%)
Depression 6,963 422 (9.6%) 477 (19%) 899 (13%)
Alcohol Consumption 6,961


    Never
3,268 (75%) 2,059 (80%) 5,327 (77%)
    Less than once a month
484 (11%) 197 (7.6%) 681 (9.8%)
    Once a month or more
631 (14%) 322 (12%) 953 (14%)
Fracture of Hip or Wrist 6,974 42 (1.0%) 47 (1.8%) 89 (1.3%)
Cancer 6,970 175 (4.0%) 117 (4.5%) 292 (4.2%)
Diabetes Mellitus 6,943 695 (16%) 563 (22%) 1,258 (18%)
Dementia 6,969 47 (1.1%) 69 (2.7%) 116 (1.7%)
Arthritis 6,937 546 (12%) 922 (36%) 1,468 (21%)
Heart Disease 6,968 260 (5.9%) 292 (11%) 552 (7.9%)
Lung Disease 6,968 159 (3.6%) 233 (9.0%) 392 (5.6%)
1 Mean (SD); n (%)

It’s important to note that the pain intensity variable doesn’t have any observations for those that do not have pain in the first place, as should be expected.

At this point we have coded all the variables correctly, so we can proceed with the analysis.

Analysis

Association between pain and sleepquality

Unadjusted Analysis

This analysis checks for the association between the pain variable and the sleepquality variable, without adjustments. As pain is a binary variable, we’re going to run a logistic regression.

## using the generalized linear model function 
## with the family = "binomial" to create a logistical
## regression

model1 <- glm (data = PPCRdata, formula = pain ~ sleepquality,
               family = "binomial")
summary(model1)
## 
## Call:
## glm(formula = pain ~ sleepquality, family = "binomial", data = PPCRdata)
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             0.8391     0.1430   5.866 4.45e-09 ***
## sleepqualityBad        -0.4186     0.1586  -2.639  0.00832 ** 
## sleepqualityFair       -1.0645     0.1515  -7.028 2.10e-12 ***
## sleepqualityGood       -1.7796     0.1482 -12.011  < 2e-16 ***
## sleepqualityVery good  -1.9347     0.1624 -11.910  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9184.1  on 6967  degrees of freedom
## Residual deviance: 8669.3  on 6963  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 8679.3
## 
## Number of Fisher Scoring iterations: 4
## coef(model1) extracts the coefficients from the model
## confint(model1) calculates confidence intervals (95%)
## cbind then creates a table by joining the two by column
## finally, we exponentiate all the results using the base e
## in order to transform logit units into regular units

exp(cbind(OR = coef(model1),confint(model1)))
## Waiting for profiling to be done...
##                              OR     2.5 %    97.5 %
## (Intercept)           2.3142857 1.7567890 3.0806419
## sleepqualityBad       0.6579686 0.4799679 0.8945713
## sleepqualityFair      0.3449168 0.2550187 0.4621682
## sleepqualityGood      0.1687122 0.1255215 0.2245582
## sleepqualityVery good 0.1444610 0.1045639 0.1978048
## another way, which is simpler and does the same, is to use
## the broom::tidy function

model1Tidy <- tidy(model1, conf.int = TRUE, exponentiate = TRUE)
model1Tidy %>% rename(Odds.Ratio = estimate) -> model1Tidy
model1Tidy
## # A tibble: 5 × 7
##   term                Odds.Ratio std.error statistic  p.value conf.low conf.high
##   <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)              2.31      0.143      5.87 4.45e- 9    1.76      3.08 
## 2 sleepqualityBad          0.658     0.159     -2.64 8.32e- 3    0.480     0.895
## 3 sleepqualityFair         0.345     0.151     -7.03 2.10e-12    0.255     0.462
## 4 sleepqualityGood         0.169     0.148    -12.0  3.10e-33    0.126     0.225
## 5 sleepqualityVery g…      0.144     0.162    -11.9  1.05e-32    0.105     0.198
## as we will be doing this extensively for each model, creating a function
## in order to not repeat the code every time
tbl_pub <- function(model, path, file, labels){
        model %>% 
                        tbl_regression(exponentiate = TRUE, conf.int = TRUE, 
                                       label = labels) %>% 
                        bold_p() %>%
                        bold_labels() %>%
                        italicize_levels() %>%
                        modify_header(label = "**Variable**") -> tblx
        ## checking if the model has been saved already
        if (!file.exists(paste(path,file,sep = ""))){
                ## if it hasn't, saving the model for publication
                tblx %>%
                        as_gt() %>% 
                        gt::gtsave(filename = file, path = path)   
        }
        tblx
}

## creating a label list
labelList <- list(sleepquality ~ "Sleep Quality",
                  pain ~ "Pain",
                  age ~ "Age",
                  bmi ~ "Body Mass Index",
                  depression ~ "Depression",
                  alcohol ~ "Alcohol",
                  fracture ~ "Fracture",
                  cancer ~ "Cancer",
                  diabetes ~ "Diabetes",
                  dementia ~ "Dementia",
                  arthritis ~ "Arthritis",
                  heartdisease ~ "Heart Disease",
                  lungdisease ~ "Lung Disease",
                  sex ~ "Sex")

## creating a path variable

tablePath <- "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/tables/"

## creating and saving the publication table
tbl_pub (model1, tablePath, "model1.docx", labelList[1]) -> tbl1


## let's plot these coefficients and their CI's to increase
## readability

## setting up two label arrays for reuse later

labels1 <- c("sleepqualityVery good" = "Very Good", "sleepqualityGood" = "Good",
             "sleepqualityFair" = "Fair", "sleepqualityBad" = "Bad")

labels2 <-c("sleepqualityVery good" = "Sleep Quality: Very Good",
            "sleepqualityGood" = "Sleep Quality: Good",
            "sleepqualityFair" = "Sleep Quality: Fair",
            "sleepqualityBad" = "Sleep Quality: Bad",
            "sleepqualityVery Bad" = "Reference - Sleep Quality: Very Bad",
            "alcoholOnce a month or more" = "Alcohol Use: Once a month of more",
            "alcoholLess than once a month" = "Alcohol Use: Less than once a month",
            "alcoholNever" = "Reference - Alcohol Use: Never",
            "depressionYes" = "Depression",
            "alcoholYes" = "Alcohol Consumption",
            "diabetesYes" = "Diabetes Mellitus",
            "dementiaYes" = "Dementia",
            "arthritisYes" = "Arthritis",
            "cancerYes" = "Cancer",
            "age" = "Age",
            "bmi" = "Body Mass Index",
            "fractureYes" = "Fracture of Hip or Wrist",
            "heartdiseaseYes" = "Heart Disease",
            "lungdiseaseYes" = "Lung Disease",
            "painYes" = "Pain",
            "sexMale" = "Male")

## creating a plot

model1Tidy %>% ggplot (aes(Odds.Ratio, term, xmin = conf.low,
                           xmax = conf.high, height = 0)) +
                ## inserting a point for the odds ratio
                geom_point() +
                ## drawing a line at the 1 to improve readability
                geom_vline(xintercept = 1, lty = 4) +
                ## drawing an horizontal line to represent the 
                ## confidence interval
                geom_errorbarh() +
                ## removing the intercept term and changing labels
                scale_y_discrete(
                        limits = setdiff(model1Tidy$term, "(Intercept)"),
                        labels = labels1) +
                labs(x = "Odds Ratio", y = "Sleep Quality",
                     title = "Presence of Pain", 
                subtitle = "Unadjusted odds ratio, in comparison to \"Very Bad\" Sleep Quality",
                caption = paste("N =", nobs(model1))) +
                ## adjusting the ticks on the x axis and its range
                scale_x_continuous (breaks = seq(0,1.2,0.2),
                                    limits = c(0,1.2)) -> plot1
## checking if plot has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots/plot1.png")){
        ## if it hasn't, saving the plot for publication
        ggsave("plot1.png", plot1, 
               path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots", height = 3) 
}

plot1

This analysis allows us to determine that, when using a reference level of sleepquality that is very bad, the odds of experiencing pain when sleepquality increases are significantly lower (the odds ratio of the comparison of each other level with the very bad level of sleep quality can be seen in the table and plot above).


Sensitivity Analysis

In this next model, we’re going to consider sleepquality as a continuous variable.

## doing the transformation and saving on another variable to not lose the
## categorical information (names for each level)
PPCRdata %>% mutate(sleepqualitynum = 
        case_when(PPCRdata$sleepquality == 'Very bad' ~ 1,
                  PPCRdata$sleepquality == 'Bad' ~ 2,
                  PPCRdata$sleepquality == 'Fair' ~ 3,
                  PPCRdata$sleepquality == 'Good' ~ 4,
                  PPCRdata$sleepquality == 'Very good' ~ 5)) -> PPCRdata

Now that the variable is transformed to numeric, let’s run the model again.

glm (data = PPCRdata, formula = pain ~ sleepqualitynum, 
     family = "binomial") -> model2
model2 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>% 
        rename(Odds.Ratio = estimate) -> model2Tidy
model2Tidy
## # A tibble: 2 × 7
##   term            Odds.Ratio std.error statistic   p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)          4.33     0.0963      15.2 3.29e- 52    3.59      5.23 
## 2 sleepqualitynum      0.564    0.0269     -21.3 2.07e-100    0.535     0.594
model2Tidy %>% ggplot (aes(Odds.Ratio, term, xmin = conf.low,
                           xmax = conf.high, height = 0)) +
                ## inserting a point for the odds ratio
                geom_point() +
                ## drawing a line at the 1 to improve readability
                geom_vline(xintercept = 1, lty = 4) +
                ## drawing an horizontal line to represent the 
                ## confidence interval
                geom_errorbarh() +
                scale_y_discrete(
                        limits = setdiff(model2Tidy$term, "(Intercept)"))

In order to determine which model is best, we’re going to look at the AIC values. It looks like changing the variable to continues increases the AIC, which tells us that the model is behaving worse than the categorical. Let’s keep it categorical, then.


Adjusted Analysis

This analysis checks for the association between the pain variable and the sleepquality variable, adjusting the model for bmi, comorbidities and depression, as was done in the original article. As pain is a binary variable, we’re going to run a logistic regression.

model3 <- glm(data = PPCRdata, formula = pain ~ sleepquality + age + bmi + 
                      depression + alcohol + fracture + cancer + diabetes + 
                      dementia + arthritis + heartdisease + lungdisease,
                family = "binomial")
summary(model3)
## 
## Call:
## glm(formula = pain ~ sleepquality + age + bmi + depression + 
##     alcohol + fracture + cancer + diabetes + dementia + arthritis + 
##     heartdisease + lungdisease, family = "binomial", data = PPCRdata)
## 
## Coefficients:
##                                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.178116   0.306158   0.582 0.560716    
## sleepqualityBad               -0.358261   0.172220  -2.080 0.037502 *  
## sleepqualityFair              -0.884865   0.165214  -5.356 8.51e-08 ***
## sleepqualityGood              -1.502021   0.161661  -9.291  < 2e-16 ***
## sleepqualityVery good         -1.671866   0.176887  -9.452  < 2e-16 ***
## age                           -0.011630   0.003067  -3.792 0.000150 ***
## bmi                            0.028117   0.005353   5.253 1.50e-07 ***
## depressionYes                  0.309084   0.082688   3.738 0.000186 ***
## alcoholLess than once a month -0.349505   0.097373  -3.589 0.000332 ***
## alcoholOnce a month or more   -0.062550   0.082267  -0.760 0.447056    
## fractureYes                    0.445831   0.243776   1.829 0.067421 .  
## cancerYes                     -0.019709   0.136656  -0.144 0.885322    
## diabetesYes                    0.194192   0.071420   2.719 0.006548 ** 
## dementiaYes                    0.264841   0.216723   1.222 0.221699    
## arthritisYes                   1.228566   0.066383  18.507  < 2e-16 ***
## heartdiseaseYes                0.455709   0.101336   4.497 6.89e-06 ***
## lungdiseaseYes                 0.818769   0.116491   7.029 2.09e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8872.2  on 6752  degrees of freedom
## Residual deviance: 7803.9  on 6736  degrees of freedom
##   (221 observations deleted due to missingness)
## AIC: 7837.9
## 
## Number of Fisher Scoring iterations: 4
## creating and saving publication-ready table
tbl_pub (model3, tablePath, "model3.docx", labelList[-c(2,14)]) -> tbl3

tbl_merge(list(tbl1,tbl3),
          tab_spanner = c("**Unadjusted**", "**Adjusted**")) -> merged1
## The merging columns (variable name, variable label, row type, and label column)
## are not unique and the merge may fail or result in a malformed table. If you
## previously 'tbl_stack'ed your tables, then 'tbl_merge'ing before you 'tbl_stack'
## may resolve the issue.
if(!file.exists(paste(tablePath,"merged1.docx", sep=""))){
        merged1 %>% as_gt() %>% gt::gtsave(filename = "merged1.docx", path = tablePath)
}

Let’s adjust the results and plot the odds ratios.

model3Tidy <- tidy(model3, conf.int = TRUE, exponentiate = TRUE)
model3Tidy %>% rename(Odds.Ratio = estimate) -> model3Tidy
model3Tidy
## # A tibble: 17 × 7
##    term               Odds.Ratio std.error statistic  p.value conf.low conf.high
##    <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
##  1 (Intercept)             1.19    0.306       0.582 5.61e- 1    0.657     2.18 
##  2 sleepqualityBad         0.699   0.172      -2.08  3.75e- 2    0.497     0.976
##  3 sleepqualityFair        0.413   0.165      -5.36  8.51e- 8    0.297     0.568
##  4 sleepqualityGood        0.223   0.162      -9.29  1.53e-20    0.161     0.305
##  5 sleepqualityVery …      0.188   0.177      -9.45  3.34e-21    0.132     0.265
##  6 age                     0.988   0.00307    -3.79  1.50e- 4    0.982     0.994
##  7 bmi                     1.03    0.00535     5.25  1.50e- 7    1.02      1.04 
##  8 depressionYes           1.36    0.0827      3.74  1.86e- 4    1.16      1.60 
##  9 alcoholLess than …      0.705   0.0974     -3.59  3.32e- 4    0.582     0.852
## 10 alcoholOnce a mon…      0.939   0.0823     -0.760 4.47e- 1    0.799     1.10 
## 11 fractureYes             1.56    0.244       1.83  6.74e- 2    0.969     2.53 
## 12 cancerYes               0.980   0.137      -0.144 8.85e- 1    0.748     1.28 
## 13 diabetesYes             1.21    0.0714      2.72  6.55e- 3    1.06      1.40 
## 14 dementiaYes             1.30    0.217       1.22  2.22e- 1    0.853     2.00 
## 15 arthritisYes            3.42    0.0664     18.5   1.81e-76    3.00      3.89 
## 16 heartdiseaseYes         1.58    0.101       4.50  6.89e- 6    1.29      1.92 
## 17 lungdiseaseYes          2.27    0.116       7.03  2.09e-12    1.81      2.85
## adding the reference terms to the data frame to allow for a better plot
model3Tidy %>% add_row(term = "sleepqualityVery Bad", Odds.Ratio = -99, std.error = 0, statistic = 0, p.value = 0, conf.low = 0, conf.high = 0, .before = 2) %>%
        add_row(term = "alcoholNever", Odds.Ratio = -99, std.error = 0, statistic = 0, p.value = 0, conf.low = 0, conf.high = 0, .before = 10) %>%
        ## filtering the intercept
        filter (term !="(Intercept)") %>%
        ## saving the result so that we can refer to it in the next call
        {. ->> model3Tidy}%>%
        ## adding a variable to determine whether the observation
        ## is a reference or not. Initially adding everything as FALSE
        add_column(.after = "conf.high",
                   reference = rep(FALSE, length.out = nrow(model3Tidy))) %>%
        ## now the appropriate ones to TRUE
        dplyr::mutate(reference = case_when
               (term %in% c("Very Bad", "Never") ~ TRUE,
                                     TRUE ~ FALSE)) %>%
        ## create a Variable column
        dplyr::mutate(.after= "term",
                      Variable = case_when(grepl("sleep", term) ~ "Sleep Quality",
                                           grepl("age", term) ~ "Age",
                                           grepl("bmi", term) ~ "Body Mass Index",
                                           grepl("depression",term) ~ "Depression",
                                           grepl("alcohol",term) ~ "Alcohol",
                                           grepl("fracture", term) ~ "Fracture",
                                           grepl("cancer",term) ~ "Cancer",
                                           grepl("diabetes",term) ~ "Diabetes",
                                           grepl("dementia", term) ~ "Dementia",
                                           grepl("arthritis",term) ~ "Arthritis",
                                           grepl("heart",term) ~ "Heart Disease",
                                           grepl("lung",term) ~ "Lung Disease")) %>%
        ## transform the variables to factor
        dplyr::mutate(Variable = as_factor(Variable)) %>%
        dplyr::mutate(term = as_factor(term)) -> model3Tidy


model3Tidy %>% ggplot (aes(x = Odds.Ratio, y = term, xmin = conf.low,
                           xmax = conf.high, height = 0, alpha = !reference)) +
                ## inserting a point for the odds ratio
                geom_point(aes(alpha = !reference)) +
                ## drawing a line at the 1 to improve readability
                geom_vline(xintercept = 1, lty = 4) +
                ## drawing an horizontal line to represent the 
                ## confidence interval
                geom_errorbarh() +
                scale_y_discrete(labels = labels2) +
                labs(x = "Odds Ratio", y = "",
                     title = "Presence of Pain", 
                subtitle = "Adjusted Odds Ratios",
                caption = paste("N =", nobs(model3))) +
                facet_grid(Variable ~ ., scales = "free_y", space = "free_y") +
                scale_alpha_identity() +
                # hide facet labels
                theme(strip.background = element_blank(),
                strip.text = element_blank()) +
                # remove spacing between facets
                theme(panel.spacing = unit(0, "pt")) +
                scale_x_continuous(breaks = seq(0,3, by = 0.5),
                                   limits = (c(0,2.9))) -> plot3

## checking if plot has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots/plot3.png")){
        ## if it hasn't, saving the plot for publication
        ggsave("plot3.png", plot3, 
               path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots") 
}

plot3

If we refer to the analysis of the unadjusted model, the results are the same for the sleepquality variable. In summary, increasing levels of sleep quality are decrease the odds of pain being present.

Sensitivity Analysis

We can also test how the adjusted model behaves with sleep quality as a continuous variable.

model4 <- glm(data = PPCRdata, formula = pain ~ sleepqualitynum + age + bmi + 
                      depression + alcohol + fracture + cancer + diabetes + 
                      dementia + arthritis + heartdisease + lungdisease,
                family = "binomial")
summary(model4)
## 
## Call:
## glm(formula = pain ~ sleepqualitynum + age + bmi + depression + 
##     alcohol + fracture + cancer + diabetes + dementia + arthritis + 
##     heartdisease + lungdisease, family = "binomial", data = PPCRdata)
## 
## Coefficients:
##                                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.734425   0.282044   2.604 0.009216 ** 
## sleepqualitynum               -0.489627   0.029134 -16.806  < 2e-16 ***
## age                           -0.011787   0.003062  -3.849 0.000119 ***
## bmi                            0.028120   0.005343   5.263 1.42e-07 ***
## depressionYes                  0.304587   0.082189   3.706 0.000211 ***
## alcoholLess than once a month -0.357506   0.097191  -3.678 0.000235 ***
## alcoholOnce a month or more   -0.063880   0.082100  -0.778 0.436524    
## fractureYes                    0.448646   0.242928   1.847 0.064772 .  
## cancerYes                     -0.015393   0.136500  -0.113 0.910215    
## diabetesYes                    0.198541   0.071297   2.785 0.005358 ** 
## dementiaYes                    0.276703   0.216056   1.281 0.200299    
## arthritisYes                   1.234864   0.066287  18.629  < 2e-16 ***
## heartdiseaseYes                0.461404   0.101158   4.561 5.09e-06 ***
## lungdiseaseYes                 0.825247   0.116301   7.096 1.29e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8872.2  on 6752  degrees of freedom
## Residual deviance: 7818.1  on 6739  degrees of freedom
##   (221 observations deleted due to missingness)
## AIC: 7846.1
## 
## Number of Fisher Scoring iterations: 4
model4Tidy <- tidy(model4, conf.int = TRUE, exponentiate = TRUE)
model4Tidy %>% rename(Odds.Ratio = estimate) -> model4Tidy
model4Tidy
## # A tibble: 14 × 7
##    term               Odds.Ratio std.error statistic  p.value conf.low conf.high
##    <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
##  1 (Intercept)             2.08    0.282       2.60  9.22e- 3    1.20      3.63 
##  2 sleepqualitynum         0.613   0.0291    -16.8   2.21e-63    0.579     0.649
##  3 age                     0.988   0.00306    -3.85  1.19e- 4    0.982     0.994
##  4 bmi                     1.03    0.00534     5.26  1.42e- 7    1.02      1.04 
##  5 depressionYes           1.36    0.0822      3.71  2.11e- 4    1.15      1.59 
##  6 alcoholLess than …      0.699   0.0972     -3.68  2.35e- 4    0.577     0.845
##  7 alcoholOnce a mon…      0.938   0.0821     -0.778 4.37e- 1    0.798     1.10 
##  8 fractureYes             1.57    0.243       1.85  6.48e- 2    0.973     2.53 
##  9 cancerYes               0.985   0.136      -0.113 9.10e- 1    0.752     1.28 
## 10 diabetesYes             1.22    0.0713      2.78  5.36e- 3    1.06      1.40 
## 11 dementiaYes             1.32    0.216       1.28  2.00e- 1    0.864     2.02 
## 12 arthritisYes            3.44    0.0663     18.6   1.87e-77    3.02      3.92 
## 13 heartdiseaseYes         1.59    0.101       4.56  5.09e- 6    1.30      1.93 
## 14 lungdiseaseYes          2.28    0.116       7.10  1.29e-12    1.82      2.87
model4Tidy %>% ggplot (aes(Odds.Ratio, term, xmin = conf.low,
                           xmax = conf.high, height = 0)) +
                ## inserting a point for the odds ratio
                geom_point() +
                ## drawing a line at the 1 to improve readability
                geom_vline(xintercept = 1, lty = 4) +
                ## drawing an horizontal line to represent the 
                ## confidence interval
                geom_errorbarh() +
                scale_y_discrete(
                        limits = setdiff(model4Tidy$term, "(Intercept)"))

The results point towards increases in the continuous variable also being significantly associated with a decrease in the odds of pain being present. Looking at the AIC, it still seems as if the model with sleepquality coded as categorical behaves better.

Testing the assumptions of the model

At this point, I haven’t yet learned how to properly test for the assumptions of the logistic regression model. On a later opportunity, I’ll try to remedy this situation. For now, I’ll include some commands that were listed as useful in the R you ready for R? e-book.

library(blorr)
blr_model_fit_stats(model3)
##                               Model Fit Statistics                                
## ---------------------------------------------------------------------------------
## Log-Lik Intercept Only:     -4596.949    Log-Lik Full Model:            -3901.953 
## Deviance(6736):              7803.906    LR(16):                         1389.993 
##                                          Prob > LR:                         0.000 
## MCFadden's R2                   0.151    McFadden's Adj R2:                 0.147 
## ML (Cox-Snell) R2:              0.181    Cragg-Uhler(Nagelkerke) R2:        0.247 
## McKelvey & Zavoina's R2:        0.194    Efron's R2:                        0.154 
## Count R2:                       0.701    Adj Count R2:                      0.182 
## BIC:                         7953.807    AIC:                            7837.906 
## ---------------------------------------------------------------------------------
blr_test_hosmer_lemeshow(model3)
##            Partition for the Hosmer & Lemeshow Test            
## --------------------------------------------------------------
##                         def = 1                 def = 0        
## Group    Total    Observed    Expected    Observed    Expected 
## --------------------------------------------------------------
##   1       676       107        107.90       569        568.10  
##   2       675       107        128.45       568        546.55  
##   3       675       134        142.86       541        532.14  
##   4       675       162        160.29       513        514.71  
##   5       676       210        188.13       466        487.87  
##   6       675       215        226.35       460        448.65  
##   7       675       313        272.65       362        402.35  
##   8       675       323        327.53       352        347.47  
##   9       675       389        398.34       286        276.66  
##  10       676       513        520.51       163        155.49  
## --------------------------------------------------------------
## 
##      Goodness of Fit Test      
## ------------------------------
## Chi-Square    DF    Pr > ChiSq 
## ------------------------------
##  20.6763      8       0.0081   
## ------------------------------
blr_roc_curve(blr_gains_table(model3))

car::vif(model3)
##                  GVIF Df GVIF^(1/(2*Df))
## sleepquality 1.044858  4        1.005500
## age          1.080334  1        1.039391
## bmi          1.037185  1        1.018423
## depression   1.047143  1        1.023300
## alcohol      1.044415  2        1.010923
## fracture     1.006378  1        1.003184
## cancer       1.008435  1        1.004208
## diabetes     1.033202  1        1.016465
## dementia     1.012373  1        1.006168
## arthritis    1.026991  1        1.013406
## heartdisease 1.021015  1        1.010453
## lungdisease  1.004299  1        1.002147

Association between sleepquality and pain

Ordinal Regression

Adjusted Analysis

In order to do this analysis I’ll try to follow the instructions from the UCLA OARC - Ordinal Logistic Regression | R Data Analysis Example.

As the original article by Montelhão et al tested for bidirectional associations, let’s try to replicate that as well. In this case, the dependent variable is ordinal, therefore we will run an ordinal logistical regression.

## creating an ordinal version of sleepquality

PPCRdata %>% mutate (sleepqualityord = factor(PPCRdata$sleepquality,
                                              ordered = TRUE)) -> PPCRdata

## loading the library required to run the model
library(MASS)

## running the model
model5 <- polr (data = PPCRdata, formula = sleepqualityord ~ pain + depression + 
                       alcohol + fracture + cancer + diabetes + dementia +
                       arthritis + heartdisease + lungdisease, Hess = TRUE)
summary(model5)
## Call:
## polr(formula = sleepqualityord ~ pain + depression + alcohol + 
##     fracture + cancer + diabetes + dementia + arthritis + heartdisease + 
##     lungdisease, data = PPCRdata, Hess = TRUE)
## 
## Coefficients:
##                                  Value Std. Error  t value
## painYes                       -0.85302    0.05024 -16.9801
## depressionYes                 -0.82700    0.06790 -12.1801
## alcoholLess than once a month  0.07362    0.07643   0.9633
## alcoholOnce a month or more    0.06820    0.06710   1.0165
## fractureYes                   -0.36430    0.20135  -1.8093
## cancerYes                     -0.01869    0.11499  -0.1625
## diabetesYes                   -0.11152    0.05947  -1.8752
## dementiaYes                   -0.90439    0.17339  -5.2160
## arthritisYes                  -0.30678    0.05771  -5.3158
## heartdiseaseYes               -0.28727    0.08436  -3.4051
## lungdiseaseYes                -0.14181    0.09885  -1.4347
## 
## Intercepts:
##                Value    Std. Error t value 
## Very bad|Bad    -4.1364   0.0780   -53.0304
## Bad|Fair        -2.3132   0.0473   -48.9539
## Fair|Good       -0.9812   0.0383   -25.6396
## Good|Very good   1.5042   0.0430    34.9433
## 
## Residual Deviance: 17569.05 
## AIC: 17599.05 
## (98 observations deleted due to missingness)
## tidy(model5, exponentiate = TRUE, p.values = TRUE, conf.int = TRUE)

The tidy function is not working for the polr model, even though it should. I tried to debug it, but couldn’t. I will then describe a more manual approach, as was described in the UCLA Office of Advanced Research Computing (OARC) page for Ordinal Regression.

## getting the coefficients
ctable <- coef(summary(model5))

## getting the respective p-values from the normal distribution
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2

## creating a table with the respective p-values added
ctable <- cbind(ctable, "p value" = p)
ctable
##                                     Value Std. Error     t value       p value
## painYes                       -0.85301696 0.05023641 -16.9800548  1.153784e-64
## depressionYes                 -0.82699798 0.06789738 -12.1801156  3.967194e-34
## alcoholLess than once a month  0.07362053 0.07642528   0.9633007  3.353966e-01
## alcoholOnce a month or more    0.06820388 0.06709866   1.0164715  3.094049e-01
## fractureYes                   -0.36430180 0.20135407  -1.8092596  7.041068e-02
## cancerYes                     -0.01869110 0.11498945  -0.1625462  8.708757e-01
## diabetesYes                   -0.11151556 0.05946946  -1.8751736  6.076884e-02
## dementiaYes                   -0.90439261 0.17338723  -5.2160278  1.828008e-07
## arthritisYes                  -0.30678023 0.05771145  -5.3157605  1.062128e-07
## heartdiseaseYes               -0.28726601 0.08436256  -3.4051363  6.613103e-04
## lungdiseaseYes                -0.14181420 0.09884818  -1.4346668  1.513821e-01
## Very bad|Bad                  -4.13641304 0.07800080 -53.0303894  0.000000e+00
## Bad|Fair                      -2.31320370 0.04725268 -48.9539177  0.000000e+00
## Fair|Good                     -0.98124293 0.03827055 -25.6396324 5.518018e-145
## Good|Very good                 1.50417355 0.04304618  34.9432561 1.639373e-267
## creating confidence intervals
ci<-confint.default(model5)
colnames(ci) <- c("conf.low", "conf.high")

## exponentiating the log-odds and the confidence intervals
## in order to create the odds-ratios and the respective CI's
model5OddsRatio <- exp(cbind(OR = coef(model5), ci))
## converting rownames to columns, as this will be necessary to build the plot
model5OddsRatio <- as_tibble(rownames_to_column(as.data.frame(model5OddsRatio)))
model5OddsRatio %>% rename(term = rowname) -> model5OddsRatio

Let’s build a plot using these values.

model5OddsRatio %>% ggplot (aes(OR, term, xmin = conf.low,
                           xmax = conf.high, height = 0)) +
                geom_point() +
                geom_vline(xintercept = 1, lty = 4) +
                geom_errorbarh()

The plot allows us to visually determine that the presence of pain decreases the odds of sleepquality being higher. Unfortunately, the model has a high AIC.

Unadjusted Analysis

In order to be able to compare the adjusted analysis, let’s proceed with one that is unadjusted as well.

## this code is equivalent to the one used for the adjusted analysis, just 
## with no covariates added to the model
model6 <- polr (data = PPCRdata, formula = sleepqualityord ~ pain, Hess = TRUE)
## Warning in polr(data = PPCRdata, formula = sleepqualityord ~ pain, Hess =
## TRUE): design appears to be rank-deficient, so dropping some coefs
summary(model6)
## Call:
## polr(formula = sleepqualityord ~ pain, data = PPCRdata, Hess = TRUE)
## 
## Coefficients:
##          Value Std. Error t value
## painYes -1.023    0.04753  -21.53
## 
## Intercepts:
##                Value    Std. Error t value 
## Very bad|Bad    -3.8629   0.0717   -53.8797
## Bad|Fair        -2.1040   0.0402   -52.2757
## Fair|Good       -0.8107   0.0313   -25.9252
## Good|Very good   1.6254   0.0379    42.8842
## 
## Residual Deviance: 18102.95 
## AIC: 18112.95 
## (6 observations deleted due to missingness)
## getting the coefficients
ctable2 <- coef(summary(model6))

## getting the respective p-values from the normal distribution
p <- pnorm(abs(ctable2[, "t value"]), lower.tail = FALSE) * 2

## creating a table with the respective p-values added
ctable2 <- cbind(ctable2, "p value" = p)

## creating confidence intervals
ci<-confint.default(model6)
colnames(ci) <- c("conf.low", "conf.high")

## exponentiating the log-odds and the confidence intervals
## in order to create the odds-ratios and the respective CI's
model6OddsRatio <- exp(cbind(OR = coef(model6), ci))
## converting rownames to columns, as this will be necessary to build the plot
model6OddsRatio <- as_tibble(rownames_to_column(as.data.frame(model6OddsRatio)))
model6OddsRatio %>% rename(term = rowname) -> model6OddsRatio
model6OddsRatio
## # A tibble: 1 × 4
##   term       OR conf.low conf.high
##   <chr>   <dbl>    <dbl>     <dbl>
## 1 painYes 0.359    0.327     0.395
## building a plot
model6OddsRatio %>% ggplot (aes(OR, term, xmin = conf.low,
                           xmax = conf.high, height = 0)) +
                geom_point() +
                geom_vline(xintercept = 1, lty = 4) +
                geom_errorbarh()

We need to check what the warning means, but the presence of pain reduces the odds of increased sleep quality in this model. The AIC is huge, as should be expected of a model that only has one predictor and that predictor is a dichotomous variable.

Testing for the Proportional Odds Assumption

This is something to consider further on. At this point, I haven’t yet figured out exactly how to do it, but I’ll try to follow the instructions from the Office of Advanced Research Computing on another day.

## Loading the library that is needed to conduct the visual analysis of the
## proportional odds assumption

library (Hmisc)
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
sf <- function(y) {
  c('Y>=1' = qlogis(mean(y >= 1)),
    'Y>=2' = qlogis(mean(y >= 2)),
    'Y>=3' = qlogis(mean(y >= 3)),
    'Y>=4' = qlogis(mean(y >= 4)),
    'Y>=5' = qlogis(mean(y >= 5)),
    'Y>=6' = qlogis(mean(y >= 6)))
}

(s <- with(PPCRdata, summary(as.numeric(sleepqualityord) ~ pain + depression + 
                       alcohol + fracture + cancer + diabetes + dementia +
                       arthritis + heartdisease + lungdisease, fun=sf)))
## as.numeric(sleepqualityord)      N= 6968 , 6 Missing 
## 
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |            |                         |   N|Y>=1|    Y>=2|     Y>=3|       Y>=4|     Y>=5|Y>=6|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |        pain|                       No|4389| Inf|4.122284|2.2407601| 0.83398814|-1.708658|-Inf|
## |            |                      Yes|2579| Inf|2.702686|0.9913397|-0.20623388|-2.347771|-Inf|
## |            |           Does not apply|   0|    |        |         |           |         |    |
## |            |Didn’t know/didn’t answer|   0|    |        |         |           |         |    |
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |  depression|                       No|6058| Inf|3.626631|1.8714213| 0.54034089|-1.803168|-Inf|
## |            |                      Yes| 899| Inf|2.411318|0.6418539|-0.29807178|-3.066889|-Inf|
## |            |                  Missing|  11| Inf|2.302585|1.5040774| 0.18232156|-2.302585|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |     alcohol|                    Never|5323| Inf|3.363997|1.5908421| 0.37907588|-1.937990|-Inf|
## |            |   Less than once a month| 680| Inf|4.019382|1.9459101| 0.61903921|-1.973077|-Inf|
## |            |     Once a month or more| 952| Inf|3.075775|1.8265024| 0.57408782|-1.741207|-Inf|
## |            |                  Missing|  13| Inf|     Inf|1.7047481|-0.15415068|-1.203973|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |    fracture|                       No|6879| Inf|3.382288|1.6654277| 0.43438220|-1.905168|-Inf|
## |            |                      Yes|  89| Inf|2.627081|0.8850382|-0.11247798|-2.460809|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |      cancer|                       No|6672| Inf|3.411938|1.6677603| 0.43352306|-1.914758|-Inf|
## |            |                      Yes| 292| Inf|2.665033|1.3523928| 0.28968008|-1.811881|-Inf|
## |            |                  Missing|   4| Inf|     Inf|1.0986123| 0.00000000|     -Inf|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |    diabetes|                       No|5682| Inf|3.397204|1.7253825| 0.48294991|-1.888043|-Inf|
## |            |                      Yes|1256| Inf|3.225520|1.3576787| 0.20131409|-2.020781|-Inf|
## |            |                  Missing|  30| Inf|     Inf|1.8718022|-0.40546511|-1.871802|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |    dementia|                       No|6848| Inf|3.414865|1.6819751| 0.45021038|-1.896113|-Inf|
## |            |                      Yes| 115| Inf|2.060023|0.4784902|-0.91021169|-3.323236|-Inf|
## |            |                  Missing|   5| Inf|1.386294|0.4054651|-0.40546511|     -Inf|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |   arthritis|                       No|5465| Inf|3.609603|1.8539727| 0.58442952|-1.835344|-Inf|
## |            |                      Yes|1466| Inf|2.813170|1.0931627|-0.10651259|-2.232586|-Inf|
## |            |                  Missing|  37| Inf|1.856298|0.7339692|-0.49643689|-2.110213|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |heartdisease|                       No|6412| Inf|3.451517|1.7161281| 0.47010500|-1.894850|-Inf|
## |            |                      Yes| 551| Inf|2.690759|1.0484225|-0.05445992|-2.102100|-Inf|
## |            |                  Missing|   5| Inf|     Inf|      Inf| 0.40546511|     -Inf|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## | lungdisease|                       No|6571| Inf|3.420717|1.6849207| 0.45543727|-1.901268|-Inf|
## |            |                      Yes| 392| Inf|2.730029|1.1962508|-0.03061464|-2.068013|-Inf|
## |            |                  Missing|   5| Inf|     Inf|1.3862944| 0.40546511|     -Inf|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
## |     Overall|                         |6968| Inf|3.368484|1.6527710| 0.42715949|-1.910944|-Inf|
## +------------+-------------------------+----+----+--------+---------+-----------+---------+----+
s[, 6] <- s[, 6] - s[, 5]
s[, 5] <- s[, 5] - s[, 4]
s[, 4] <- s[, 4] - s[, 3]
s[, 3] <- s[, 3] - s[, 3]

s
## as.numeric(sleepqualityord)      N= 6968 , 6 Missing 
## 
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |            |                         |   N|Y>=1|Y>=2|      Y>=3|      Y>=4|     Y>=5|Y>=6|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |        pain|                       No|4389| Inf|   0|-1.8815238|-1.4067720|-2.542646|-Inf|
## |            |                      Yes|2579| Inf|   0|-1.7113463|-1.1975736|-2.141537|-Inf|
## |            |           Does not apply|   0|    |    |          |          |         |    |
## |            |Didn’t know/didn’t answer|   0|    |    |          |          |         |    |
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |  depression|                       No|6058| Inf|   0|-1.7552100|-1.3310804|-2.343509|-Inf|
## |            |                      Yes| 899| Inf|   0|-1.7694644|-0.9399257|-2.768818|-Inf|
## |            |                  Missing|  11| Inf|   0|-0.7985077|-1.3217558|-2.484907|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |     alcohol|                    Never|5323| Inf|   0|-1.7731550|-1.2117662|-2.317066|-Inf|
## |            |   Less than once a month| 680| Inf|   0|-2.0734714|-1.3268709|-2.592116|-Inf|
## |            |     Once a month or more| 952| Inf|   0|-1.2492726|-1.2524146|-2.315295|-Inf|
## |            |                  Missing|  13| Inf|    |      -Inf|-1.8588988|-1.049822|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |    fracture|                       No|6879| Inf|   0|-1.7168605|-1.2310455|-2.339551|-Inf|
## |            |                      Yes|  89| Inf|   0|-1.7420430|-0.9975162|-2.348331|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |      cancer|                       No|6672| Inf|   0|-1.7441774|-1.2342372|-2.348281|-Inf|
## |            |                      Yes| 292| Inf|   0|-1.3126400|-1.0627127|-2.101561|-Inf|
## |            |                  Missing|   4| Inf|    |      -Inf|-1.0986123|     -Inf|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |    diabetes|                       No|5682| Inf|   0|-1.6718214|-1.2424326|-2.370993|-Inf|
## |            |                      Yes|1256| Inf|   0|-1.8678417|-1.1563646|-2.222095|-Inf|
## |            |                  Missing|  30| Inf|    |      -Inf|-2.2772673|-1.466337|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |    dementia|                       No|6848| Inf|   0|-1.7328899|-1.2317647|-2.346323|-Inf|
## |            |                      Yes| 115| Inf|   0|-1.5815332|-1.3887019|-2.413024|-Inf|
## |            |                  Missing|   5| Inf|   0|-0.9808293|-0.8109302|     -Inf|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |   arthritis|                       No|5465| Inf|   0|-1.7556306|-1.2695431|-2.419773|-Inf|
## |            |                      Yes|1466| Inf|   0|-1.7200070|-1.1996753|-2.126073|-Inf|
## |            |                  Missing|  37| Inf|   0|-1.1223288|-1.2304061|-1.613776|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |heartdisease|                       No|6412| Inf|   0|-1.7353891|-1.2460231|-2.364955|-Inf|
## |            |                      Yes| 551| Inf|   0|-1.6423362|-1.1028825|-2.047640|-Inf|
## |            |                  Missing|   5| Inf|    |          |      -Inf|     -Inf|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## | lungdisease|                       No|6571| Inf|   0|-1.7357965|-1.2294834|-2.356706|-Inf|
## |            |                      Yes| 392| Inf|   0|-1.5337783|-1.2268654|-2.037398|-Inf|
## |            |                  Missing|   5| Inf|    |      -Inf|-0.9808293|     -Inf|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
## |     Overall|                         |6968| Inf|   0|-1.7157131|-1.2256115|-2.338103|-Inf|
## +------------+-------------------------+----+----+----+----------+----------+---------+----+
plot(s, which=1:6, xlab='logit', main=' ', xlim=c(-2.8,0))

It doesn’t look like the proportional odds assumption holds.

Logistic Regression

In order to conduct a logistic regression using sleep quality as a dependent variable, we need to first convert it to a dichotomous variable. Let’s consider that “Very Bad” and “Bad” are failures and “Fair”, “Good” and “Very Good” are successes.

PPCRdata %>% mutate (sleepqualitydich = case_when(
                        sleepquality %in% c("Very bad", "Bad") ~ 0,
                        sleepquality %in% c("Fair", "Good", "Very good") ~ 1
                                                )
                     ) -> PPCRdata

head(PPCRdata$sleepqualitydich)
## [1] 0 1 1 1 1 1
summary(PPCRdata$sleepqualitydich)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  1.0000  1.0000  0.8393  1.0000  1.0000       6
Unadjusted Analysis

Let’s proceed with the unadjusted analysis with sleepquality as a dichotomous variable.

model7 <- glm(sleepqualitydich ~ pain, "binomial", data = PPCRdata)
summary(model7)
## 
## Call:
## glm(formula = sleepqualitydich ~ pain, family = "binomial", data = PPCRdata)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.24076    0.05120   43.77   <2e-16 ***
## painYes     -1.24942    0.06772  -18.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6144.2  on 6967  degrees of freedom
## Residual deviance: 5790.3  on 6966  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 5794.3
## 
## Number of Fisher Scoring iterations: 4
## creating and saving table for publication
tbl_pub (model7, tablePath, "model7.docx", labelList[2]) -> tbl7

model7Tidy <- tidy(model7, conf.int = TRUE, exponentiate = TRUE)
model7Tidy %>% rename(Odds.Ratio = estimate) -> model7Tidy
model7Tidy
## # A tibble: 2 × 7
##   term        Odds.Ratio std.error statistic  p.value conf.low conf.high
##   <chr>            <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)      9.40     0.0512      43.8 0           8.51     10.4  
## 2 painYes          0.287    0.0677     -18.5 5.12e-76    0.251     0.327
model7Tidy %>% ggplot (aes(Odds.Ratio, term, xmin = conf.low,
                           xmax = conf.high, height = 0)) +
                ## inserting a point for the odds ratio
                geom_point() +
                ## drawing a line at the 1 to improve readability
                geom_vline(xintercept = 1, lty = 4) +
                ## drawing an horizontal line to represent the 
                ## confidence interval
                geom_errorbarh() +
                scale_y_discrete(
                        limits = setdiff(model7Tidy$term, "(Intercept)"))

The presence of pain is associated with decreased odds of a success in sleep quality that is “Fair” or better.

Adjusted Analysis
model8 <- glm(sleepqualitydich ~ pain + age + bmi + depression + alcohol +
                      fracture + cancer + diabetes + dementia + arthritis +
                      heartdisease + lungdisease, "binomial", data = PPCRdata)
summary(model8)
## 
## Call:
## glm(formula = sleepqualitydich ~ pain + age + bmi + depression + 
##     alcohol + fracture + cancer + diabetes + dementia + arthritis + 
##     heartdisease + lungdisease, family = "binomial", data = PPCRdata)
## 
## Coefficients:
##                                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    1.757899   0.336922   5.218 1.81e-07 ***
## painYes                       -1.044460   0.073726 -14.167  < 2e-16 ***
## age                            0.002957   0.003879   0.762 0.445873    
## bmi                            0.019638   0.006735   2.916 0.003546 ** 
## depressionYes                 -1.016393   0.086296 -11.778  < 2e-16 ***
## alcoholLess than once a month  0.166445   0.128151   1.299 0.194006    
## alcoholOnce a month or more    0.102168   0.108679   0.940 0.347173    
## fractureYes                   -0.521306   0.255911  -2.037 0.041644 *  
## cancerYes                     -0.095977   0.163413  -0.587 0.556984    
## diabetesYes                   -0.221428   0.086834  -2.550 0.010772 *  
## dementiaYes                   -0.721659   0.216245  -3.337 0.000846 ***
## arthritisYes                  -0.332851   0.080613  -4.129 3.64e-05 ***
## heartdiseaseYes               -0.384079   0.113976  -3.370 0.000752 ***
## lungdiseaseYes                -0.137382   0.133866  -1.026 0.304769    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5929.9  on 6752  degrees of freedom
## Residual deviance: 5372.7  on 6739  degrees of freedom
##   (221 observations deleted due to missingness)
## AIC: 5400.7
## 
## Number of Fisher Scoring iterations: 5
## creating and saving table for publication
tbl_pub (model8, tablePath, "model8.docx", labelList[-c(1,14)]) -> tbl8

## creating merged table for adjusted and unadjusted analysis together
tbl_merge(list(tbl7,tbl8),
          tab_spanner = c("**Unadjusted**", "**Adjusted**")) -> merged2

if(!file.exists(paste(tablePath,"merged2.docx", sep=""))){
        merged2 %>% as_gt() %>% gt::gtsave(filename = "merged2.docx", path = tablePath)
}

## creating a table that has the four models together
tbl_merge(list(merged1,merged2)) %>%
        as_gt() %>% gt::tab_spanner(label = gt::md("**Unadjusted**"),
                                    columns = matches("_1_1"),
                                    replace = TRUE,
                                    level = 1,
                                    id = "unadjusted1") %>%
        gt::tab_spanner(label = gt::md("**Adjusted**"),
                        columns = matches("_2_1"),
                        replace = TRUE,
                        level = 1,
                        id = "adjusted1") %>%
        gt::tab_spanner(label = gt::md("**Unadjusted**"),
                                    columns = matches("_1_2"),
                                    replace = TRUE,
                                    level = 1,
                                    id = "unadjusted2") %>%
        gt::tab_spanner(label = gt::md("**Adjusted**"),
                        columns = matches("_2_2"),
                        replace = TRUE,
                        level = 1,
                        id = "adjusted2") %>%
        gt::tab_spanner(label = gt::md("**Outcome: Pain**"),
                        spanners = c("unadjusted1","adjusted1"),
                        level = 2) %>%
        gt::tab_spanner(label = gt::md("**Outcome: Sleep Quality**"),
                        spanners = c("unadjusted2", "adjusted2"),
                        level = 2) -> merged3 
## The merging columns (variable name, variable label, row type, and label column)
## are not unique and the merge may fail or result in a malformed table. If you
## previously 'tbl_stack'ed your tables, then 'tbl_merge'ing before you 'tbl_stack'
## may resolve the issue.
if(!file.exists(paste(tablePath,"merged3.docx", sep=""))){
        merged3 %>% gt::gtsave(filename = "merged3.docx", path = tablePath)
}

merged3
Outcome: Pain Outcome: Sleep Quality
Variable Unadjusted Adjusted Unadjusted Adjusted
OR1 95% CI1 p-value OR1 95% CI1 p-value OR1 95% CI1 p-value OR1 95% CI1 p-value
Sleep Quality











    Very bad







    Very bad







    Very bad







    Very bad







    Bad 0.66 0.48, 0.89 0.008 0.70 0.50, 0.98 0.038





    Fair 0.34 0.26, 0.46 <0.001 0.41 0.30, 0.57 <0.001





    Good 0.17 0.13, 0.22 <0.001 0.22 0.16, 0.30 <0.001





    Very good 0.14 0.10, 0.20 <0.001 0.19 0.13, 0.26 <0.001





Age


0.99 0.98, 0.99 <0.001


1.00 1.00, 1.01 0.4
Body Mass Index


1.03 1.02, 1.04 <0.001


1.02 1.01, 1.03 0.004
Depression











    No







    Yes


1.36 1.16, 1.60 <0.001


0.36 0.31, 0.43 <0.001
Alcohol











    Never







    Less than once a month


0.71 0.58, 0.85 <0.001


1.18 0.92, 1.53 0.2
    Once a month or more


0.94 0.80, 1.10 0.4


1.11 0.90, 1.38 0.3
Fracture











    No







    Yes


1.56 0.97, 2.53 0.067


0.59 0.36, 1.0 0.042
Cancer











    No







    Yes


0.98 0.75, 1.28 0.9


0.91 0.66, 1.26 0.6
Diabetes











    No







    Yes


1.21 1.06, 1.40 0.007


0.80 0.68, 0.95 0.011
Dementia











    No







    Yes


1.30 0.85, 2.00 0.2


0.49 0.32, 0.75 <0.001
Arthritis











    No







    Yes


3.42 3.00, 3.89 <0.001


0.72 0.61, 0.84 <0.001
Heart Disease











    No







    Yes


1.58 1.29, 1.92 <0.001


0.68 0.55, 0.85 <0.001
Lung Disease











    No







    Yes


2.27 1.81, 2.85 <0.001


0.87 0.67, 1.14 0.3
Pain











    No







    Yes





0.29 0.25, 0.33 <0.001 0.35 0.30, 0.41 <0.001
1 OR = Odds Ratio, CI = Confidence Interval
model8Tidy <- tidy(model8, conf.int = TRUE, exponentiate = TRUE)
model8Tidy %>% rename(Odds.Ratio = estimate) -> model8Tidy
model8Tidy
## # A tibble: 14 × 7
##    term               Odds.Ratio std.error statistic  p.value conf.low conf.high
##    <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
##  1 (Intercept)             5.80    0.337       5.22  1.81e- 7    3.00     11.2  
##  2 painYes                 0.352   0.0737    -14.2   1.47e-45    0.304     0.406
##  3 age                     1.00    0.00388     0.762 4.46e- 1    0.995     1.01 
##  4 bmi                     1.02    0.00673     2.92  3.55e- 3    1.01      1.03 
##  5 depressionYes           0.362   0.0863    -11.8   5.06e-32    0.306     0.429
##  6 alcoholLess than …      1.18    0.128       1.30  1.94e- 1    0.923     1.53 
##  7 alcoholOnce a mon…      1.11    0.109       0.940 3.47e- 1    0.898     1.38 
##  8 fractureYes             0.594   0.256      -2.04  4.16e- 2    0.364     0.995
##  9 cancerYes               0.908   0.163      -0.587 5.57e- 1    0.664     1.26 
## 10 diabetesYes             0.801   0.0868     -2.55  1.08e- 2    0.677     0.951
## 11 dementiaYes             0.486   0.216      -3.34  8.46e- 4    0.319     0.747
## 12 arthritisYes            0.717   0.0806     -4.13  3.64e- 5    0.613     0.840
## 13 heartdiseaseYes         0.681   0.114      -3.37  7.52e- 4    0.546     0.854
## 14 lungdiseaseYes          0.872   0.134      -1.03  3.05e- 1    0.673     1.14
## adding the reference terms to the data frame to allow for a better plot
model8Tidy %>% 
        add_row(term = "alcoholNever", Odds.Ratio = -99, std.error = 0, statistic = 0, p.value = 0, conf.low = 0, conf.high = 0, .before = 10) %>%
        ## filtering the intercept
        filter (term !="(Intercept)") %>%
        ## saving the result so that we can refer to it in the next call
        {. ->> model8Tidy}%>%
        ## adding a variable to determine whether the observation
        ## is a reference or not. Initially adding everything as FALSE
        add_column(.after = "conf.high",
                   reference = rep(FALSE, length.out = nrow(model8Tidy))) %>%
        ## now the appropriate ones to TRUE
        dplyr::mutate(reference = case_when
               (term %in% c("Never") ~ TRUE,
                                     TRUE ~ FALSE)) %>%
        ## create a Variable column
        dplyr::mutate(.after= "term",
                      Variable = case_when(grepl("pain", term) ~ "Pain",
                                           grepl("age", term) ~ "Age",
                                           grepl("bmi", term) ~ "Body Mass Index",
                                           grepl("depression",term) ~ "Depression",
                                           grepl("alcohol",term) ~ "Alcohol",
                                           grepl("fracture", term) ~ "Fracture",
                                           grepl("cancer",term) ~ "Cancer",
                                           grepl("diabetes",term) ~ "Diabetes",
                                           grepl("dementia", term) ~ "Dementia",
                                           grepl("arthritis",term) ~ "Arthritis",
                                           grepl("heart",term) ~ "Heart Disease",
                                           grepl("lung",term) ~ "Lung Disease")) %>%
        ## transform the variables to factor
        dplyr::mutate(Variable = as_factor(Variable)) %>%
        dplyr::mutate(term = as_factor(term)) -> model8Tidy


model8Tidy %>% ggplot (aes(x = Odds.Ratio, y = term, xmin = conf.low,
                           xmax = conf.high, height = 0, alpha = !reference)) +
                ## inserting a point for the odds ratio
                geom_point(aes(alpha = !reference)) +
                ## drawing a line at the 1 to improve readability
                geom_vline(xintercept = 1, lty = 4) +
                ## drawing an horizontal line to represent the 
                ## confidence interval
                geom_errorbarh() +
                scale_y_discrete(labels = labels2) +
                labs(x = "Odds Ratio", y = "",
                     title = "Sleep Quality being Fair or Better", 
                subtitle = "Adjusted Odds Ratios",
                caption = paste("N =", nobs(model8))) +
                facet_grid(Variable ~ ., scales = "free_y", space = "free_y") +
                scale_alpha_identity() +
                # hide facet labels
                theme(strip.background = element_blank(),
                strip.text = element_blank()) +
                # remove spacing between facets
                theme(panel.spacing = unit(0, "pt")) +
                scale_x_continuous(breaks = seq(0,1.8, by = 0.2),
                                   limits = (c(0.3,1.53))) -> plot8

## checking if plot has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots/plot8.png")){
        ## if it hasn't, saving the plot for publication
        ggsave("plot8.png", plot8, 
               path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots") 
}

plot8

Testing the assumptions
Sensitivity Analysis
Inclusion of Sex as a Covariate

The analysis done by Morelhão et al. didn’t include sex as a covariate. We can analyze with it included to see its effect.

model9 <- glm(sleepqualitydich ~ pain + age + bmi + depression + alcohol +
         fracture + cancer + diabetes + dementia + arthritis +
         heartdisease + lungdisease + sex, "binomial", data = PPCRdata)

model9 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>% 
        rename(Odds.Ratio = estimate) -> model9Tidy

model9Tidy
## # A tibble: 15 × 7
##    term               Odds.Ratio std.error statistic  p.value conf.low conf.high
##    <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
##  1 (Intercept)             4.76    0.340      4.60   4.29e- 6    2.45      9.27 
##  2 painYes                 0.358   0.0739   -13.9    7.31e-44    0.310     0.414
##  3 age                     1.00    0.00388    0.700  4.84e- 1    0.995     1.01 
##  4 bmi                     1.02    0.00677    3.35   7.94e- 4    1.01      1.04 
##  5 depressionYes           0.382   0.0871   -11.1    2.08e-28    0.322     0.453
##  6 alcoholLess than …      1.11    0.129      0.809  4.18e- 1    0.866     1.44 
##  7 alcoholOnce a mon…      0.995   0.112     -0.0417 9.67e- 1    0.802     1.24 
##  8 fractureYes             0.632   0.256     -1.79   7.29e- 2    0.387     1.06 
##  9 cancerYes               0.905   0.164     -0.608  5.43e- 1    0.662     1.26 
## 10 diabetesYes             0.804   0.0870    -2.50   1.23e- 2    0.679     0.955
## 11 dementiaYes             0.466   0.217     -3.52   4.38e- 4    0.306     0.718
## 12 arthritisYes            0.746   0.0812    -3.61   3.11e- 4    0.637     0.876
## 13 heartdiseaseYes         0.657   0.115     -3.67   2.46e- 4    0.526     0.824
## 14 lungdiseaseYes          0.879   0.134     -0.963  3.36e- 1    0.678     1.15 
## 15 sexMale                 1.42    0.0805     4.34   1.42e- 5    1.21      1.66
## creating and saving table for publication
tbl_pub (model9, tablePath, "model9.docx", labelList[-1])
Variable OR1 95% CI1 p-value
Pain


    No
    Yes 0.36 0.31, 0.41 <0.001
Age 1.00 1.00, 1.01 0.5
Body Mass Index 1.02 1.01, 1.04 <0.001
Depression


    No
    Yes 0.38 0.32, 0.45 <0.001
Alcohol


    Never
    Less than once a month 1.11 0.87, 1.44 0.4
    Once a month or more 1.00 0.80, 1.24 >0.9
Fracture


    No
    Yes 0.63 0.39, 1.06 0.073
Cancer


    No
    Yes 0.91 0.66, 1.26 0.5
Diabetes


    No
    Yes 0.80 0.68, 0.96 0.012
Dementia


    No
    Yes 0.47 0.31, 0.72 <0.001
Arthritis


    No
    Yes 0.75 0.64, 0.88 <0.001
Heart Disease


    No
    Yes 0.66 0.53, 0.82 <0.001
Lung Disease


    No
    Yes 0.88 0.68, 1.15 0.3
Sex


    Female
    Male 1.42 1.21, 1.66 <0.001
1 OR = Odds Ratio, CI = Confidence Interval
## adding the reference terms to the data frame to allow for a better plot
model9Tidy %>% 
        add_row(term = "alcoholNever", Odds.Ratio = 0, std.error = 0, statistic = 0, p.value = 0, conf.low = 0, conf.high = 0, .before = 10) %>%
        ## filtering the intercept
        filter (term !="(Intercept)") %>%
        ## saving the result so that we can refer to it in the next call
        {. ->> model9Tidy}%>%
        ## adding a variable to determine whether the observation
        ## is a reference or not. Initially adding everything as FALSE
        add_column(.after = "conf.high",
                   reference = rep(FALSE, length.out = nrow(model9Tidy))) %>%
        ## now the appropriate ones to TRUE
        dplyr::mutate(reference = case_when
               (term %in% c("alcoholNever") ~ TRUE,
                                     TRUE ~ FALSE)) %>%
        ## create a Variable column
        dplyr::mutate(.after= "term",
                      Variable = case_when(grepl("pain", term) ~ "Pain",
                                           grepl("age", term) ~ "Age",
                                           grepl("bmi", term) ~ "Body Mass Index",
                                           grepl("depression",term) ~ "Depression",
                                           grepl("alcohol",term) ~ "Alcohol",
                                           grepl("fracture", term) ~ "Fracture",
                                           grepl("cancer",term) ~ "Cancer",
                                           grepl("diabetes",term) ~ "Diabetes",
                                           grepl("dementia", term) ~ "Dementia",
                                           grepl("arthritis",term) ~ "Arthritis",
                                           grepl("heart",term) ~ "Heart Disease",
                                           grepl("lung",term) ~ "Lung Disease",
                                           grepl("sex",term) ~ "Sex")) %>%
        ## transform the variables to factor
        dplyr::mutate(Variable = as_factor(Variable)) %>%
        dplyr::mutate(term = as_factor(term)) -> model9Tidy

model9Tidy %>% ggplot (aes(x = Odds.Ratio, y = term, xmin = conf.low,
                           xmax = conf.high, height = 0, alpha = !reference)) +
                ## inserting a point for the odds ratio
                geom_point(aes(alpha = !reference)) +
                ## drawing a line at the 1 to improve readability
                geom_vline(xintercept = 1, lty = 4) +
                ## drawing an horizontal line to represent the 
                ## confidence interval
                geom_errorbarh() +
                scale_y_discrete(labels = labels2) +
                labs(x = "Odds Ratio", y = "",
                     title = "Sleep Quality being Fair or Better", 
                subtitle = "Adjusted Odds Ratios",
                caption = paste("N =", nobs(model3))) +
                facet_grid(Variable ~ ., scales = "free_y", space = "free_y") +
                scale_alpha_identity() +
                # hide facet labels
                theme(strip.background = element_blank(),
                strip.text = element_blank()) +
                # remove spacing between facets
                theme(panel.spacing = unit(0, "pt")) +
                scale_x_continuous(breaks = seq(0,1.8, by = 0.2),
                                   limits = (c(0.3,1.67))) -> plot9

## checking if plot has been saved already
if (!file.exists("C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots/plot9.png")){
        ## if it hasn't, saving the plot for publication
        ggsave("plot9.png", plot9, 
               path = "C:/Users/Pedro Henrique/Documents/PPCR/Data Analysis Project/DataProjectPPCR/plots") 
}

plot9

The results show us that being male is associated with increased odds of having fair or better sleep quality.

Replacing Individual Comorbidities with a Self-Reported General Health Assessment

We can also substitute the usage of variables to determine presence of comorbidities for a general health assessment reported by the patient himself, to see if it changes the analysis.

# modeling
model10 <- glm(sleepqualitydich ~ pain + genhealth + age + bmi + sex, "binomial", data = PPCRdata)

model10 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>% 
        rename(Odds.Ratio = estimate) -> model10Tidy

tbl_pub (model10, tablePath, "model10.docx",
         append(labelList[-c(1,5,6,7,8,9,10,11,12,13)], genhealth ~ "General Health Assessment"))
Variable OR1 95% CI1 p-value
Pain


    No
    Yes 0.45 0.39, 0.52 <0.001
General Health Assessment


    Very bad
    Very bad
    Bad 1.48 1.07, 2.03 0.016
    Fair 2.85 2.11, 3.84 <0.001
    Good 6.33 4.58, 8.74 <0.001
    Very good 8.42 5.21, 14.0 <0.001
    Excellent 9.44 4.87, 20.2 <0.001
Age 1.01 1.00, 1.02 0.017
Body Mass Index 1.02 1.01, 1.03 0.004
Sex


    Female
    Male 1.67 1.44, 1.95 <0.001
1 OR = Odds Ratio, CI = Confidence Interval

While the direction of the effect was the same, when considering the subjects perception of their own general health status instead of comorbidity level, the presence of pain has a lower effect size in the association of decreased odds of having sleep quality that is fair or better.

Evaluation of dose-effect relationship between pain and sleep quality

We can also take a look at the subset of patients that have pain to see if sleep quality is worse with increasing levels of pain.

## unadjusted
model11 <- glm(sleepqualitydich ~ p_intensity,
               data = filter(PPCRdata, pain == "Yes"), family = "binomial")

model11 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>% 
        rename(Odds.Ratio = estimate) -> model11Tidy

tbl_pub(model11, tablePath, "model11.docx",
        list(p_intensity ~ "Pain Intensity")) -> tbl11

## adjusted
model12 <- glm(sleepqualitydich ~ p_intensity + age + bmi + sex + depression + alcohol +
         fracture + cancer + diabetes + dementia + arthritis +
         heartdisease + lungdisease,
    data = filter(PPCRdata, pain == "Yes"),
    family = "binomial")

model12 %>% tidy(conf.int = TRUE, exponentiate = TRUE) %>% 
        rename(Odds.Ratio = estimate) -> model12Tidy

tbl_pub (model12, tablePath, "model12.docx",
         append(labelList[-c(1,2)], p_intensity ~ "Pain Intensity")) -> tbl12

tbl_merge(list(tbl11,tbl12),
          tab_spanner = c("**Unadjusted**", "**Adjusted**")) -> merged4

if(!file.exists(paste(tablePath,"merged4.docx", sep=""))){
        merged4 %>% as_gt() %>% gt::gtsave(filename = "merged4.docx", path = tablePath)
}

merged4
Variable Unadjusted Adjusted
OR1 95% CI1 p-value OR1 95% CI1 p-value
Pain Intensity





    Soft/weak

    Moderate 0.60 0.43, 0.83 0.003 0.67 0.47, 0.94 0.022
    Intense/strong 0.28 0.20, 0.39 <0.001 0.35 0.24, 0.49 <0.001
Age


1.00 0.99, 1.01 >0.9
Body Mass Index


1.02 1.01, 1.04 0.005
Sex





    Female



    Male


1.27 1.02, 1.58 0.036
Depression





    No



    Yes


0.40 0.32, 0.51 <0.001
Alcohol





    Never



    Less than once a month


1.23 0.85, 1.82 0.3
    Once a month or more


1.06 0.78, 1.43 0.7
Fracture





    No



    Yes


0.50 0.27, 0.93 0.028
Cancer





    No



    Yes


0.88 0.58, 1.38 0.6
Diabetes





    No



    Yes


0.95 0.75, 1.19 0.6
Dementia





    No



    Yes


0.47 0.27, 0.82 0.007
Arthritis





    No



    Yes


0.84 0.69, 1.02 0.079
Heart Disease





    No



    Yes


0.67 0.51, 0.89 0.005
Lung Disease





    No



    Yes


0.81 0.60, 1.11 0.2
1 OR = Odds Ratio, CI = Confidence Interval

Increasing levels of pain intensity are also associated with decreased odds of having fair or better sleep quality.