In the last Rpubs post we tried to establish a model, using a modified version of ELSI dataset, to show association between pain intensity and sleep quality. Unfortunately, probably due to the large amount of missing observations in the pain intensity variable, the model was invalid, as the assumptions weren’t met. In this new analysis, we will try to associate another variable in that same dataset that also deals with pain (the pain variable). Let’s start by reading in the data and looking at these two variables.

library(tidyverse)
require(haven)
url <- 'https://contattafiles.s3.us-west-1.amazonaws.com/tnt45405/T9gKrWnaRr5K0yT/FINAL%20-%20ELSI_students.dta'
PPCRdata <- read_dta(url)
rm(url)
PPCRdata <- as_tibble(PPCRdata)
unique(PPCRdata$pain)
## <labelled<double>[2]>: Do you feel pain that bothers you frequently?
## [1] 1 0
## 
## Labels:
##  value                     label
##      0                        No
##      1                       Yes
##      8            Does not apply
##      9 Didn’t know/didn’t answer
unique(PPCRdata$sleepquality)
## <labelled<double>[6]>: How would you evaluate the quality of your sleep?
## [1]  4  2  3  1  5 NA
## 
## Labels:
##  value                     label
##      1                 Very good
##      2                      Good
##      3                      Fair
##      4                       Bad
##      5                  Very bad
##      9 Didn’t know/didn’t answer

Looking at these variables we can see that pain is dichotomous and that sleep quality is ordinal. We’re going to treat it as continuous for our purposes.

Let’s code sleep quality as ordinal so that our model recognizes it as such. Let’s also code it so that sleep quality is ordered in a crescent manner.

Based on the previous work by (Morelhão et al. 2021) we’re going to use the covariates: age, BMI, comorbidities, and depression.

model <- lm(data = PPCRdata, formula = sleepquality ~ pain + age + bmi + depression + heartdisease + lungdisease + cholesterol + dementia + ckd + cancer + arthritis + stroke + diabetes + hypertension + numberfalls + hearingdevice + cataractsurg + glasses + eyedisease + jointsurg + viral)
summary (model)
## 
## Call:
## lm(formula = sleepquality ~ pain + age + bmi + depression + heartdisease + 
##     lungdisease + cholesterol + dementia + ckd + cancer + arthritis + 
##     stroke + diabetes + hypertension + numberfalls + hearingdevice + 
##     cataractsurg + glasses + eyedisease + jointsurg + viral, 
##     data = PPCRdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5545 -0.5681 -0.1441  0.6218  3.0826 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.590686   0.112326  23.064  < 2e-16 ***
## pain           0.377531   0.025200  14.981  < 2e-16 ***
## age           -0.003377   0.001340  -2.519 0.011791 *  
## bmi           -0.009958   0.002246  -4.433 9.44e-06 ***
## depression     0.431919   0.034339  12.578  < 2e-16 ***
## heartdisease   0.127942   0.042846   2.986 0.002836 ** 
## lungdisease    0.038018   0.049075   0.775 0.438544    
## cholesterol    0.106617   0.027669   3.853 0.000118 ***
## dementia       0.453108   0.089828   5.044 4.68e-07 ***
## ckd            0.181188   0.076044   2.383 0.017216 *  
## cancer        -0.012257   0.056270  -0.218 0.827573    
## arthritis      0.137329   0.029210   4.701 2.64e-06 ***
## stroke         0.032515   0.062515   0.520 0.603002    
## diabetes       0.025069   0.030660   0.818 0.413590    
## hypertension   0.081622   0.024582   3.320 0.000904 ***
## numberfalls    0.116106   0.023091   5.028 5.08e-07 ***
## hearingdevice -0.169452   0.080381  -2.108 0.035059 *  
## cataractsurg  -0.033891   0.041327  -0.820 0.412214    
## glasses       -0.056618   0.024908  -2.273 0.023052 *  
## eyedisease     0.128216   0.032775   3.912 9.25e-05 ***
## jointsurg      0.024021   0.086340   0.278 0.780861    
## viral          0.093138   0.028230   3.299 0.000975 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9098 on 6550 degrees of freedom
##   (402 observations deleted due to missingness)
## Multiple R-squared:  0.1239, Adjusted R-squared:  0.1211 
## F-statistic: 44.13 on 21 and 6550 DF,  p-value: < 2.2e-16

We can see that pain has a statistically significant association with sleep quality in this analysis.

Checking for Assumptions

Strength of the model

This model has an adjusted R-squared of 0.1211 which means it describes 12% of the variation of the dependent variable with the data provided by the independent variables.

Linearity

plot(model, 1)

It doesn’t look like this analysis conforms to the linearity assumption.

Homoscedasticity

plot(model, 3)

There is heteroscedasticity in the model

Normality of Residuals

plot (model,2)

It looks like the residuals conform to the QQ plot, therefore this assumption is satisfied.

High Leverage Points

plot(model, 5)

There don’t seem to be points with excessive leverage.

Conclusion

Unfortunately, not all assumptions of linear models have been satisfied.