In the last Rpubs post we tried to establish a model, using a modified version of ELSI dataset, to show association between pain intensity and sleep quality. Unfortunately, probably due to the large amount of missing observations in the pain intensity variable, the model was invalid, as the assumptions weren’t met. In this new analysis, we will try to associate another variable in that same dataset that also deals with pain (the pain variable). Let’s start by reading in the data and looking at these two variables.
library(tidyverse)
require(haven)
url <- 'https://contattafiles.s3.us-west-1.amazonaws.com/tnt45405/T9gKrWnaRr5K0yT/FINAL%20-%20ELSI_students.dta'
PPCRdata <- read_dta(url)
rm(url)
PPCRdata <- as_tibble(PPCRdata)
unique(PPCRdata$pain)
## <labelled<double>[2]>: Do you feel pain that bothers you frequently?
## [1] 1 0
##
## Labels:
## value label
## 0 No
## 1 Yes
## 8 Does not apply
## 9 Didn’t know/didn’t answer
unique(PPCRdata$sleepquality)
## <labelled<double>[6]>: How would you evaluate the quality of your sleep?
## [1] 4 2 3 1 5 NA
##
## Labels:
## value label
## 1 Very good
## 2 Good
## 3 Fair
## 4 Bad
## 5 Very bad
## 9 Didn’t know/didn’t answer
Looking at these variables we can see that pain is dichotomous and that sleep quality is ordinal. We’re going to treat it as continuous for our purposes.
Let’s code sleep quality as ordinal so that our model recognizes it as such. Let’s also code it so that sleep quality is ordered in a crescent manner.
Based on the previous work by (Morelhão et al. 2021) we’re going to use the covariates: age, BMI, comorbidities, and depression.
model <- lm(data = PPCRdata, formula = sleepquality ~ pain + age + bmi + depression + heartdisease + lungdisease + cholesterol + dementia + ckd + cancer + arthritis + stroke + diabetes + hypertension + numberfalls + hearingdevice + cataractsurg + glasses + eyedisease + jointsurg + viral)
summary (model)
##
## Call:
## lm(formula = sleepquality ~ pain + age + bmi + depression + heartdisease +
## lungdisease + cholesterol + dementia + ckd + cancer + arthritis +
## stroke + diabetes + hypertension + numberfalls + hearingdevice +
## cataractsurg + glasses + eyedisease + jointsurg + viral,
## data = PPCRdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5545 -0.5681 -0.1441 0.6218 3.0826
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.590686 0.112326 23.064 < 2e-16 ***
## pain 0.377531 0.025200 14.981 < 2e-16 ***
## age -0.003377 0.001340 -2.519 0.011791 *
## bmi -0.009958 0.002246 -4.433 9.44e-06 ***
## depression 0.431919 0.034339 12.578 < 2e-16 ***
## heartdisease 0.127942 0.042846 2.986 0.002836 **
## lungdisease 0.038018 0.049075 0.775 0.438544
## cholesterol 0.106617 0.027669 3.853 0.000118 ***
## dementia 0.453108 0.089828 5.044 4.68e-07 ***
## ckd 0.181188 0.076044 2.383 0.017216 *
## cancer -0.012257 0.056270 -0.218 0.827573
## arthritis 0.137329 0.029210 4.701 2.64e-06 ***
## stroke 0.032515 0.062515 0.520 0.603002
## diabetes 0.025069 0.030660 0.818 0.413590
## hypertension 0.081622 0.024582 3.320 0.000904 ***
## numberfalls 0.116106 0.023091 5.028 5.08e-07 ***
## hearingdevice -0.169452 0.080381 -2.108 0.035059 *
## cataractsurg -0.033891 0.041327 -0.820 0.412214
## glasses -0.056618 0.024908 -2.273 0.023052 *
## eyedisease 0.128216 0.032775 3.912 9.25e-05 ***
## jointsurg 0.024021 0.086340 0.278 0.780861
## viral 0.093138 0.028230 3.299 0.000975 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9098 on 6550 degrees of freedom
## (402 observations deleted due to missingness)
## Multiple R-squared: 0.1239, Adjusted R-squared: 0.1211
## F-statistic: 44.13 on 21 and 6550 DF, p-value: < 2.2e-16
We can see that pain has a statistically significant association with sleep quality in this analysis.
This model has an adjusted R-squared of 0.1211 which means it describes 12% of the variation of the dependent variable with the data provided by the independent variables.
plot(model, 1)
It doesn’t look like this analysis conforms to the linearity assumption.
plot(model, 3)
There is heteroscedasticity in the model
plot (model,2)
It looks like the residuals conform to the QQ plot, therefore this assumption is satisfied.
plot(model, 5)
There don’t seem to be points with excessive leverage.
Unfortunately, not all assumptions of linear models have been satisfied.