Load packages and Read the COPD dataset
Summary of the COPD dataset
## $Summary
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 120.0 303.8 420.0 399.1 465.2 699.0 1
##
## $Mean
## [1] 399.11
##
## $`Standard Deviation`
## [1] 106.5501
##
## $Range
## [1] 120 699
##
## $`Inter-Quartile Range`
## [1] 161.5
Visualize Best Walking Score vs FEV_1 Calculate correlation
## `geom_smooth()` using formula 'y ~ x'
##
## Pearson's product-moment correlation
##
## data: data$FEV1 and data$MWT1Best
## t = 5.26, df = 98, p-value = 8.469e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3004898 0.6094629
## sample estimates:
## cor
## 0.4692142
##
## Spearman's rank correlation rho
##
## data: data$FEV1 and data$MWT1Best
## S = 90853, p-value = 1.995e-06
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.4548251
Visualize Best Walking Score vs Age Calculate correlation
## `geom_smooth()` using formula 'y ~ x'
##
## Pearson's product-moment correlation
##
## data: data$AGE and data$MWT1Best
## t = -2.3406, df = 98, p-value = 0.02128
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4080687 -0.0352688
## sample estimates:
## cor
## -0.230093
##
## Spearman's rank correlation rho
##
## data: data$AGE and data$MWT1Best
## S = 211497, p-value = 0.006781
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.2691106
Walking Score vs FEV_1
##
## Call:
## lm(formula = MWT1Best ~ FEV1, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -249.592 -58.227 7.881 63.551 291.612
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 279.92 24.55 11.40 < 2e-16 ***
## FEV1 74.11 14.09 5.26 8.47e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 94.57 on 98 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.2202, Adjusted R-squared: 0.2122
## F-statistic: 27.67 on 1 and 98 DF, p-value: 8.469e-07
## 2.5 % 97.5 %
## (Intercept) 231.19004 328.6456
## FEV1 46.15031 102.0710
Walking Score vs Age
##
## Call:
## lm(formula = MWT1Best ~ AGE, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -257.44 -84.40 20.30 67.87 250.16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 616.453 93.440 6.597 2.14e-09 ***
## AGE -3.104 1.326 -2.341 0.0213 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 104.2 on 98 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.05294, Adjusted R-squared: 0.04328
## F-statistic: 5.478 on 1 and 98 DF, p-value: 0.02128
## 2.5 % 97.5 %
## (Intercept) 431.023080 801.8819906
## AGE -5.735718 -0.4722946
Three assumptions of a linear regression model are that: (1) there is linearity between the outcome and predictor variable; (2) that the outcome variable is normally distributed across values of the predictor; and (3) that the variance of the outcome variable is constant across values of the predictor variable.
QQ plot to assess the assumption of normality. It is fitted in R using the following commands:
The QQ plot suggests some violation of the assumption of normality. The plot shows values lying off the straight line including the middle section of the plot. Recall that the QQ plot is a plot of the quartiles of the residuals against the quartiles of a theoretical normal distribution and if the residuals are normal then the observations will lie on a straight line.
This is backed up by examining the histogram of the residuals which is fitted using the following command:
Multiple Regression Model of walking vs Age + FEV_1
##
## Call:
## lm(formula = MWT1Best ~ FEV1 + AGE, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -233.13 -62.55 16.81 67.41 251.57
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 460.887 88.675 5.197 1.12e-06 ***
## FEV1 71.278 13.909 5.125 1.52e-06 ***
## AGE -2.519 1.188 -2.121 0.0365 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 92.93 on 97 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.2547, Adjusted R-squared: 0.2394
## F-statistic: 16.58 on 2 and 97 DF, p-value: 6.419e-07