Load packages and Read the COPD dataset

Summary of the COPD dataset

## $Summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   120.0   303.8   420.0   399.1   465.2   699.0       1 
## 
## $Mean
## [1] 399.11
## 
## $`Standard Deviation`
## [1] 106.5501
## 
## $Range
## [1] 120 699
## 
## $`Inter-Quartile Range`
## [1] 161.5

Visualize Best Walking Score vs FEV_1 Calculate correlation

## `geom_smooth()` using formula 'y ~ x'

## 
##  Pearson's product-moment correlation
## 
## data:  data$FEV1 and data$MWT1Best
## t = 5.26, df = 98, p-value = 8.469e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3004898 0.6094629
## sample estimates:
##       cor 
## 0.4692142
## 
##  Spearman's rank correlation rho
## 
## data:  data$FEV1 and data$MWT1Best
## S = 90853, p-value = 1.995e-06
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.4548251

Visualize Best Walking Score vs Age Calculate correlation

## `geom_smooth()` using formula 'y ~ x'

## 
##  Pearson's product-moment correlation
## 
## data:  data$AGE and data$MWT1Best
## t = -2.3406, df = 98, p-value = 0.02128
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4080687 -0.0352688
## sample estimates:
##       cor 
## -0.230093
## 
##  Spearman's rank correlation rho
## 
## data:  data$AGE and data$MWT1Best
## S = 211497, p-value = 0.006781
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.2691106

Walking Score vs FEV_1

## 
## Call:
## lm(formula = MWT1Best ~ FEV1, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -249.592  -58.227    7.881   63.551  291.612 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   279.92      24.55   11.40  < 2e-16 ***
## FEV1           74.11      14.09    5.26 8.47e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 94.57 on 98 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2202, Adjusted R-squared:  0.2122 
## F-statistic: 27.67 on 1 and 98 DF,  p-value: 8.469e-07
##                 2.5 %   97.5 %
## (Intercept) 231.19004 328.6456
## FEV1         46.15031 102.0710

Walking Score vs Age

## 
## Call:
## lm(formula = MWT1Best ~ AGE, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -257.44  -84.40   20.30   67.87  250.16 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  616.453     93.440   6.597 2.14e-09 ***
## AGE           -3.104      1.326  -2.341   0.0213 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 104.2 on 98 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.05294,    Adjusted R-squared:  0.04328 
## F-statistic: 5.478 on 1 and 98 DF,  p-value: 0.02128
##                  2.5 %      97.5 %
## (Intercept) 431.023080 801.8819906
## AGE          -5.735718  -0.4722946

Three assumptions of a linear regression model are that: (1) there is linearity between the outcome and predictor variable; (2) that the outcome variable is normally distributed across values of the predictor; and (3) that the variance of the outcome variable is constant across values of the predictor variable.

QQ plot to assess the assumption of normality. It is fitted in R using the following commands:

The QQ plot suggests some violation of the assumption of normality. The plot shows values lying off the straight line including the middle section of the plot. Recall that the QQ plot is a plot of the quartiles of the residuals against the quartiles of a theoretical normal distribution and if the residuals are normal then the observations will lie on a straight line.

This is backed up by examining the histogram of the residuals which is fitted using the following command:

Multiple Regression Model of walking vs Age + FEV_1

## 
## Call:
## lm(formula = MWT1Best ~ FEV1 + AGE, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -233.13  -62.55   16.81   67.41  251.57 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  460.887     88.675   5.197 1.12e-06 ***
## FEV1          71.278     13.909   5.125 1.52e-06 ***
## AGE           -2.519      1.188  -2.121   0.0365 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 92.93 on 97 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2547, Adjusted R-squared:  0.2394 
## F-statistic: 16.58 on 2 and 97 DF,  p-value: 6.419e-07