knitr::opts_chunk$set(
    echo = TRUE,
    message = FALSE,
    warning = FALSE
)

Previous sections:

Clustering practice: https://rpubs.com/minhtri/923711
Table Descriptive Analysis: https://rpubs.com/minhtri/929114
Imputation - Dealing with missing data: https://rpubs.com/minhtri/968586
The R environment - Data Wrangling: https://rpubs.com/minhtri/968607
Correlation analysis: https://rpubs.com/minhtri/968611
Parametric or Nonparametric data: https://rpubs.com/minhtri/968616
Dealing with outliers or non-normal distribution: https://rpubs.com/minhtri/968622
Linear regression - Frequentist approach: https://rpubs.com/minhtri/968628
Linear regression - Bayesian framework: https://rpubs.com/minhtri/968636

Note: This analysis is used for personal study purpose. In this section, I will summerize some basic infomation about linear regression - model selection based on several sources.

R Packages:

# More packages will be shown during the analysis process
# Load the required library
library(tidyverse)    # Data Wrangling
library(conflicted)   # Dealing with conflict package
library(readxl)       # Read csv file

Dealing with Conflicts
There is a lot of packages here, and sometimes individual functions are in conflict. R’s default conflict resolution system gives precedence to the most recently loaded package. This can make it hard to detect conflicts, particularly when introduced by an update to an existing package.

Using the code below helps the entire section run properly. You may or may not need to look into the conflicted package for your work.

conflict_prefer("filter", "dplyr")
conflict_prefer("select", "dplyr")
conflict_prefer("Predict", "rms")
conflict_prefer("impute_median", "simputation")
conflict_prefer("summarize", "dplyr")

1 Simulated fakestroke data

Data used in this notes
fakestroke.csv
(Source: https://github.com/THOMASELOVE/432-data/blob/master/data/fakestroke.csv)

The fakestroke.csv file contains the following 18 variables for 500 patients:

# To make a table in R markdown: 1st row: header, 2nd row: Alignment; the remaining row: for content
## |Variable |  Description |
## |:------- | :----------  |
## |studyid  | Study ID  # (z001 through z500) |

Variable	Description
studyid	Study ID # (z001 through z500)
trt	Treatment group (Intervention or Control)
age	Age in years
sex	Male or Female
nihss	NIH Stroke Scale Score (can range from 0-42; higher scores indicate more severe neurological deficits)
location	Stroke Location - Left or Right Hemisphere
hx.isch	History of Ischemic Stroke (Yes/No)
afib	Atrial Fibrillation (1 = Yes, 0 = No)
dm	Diabetes Mellitus (1 = Yes, 0 = No)
mrankin	Pre-stroke modified Rankin scale score (0, 1, 2 or > 2) indicating functional disability - complete range is 0 (no symptoms) to 6 (death)
sbp	Systolic blood pressure, in mm Hg
iv.altep	Treatment with IV alteplase (Yes/No)
time.iv	Time from stroke onset to start of IV alteplase (minutes) if iv.altep=Yes
aspects	Alberta Stroke Program Early Computed Tomography score, which measures extent of stroke from 0 - 10; higher scores indicate fewer early ischemic changes
ia.occlus	Intracranial arterial occlusion, based on vessel imaging - five categories3
extra.ica	Extracranial ICA occlusion (1 = Yes, 0 = No)
time.rand	Time from stroke onset to study randomization, in minutes
time.punc	Time from stroke onset to groin puncture, in minutes (only if Intervention)

2 Data preprocessing step

2.1 Step 1: Data wrangling

Please refer to this section: The R environment - Data Wrangling: https://rpubs.com/minhtri/933332

Load the data to R and examine some basic infomation about variables. When loading the data, I also define the type of variables (numeric or text):

df <- read_excel("D:/Statistics/R/R data/fakestroke.xlsx", 
                 col_types = c("text", "text", "numeric",
                               "text", "numeric", "text", "text", 
                               "numeric", "numeric", "text", "numeric", 
                               "text", "numeric", "numeric", "text", 
                               "numeric", "numeric", "numeric")) 
names(df)

##  [1] "studyid"   "trt"       "age"       "sex"       "nihss"     "location" 
##  [7] "hx.isch"   "afib"      "dm"        "mrankin"   "sbp"       "iv.altep" 
## [13] "time.iv"   "aspects"   "ia.occlus" "extra.ica" "time.rand" "time.punc"

str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : chr [1:500] "z001" "z002" "z003" "z004" ...
##  $ trt      : chr [1:500] "Control" "Intervention" "Control" "Control" ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : chr [1:500] "Male" "Male" "Female" "Male" ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : chr [1:500] "Right" "Left" "Right" "Left" ...
##  $ hx.isch  : chr [1:500] "No" "No" "No" "No" ...
##  $ afib     : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
##  $ dm       : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mrankin  : chr [1:500] "2" "0" "0" "0" ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : chr [1:500] "Yes" "Yes" "No" "Yes" ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: chr [1:500] "M1" "M1" "ICA with M1" "ICA with M1" ...
##  $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

Define factor for all character variables:

df <- df %>% mutate_if(is.character, as.factor)
str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ trt      : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
##  $ hx.isch  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ afib     : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
##  $ dm       : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mrankin  : Factor w/ 4 levels "> 2","0","1",..: 4 2 2 2 2 4 2 2 4 2 ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: Factor w/ 6 levels "A1 or A2","ICA with M1",..: 4 4 2 2 4 1 4 4 4 4 ...
##  $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

afib, dm, extra.ica are variables but in the excel file they are noted as “1” and “0”. Therefore, I must convert them back to “Yes-No” for factor variables:

# Changing coding variable to categorical:
df$afib       <- factor(df$afib , levels=0:1, labels=c("No", "Yes"))
df$dm         <- factor(df$dm , levels=0:1, labels=c("No", "Yes"))
df$extra.ica  <- factor(df$extra.ica, levels=0:1, labels=c("No", "Yes"))
# Checking after converting
str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ trt      : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
##  $ hx.isch  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ afib     : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 2 1 ...
##  $ dm       : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ mrankin  : Factor w/ 4 levels "> 2","0","1",..: 4 2 2 2 2 4 2 2 4 2 ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: Factor w/ 6 levels "A1 or A2","ICA with M1",..: 4 4 2 2 4 1 4 4 4 4 ...
##  $ extra.ica: Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 1 1 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

Reorder the level of some character variables. After this, basically we have tidy data that are nearly ready for Missingness checking:

trt: Intervention (baseline) , Control
mrankin: 0, 1, 2, > 2.
ia.occlus: rearrange the ia.occlus variable to the order presented in Berkhemer et al. (2015).

# Relevel factors:
df$trt = factor(df$trt, levels=c("Control", "Intervention"))
df$mrankin = factor(df$mrankin, levels=c("0", "1", "2", "> 2"))
df$ia.occlus = factor(df$ia.occlus, levels=c("Intracranial ICA", "ICA with M1", 
                                             "M1", "M2", "A1 or A2"))

# Check order of factor after relevel: 
str(df)

## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
##  $ studyid  : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ trt      : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
##  $ age      : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
##  $ sex      : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
##  $ nihss    : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
##  $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
##  $ hx.isch  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ afib     : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 2 1 ...
##  $ dm       : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ mrankin  : Factor w/ 4 levels "0","1","2","> 2": 3 1 1 1 1 3 1 1 3 1 ...
##  $ sbp      : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
##  $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
##  $ time.iv  : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
##  $ aspects  : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
##  $ ia.occlus: Factor w/ 5 levels "Intracranial ICA",..: 3 3 2 2 3 5 3 3 3 3 ...
##  $ extra.ica: Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 1 1 ...
##  $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
##  $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...

2.2 Step 2: Missingness checking and imputation when necessary

Package: VIM

Command: aggr (x, plot = T, prop = T, numbers = F, combined = F, sortVars = T, cex.axis=.7, gap=3)

plot: whether the results should be plotted.
numbers: whether the proportion or frequencies of the different combinations should be represented by numbers.
prop: whether the proportion of missing/imputed values should be used rather than the total amount.
combined: a logical indicating whether the two plots should be combined.
sortVars: whether the variables should be sorted by the number of missing/imputed values.

For more infomation, please refer to the following link:
Dealing with missingness - Imputation: https://rpubs.com/minhtri/932007

Checking the percentage of missingness:

library(VIM)
aggr(df, plot = F, numbers = T, prop = T)

## 
##  Missings in variables:
##   Variable Count
##        sbp     1
##    time.iv    55
##    aspects     4
##  ia.occlus     1
##  extra.ica     1
##  time.rand     2
##  time.punc   267

aggr(df, plot = T, numbers = T, prop = T, combined= F, sortVars = T, cex.axis=.7, gap=3)

## 
##  Variables sorted by number of missings: 
##   Variable Count
##  time.punc 0.534
##    time.iv 0.110
##    aspects 0.008
##  time.rand 0.004
##        sbp 0.002
##  ia.occlus 0.002
##  extra.ica 0.002
##    studyid 0.000
##        trt 0.000
##        age 0.000
##        sex 0.000
##      nihss 0.000
##   location 0.000
##    hx.isch 0.000
##       afib 0.000
##         dm 0.000
##    mrankin 0.000
##   iv.altep 0.000

aggr(df, plot = T, numbers = T, prop = F, combined= F, sortVars = T, cex.axis=.7, gap=3)

## 
##  Variables sorted by number of missings: 
##   Variable Count
##  time.punc   267
##    time.iv    55
##    aspects     4
##  time.rand     2
##        sbp     1
##  ia.occlus     1
##  extra.ica     1
##    studyid     0
##        trt     0
##        age     0
##        sex     0
##      nihss     0
##   location     0
##    hx.isch     0
##       afib     0
##         dm     0
##    mrankin     0
##   iv.altep     0

Discuss

There are 7 variables with missingess.
However, the missingness in time.punc and time.iv is due to the experimental design (time.punc is only applied for Intervention group (trt), time.iv only applied for patients if iv.altep =Yes). Therefore, I will exclude them from the missingness analysis.
Only 5 variables left: the proportion of each < 5%.
=> Complete case analysis can be applied.
=> We can ignore the missingness and progress to next step - Multiple regression.

df1 = df %>% select (-time.punc, - time.iv, -studyid) %>% na.omit()

3 Model Selection in Statistics

In statistics, model selection is a critical step. Your findings may be invalidated if you specify incorrectly and choose the incorrect model. When the independent variables and their functional form (i.e., curvature and interactions) incorrectly reflect the true connection existing in the data, this is referred to as specification error. Bias can be caused by specification errors, which can overstate, understate, or completely conceal the presence of underlying connections. In summary, you cannot rely on your outcomes! As a result, in order to pick the appropriate regression model, you must first grasp model selection in statistics.

There are several common techniques to model selection:

Bayesian Model Average (recommend).
Stepwise method approach.
Criterion-Based method ( slightly recommend).

4 Bayesian Model Average: BMA package (recommend)

4.1 Step 3: Finding the optimal model

To fit a Bayesian regression, we use the function bicreg from the BMA package. Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.

Command:

seach = bicreg(x, y, wt = rep(1, length(y)), strict = FALSE, OR = 20, maxCol = 31, drop.factor.levels = TRUE, nbest = 150)
imageplot.bma(search, color = c(“red”, “blue”, “#FFFFD5”))

with: color: A vector of colors of length 3. The first color is the color to use when the variable estimate is positive, the second color is the color to use when the variable estimate is negative, and the third color is the color to use when the variable is not included in the model.

library(BMA)
X = df1 %>% select(-nihss)
Y = df1$nihss   
search = bicreg(X, Y, strict = FALSE, OR = 20)
summary(search)

## 
## Call:
## bicreg(x = X, y = Y, strict = FALSE, OR = 20)
## 
## 
##   32  models were selected
##  Best  5  models (cumulative posterior probability =  0.503 ): 
## 
##                       p!=0    EV         SD        model 1  model 2  model 3
## Intercept             100.0  18.2925118  0.775666  18.5722  18.2413  17.4026
## trtIntervention         0.0   0.0000000  0.000000      .        .        .  
## age                     1.2  -0.0001306  0.001798      .        .        .  
## sexMale                 1.0   0.0027078  0.050565      .        .        .  
## locationRight          11.4   0.0882728  0.283505      .        .        .  
## hx.ischYes              5.0  -0.0500489  0.264613      .        .        .  
## afibYes                 0.0   0.0000000  0.000000      .        .        .  
## dmYes                  95.0  -1.9546750  0.764513  -2.1037  -2.0282  -2.0002
## mrankin1                1.2  -0.0067112  0.097064      .        .        .  
## mrankin2                0.0   0.0000000  0.000000      .        .        .  
## mrankin..2              4.3   0.0628149  0.365601      .        .        .  
## sbp                     2.3  -0.0001566  0.001620      .        .        .  
## iv.altepYes            13.7   0.1769946  0.508889      .        .     1.3084
## aspects                 2.4  -0.0028924  0.027854      .        .        .  
## ia.occlusICA.with.M1   52.2  -0.6299169  0.699413  -1.1983      .    -1.2284
## ia.occlusM1             5.2   0.0243034  0.221142      .        .        .  
## ia.occlusM2             5.6   0.0738953  0.371888      .        .        .  
## ia.occlusA1.or.A2       4.8   0.1891645  1.028578      .        .        .  
## extra.icaYes            1.2   0.0043076  0.064920      .        .        .  
## time.rand               5.3  -0.0002669  0.001354      .        .        .  
##                                                                             
## nVar                                                  2        1        3   
## r2                                                  0.033    0.020    0.041 
## BIC                                                -4.2249  -3.9371  -1.8940
## post prob                                           0.190    0.165    0.059 
##                       model 4  model 5
## Intercept             17.8790  18.2187
## trtIntervention           .        .  
## age                       .        .  
## sexMale                   .        .  
## locationRight          0.7926   0.7544
## hx.ischYes                .        .  
## afibYes                   .        .  
## dmYes                 -2.0687  -2.1403
## mrankin1                  .        .  
## mrankin2                  .        .  
## mrankin..2                .        .  
## sbp                       .        .  
## iv.altepYes               .        .  
## aspects                   .        .  
## ia.occlusICA.with.M1      .    -1.1670
## ia.occlusM1               .        .  
## ia.occlusM2               .        .  
## ia.occlusA1.or.A2         .        .  
## extra.icaYes              .        .  
## time.rand                 .        .  
##                                       
## nVar                     2        3   
## r2                     0.028    0.040 
## BIC                   -1.3226  -1.3148
## post prob              0.045    0.044

imageplot.bma(search)

Discuss: We have a lot of suggested models but I usually choose the 1st one: dm and ia.occlus variables.

4.2 Step 4: Building the suggested model

Please refer to this link for detail discussion:
7. Linear regression - Frequentist approach: https://rpubs.com/minhtri/936322

Command: newModel<-lm(outcome ~ predictor(s), data = dataFrame, na.action = an action))

with na.action:

na.action = na.fail: This is the default and it simply means that if there are any missing values the model will fail to compute.
na.action = na.omit or na.exclude: This estimates the model but excludes any case that has any missing data on any variable in the model.

# Model:
model_B <- lm(nihss ~ dm + ia.occlus, data = df1)
summary(model_B)

## 
## Call:
## lm(formula = nihss ~ dm + ia.occlus, data = df1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.4949 -3.4949 -0.4949  3.6222 10.6222 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           14.7500     2.3016   6.409 3.48e-10 ***
## dmYes                 -2.1440     0.6317  -3.394 0.000746 ***
## ia.occlusICA with M1   2.6278     2.3370   1.124 0.261387    
## ia.occlusM1            3.7449     2.3176   1.616 0.106774    
## ia.occlusM2            4.6186     2.4225   1.907 0.057167 .  
## ia.occlusA1 or A2      7.5833     3.5157   2.157 0.031498 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.603 on 486 degrees of freedom
## Multiple R-squared:  0.04498,    Adjusted R-squared:  0.03515 
## F-statistic: 4.577 on 5 and 486 DF,  p-value: 0.000433

# Importance of each variable:
library(relaimpo)
calc.relimp(model_B, type = "lmg")

## Response variable: nihss 
## Total response variance: 21.9612 
## Analysis based on 492 observations 
## 
## 5 Regressors: 
## Some regressors combined in groups: 
##         Group  ia.occlus : ia.occlusICA with M1 ia.occlusM1 ia.occlusM2 ia.occlusA1 or A2 
## 
##  Relative importance of 2 (groups of) regressors assessed: 
##  ia.occlus dm 
##  
## Proportion of variance explained by model: 4.5%
## Metrics are not normalized (rela=FALSE). 
## 
## Relative importance metrics: 
## 
##                  lmg
## ia.occlus 0.02346548
## dm        0.02150961
## 
## Average coefficients for different model sizes: 
## 
##                         1group   2groups
## dm                   -2.028185 -2.144006
## ia.occlusICA with M1  2.416667  2.627819
## ia.occlusM1           3.465873  3.744934
## ia.occlusM2           4.223684  4.618633
## ia.occlusA1 or A2     7.583333  7.583333

# CI of model:
confint(model_B, level = 0.95)

##                           2.5 %     97.5 %
## (Intercept)          10.2277040 19.2722960
## dmYes                -3.3852794 -0.9027328
## ia.occlusICA with M1 -1.9641130  7.2197506
## ia.occlusM1          -0.8088512  8.2987195
## ia.occlusM2          -0.1412201  9.3784855
## ia.occlusA1 or A2     0.6754121 14.4912546

Discuss

F = 4.577, p-value < 0.05: the regression model overall predicts nihss significantly well.
R2 = 0.045 => 2 variables dm and ia.occlus account for 4.5% of the variation in nihss. Overall, dm explained 2.2% and ia.occlus explainted 2.3%.

=> Good model or there is sufficient evidence to say that there is a significant relationship between nihss and dm + ia.occlus.

CI of dm and ia.occlusA1 or A2 do not cross 0 => The predictors dm and ia.cclusA1 A2 have relationship with the outcome nihss.

CI of dm < 0: negative relationship.
CI of ia.occlusA1 A2 > 0: positive relationship.

t value: dm and ia.occlus have similar effect on nihss scores.
Equation: nihss = 14.75 - 2.14dmYes + 2.62 ICAM1 + 3.74M1 + 4.62M2 + 7.6 A1A2

If the value is positive we can tell that there is a positive relationship between the predictor and the outcome, whereas a negative coefficient represents a negative relationship - dm predictors have negative b-values while ia.occlus have positive value.
Intercept (b0) = 14.75: When dm=0 and ia.occlus=0, the model predict that NIH Stroke Scale Score will be 14.75 => Intercept is meaningless, just for adjust the height of the line.

4.3 Step 5: Testing the accuracy of the Model

For this section, refer to the link for more detail:
7. Linear regression - Basic commands: https://rpubs.com/minhtri/936322

5 Stepwise method approach (not recommend)

A few notes:

Two common strategies for adding or removing variables in a multiple regression model are called backward elimination and forward selection. These techniques are often referred to as stepwise model selection strategies, because they add or delete one variable at a time as they step through the candidate predictors.
Variables can be included (or excluded) based on the expert opinion. If you are studying certain variables, you might choose to leave it in the model regardless of whether it’s significant or yield a higher adjust R2.
p value vs R2:
** If the sole goal is to improve prediction accuracy - use Radj2.The common case is machine learning applications.
** When we care about understanding which variables are statistically significant predictors of the response, or if produce a simple model at the potential cost of little prediction accuracy, then the p-value approach is preferred.

5.1 Forward selection

Choosing based on highest F (R value) in each variable: Model starts with no independent variable, add each variable with highest F value – significant p value until p value is not significance.

null = lm( nihss ~ 1, data = df1)
fw = step(null,nihss ~ trt + age + sex + location + hx.isch + afib + dm + 
            mrankin + sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand, 
          direction = "forward", 
          test = "F", 
          )

## Start:  AIC=1520.92
## nihss ~ 1
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## + dm         1   219.815 10563 1512.8 10.1967 0.001497 **
## + ia.occlus  4   240.905 10542 1517.8  2.7822 0.026272 * 
## + iv.altep   1    97.467 10686 1518.5  4.4695 0.035009 * 
## + location   1    68.160 10715 1519.8  3.1170 0.078100 . 
## + time.rand  1    58.532 10724 1520.2  2.6743 0.102619   
## + hx.isch    1    57.228 10726 1520.3  2.6144 0.106539   
## <none>                   10783 1520.9                    
## + aspects    1    18.130 10765 1522.1  0.8253 0.364090   
## + sbp        1    16.954 10766 1522.2  0.7717 0.380137   
## + age        1    14.026 10769 1522.3  0.6382 0.424752   
## + sex        1     4.794 10778 1522.7  0.2180 0.640811   
## + afib       1     2.115 10781 1522.8  0.0961 0.756635   
## + trt        1     0.261 10783 1522.9  0.0118 0.913366   
## + extra.ica  1     0.196 10783 1522.9  0.0089 0.924944   
## + mrankin    3    63.224 10720 1524.0  0.9594 0.411675   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1512.79
## nihss ~ dm
## 
##             Df Sum of Sq   RSS    AIC F value  Pr(>F)  
## + ia.occlus  4   265.149 10298 1508.3  3.1283 0.01474 *
## + location   1    76.769 10486 1511.2  3.5799 0.05907 .
## + iv.altep   1    74.728 10488 1511.3  3.4840 0.06256 .
## + time.rand  1    56.015 10507 1512.2  2.6069 0.10704  
## + hx.isch    1    51.633 10512 1512.4  2.4020 0.12183  
## <none>                   10563 1512.8                  
## + aspects    1    18.698 10544 1513.9  0.8671 0.35222  
## + sbp        1    16.495 10547 1514.0  0.7648 0.38227  
## + age        1     9.128 10554 1514.4  0.4229 0.51578  
## + sex        1     5.904 10557 1514.5  0.2735 0.60125  
## + afib       1     3.785 10559 1514.6  0.1753 0.67562  
## + trt        1     0.198 10563 1514.8  0.0092 0.92381  
## + extra.ica  1     0.087 10563 1514.8  0.0040 0.94938  
## + mrankin    3    48.984 10514 1516.5  0.7563 0.51908  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1508.28
## nihss ~ dm + ia.occlus
## 
##             Df Sum of Sq   RSS    AIC F value  Pr(>F)  
## + iv.altep   1    81.392 10217 1506.4  3.8638 0.04991 *
## + location   1    67.720 10230 1507.0  3.2105 0.07379 .
## + time.rand  1    57.464 10240 1507.5  2.7215 0.09965 .
## + hx.isch    1    44.597 10253 1508.2  2.1095 0.14703  
## <none>                   10298 1508.3                  
## + extra.ica  1    22.020 10276 1509.2  1.0393 0.30849  
## + aspects    1    11.911 10286 1509.7  0.5616 0.45398  
## + sex        1    11.714 10286 1509.7  0.5523 0.45773  
## + age        1    11.419 10287 1509.7  0.5384 0.46346  
## + sbp        1    10.543 10287 1509.8  0.4970 0.48115  
## + afib       1     2.741 10295 1510.2  0.1291 0.71951  
## + trt        1     0.829 10297 1510.2  0.0390 0.84344  
## + mrankin    3    44.597 10253 1512.2  0.7003 0.55223  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1506.38
## nihss ~ dm + ia.occlus + iv.altep
## 
##             Df Sum of Sq   RSS    AIC F value  Pr(>F)  
## + location   1    67.153 10149 1505.1  3.2023 0.07416 .
## + time.rand  1    55.089 10162 1505.7  2.6239 0.10592  
## + hx.isch    1    47.358 10169 1506.1  2.2540 0.13392  
## <none>                   10217 1506.4                  
## + extra.ica  1    24.586 10192 1507.2  1.1676 0.28044  
## + sbp        1    11.939 10205 1507.8  0.5663 0.45212  
## + aspects    1    11.695 10205 1507.8  0.5547 0.45677  
## + sex        1    11.192 10205 1507.8  0.5308 0.46662  
## + age        1     8.873 10208 1508.0  0.4207 0.51690  
## + afib       1     1.273 10215 1508.3  0.0603 0.80611  
## + trt        1     0.259 10216 1508.4  0.0123 0.91185  
## + mrankin    3    38.532 10178 1510.5  0.6082 0.60990  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1505.13
## nihss ~ dm + ia.occlus + iv.altep + location
## 
##             Df Sum of Sq   RSS    AIC F value Pr(>F)
## + hx.isch    1    53.768 10096 1504.5  2.5724 0.1094
## + time.rand  1    46.695 10103 1504.9  2.2324 0.1358
## <none>                   10149 1505.1               
## + extra.ica  1    22.918 10126 1506.0  1.0931 0.2963
## + sbp        1    13.479 10136 1506.5  0.6423 0.4233
## + aspects    1    13.442 10136 1506.5  0.6405 0.4239
## + sex        1    11.838 10138 1506.6  0.5640 0.4530
## + age        1     8.877 10141 1506.7  0.4228 0.5158
## + afib       1     1.474 10148 1507.1  0.0702 0.7912
## + trt        1     1.078 10148 1507.1  0.0513 0.8209
## + mrankin    3    48.245 10101 1508.8  0.7658 0.5136
## 
## Step:  AIC=1504.52
## nihss ~ dm + ia.occlus + iv.altep + location + hx.isch
## 
##             Df Sum of Sq   RSS    AIC F value Pr(>F)
## + time.rand  1    55.501 10040 1503.8  2.6644 0.1033
## <none>                   10096 1504.5               
## + extra.ica  1    21.372 10074 1505.5  1.0225 0.3124
## + sbp        1    13.375 10082 1505.9  0.6394 0.4243
## + aspects    1    12.356 10083 1505.9  0.5906 0.4426
## + sex        1    12.014 10084 1505.9  0.5743 0.4489
## + age        1     6.901 10089 1506.2  0.3297 0.5661
## + afib       1     1.531 10094 1506.5  0.0731 0.7870
## + trt        1     0.509 10095 1506.5  0.0243 0.8762
## + mrankin    3    43.868 10052 1508.4  0.6983 0.5534
## 
## Step:  AIC=1503.81
## nihss ~ dm + ia.occlus + iv.altep + location + hx.isch + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value Pr(>F)
## <none>                   10040 1503.8               
## + extra.ica  1    22.227 10018 1504.7  1.0672 0.3021
## + sex        1    15.715 10024 1505.0  0.7541 0.3856
## + aspects    1    11.709 10028 1505.2  0.5616 0.4540
## + sbp        1    11.381 10029 1505.2  0.5459 0.4604
## + age        1     7.778 10032 1505.4  0.3729 0.5417
## + afib       1     2.211 10038 1505.7  0.1059 0.7450
## + trt        1     1.822 10038 1505.7  0.0873 0.7678
## + mrankin    3    48.203  9992 1507.4  0.7703 0.5111

summary(fw)

## 
## Call:
## lm(formula = nihss ~ dm + ia.occlus + iv.altep + location + hx.isch + 
##     time.rand, data = df1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9289 -3.7957 -0.6397  3.6614 10.9729 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          13.960066   2.451532   5.694 2.15e-08 ***
## dmYes                -2.035565   0.629020  -3.236   0.0013 ** 
## ia.occlusICA with M1  3.184185   2.324867   1.370   0.1714    
## ia.occlusM1           4.238781   2.303610   1.840   0.0664 .  
## ia.occlusM2           5.053979   2.409919   2.097   0.0365 *  
## ia.occlusA1 or A2     7.903601   3.488460   2.266   0.0239 *  
## iv.altepYes           1.309981   0.661897   1.979   0.0484 *  
## locationRight         0.731638   0.415601   1.760   0.0790 .  
## hx.ischYes           -1.149006   0.662940  -1.733   0.0837 .  
## time.rand            -0.005289   0.003240  -1.632   0.1033    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.564 on 482 degrees of freedom
## Multiple R-squared:  0.06888,    Adjusted R-squared:  0.0515 
## F-statistic: 3.962 on 9 and 482 DF,  p-value: 6.87e-05

fw = step(null,nihss ~ trt + age + sex + location + hx.isch + afib + dm + 
            mrankin + sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand, 
     direction = "forward", test = "F", k = log(nrow(df1)))

## Start:  AIC=1525.12
## nihss ~ 1
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## + dm         1   219.815 10563 1521.2 10.1967 0.001497 **
## <none>                   10783 1525.1                    
## + iv.altep   1    97.467 10686 1526.8  4.4695 0.035009 * 
## + location   1    68.160 10715 1528.2  3.1170 0.078100 . 
## + time.rand  1    58.532 10724 1528.6  2.6743 0.102619   
## + hx.isch    1    57.228 10726 1528.7  2.6144 0.106539   
## + aspects    1    18.130 10765 1530.5  0.8253 0.364090   
## + sbp        1    16.954 10766 1530.5  0.7717 0.380137   
## + age        1    14.026 10769 1530.7  0.6382 0.424752   
## + sex        1     4.794 10778 1531.1  0.2180 0.640811   
## + afib       1     2.115 10781 1531.2  0.0961 0.756635   
## + trt        1     0.261 10783 1531.3  0.0118 0.913366   
## + extra.ica  1     0.196 10783 1531.3  0.0089 0.924944   
## + ia.occlus  4   240.905 10542 1538.8  2.7822 0.026272 * 
## + mrankin    3    63.224 10720 1540.8  0.9594 0.411675   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1521.19
## nihss ~ dm
## 
##             Df Sum of Sq   RSS    AIC F value  Pr(>F)  
## <none>                   10563 1521.2                  
## + location   1    76.769 10486 1523.8  3.5799 0.05907 .
## + iv.altep   1    74.728 10488 1523.9  3.4840 0.06256 .
## + time.rand  1    56.015 10507 1524.8  2.6069 0.10704  
## + hx.isch    1    51.633 10512 1525.0  2.4020 0.12183  
## + aspects    1    18.698 10544 1526.5  0.8671 0.35222  
## + sbp        1    16.495 10547 1526.6  0.7648 0.38227  
## + age        1     9.128 10554 1527.0  0.4229 0.51578  
## + sex        1     5.904 10557 1527.1  0.2735 0.60125  
## + afib       1     3.785 10559 1527.2  0.1753 0.67562  
## + trt        1     0.198 10563 1527.4  0.0092 0.92381  
## + extra.ica  1     0.087 10563 1527.4  0.0040 0.94938  
## + ia.occlus  4   265.149 10298 1533.5  3.1283 0.01474 *
## + mrankin    3    48.984 10514 1537.5  0.7563 0.51908  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(fw)

## 
## Call:
## lm(formula = nihss ~ dm, data = df1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2413 -4.2413 -0.2413  3.7587  9.7869 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.2413     0.2236  81.564   <2e-16 ***
## dmYes        -2.0282     0.6352  -3.193   0.0015 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.643 on 490 degrees of freedom
## Multiple R-squared:  0.02039,    Adjusted R-squared:  0.01839 
## F-statistic:  10.2 on 1 and 490 DF,  p-value: 0.001497

Build regression model from a set of candidate predictor variables by entering predictors based on p values, in a stepwise manner until there is no variable left to enter any more.
ols_step_forward_p (model = full, details = FALSE, progress = FALSE)
details = FALSE (default) : Logical; if TRUE, will print the regression result at each step.
progress = FALSE (default): Logical; if TRUE, will display variable selection progress.

library(olsrr)
full = lm(nihss ~ ., data = df1)
ols_step_forward_p(model = full, details = FALSE, progress = FALSE)

## 
##                              Selection Summary                              
## ---------------------------------------------------------------------------
##         Variable                   Adj.                                        
## Step     Entered     R-Square    R-Square     C(p)         AIC        RMSE     
## ---------------------------------------------------------------------------
##    1    dm             0.0204      0.0184    14.2244    2911.0257    4.6430    
##    2    ia.occlus      0.0450      0.0351     3.6179    2906.5181    4.6032    
##    3    iv.altep       0.0525      0.0408     1.7481    2904.6141    4.5897    
##    4    location       0.0588      0.0451     0.5553    2903.3695    4.5793    
##    5    hx.isch        0.0637      0.0482    -0.0011    2902.7562    4.5719    
##    6    time.rand      0.0689      0.0515    -0.6399    2902.0439    4.5640    
## ---------------------------------------------------------------------------

Discuss:
* There are 1 or 6 variables in the final suggesting equation for each approach.
* When define: k = log(nrow(album2)), only dm is chosen.

5.2 Backward selection

full = lm( nihss ~ trt + age + sex + location + hx.isch + afib + dm + 
            mrankin + sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand, 
           data = df1)
bw = step (full,  direction = "backward", test = "F")

## Start:  AIC=1518.25
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + mrankin + 
##     sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand
## 
##             Df Sum of Sq     RSS    AIC F value   Pr(>F)   
## - mrankin    3    41.585  9969.0 1514.3  0.6590 0.577592   
## - afib       1     0.277  9927.7 1516.3  0.0132 0.908730   
## - trt        1     5.168  9932.6 1516.5  0.2457 0.620341   
## - age        1     6.363  9933.8 1516.6  0.3025 0.582568   
## - sbp        1    10.256  9937.7 1516.8  0.4876 0.485341   
## - sex        1    10.945  9938.4 1516.8  0.5204 0.471039   
## - aspects    1    13.158  9940.6 1516.9  0.6256 0.429371   
## - extra.ica  1    17.627  9945.1 1517.1  0.8381 0.360417   
## <none>                    9927.4 1518.2                    
## - hx.isch    1    53.058  9980.5 1518.9  2.5226 0.112892   
## - time.rand  1    64.485  9991.9 1519.4  3.0659 0.080599 . 
## - iv.altep   1    71.467  9998.9 1519.8  3.3979 0.065907 . 
## - location   1    77.022 10004.5 1520.1  3.6620 0.056270 . 
## - ia.occlus  4   264.650 10192.1 1523.2  3.1457 0.014336 * 
## - dm         1   202.440 10129.9 1526.2  9.6250 0.002035 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1514.31
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + sbp + 
##     iv.altep + aspects + ia.occlus + extra.ica + time.rand
## 
##             Df Sum of Sq     RSS    AIC F value   Pr(>F)   
## - afib       1     0.274  9969.3 1512.3  0.0131 0.909017   
## - trt        1     4.709  9973.7 1512.5  0.2244 0.635957   
## - age        1     5.790  9974.8 1512.6  0.2759 0.599664   
## - sbp        1    10.377  9979.4 1512.8  0.4944 0.482308   
## - aspects    1    11.874  9980.9 1512.9  0.5658 0.452311   
## - sex        1    15.759  9984.8 1513.1  0.7509 0.386638   
## - extra.ica  1    21.640  9990.7 1513.4  1.0311 0.310419   
## <none>                    9969.0 1514.3                    
## - hx.isch    1    56.859 10025.9 1515.1  2.7092 0.100432   
## - time.rand  1    60.799 10029.8 1515.3  2.8969 0.089404 . 
## - location   1    67.993 10037.0 1515.7  3.2397 0.072509 . 
## - iv.altep   1    79.208 10048.2 1516.2  3.7741 0.052643 . 
## - ia.occlus  4   274.299 10243.3 1519.7  3.2674 0.011673 * 
## - dm         1   217.646 10186.7 1522.9 10.3703 0.001368 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1512.32
## nihss ~ trt + age + sex + location + hx.isch + dm + sbp + iv.altep + 
##     aspects + ia.occlus + extra.ica + time.rand
## 
##             Df Sum of Sq     RSS    AIC F value   Pr(>F)   
## - trt        1     4.677  9974.0 1510.5  0.2233 0.636754   
## - age        1     5.966  9975.3 1510.6  0.2849 0.593774   
## - sbp        1    10.544  9979.8 1510.8  0.5034 0.478341   
## - aspects    1    12.390  9981.7 1510.9  0.5916 0.442181   
## - sex        1    15.765  9985.1 1511.1  0.7527 0.386053   
## - extra.ica  1    22.060  9991.4 1511.4  1.0533 0.305269   
## <none>                    9969.3 1512.3                    
## - hx.isch    1    56.772 10026.1 1513.1  2.7107 0.100339   
## - time.rand  1    60.585 10029.9 1513.3  2.8927 0.089633 . 
## - location   1    67.939 10037.2 1513.7  3.2438 0.072325 . 
## - iv.altep   1    80.057 10049.3 1514.3  3.8225 0.051155 . 
## - ia.occlus  4   275.072 10244.4 1517.7  3.2834 0.011359 * 
## - dm         1   217.376 10186.7 1520.9 10.3790 0.001362 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1510.55
## nihss ~ age + sex + location + hx.isch + dm + sbp + iv.altep + 
##     aspects + ia.occlus + extra.ica + time.rand
## 
##             Df Sum of Sq     RSS    AIC F value   Pr(>F)   
## - age        1     5.378  9979.3 1508.8  0.2572 0.612278   
## - aspects    1    10.938  9984.9 1509.1  0.5231 0.469879   
## - sbp        1    11.088  9985.1 1509.1  0.5303 0.466853   
## - sex        1    15.706  9989.7 1509.3  0.7511 0.386550   
## - extra.ica  1    20.611  9994.6 1509.6  0.9857 0.321291   
## <none>                    9974.0 1510.5                    
## - time.rand  1    58.108 10032.1 1511.4  2.7790 0.096166 . 
## - hx.isch    1    58.350 10032.3 1511.4  2.7905 0.095478 . 
## - location   1    66.143 10040.1 1511.8  3.1633 0.075949 . 
## - iv.altep   1    82.193 10056.2 1512.6  3.9308 0.047981 * 
## - ia.occlus  4   272.278 10246.2 1515.8  3.2554 0.011909 * 
## - dm         1   217.126 10191.1 1519.2 10.3840 0.001358 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1508.82
## nihss ~ sex + location + hx.isch + dm + sbp + iv.altep + aspects + 
##     ia.occlus + extra.ica + time.rand
## 
##             Df Sum of Sq     RSS    AIC F value   Pr(>F)   
## - aspects    1    11.006  9990.4 1507.4  0.5272 0.468148   
## - sbp        1    11.496  9990.8 1507.4  0.5507 0.458409   
## - sex        1    16.361  9995.7 1507.6  0.7837 0.376467   
## - extra.ica  1    21.741 10001.1 1507.9  1.0414 0.308014   
## <none>                    9979.3 1508.8                    
## - time.rand  1    57.432 10036.8 1509.6  2.7509 0.097853 . 
## - hx.isch    1    60.104 10039.5 1509.8  2.8789 0.090397 . 
## - location   1    66.295 10045.6 1510.1  3.1755 0.075387 . 
## - iv.altep   1    84.415 10063.8 1511.0  4.0434 0.044905 * 
## - ia.occlus  4   271.871 10251.2 1514.0  3.2556 0.011904 * 
## - dm         1   220.320 10199.7 1517.6 10.5531 0.001242 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1507.36
## nihss ~ sex + location + hx.isch + dm + sbp + iv.altep + ia.occlus + 
##     extra.ica + time.rand
## 
##             Df Sum of Sq     RSS    AIC F value   Pr(>F)   
## - sbp        1    11.430 10001.8 1505.9  0.5480 0.459491   
## - sex        1    16.628 10007.0 1506.2  0.7973 0.372363   
## - extra.ica  1    22.248 10012.6 1506.5  1.0667 0.302216   
## <none>                    9990.4 1507.4                    
## - time.rand  1    58.122 10048.5 1508.2  2.7867 0.095700 . 
## - hx.isch    1    61.240 10051.6 1508.4  2.9362 0.087261 . 
## - location   1    64.747 10055.1 1508.5  3.1044 0.078719 . 
## - iv.altep   1    84.670 10075.0 1509.5  4.0596 0.044479 * 
## - ia.occlus  4   278.710 10269.1 1512.9  3.3408 0.010305 * 
## - dm         1   219.625 10210.0 1516.1 10.5302 0.001257 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1505.92
## nihss ~ sex + location + hx.isch + dm + iv.altep + ia.occlus + 
##     extra.ica + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value  Pr(>F)   
## - sex        1    16.161 10018 1504.7  0.7756 0.37894   
## - extra.ica  1    22.673 10024 1505.0  1.0881 0.29742   
## <none>                   10002 1505.9                   
## - time.rand  1    60.148 10062 1506.9  2.8866 0.08997 . 
## - hx.isch    1    61.484 10063 1506.9  2.9507 0.08649 . 
## - location   1    63.226 10065 1507.0  3.0343 0.08216 . 
## - iv.altep   1    83.294 10085 1508.0  3.9974 0.04613 * 
## - ia.occlus  4   285.643 10287 1511.8  3.4271 0.00890 **
## - dm         1   220.891 10223 1514.7 10.6009 0.00121 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1504.72
## nihss ~ location + hx.isch + dm + iv.altep + ia.occlus + extra.ica + 
##     time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - extra.ica  1    22.227 10040 1503.8  1.0672 0.302093   
## <none>                   10018 1504.7                    
## - time.rand  1    56.356 10074 1505.5  2.7059 0.100633   
## - hx.isch    1    60.952 10079 1505.7  2.9265 0.087780 . 
## - location   1    62.790 10081 1505.8  3.0148 0.083148 . 
## - iv.altep   1    83.980 10102 1506.8  4.0322 0.045198 * 
## - ia.occlus  4   278.375 10296 1510.2  3.3415 0.010290 * 
## - dm         1   218.748 10237 1513.3 10.5029 0.001274 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1503.81
## nihss ~ location + hx.isch + dm + iv.altep + ia.occlus + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## <none>                   10040 1503.8                    
## - time.rand  1    55.501 10096 1504.5  2.6644 0.103267   
## - hx.isch    1    62.573 10103 1504.9  3.0040 0.083700 . 
## - location   1    64.555 10105 1505.0  3.0991 0.078969 . 
## - iv.altep   1    81.591 10122 1505.8  3.9170 0.048370 * 
## - ia.occlus  4   256.372 10296 1508.2  3.0769 0.016073 * 
## - dm         1   218.140 10258 1512.4 10.4723 0.001295 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(bw)

## 
## Call:
## lm(formula = nihss ~ location + hx.isch + dm + iv.altep + ia.occlus + 
##     time.rand, data = df1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9289 -3.7957 -0.6397  3.6614 10.9729 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          13.960066   2.451532   5.694 2.15e-08 ***
## locationRight         0.731638   0.415601   1.760   0.0790 .  
## hx.ischYes           -1.149006   0.662940  -1.733   0.0837 .  
## dmYes                -2.035565   0.629020  -3.236   0.0013 ** 
## iv.altepYes           1.309981   0.661897   1.979   0.0484 *  
## ia.occlusICA with M1  3.184185   2.324867   1.370   0.1714    
## ia.occlusM1           4.238781   2.303610   1.840   0.0664 .  
## ia.occlusM2           5.053979   2.409919   2.097   0.0365 *  
## ia.occlusA1 or A2     7.903601   3.488460   2.266   0.0239 *  
## time.rand            -0.005289   0.003240  -1.632   0.1033    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.564 on 482 degrees of freedom
## Multiple R-squared:  0.06888,    Adjusted R-squared:  0.0515 
## F-statistic: 3.962 on 9 and 482 DF,  p-value: 6.87e-05

full = lm( nihss ~ trt + age + sex + location + hx.isch + afib + dm + 
            mrankin + sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand, 
           data = df1)
bw = step (full,  direction = "backward", test = "F", k = log(nrow(df1)))

## Start:  AIC=1602.22
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + mrankin + 
##     sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand
## 
##             Df Sum of Sq     RSS    AIC F value   Pr(>F)   
## - mrankin    3    41.585  9969.0 1585.7  0.6590 0.577592   
## - ia.occlus  4   264.650 10192.1 1590.4  3.1457 0.014336 * 
## - afib       1     0.277  9927.7 1596.0  0.0132 0.908730   
## - trt        1     5.168  9932.6 1596.3  0.2457 0.620341   
## - age        1     6.363  9933.8 1596.3  0.3025 0.582568   
## - sbp        1    10.256  9937.7 1596.5  0.4876 0.485341   
## - sex        1    10.945  9938.4 1596.6  0.5204 0.471039   
## - aspects    1    13.158  9940.6 1596.7  0.6256 0.429371   
## - extra.ica  1    17.627  9945.1 1596.9  0.8381 0.360417   
## - hx.isch    1    53.058  9980.5 1598.7  2.5226 0.112892   
## - time.rand  1    64.485  9991.9 1599.2  3.0659 0.080599 . 
## - iv.altep   1    71.467  9998.9 1599.5  3.3979 0.065907 . 
## - location   1    77.022 10004.5 1599.8  3.6620 0.056270 . 
## <none>                    9927.4 1602.2                    
## - dm         1   202.440 10129.9 1606.0  9.6250 0.002035 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1585.68
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + sbp + 
##     iv.altep + aspects + ia.occlus + extra.ica + time.rand
## 
##             Df Sum of Sq     RSS    AIC F value   Pr(>F)   
## - ia.occlus  4   274.299 10243.3 1574.2  3.2674 0.011673 * 
## - afib       1     0.274  9969.3 1579.5  0.0131 0.909017   
## - trt        1     4.709  9973.7 1579.7  0.2244 0.635957   
## - age        1     5.790  9974.8 1579.8  0.2759 0.599664   
## - sbp        1    10.377  9979.4 1580.0  0.4944 0.482308   
## - aspects    1    11.874  9980.9 1580.1  0.5658 0.452311   
## - sex        1    15.759  9984.8 1580.3  0.7509 0.386638   
## - extra.ica  1    21.640  9990.7 1580.5  1.0311 0.310419   
## - hx.isch    1    56.859 10025.9 1582.3  2.7092 0.100432   
## - time.rand  1    60.799 10029.8 1582.5  2.8969 0.089404 . 
## - location   1    67.993 10037.0 1582.8  3.2397 0.072509 . 
## - iv.altep   1    79.208 10048.2 1583.4  3.7741 0.052643 . 
## <none>                    9969.0 1585.7                    
## - dm         1   217.646 10186.7 1590.1 10.3703 0.001368 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1574.24
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + sbp + 
##     iv.altep + aspects + extra.ica + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - extra.ica  1     0.125 10243 1568.0  0.0058 0.939100   
## - afib       1     1.048 10244 1568.1  0.0490 0.824908   
## - trt        1     1.930 10245 1568.1  0.0902 0.763993   
## - age        1     5.020 10248 1568.3  0.2347 0.628265   
## - sex        1     8.816 10252 1568.5  0.4123 0.521128   
## - sbp        1    17.359 10261 1568.9  0.8117 0.368064   
## - aspects    1    17.609 10261 1568.9  0.8234 0.364636   
## - time.rand  1    56.710 10300 1570.8  2.6519 0.104084   
## - hx.isch    1    65.754 10309 1571.2  3.0748 0.080153 . 
## - iv.altep   1    71.385 10315 1571.5  3.3381 0.068314 . 
## - location   1    79.166 10322 1571.8  3.7020 0.054941 . 
## <none>                   10243 1574.2                    
## - dm         1   195.434 10439 1577.3  9.1389 0.002636 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1568.05
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + sbp + 
##     iv.altep + aspects + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - afib       1     1.093 10244 1561.9  0.0512 0.821041   
## - trt        1     1.873 10245 1561.9  0.0878 0.767164   
## - age        1     5.127 10249 1562.1  0.2403 0.624239   
## - sex        1     8.815 10252 1562.3  0.4131 0.520717   
## - sbp        1    17.347 10261 1562.7  0.8128 0.367731   
## - aspects    1    17.580 10261 1562.7  0.8238 0.364537   
## - time.rand  1    56.621 10300 1564.6  2.6532 0.103997   
## - hx.isch    1    65.810 10309 1565.0  3.0838 0.079713 . 
## - iv.altep   1    71.288 10315 1565.3  3.3405 0.068214 . 
## - location   1    79.199 10323 1565.6  3.7112 0.054638 . 
## <none>                   10243 1568.0                    
## - dm         1   195.571 10439 1571.2  9.1643 0.002601 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1561.91
## nihss ~ trt + age + sex + location + hx.isch + dm + sbp + iv.altep + 
##     aspects + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - trt        1     1.814 10246 1555.8  0.0852 0.770504   
## - age        1     5.439 10250 1556.0  0.2554 0.613547   
## - sex        1     8.817 10253 1556.1  0.4140 0.520265   
## - sbp        1    17.763 10262 1556.6  0.8340 0.361577   
## - aspects    1    18.712 10263 1556.6  0.8785 0.349071   
## - time.rand  1    56.170 10301 1558.4  2.6373 0.105036   
## - hx.isch    1    65.647 10310 1558.8  3.0822 0.079788 . 
## - iv.altep   1    72.519 10317 1559.2  3.4049 0.065618 . 
## - location   1    79.075 10324 1559.5  3.7127 0.054588 . 
## <none>                   10244 1561.9                    
## - dm         1   194.712 10439 1565.0  9.1421 0.002631 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1555.79
## nihss ~ age + sex + location + hx.isch + dm + sbp + iv.altep + 
##     aspects + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - age        1     5.086 10251 1549.8  0.2392 0.624987   
## - sex        1     8.805 10255 1550.0  0.4142 0.520147   
## - aspects    1    17.706 10264 1550.5  0.8329 0.361891   
## - sbp        1    18.129 10264 1550.5  0.8528 0.356222   
## - time.rand  1    54.857 10301 1552.2  2.5805 0.108840   
## - hx.isch    1    66.750 10313 1552.8  3.1400 0.077025 . 
## - iv.altep   1    74.005 10320 1553.1  3.4813 0.062674 . 
## - location   1    77.930 10324 1553.3  3.6659 0.056128 . 
## <none>                   10246 1555.8                    
## - dm         1   194.711 10441 1558.9  9.1594 0.002607 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1549.84
## nihss ~ sex + location + hx.isch + dm + sbp + iv.altep + aspects + 
##     time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - sex        1     9.260 10261 1544.1  0.4363 0.509224   
## - aspects    1    17.857 10269 1544.5  0.8414 0.359467   
## - sbp        1    18.586 10270 1544.5  0.8757 0.349853   
## - time.rand  1    54.002 10305 1546.2  2.5443 0.111345   
## - hx.isch    1    68.518 10320 1546.9  3.2283 0.073002 . 
## - iv.altep   1    76.143 10328 1547.3  3.5875 0.058813 . 
## - location   1    78.021 10330 1547.4  3.6760 0.055790 . 
## <none>                   10251 1549.8                    
## - dm         1   197.913 10449 1553.0  9.3248 0.002386 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1544.09
## nihss ~ location + hx.isch + dm + sbp + iv.altep + aspects + 
##     time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - aspects    1    18.001 10279 1538.8  0.8491 0.357266   
## - sbp        1    18.081 10279 1538.8  0.8529 0.356196   
## - time.rand  1    51.384 10312 1540.3  2.4238 0.120160   
## - hx.isch    1    68.007 10329 1541.1  3.2079 0.073909 . 
## - iv.altep   1    76.670 10337 1541.5  3.6166 0.057801 . 
## - location   1    77.555 10338 1541.6  3.6583 0.056381 . 
## <none>                   10261 1544.1                    
## - dm         1   196.582 10457 1547.2  9.2728 0.002453 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1538.75
## nihss ~ location + hx.isch + dm + sbp + iv.altep + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - sbp        1    17.848 10296 1533.4  0.8421 0.359242   
## - time.rand  1    52.239 10331 1535.0  2.4649 0.117066   
## - hx.isch    1    69.956 10349 1535.9  3.3009 0.069860 . 
## - location   1    75.541 10354 1536.2  3.5644 0.059628 . 
## - iv.altep   1    77.177 10356 1536.2  3.6416 0.056943 . 
## <none>                   10279 1538.8                    
## - dm         1   195.785 10474 1541.8  9.2381 0.002498 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1533.4
## nihss ~ location + hx.isch + dm + iv.altep + time.rand
## 
##             Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - time.rand  1    54.285 10351 1529.8  2.5623 0.110091   
## - hx.isch    1    70.254 10367 1530.5  3.3160 0.069222 . 
## - location   1    73.922 10370 1530.7  3.4891 0.062375 . 
## - iv.altep   1    75.248 10372 1530.8  3.5517 0.060080 . 
## <none>                   10296 1533.4                    
## - dm         1   196.360 10493 1536.5  9.2682 0.002458 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1529.79
## nihss ~ location + hx.isch + dm + iv.altep
## 
##            Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - hx.isch   1    61.291 10412 1526.5  2.8837 0.090118 . 
## - iv.altep  1    77.127 10428 1527.2  3.6288 0.057377 . 
## - location  1    83.207 10434 1527.5  3.9148 0.048424 * 
## <none>                  10351 1529.8                    
## - dm        1   199.377 10550 1533.0  9.3806 0.002314 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1526.5
## nihss ~ location + dm + iv.altep
## 
##            Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - iv.altep  1    74.245 10486 1523.8  3.4797 0.062725 . 
## - location  1    76.286 10488 1523.9  3.5754 0.059233 . 
## <none>                  10412 1526.5                    
## - dm        1   205.253 10617 1529.9  9.6199 0.002036 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1523.8
## nihss ~ location + dm
## 
##            Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## - location  1    76.769 10563 1521.2  3.5799 0.059073 . 
## <none>                  10486 1523.8                    
## - dm        1   228.424 10715 1528.2 10.6519 0.001177 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=1521.19
## nihss ~ dm
## 
##        Df Sum of Sq   RSS    AIC F value   Pr(>F)   
## <none>              10563 1521.2                    
## - dm    1    219.81 10783 1525.1  10.197 0.001497 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(bw)

## 
## Call:
## lm(formula = nihss ~ dm, data = df1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2413 -4.2413 -0.2413  3.7587  9.7869 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.2413     0.2236  81.564   <2e-16 ***
## dmYes        -2.0282     0.6352  -3.193   0.0015 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.643 on 490 degrees of freedom
## Multiple R-squared:  0.02039,    Adjusted R-squared:  0.01839 
## F-statistic:  10.2 on 1 and 490 DF,  p-value: 0.001497

ols_step_backward_p (model = full, details = FALSE)

## 
## 
##                             Elimination Summary                             
## ---------------------------------------------------------------------------
##         Variable                   Adj.                                        
## Step     Removed     R-Square    R-Square     C(p)         AIC        RMSE     
## ---------------------------------------------------------------------------
##    1    afib           0.0793      0.0443     8.0132    2914.5019    4.5814    
##    2    trt            0.0788      0.0458     6.2572    2912.7562    4.5777    
##    3    age            0.0783      0.0472     4.5385    2911.0491    4.5742    
##    4    mrankin        0.0745      0.0494     4.4682    2907.0543    4.5692    
##    5    aspects        0.0735      0.0503     2.9915    2905.5966    4.5669    
##    6    sbp            0.0724      0.0512     1.5350    2904.1592    4.5648    
##    7    sex            0.0709      0.0516     0.3033    2902.9535    4.5637    
##    8    extra.ica      0.0689      0.0515    -0.6399    2902.0439    4.5640    
## ---------------------------------------------------------------------------

Discuss

Different variables are chosen when using backward selection.

5.3 Both selections

Build regression model from a set of candidate predictor variables by entering and removing predictors based on p values, in a stepwise manner until there is no variable left to enter or remove any more. ols_step_both_p (model = full, details = FALSE)

ols_step_both_p (model = full, details = FALSE)

## 
##                               Stepwise Selection Summary                                
## ---------------------------------------------------------------------------------------
##                       Added/                   Adj.                                        
## Step    Variable     Removed     R-Square    R-Square     C(p)         AIC        RMSE     
## ---------------------------------------------------------------------------------------
##    1       dm        addition       0.020       0.018    14.2240    2911.0257    4.6430    
##    2    ia.occlus    addition       0.045       0.035     3.6180    2906.5181    4.6032    
##    3    iv.altep     addition       0.053       0.041     1.7480    2904.6141    4.5897    
##    4    location     addition       0.059       0.045     0.5550    2903.3695    4.5793    
## ---------------------------------------------------------------------------------------

Discuss

For both selection approach, only 4 variables are chosen.

6 Criterion-Based method (recommend for checking)

Fits all regressions involving one regressor, two regressors, three regressors, and so on. It tests all possible subsets of the set of potential independent variables.

6.1 Model contains all variables:

library(olsrr)
full = lm( nihss ~ dm + mrankin + sbp + ia.occlus + extra.ica, 
           data = df1)
full.mod = ols_step_all_possible(model = full)
full.mod

##    Index N                         Predictors     R-Square Adj. R-Square
## 4      1 1                          ia.occlus 2.234128e-02  0.0143112271
## 1      2 1                                 dm 2.038541e-02  0.0183861934
## 2      3 1                            mrankin 5.863365e-03 -0.0002481311
## 3      4 1                                sbp 1.572324e-03 -0.0004652839
## 5      5 1                          extra.ica 1.813095e-05 -0.0020226484
## 8      6 2                       dm ia.occlus 4.497508e-02  0.0351497252
## 11     7 2                  mrankin ia.occlus 2.797134e-02  0.0139130773
## 6      8 2                         dm mrankin 2.492816e-02  0.0169193516
## 15     9 2                ia.occlus extra.ica 2.428961e-02  0.0142514334
## 13    10 2                      sbp ia.occlus 2.343476e-02  0.0133877907
## 7     11 2                             dm sbp 2.191510e-02  0.0179147544
## 9     12 2                       dm extra.ica 2.039349e-02  0.0163869197
## 10    13 2                        mrankin sbp 7.341047e-03 -0.0008122097
## 12    14 2                  mrankin extra.ica 5.863624e-03 -0.0023017671
## 14    15 2                      sbp extra.ica 1.591304e-03 -0.0024921675
## 17    16 3               dm mrankin ia.occlus 4.911097e-02  0.0333612509
## 21    17 3             dm ia.occlus extra.ica 4.701721e-02  0.0352277308
## 19    18 3                   dm sbp ia.occlus 4.595280e-02  0.0341501550
## 24    19 3        mrankin ia.occlus extra.ica 2.955505e-02  0.0134814320
## 22    20 3              mrankin sbp ia.occlus 2.906237e-02  0.0129805867
## 16    21 3                     dm mrankin sbp 2.636686e-02  0.0163500626
## 25    22 3            sbp ia.occlus extra.ica 2.534585e-02  0.0132882715
## 18    23 3               dm mrankin extra.ica 2.492816e-02  0.0148965521
## 20    24 3                   dm sbp extra.ica 2.192376e-02  0.0159109923
## 23    25 3              mrankin sbp extra.ica 7.341365e-03 -0.0028711721
## 28    26 4     dm mrankin ia.occlus extra.ica 5.086244e-02  0.0331399500
## 26    27 4           dm mrankin sbp ia.occlus 5.008766e-02  0.0323507074
## 29    28 4         dm sbp ia.occlus extra.ica 4.795866e-02  0.0341894626
## 30    29 4    mrankin sbp ia.occlus extra.ica 3.060974e-02  0.0125090910
## 27    30 4           dm mrankin sbp extra.ica 2.636687e-02  0.0143219201
## 31    31 5 dm mrankin sbp ia.occlus extra.ica 5.180254e-02  0.0320894926
##    Mallow's Cp
## 4   7.94505763
## 1   8.93722874
## 2  16.30394569
## 3  18.48069771
## 5  19.26910590
## 8  -1.53658119
## 11  7.08904795
## 6   8.63279106
## 15  8.95671314
## 13  9.39035895
## 7  10.16124777
## 9  10.93312817
## 10 17.55434970
## 12 18.30381405
## 14 20.47106950
## 17 -1.63462402
## 21 -0.57250650
## 19 -0.03255545
## 24  8.28566554
## 22  8.53559389
## 16  9.90296509
## 25 10.42090427
## 18 10.63279096
## 20 12.15685756
## 23 19.55418800
## 28 -0.52310673
## 26 -0.13007972
## 29  0.94991788
## 30  9.75064687
## 27 11.90296424
## 31  1.00000000

plot(full.mod)

Discuss

In the plot, the x-axis is the number of predictors. The figure plots all possible predictor model and it highlights the best fit model (model ID).
We can see that model 6 is the best for 2 predictors / model 31 is the best for 5 predictors.
nihss = dm + ia.occlus

6.2 Creating a best subset model:

best.mod = ols_step_best_subset(full)
best.mod

##              Best Subsets Regression             
## -------------------------------------------------
## Model Index    Predictors
## -------------------------------------------------
##      1         ia.occlus                          
##      2         dm ia.occlus                       
##      3         dm mrankin ia.occlus               
##      4         dm mrankin ia.occlus extra.ica     
##      5         dm mrankin sbp ia.occlus extra.ica 
## -------------------------------------------------
## 
##                                                        Subsets Regression Summary                                                       
## ----------------------------------------------------------------------------------------------------------------------------------------
##                        Adj.        Pred                                                                                                  
## Model    R-Square    R-Square    R-Square     C(p)         AIC         SBIC          SBC          MSEP         FPE       HSP       APC  
## ----------------------------------------------------------------------------------------------------------------------------------------
##   1        0.0223      0.0143      0.0021     7.9451    2916.0424    1513.7749    2941.2332    10585.0734    21.7349    0.0443    0.9856 
##   2        0.0450      0.0351      0.0218    -1.5366    2906.5181    1504.3755    2935.9075    10361.2066    21.3185    0.0434    0.9667 
##   3        0.0491      0.0334      0.0139    -1.6346    2910.3828    1504.3063    2952.3676    10337.5193    21.4011    0.0436    0.9665 
##   4        0.0509      0.0331      0.0108    -0.5231    2911.4758    1505.4578    2957.6590    10339.7098    21.4492    0.0437    0.9686 
##   5        0.0518      0.0321       0.008     1.0000    2912.9882    1507.0261    2963.3699    10350.7664    21.5157    0.0438    0.9716 
## ----------------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria 
##  SBIC: Sawa's Bayesian Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
##  MSEP: Estimated error of prediction, assuming multivariate normality 
##  FPE: Final Prediction Error 
##  HSP: Hocking's Sp 
##  APC: Amemiya Prediction Criteria

plot(best.mod)

Discuss

According to the plot and the result, model number 2 is the best – nihss ~ dm + ia.occlus

Conclusion

Both Bayesian Model Average: BMA package and Criterion-Based method suggest similar best model with 2 variables dm + ia.occlus.
Therefore, these 2 approaches should be applied to select the best variables.
Then, apply Frequentist or Bayesian approach for further analysis:
Linear regression - Frequentist approach: https://rpubs.com/minhtri/968628
Linear regression - Bayesian approach: https://rpubs.com/minhtri/968636

7 References

https://statisticsbyjim.com/regression/model-specification-variable-selection/

Linear regression - Choosing optimal Model

Minh Tri

2022-11-10

1 Simulated fakestroke data

2 Data preprocessing step

2.1 Step 1: Data wrangling

2.2 Step 2: Missingness checking and imputation when necessary

3 Model Selection in Statistics

7 References