knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
Previous sections:
Note: This analysis is used for personal study purpose. In this section, I will summerize some basic infomation about linear regression - model selection based on several sources.
R Packages:
# More packages will be shown during the analysis process
# Load the required library
library(tidyverse) # Data Wrangling
library(conflicted) # Dealing with conflict package
library(readxl) # Read csv file
Dealing with Conflicts
There is a lot of packages here, and sometimes individual functions are
in conflict. R’s default conflict resolution system gives precedence to
the most recently loaded package. This can make it hard to detect
conflicts, particularly when introduced by an update to an existing
package.
conflict_prefer("filter", "dplyr")
conflict_prefer("select", "dplyr")
conflict_prefer("Predict", "rms")
conflict_prefer("impute_median", "simputation")
conflict_prefer("summarize", "dplyr")
Data used in this notes
fakestroke.csv
(Source: https://github.com/THOMASELOVE/432-data/blob/master/data/fakestroke.csv)
The fakestroke.csv file contains the following 18 variables for 500 patients:
# To make a table in R markdown: 1st row: header, 2nd row: Alignment; the remaining row: for content
## |Variable | Description |
## |:------- | :---------- |
## |studyid | Study ID # (z001 through z500) |
| Variable | Description |
|---|---|
| studyid | Study ID # (z001 through z500) |
| trt | Treatment group (Intervention or Control) |
| age | Age in years |
| sex | Male or Female |
| nihss | NIH Stroke Scale Score (can range from 0-42; higher scores indicate more severe neurological deficits) |
| location | Stroke Location - Left or Right Hemisphere |
| hx.isch | History of Ischemic Stroke (Yes/No) |
| afib | Atrial Fibrillation (1 = Yes, 0 = No) |
| dm | Diabetes Mellitus (1 = Yes, 0 = No) |
| mrankin | Pre-stroke modified Rankin scale score (0, 1, 2 or > 2) indicating functional disability - complete range is 0 (no symptoms) to 6 (death) |
| sbp | Systolic blood pressure, in mm Hg |
| iv.altep | Treatment with IV alteplase (Yes/No) |
| time.iv | Time from stroke onset to start of IV alteplase (minutes) if iv.altep=Yes |
| aspects | Alberta Stroke Program Early Computed Tomography score, which measures extent of stroke from 0 - 10; higher scores indicate fewer early ischemic changes |
| ia.occlus | Intracranial arterial occlusion, based on vessel imaging - five categories3 |
| extra.ica | Extracranial ICA occlusion (1 = Yes, 0 = No) |
| time.rand | Time from stroke onset to study randomization, in minutes |
| time.punc | Time from stroke onset to groin puncture, in minutes (only if Intervention) |
Please refer to this section: The R environment - Data Wrangling: https://rpubs.com/minhtri/933332
df <- read_excel("D:/Statistics/R/R data/fakestroke.xlsx",
col_types = c("text", "text", "numeric",
"text", "numeric", "text", "text",
"numeric", "numeric", "text", "numeric",
"text", "numeric", "numeric", "text",
"numeric", "numeric", "numeric"))
names(df)
## [1] "studyid" "trt" "age" "sex" "nihss" "location"
## [7] "hx.isch" "afib" "dm" "mrankin" "sbp" "iv.altep"
## [13] "time.iv" "aspects" "ia.occlus" "extra.ica" "time.rand" "time.punc"
str(df)
## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
## $ studyid : chr [1:500] "z001" "z002" "z003" "z004" ...
## $ trt : chr [1:500] "Control" "Intervention" "Control" "Control" ...
## $ age : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
## $ sex : chr [1:500] "Male" "Male" "Female" "Male" ...
## $ nihss : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
## $ location : chr [1:500] "Right" "Left" "Right" "Left" ...
## $ hx.isch : chr [1:500] "No" "No" "No" "No" ...
## $ afib : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
## $ dm : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
## $ mrankin : chr [1:500] "2" "0" "0" "0" ...
## $ sbp : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
## $ iv.altep : chr [1:500] "Yes" "Yes" "No" "Yes" ...
## $ time.iv : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
## $ aspects : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
## $ ia.occlus: chr [1:500] "M1" "M1" "ICA with M1" "ICA with M1" ...
## $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
## $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
## $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...
df <- df %>% mutate_if(is.character, as.factor)
str(df)
## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
## $ studyid : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ trt : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
## $ age : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
## $ nihss : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
## $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
## $ hx.isch : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ afib : num [1:500] 0 1 0 0 0 0 0 0 1 0 ...
## $ dm : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
## $ mrankin : Factor w/ 4 levels "> 2","0","1",..: 4 2 2 2 2 4 2 2 4 2 ...
## $ sbp : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
## $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
## $ time.iv : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
## $ aspects : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
## $ ia.occlus: Factor w/ 6 levels "A1 or A2","ICA with M1",..: 4 4 2 2 4 1 4 4 4 4 ...
## $ extra.ica: num [1:500] 0 1 1 1 0 0 0 0 0 0 ...
## $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
## $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...
# Changing coding variable to categorical:
df$afib <- factor(df$afib , levels=0:1, labels=c("No", "Yes"))
df$dm <- factor(df$dm , levels=0:1, labels=c("No", "Yes"))
df$extra.ica <- factor(df$extra.ica, levels=0:1, labels=c("No", "Yes"))
# Checking after converting
str(df)
## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
## $ studyid : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ trt : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
## $ age : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
## $ nihss : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
## $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
## $ hx.isch : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ afib : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 2 1 ...
## $ dm : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ mrankin : Factor w/ 4 levels "> 2","0","1",..: 4 2 2 2 2 4 2 2 4 2 ...
## $ sbp : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
## $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
## $ time.iv : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
## $ aspects : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
## $ ia.occlus: Factor w/ 6 levels "A1 or A2","ICA with M1",..: 4 4 2 2 4 1 4 4 4 4 ...
## $ extra.ica: Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 1 1 ...
## $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
## $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...
# Relevel factors:
df$trt = factor(df$trt, levels=c("Control", "Intervention"))
df$mrankin = factor(df$mrankin, levels=c("0", "1", "2", "> 2"))
df$ia.occlus = factor(df$ia.occlus, levels=c("Intracranial ICA", "ICA with M1",
"M1", "M2", "A1 or A2"))
# Check order of factor after relevel:
str(df)
## tibble [500 x 18] (S3: tbl_df/tbl/data.frame)
## $ studyid : Factor w/ 500 levels "z001","z002",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ trt : Factor w/ 2 levels "Control","Intervention": 1 2 1 1 1 1 2 1 1 2 ...
## $ age : num [1:500] 53 51 68 28 91 34 75 89 75 26 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 2 1 ...
## $ nihss : num [1:500] 21 23 11 22 24 18 25 18 25 27 ...
## $ location : Factor w/ 2 levels "Left","Right": 2 1 2 1 2 1 2 2 1 2 ...
## $ hx.isch : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ afib : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 2 1 ...
## $ dm : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ mrankin : Factor w/ 4 levels "0","1","2","> 2": 3 1 1 1 1 3 1 1 3 1 ...
## $ sbp : num [1:500] 127 137 138 122 162 166 140 157 129 143 ...
## $ iv.altep : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 1 2 2 ...
## $ time.iv : num [1:500] 63 68 NA 78 121 78 97 NA 49 99 ...
## $ aspects : num [1:500] 10 10 10 10 8 5 10 9 6 10 ...
## $ ia.occlus: Factor w/ 5 levels "Intracranial ICA",..: 3 3 2 2 3 5 3 3 3 3 ...
## $ extra.ica: Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 1 1 ...
## $ time.rand: num [1:500] 139 118 178 160 214 154 122 147 271 141 ...
## $ time.punc: num [1:500] NA 281 NA NA NA NA 268 NA NA 259 ...
Package: VIM
Command: aggr (x, plot = T, prop = T, numbers = F, combined = F, sortVars = T, cex.axis=.7, gap=3)
For more infomation, please refer to the following link:
Dealing with missingness - Imputation: https://rpubs.com/minhtri/932007
Checking the percentage of missingness:
library(VIM)
aggr(df, plot = F, numbers = T, prop = T)
##
## Missings in variables:
## Variable Count
## sbp 1
## time.iv 55
## aspects 4
## ia.occlus 1
## extra.ica 1
## time.rand 2
## time.punc 267
aggr(df, plot = T, numbers = T, prop = T, combined= F, sortVars = T, cex.axis=.7, gap=3)
##
## Variables sorted by number of missings:
## Variable Count
## time.punc 0.534
## time.iv 0.110
## aspects 0.008
## time.rand 0.004
## sbp 0.002
## ia.occlus 0.002
## extra.ica 0.002
## studyid 0.000
## trt 0.000
## age 0.000
## sex 0.000
## nihss 0.000
## location 0.000
## hx.isch 0.000
## afib 0.000
## dm 0.000
## mrankin 0.000
## iv.altep 0.000
aggr(df, plot = T, numbers = T, prop = F, combined= F, sortVars = T, cex.axis=.7, gap=3)
##
## Variables sorted by number of missings:
## Variable Count
## time.punc 267
## time.iv 55
## aspects 4
## time.rand 2
## sbp 1
## ia.occlus 1
## extra.ica 1
## studyid 0
## trt 0
## age 0
## sex 0
## nihss 0
## location 0
## hx.isch 0
## afib 0
## dm 0
## mrankin 0
## iv.altep 0
Discuss
df1 = df %>% select (-time.punc, - time.iv, -studyid) %>% na.omit()
In statistics, model selection is a critical step. Your findings may be invalidated if you specify incorrectly and choose the incorrect model. When the independent variables and their functional form (i.e., curvature and interactions) incorrectly reflect the true connection existing in the data, this is referred to as specification error. Bias can be caused by specification errors, which can overstate, understate, or completely conceal the presence of underlying connections. In summary, you cannot rely on your outcomes! As a result, in order to pick the appropriate regression model, you must first grasp model selection in statistics.
There are several common techniques to model selection:
Bayesian Model Average (recommend).
Stepwise method approach.
Criterion-Based method ( slightly recommend).
To fit a Bayesian regression, we use the function bicreg from the BMA package. Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
Command:
seach = bicreg(x, y, wt = rep(1, length(y)), strict = FALSE, OR = 20, maxCol = 31, drop.factor.levels = TRUE, nbest = 150)
imageplot.bma(search, color = c(“red”, “blue”, “#FFFFD5”))
with: color: A vector of colors of length 3. The first color is the color to use when the variable estimate is positive, the second color is the color to use when the variable estimate is negative, and the third color is the color to use when the variable is not included in the model.
library(BMA)
X = df1 %>% select(-nihss)
Y = df1$nihss
search = bicreg(X, Y, strict = FALSE, OR = 20)
summary(search)
##
## Call:
## bicreg(x = X, y = Y, strict = FALSE, OR = 20)
##
##
## 32 models were selected
## Best 5 models (cumulative posterior probability = 0.503 ):
##
## p!=0 EV SD model 1 model 2 model 3
## Intercept 100.0 18.2925118 0.775666 18.5722 18.2413 17.4026
## trtIntervention 0.0 0.0000000 0.000000 . . .
## age 1.2 -0.0001306 0.001798 . . .
## sexMale 1.0 0.0027078 0.050565 . . .
## locationRight 11.4 0.0882728 0.283505 . . .
## hx.ischYes 5.0 -0.0500489 0.264613 . . .
## afibYes 0.0 0.0000000 0.000000 . . .
## dmYes 95.0 -1.9546750 0.764513 -2.1037 -2.0282 -2.0002
## mrankin1 1.2 -0.0067112 0.097064 . . .
## mrankin2 0.0 0.0000000 0.000000 . . .
## mrankin..2 4.3 0.0628149 0.365601 . . .
## sbp 2.3 -0.0001566 0.001620 . . .
## iv.altepYes 13.7 0.1769946 0.508889 . . 1.3084
## aspects 2.4 -0.0028924 0.027854 . . .
## ia.occlusICA.with.M1 52.2 -0.6299169 0.699413 -1.1983 . -1.2284
## ia.occlusM1 5.2 0.0243034 0.221142 . . .
## ia.occlusM2 5.6 0.0738953 0.371888 . . .
## ia.occlusA1.or.A2 4.8 0.1891645 1.028578 . . .
## extra.icaYes 1.2 0.0043076 0.064920 . . .
## time.rand 5.3 -0.0002669 0.001354 . . .
##
## nVar 2 1 3
## r2 0.033 0.020 0.041
## BIC -4.2249 -3.9371 -1.8940
## post prob 0.190 0.165 0.059
## model 4 model 5
## Intercept 17.8790 18.2187
## trtIntervention . .
## age . .
## sexMale . .
## locationRight 0.7926 0.7544
## hx.ischYes . .
## afibYes . .
## dmYes -2.0687 -2.1403
## mrankin1 . .
## mrankin2 . .
## mrankin..2 . .
## sbp . .
## iv.altepYes . .
## aspects . .
## ia.occlusICA.with.M1 . -1.1670
## ia.occlusM1 . .
## ia.occlusM2 . .
## ia.occlusA1.or.A2 . .
## extra.icaYes . .
## time.rand . .
##
## nVar 2 3
## r2 0.028 0.040
## BIC -1.3226 -1.3148
## post prob 0.045 0.044
imageplot.bma(search)
Discuss: We have a lot of suggested models but I usually choose the 1st one: dm and ia.occlus variables.
Please refer to this link for detail discussion:
7. Linear regression - Frequentist approach: https://rpubs.com/minhtri/936322
Command: newModel<-lm(outcome ~ predictor(s), data = dataFrame, na.action = an action))
with na.action:
# Model:
model_B <- lm(nihss ~ dm + ia.occlus, data = df1)
summary(model_B)
##
## Call:
## lm(formula = nihss ~ dm + ia.occlus, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4949 -3.4949 -0.4949 3.6222 10.6222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.7500 2.3016 6.409 3.48e-10 ***
## dmYes -2.1440 0.6317 -3.394 0.000746 ***
## ia.occlusICA with M1 2.6278 2.3370 1.124 0.261387
## ia.occlusM1 3.7449 2.3176 1.616 0.106774
## ia.occlusM2 4.6186 2.4225 1.907 0.057167 .
## ia.occlusA1 or A2 7.5833 3.5157 2.157 0.031498 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.603 on 486 degrees of freedom
## Multiple R-squared: 0.04498, Adjusted R-squared: 0.03515
## F-statistic: 4.577 on 5 and 486 DF, p-value: 0.000433
# Importance of each variable:
library(relaimpo)
calc.relimp(model_B, type = "lmg")
## Response variable: nihss
## Total response variance: 21.9612
## Analysis based on 492 observations
##
## 5 Regressors:
## Some regressors combined in groups:
## Group ia.occlus : ia.occlusICA with M1 ia.occlusM1 ia.occlusM2 ia.occlusA1 or A2
##
## Relative importance of 2 (groups of) regressors assessed:
## ia.occlus dm
##
## Proportion of variance explained by model: 4.5%
## Metrics are not normalized (rela=FALSE).
##
## Relative importance metrics:
##
## lmg
## ia.occlus 0.02346548
## dm 0.02150961
##
## Average coefficients for different model sizes:
##
## 1group 2groups
## dm -2.028185 -2.144006
## ia.occlusICA with M1 2.416667 2.627819
## ia.occlusM1 3.465873 3.744934
## ia.occlusM2 4.223684 4.618633
## ia.occlusA1 or A2 7.583333 7.583333
# CI of model:
confint(model_B, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) 10.2277040 19.2722960
## dmYes -3.3852794 -0.9027328
## ia.occlusICA with M1 -1.9641130 7.2197506
## ia.occlusM1 -0.8088512 8.2987195
## ia.occlusM2 -0.1412201 9.3784855
## ia.occlusA1 or A2 0.6754121 14.4912546
Discuss
F = 4.577, p-value < 0.05: the regression model overall predicts nihss significantly well.
R2 = 0.045 => 2 variables dm and ia.occlus account for 4.5% of the variation in nihss. Overall, dm explained 2.2% and ia.occlus explainted 2.3%.
=> Good model or there is sufficient evidence to say that there is a significant relationship between nihss and dm + ia.occlus.
t value: dm and ia.occlus have similar effect on nihss scores.
Equation: nihss = 14.75 - 2.14dmYes + 2.62 ICAM1 + 3.74M1 + 4.62M2 + 7.6 A1A2
If the value is positive we can tell that there is a positive
relationship between the predictor and the outcome, whereas a negative
coefficient represents a negative relationship - dm
predictors have negative b-values while ia.occlus have
positive value.
Intercept (b0) = 14.75: When dm=0 and ia.occlus=0, the model predict that NIH Stroke Scale Score will be 14.75 => Intercept is meaningless, just for adjust the height of the line.
For this section, refer to the link for more detail:
7. Linear regression - Basic commands: https://rpubs.com/minhtri/936322
A few notes:
null = lm( nihss ~ 1, data = df1)
fw = step(null,nihss ~ trt + age + sex + location + hx.isch + afib + dm +
mrankin + sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand,
direction = "forward",
test = "F",
)
## Start: AIC=1520.92
## nihss ~ 1
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## + dm 1 219.815 10563 1512.8 10.1967 0.001497 **
## + ia.occlus 4 240.905 10542 1517.8 2.7822 0.026272 *
## + iv.altep 1 97.467 10686 1518.5 4.4695 0.035009 *
## + location 1 68.160 10715 1519.8 3.1170 0.078100 .
## + time.rand 1 58.532 10724 1520.2 2.6743 0.102619
## + hx.isch 1 57.228 10726 1520.3 2.6144 0.106539
## <none> 10783 1520.9
## + aspects 1 18.130 10765 1522.1 0.8253 0.364090
## + sbp 1 16.954 10766 1522.2 0.7717 0.380137
## + age 1 14.026 10769 1522.3 0.6382 0.424752
## + sex 1 4.794 10778 1522.7 0.2180 0.640811
## + afib 1 2.115 10781 1522.8 0.0961 0.756635
## + trt 1 0.261 10783 1522.9 0.0118 0.913366
## + extra.ica 1 0.196 10783 1522.9 0.0089 0.924944
## + mrankin 3 63.224 10720 1524.0 0.9594 0.411675
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1512.79
## nihss ~ dm
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## + ia.occlus 4 265.149 10298 1508.3 3.1283 0.01474 *
## + location 1 76.769 10486 1511.2 3.5799 0.05907 .
## + iv.altep 1 74.728 10488 1511.3 3.4840 0.06256 .
## + time.rand 1 56.015 10507 1512.2 2.6069 0.10704
## + hx.isch 1 51.633 10512 1512.4 2.4020 0.12183
## <none> 10563 1512.8
## + aspects 1 18.698 10544 1513.9 0.8671 0.35222
## + sbp 1 16.495 10547 1514.0 0.7648 0.38227
## + age 1 9.128 10554 1514.4 0.4229 0.51578
## + sex 1 5.904 10557 1514.5 0.2735 0.60125
## + afib 1 3.785 10559 1514.6 0.1753 0.67562
## + trt 1 0.198 10563 1514.8 0.0092 0.92381
## + extra.ica 1 0.087 10563 1514.8 0.0040 0.94938
## + mrankin 3 48.984 10514 1516.5 0.7563 0.51908
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1508.28
## nihss ~ dm + ia.occlus
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## + iv.altep 1 81.392 10217 1506.4 3.8638 0.04991 *
## + location 1 67.720 10230 1507.0 3.2105 0.07379 .
## + time.rand 1 57.464 10240 1507.5 2.7215 0.09965 .
## + hx.isch 1 44.597 10253 1508.2 2.1095 0.14703
## <none> 10298 1508.3
## + extra.ica 1 22.020 10276 1509.2 1.0393 0.30849
## + aspects 1 11.911 10286 1509.7 0.5616 0.45398
## + sex 1 11.714 10286 1509.7 0.5523 0.45773
## + age 1 11.419 10287 1509.7 0.5384 0.46346
## + sbp 1 10.543 10287 1509.8 0.4970 0.48115
## + afib 1 2.741 10295 1510.2 0.1291 0.71951
## + trt 1 0.829 10297 1510.2 0.0390 0.84344
## + mrankin 3 44.597 10253 1512.2 0.7003 0.55223
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1506.38
## nihss ~ dm + ia.occlus + iv.altep
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## + location 1 67.153 10149 1505.1 3.2023 0.07416 .
## + time.rand 1 55.089 10162 1505.7 2.6239 0.10592
## + hx.isch 1 47.358 10169 1506.1 2.2540 0.13392
## <none> 10217 1506.4
## + extra.ica 1 24.586 10192 1507.2 1.1676 0.28044
## + sbp 1 11.939 10205 1507.8 0.5663 0.45212
## + aspects 1 11.695 10205 1507.8 0.5547 0.45677
## + sex 1 11.192 10205 1507.8 0.5308 0.46662
## + age 1 8.873 10208 1508.0 0.4207 0.51690
## + afib 1 1.273 10215 1508.3 0.0603 0.80611
## + trt 1 0.259 10216 1508.4 0.0123 0.91185
## + mrankin 3 38.532 10178 1510.5 0.6082 0.60990
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1505.13
## nihss ~ dm + ia.occlus + iv.altep + location
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## + hx.isch 1 53.768 10096 1504.5 2.5724 0.1094
## + time.rand 1 46.695 10103 1504.9 2.2324 0.1358
## <none> 10149 1505.1
## + extra.ica 1 22.918 10126 1506.0 1.0931 0.2963
## + sbp 1 13.479 10136 1506.5 0.6423 0.4233
## + aspects 1 13.442 10136 1506.5 0.6405 0.4239
## + sex 1 11.838 10138 1506.6 0.5640 0.4530
## + age 1 8.877 10141 1506.7 0.4228 0.5158
## + afib 1 1.474 10148 1507.1 0.0702 0.7912
## + trt 1 1.078 10148 1507.1 0.0513 0.8209
## + mrankin 3 48.245 10101 1508.8 0.7658 0.5136
##
## Step: AIC=1504.52
## nihss ~ dm + ia.occlus + iv.altep + location + hx.isch
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## + time.rand 1 55.501 10040 1503.8 2.6644 0.1033
## <none> 10096 1504.5
## + extra.ica 1 21.372 10074 1505.5 1.0225 0.3124
## + sbp 1 13.375 10082 1505.9 0.6394 0.4243
## + aspects 1 12.356 10083 1505.9 0.5906 0.4426
## + sex 1 12.014 10084 1505.9 0.5743 0.4489
## + age 1 6.901 10089 1506.2 0.3297 0.5661
## + afib 1 1.531 10094 1506.5 0.0731 0.7870
## + trt 1 0.509 10095 1506.5 0.0243 0.8762
## + mrankin 3 43.868 10052 1508.4 0.6983 0.5534
##
## Step: AIC=1503.81
## nihss ~ dm + ia.occlus + iv.altep + location + hx.isch + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 10040 1503.8
## + extra.ica 1 22.227 10018 1504.7 1.0672 0.3021
## + sex 1 15.715 10024 1505.0 0.7541 0.3856
## + aspects 1 11.709 10028 1505.2 0.5616 0.4540
## + sbp 1 11.381 10029 1505.2 0.5459 0.4604
## + age 1 7.778 10032 1505.4 0.3729 0.5417
## + afib 1 2.211 10038 1505.7 0.1059 0.7450
## + trt 1 1.822 10038 1505.7 0.0873 0.7678
## + mrankin 3 48.203 9992 1507.4 0.7703 0.5111
summary(fw)
##
## Call:
## lm(formula = nihss ~ dm + ia.occlus + iv.altep + location + hx.isch +
## time.rand, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9289 -3.7957 -0.6397 3.6614 10.9729
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.960066 2.451532 5.694 2.15e-08 ***
## dmYes -2.035565 0.629020 -3.236 0.0013 **
## ia.occlusICA with M1 3.184185 2.324867 1.370 0.1714
## ia.occlusM1 4.238781 2.303610 1.840 0.0664 .
## ia.occlusM2 5.053979 2.409919 2.097 0.0365 *
## ia.occlusA1 or A2 7.903601 3.488460 2.266 0.0239 *
## iv.altepYes 1.309981 0.661897 1.979 0.0484 *
## locationRight 0.731638 0.415601 1.760 0.0790 .
## hx.ischYes -1.149006 0.662940 -1.733 0.0837 .
## time.rand -0.005289 0.003240 -1.632 0.1033
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.564 on 482 degrees of freedom
## Multiple R-squared: 0.06888, Adjusted R-squared: 0.0515
## F-statistic: 3.962 on 9 and 482 DF, p-value: 6.87e-05
fw = step(null,nihss ~ trt + age + sex + location + hx.isch + afib + dm +
mrankin + sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand,
direction = "forward", test = "F", k = log(nrow(df1)))
## Start: AIC=1525.12
## nihss ~ 1
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## + dm 1 219.815 10563 1521.2 10.1967 0.001497 **
## <none> 10783 1525.1
## + iv.altep 1 97.467 10686 1526.8 4.4695 0.035009 *
## + location 1 68.160 10715 1528.2 3.1170 0.078100 .
## + time.rand 1 58.532 10724 1528.6 2.6743 0.102619
## + hx.isch 1 57.228 10726 1528.7 2.6144 0.106539
## + aspects 1 18.130 10765 1530.5 0.8253 0.364090
## + sbp 1 16.954 10766 1530.5 0.7717 0.380137
## + age 1 14.026 10769 1530.7 0.6382 0.424752
## + sex 1 4.794 10778 1531.1 0.2180 0.640811
## + afib 1 2.115 10781 1531.2 0.0961 0.756635
## + trt 1 0.261 10783 1531.3 0.0118 0.913366
## + extra.ica 1 0.196 10783 1531.3 0.0089 0.924944
## + ia.occlus 4 240.905 10542 1538.8 2.7822 0.026272 *
## + mrankin 3 63.224 10720 1540.8 0.9594 0.411675
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1521.19
## nihss ~ dm
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 10563 1521.2
## + location 1 76.769 10486 1523.8 3.5799 0.05907 .
## + iv.altep 1 74.728 10488 1523.9 3.4840 0.06256 .
## + time.rand 1 56.015 10507 1524.8 2.6069 0.10704
## + hx.isch 1 51.633 10512 1525.0 2.4020 0.12183
## + aspects 1 18.698 10544 1526.5 0.8671 0.35222
## + sbp 1 16.495 10547 1526.6 0.7648 0.38227
## + age 1 9.128 10554 1527.0 0.4229 0.51578
## + sex 1 5.904 10557 1527.1 0.2735 0.60125
## + afib 1 3.785 10559 1527.2 0.1753 0.67562
## + trt 1 0.198 10563 1527.4 0.0092 0.92381
## + extra.ica 1 0.087 10563 1527.4 0.0040 0.94938
## + ia.occlus 4 265.149 10298 1533.5 3.1283 0.01474 *
## + mrankin 3 48.984 10514 1537.5 0.7563 0.51908
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fw)
##
## Call:
## lm(formula = nihss ~ dm, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2413 -4.2413 -0.2413 3.7587 9.7869
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.2413 0.2236 81.564 <2e-16 ***
## dmYes -2.0282 0.6352 -3.193 0.0015 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.643 on 490 degrees of freedom
## Multiple R-squared: 0.02039, Adjusted R-squared: 0.01839
## F-statistic: 10.2 on 1 and 490 DF, p-value: 0.001497
Build regression model from a set of candidate predictor
variables by entering predictors based on p values, in
a stepwise manner until there is no variable left to enter any
more.
ols_step_forward_p (model = full, details = FALSE, progress =
FALSE)
details = FALSE (default) : Logical; if TRUE, will print the
regression result at each step.
progress = FALSE (default): Logical; if TRUE, will display variable selection progress.
library(olsrr)
full = lm(nihss ~ ., data = df1)
ols_step_forward_p(model = full, details = FALSE, progress = FALSE)
##
## Selection Summary
## ---------------------------------------------------------------------------
## Variable Adj.
## Step Entered R-Square R-Square C(p) AIC RMSE
## ---------------------------------------------------------------------------
## 1 dm 0.0204 0.0184 14.2244 2911.0257 4.6430
## 2 ia.occlus 0.0450 0.0351 3.6179 2906.5181 4.6032
## 3 iv.altep 0.0525 0.0408 1.7481 2904.6141 4.5897
## 4 location 0.0588 0.0451 0.5553 2903.3695 4.5793
## 5 hx.isch 0.0637 0.0482 -0.0011 2902.7562 4.5719
## 6 time.rand 0.0689 0.0515 -0.6399 2902.0439 4.5640
## ---------------------------------------------------------------------------
Discuss:
* There are 1 or 6 variables in the final suggesting equation for each
approach.
* When define: k = log(nrow(album2)), only
dm is chosen.
full = lm( nihss ~ trt + age + sex + location + hx.isch + afib + dm +
mrankin + sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand,
data = df1)
bw = step (full, direction = "backward", test = "F")
## Start: AIC=1518.25
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + mrankin +
## sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - mrankin 3 41.585 9969.0 1514.3 0.6590 0.577592
## - afib 1 0.277 9927.7 1516.3 0.0132 0.908730
## - trt 1 5.168 9932.6 1516.5 0.2457 0.620341
## - age 1 6.363 9933.8 1516.6 0.3025 0.582568
## - sbp 1 10.256 9937.7 1516.8 0.4876 0.485341
## - sex 1 10.945 9938.4 1516.8 0.5204 0.471039
## - aspects 1 13.158 9940.6 1516.9 0.6256 0.429371
## - extra.ica 1 17.627 9945.1 1517.1 0.8381 0.360417
## <none> 9927.4 1518.2
## - hx.isch 1 53.058 9980.5 1518.9 2.5226 0.112892
## - time.rand 1 64.485 9991.9 1519.4 3.0659 0.080599 .
## - iv.altep 1 71.467 9998.9 1519.8 3.3979 0.065907 .
## - location 1 77.022 10004.5 1520.1 3.6620 0.056270 .
## - ia.occlus 4 264.650 10192.1 1523.2 3.1457 0.014336 *
## - dm 1 202.440 10129.9 1526.2 9.6250 0.002035 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1514.31
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + sbp +
## iv.altep + aspects + ia.occlus + extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - afib 1 0.274 9969.3 1512.3 0.0131 0.909017
## - trt 1 4.709 9973.7 1512.5 0.2244 0.635957
## - age 1 5.790 9974.8 1512.6 0.2759 0.599664
## - sbp 1 10.377 9979.4 1512.8 0.4944 0.482308
## - aspects 1 11.874 9980.9 1512.9 0.5658 0.452311
## - sex 1 15.759 9984.8 1513.1 0.7509 0.386638
## - extra.ica 1 21.640 9990.7 1513.4 1.0311 0.310419
## <none> 9969.0 1514.3
## - hx.isch 1 56.859 10025.9 1515.1 2.7092 0.100432
## - time.rand 1 60.799 10029.8 1515.3 2.8969 0.089404 .
## - location 1 67.993 10037.0 1515.7 3.2397 0.072509 .
## - iv.altep 1 79.208 10048.2 1516.2 3.7741 0.052643 .
## - ia.occlus 4 274.299 10243.3 1519.7 3.2674 0.011673 *
## - dm 1 217.646 10186.7 1522.9 10.3703 0.001368 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1512.32
## nihss ~ trt + age + sex + location + hx.isch + dm + sbp + iv.altep +
## aspects + ia.occlus + extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - trt 1 4.677 9974.0 1510.5 0.2233 0.636754
## - age 1 5.966 9975.3 1510.6 0.2849 0.593774
## - sbp 1 10.544 9979.8 1510.8 0.5034 0.478341
## - aspects 1 12.390 9981.7 1510.9 0.5916 0.442181
## - sex 1 15.765 9985.1 1511.1 0.7527 0.386053
## - extra.ica 1 22.060 9991.4 1511.4 1.0533 0.305269
## <none> 9969.3 1512.3
## - hx.isch 1 56.772 10026.1 1513.1 2.7107 0.100339
## - time.rand 1 60.585 10029.9 1513.3 2.8927 0.089633 .
## - location 1 67.939 10037.2 1513.7 3.2438 0.072325 .
## - iv.altep 1 80.057 10049.3 1514.3 3.8225 0.051155 .
## - ia.occlus 4 275.072 10244.4 1517.7 3.2834 0.011359 *
## - dm 1 217.376 10186.7 1520.9 10.3790 0.001362 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1510.55
## nihss ~ age + sex + location + hx.isch + dm + sbp + iv.altep +
## aspects + ia.occlus + extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - age 1 5.378 9979.3 1508.8 0.2572 0.612278
## - aspects 1 10.938 9984.9 1509.1 0.5231 0.469879
## - sbp 1 11.088 9985.1 1509.1 0.5303 0.466853
## - sex 1 15.706 9989.7 1509.3 0.7511 0.386550
## - extra.ica 1 20.611 9994.6 1509.6 0.9857 0.321291
## <none> 9974.0 1510.5
## - time.rand 1 58.108 10032.1 1511.4 2.7790 0.096166 .
## - hx.isch 1 58.350 10032.3 1511.4 2.7905 0.095478 .
## - location 1 66.143 10040.1 1511.8 3.1633 0.075949 .
## - iv.altep 1 82.193 10056.2 1512.6 3.9308 0.047981 *
## - ia.occlus 4 272.278 10246.2 1515.8 3.2554 0.011909 *
## - dm 1 217.126 10191.1 1519.2 10.3840 0.001358 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1508.82
## nihss ~ sex + location + hx.isch + dm + sbp + iv.altep + aspects +
## ia.occlus + extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - aspects 1 11.006 9990.4 1507.4 0.5272 0.468148
## - sbp 1 11.496 9990.8 1507.4 0.5507 0.458409
## - sex 1 16.361 9995.7 1507.6 0.7837 0.376467
## - extra.ica 1 21.741 10001.1 1507.9 1.0414 0.308014
## <none> 9979.3 1508.8
## - time.rand 1 57.432 10036.8 1509.6 2.7509 0.097853 .
## - hx.isch 1 60.104 10039.5 1509.8 2.8789 0.090397 .
## - location 1 66.295 10045.6 1510.1 3.1755 0.075387 .
## - iv.altep 1 84.415 10063.8 1511.0 4.0434 0.044905 *
## - ia.occlus 4 271.871 10251.2 1514.0 3.2556 0.011904 *
## - dm 1 220.320 10199.7 1517.6 10.5531 0.001242 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1507.36
## nihss ~ sex + location + hx.isch + dm + sbp + iv.altep + ia.occlus +
## extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - sbp 1 11.430 10001.8 1505.9 0.5480 0.459491
## - sex 1 16.628 10007.0 1506.2 0.7973 0.372363
## - extra.ica 1 22.248 10012.6 1506.5 1.0667 0.302216
## <none> 9990.4 1507.4
## - time.rand 1 58.122 10048.5 1508.2 2.7867 0.095700 .
## - hx.isch 1 61.240 10051.6 1508.4 2.9362 0.087261 .
## - location 1 64.747 10055.1 1508.5 3.1044 0.078719 .
## - iv.altep 1 84.670 10075.0 1509.5 4.0596 0.044479 *
## - ia.occlus 4 278.710 10269.1 1512.9 3.3408 0.010305 *
## - dm 1 219.625 10210.0 1516.1 10.5302 0.001257 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1505.92
## nihss ~ sex + location + hx.isch + dm + iv.altep + ia.occlus +
## extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - sex 1 16.161 10018 1504.7 0.7756 0.37894
## - extra.ica 1 22.673 10024 1505.0 1.0881 0.29742
## <none> 10002 1505.9
## - time.rand 1 60.148 10062 1506.9 2.8866 0.08997 .
## - hx.isch 1 61.484 10063 1506.9 2.9507 0.08649 .
## - location 1 63.226 10065 1507.0 3.0343 0.08216 .
## - iv.altep 1 83.294 10085 1508.0 3.9974 0.04613 *
## - ia.occlus 4 285.643 10287 1511.8 3.4271 0.00890 **
## - dm 1 220.891 10223 1514.7 10.6009 0.00121 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1504.72
## nihss ~ location + hx.isch + dm + iv.altep + ia.occlus + extra.ica +
## time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - extra.ica 1 22.227 10040 1503.8 1.0672 0.302093
## <none> 10018 1504.7
## - time.rand 1 56.356 10074 1505.5 2.7059 0.100633
## - hx.isch 1 60.952 10079 1505.7 2.9265 0.087780 .
## - location 1 62.790 10081 1505.8 3.0148 0.083148 .
## - iv.altep 1 83.980 10102 1506.8 4.0322 0.045198 *
## - ia.occlus 4 278.375 10296 1510.2 3.3415 0.010290 *
## - dm 1 218.748 10237 1513.3 10.5029 0.001274 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1503.81
## nihss ~ location + hx.isch + dm + iv.altep + ia.occlus + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 10040 1503.8
## - time.rand 1 55.501 10096 1504.5 2.6644 0.103267
## - hx.isch 1 62.573 10103 1504.9 3.0040 0.083700 .
## - location 1 64.555 10105 1505.0 3.0991 0.078969 .
## - iv.altep 1 81.591 10122 1505.8 3.9170 0.048370 *
## - ia.occlus 4 256.372 10296 1508.2 3.0769 0.016073 *
## - dm 1 218.140 10258 1512.4 10.4723 0.001295 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(bw)
##
## Call:
## lm(formula = nihss ~ location + hx.isch + dm + iv.altep + ia.occlus +
## time.rand, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9289 -3.7957 -0.6397 3.6614 10.9729
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.960066 2.451532 5.694 2.15e-08 ***
## locationRight 0.731638 0.415601 1.760 0.0790 .
## hx.ischYes -1.149006 0.662940 -1.733 0.0837 .
## dmYes -2.035565 0.629020 -3.236 0.0013 **
## iv.altepYes 1.309981 0.661897 1.979 0.0484 *
## ia.occlusICA with M1 3.184185 2.324867 1.370 0.1714
## ia.occlusM1 4.238781 2.303610 1.840 0.0664 .
## ia.occlusM2 5.053979 2.409919 2.097 0.0365 *
## ia.occlusA1 or A2 7.903601 3.488460 2.266 0.0239 *
## time.rand -0.005289 0.003240 -1.632 0.1033
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.564 on 482 degrees of freedom
## Multiple R-squared: 0.06888, Adjusted R-squared: 0.0515
## F-statistic: 3.962 on 9 and 482 DF, p-value: 6.87e-05
full = lm( nihss ~ trt + age + sex + location + hx.isch + afib + dm +
mrankin + sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand,
data = df1)
bw = step (full, direction = "backward", test = "F", k = log(nrow(df1)))
## Start: AIC=1602.22
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + mrankin +
## sbp + iv.altep + aspects + ia.occlus + extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - mrankin 3 41.585 9969.0 1585.7 0.6590 0.577592
## - ia.occlus 4 264.650 10192.1 1590.4 3.1457 0.014336 *
## - afib 1 0.277 9927.7 1596.0 0.0132 0.908730
## - trt 1 5.168 9932.6 1596.3 0.2457 0.620341
## - age 1 6.363 9933.8 1596.3 0.3025 0.582568
## - sbp 1 10.256 9937.7 1596.5 0.4876 0.485341
## - sex 1 10.945 9938.4 1596.6 0.5204 0.471039
## - aspects 1 13.158 9940.6 1596.7 0.6256 0.429371
## - extra.ica 1 17.627 9945.1 1596.9 0.8381 0.360417
## - hx.isch 1 53.058 9980.5 1598.7 2.5226 0.112892
## - time.rand 1 64.485 9991.9 1599.2 3.0659 0.080599 .
## - iv.altep 1 71.467 9998.9 1599.5 3.3979 0.065907 .
## - location 1 77.022 10004.5 1599.8 3.6620 0.056270 .
## <none> 9927.4 1602.2
## - dm 1 202.440 10129.9 1606.0 9.6250 0.002035 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1585.68
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + sbp +
## iv.altep + aspects + ia.occlus + extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - ia.occlus 4 274.299 10243.3 1574.2 3.2674 0.011673 *
## - afib 1 0.274 9969.3 1579.5 0.0131 0.909017
## - trt 1 4.709 9973.7 1579.7 0.2244 0.635957
## - age 1 5.790 9974.8 1579.8 0.2759 0.599664
## - sbp 1 10.377 9979.4 1580.0 0.4944 0.482308
## - aspects 1 11.874 9980.9 1580.1 0.5658 0.452311
## - sex 1 15.759 9984.8 1580.3 0.7509 0.386638
## - extra.ica 1 21.640 9990.7 1580.5 1.0311 0.310419
## - hx.isch 1 56.859 10025.9 1582.3 2.7092 0.100432
## - time.rand 1 60.799 10029.8 1582.5 2.8969 0.089404 .
## - location 1 67.993 10037.0 1582.8 3.2397 0.072509 .
## - iv.altep 1 79.208 10048.2 1583.4 3.7741 0.052643 .
## <none> 9969.0 1585.7
## - dm 1 217.646 10186.7 1590.1 10.3703 0.001368 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1574.24
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + sbp +
## iv.altep + aspects + extra.ica + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - extra.ica 1 0.125 10243 1568.0 0.0058 0.939100
## - afib 1 1.048 10244 1568.1 0.0490 0.824908
## - trt 1 1.930 10245 1568.1 0.0902 0.763993
## - age 1 5.020 10248 1568.3 0.2347 0.628265
## - sex 1 8.816 10252 1568.5 0.4123 0.521128
## - sbp 1 17.359 10261 1568.9 0.8117 0.368064
## - aspects 1 17.609 10261 1568.9 0.8234 0.364636
## - time.rand 1 56.710 10300 1570.8 2.6519 0.104084
## - hx.isch 1 65.754 10309 1571.2 3.0748 0.080153 .
## - iv.altep 1 71.385 10315 1571.5 3.3381 0.068314 .
## - location 1 79.166 10322 1571.8 3.7020 0.054941 .
## <none> 10243 1574.2
## - dm 1 195.434 10439 1577.3 9.1389 0.002636 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1568.05
## nihss ~ trt + age + sex + location + hx.isch + afib + dm + sbp +
## iv.altep + aspects + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - afib 1 1.093 10244 1561.9 0.0512 0.821041
## - trt 1 1.873 10245 1561.9 0.0878 0.767164
## - age 1 5.127 10249 1562.1 0.2403 0.624239
## - sex 1 8.815 10252 1562.3 0.4131 0.520717
## - sbp 1 17.347 10261 1562.7 0.8128 0.367731
## - aspects 1 17.580 10261 1562.7 0.8238 0.364537
## - time.rand 1 56.621 10300 1564.6 2.6532 0.103997
## - hx.isch 1 65.810 10309 1565.0 3.0838 0.079713 .
## - iv.altep 1 71.288 10315 1565.3 3.3405 0.068214 .
## - location 1 79.199 10323 1565.6 3.7112 0.054638 .
## <none> 10243 1568.0
## - dm 1 195.571 10439 1571.2 9.1643 0.002601 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1561.91
## nihss ~ trt + age + sex + location + hx.isch + dm + sbp + iv.altep +
## aspects + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - trt 1 1.814 10246 1555.8 0.0852 0.770504
## - age 1 5.439 10250 1556.0 0.2554 0.613547
## - sex 1 8.817 10253 1556.1 0.4140 0.520265
## - sbp 1 17.763 10262 1556.6 0.8340 0.361577
## - aspects 1 18.712 10263 1556.6 0.8785 0.349071
## - time.rand 1 56.170 10301 1558.4 2.6373 0.105036
## - hx.isch 1 65.647 10310 1558.8 3.0822 0.079788 .
## - iv.altep 1 72.519 10317 1559.2 3.4049 0.065618 .
## - location 1 79.075 10324 1559.5 3.7127 0.054588 .
## <none> 10244 1561.9
## - dm 1 194.712 10439 1565.0 9.1421 0.002631 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1555.79
## nihss ~ age + sex + location + hx.isch + dm + sbp + iv.altep +
## aspects + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - age 1 5.086 10251 1549.8 0.2392 0.624987
## - sex 1 8.805 10255 1550.0 0.4142 0.520147
## - aspects 1 17.706 10264 1550.5 0.8329 0.361891
## - sbp 1 18.129 10264 1550.5 0.8528 0.356222
## - time.rand 1 54.857 10301 1552.2 2.5805 0.108840
## - hx.isch 1 66.750 10313 1552.8 3.1400 0.077025 .
## - iv.altep 1 74.005 10320 1553.1 3.4813 0.062674 .
## - location 1 77.930 10324 1553.3 3.6659 0.056128 .
## <none> 10246 1555.8
## - dm 1 194.711 10441 1558.9 9.1594 0.002607 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1549.84
## nihss ~ sex + location + hx.isch + dm + sbp + iv.altep + aspects +
## time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - sex 1 9.260 10261 1544.1 0.4363 0.509224
## - aspects 1 17.857 10269 1544.5 0.8414 0.359467
## - sbp 1 18.586 10270 1544.5 0.8757 0.349853
## - time.rand 1 54.002 10305 1546.2 2.5443 0.111345
## - hx.isch 1 68.518 10320 1546.9 3.2283 0.073002 .
## - iv.altep 1 76.143 10328 1547.3 3.5875 0.058813 .
## - location 1 78.021 10330 1547.4 3.6760 0.055790 .
## <none> 10251 1549.8
## - dm 1 197.913 10449 1553.0 9.3248 0.002386 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1544.09
## nihss ~ location + hx.isch + dm + sbp + iv.altep + aspects +
## time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - aspects 1 18.001 10279 1538.8 0.8491 0.357266
## - sbp 1 18.081 10279 1538.8 0.8529 0.356196
## - time.rand 1 51.384 10312 1540.3 2.4238 0.120160
## - hx.isch 1 68.007 10329 1541.1 3.2079 0.073909 .
## - iv.altep 1 76.670 10337 1541.5 3.6166 0.057801 .
## - location 1 77.555 10338 1541.6 3.6583 0.056381 .
## <none> 10261 1544.1
## - dm 1 196.582 10457 1547.2 9.2728 0.002453 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1538.75
## nihss ~ location + hx.isch + dm + sbp + iv.altep + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - sbp 1 17.848 10296 1533.4 0.8421 0.359242
## - time.rand 1 52.239 10331 1535.0 2.4649 0.117066
## - hx.isch 1 69.956 10349 1535.9 3.3009 0.069860 .
## - location 1 75.541 10354 1536.2 3.5644 0.059628 .
## - iv.altep 1 77.177 10356 1536.2 3.6416 0.056943 .
## <none> 10279 1538.8
## - dm 1 195.785 10474 1541.8 9.2381 0.002498 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1533.4
## nihss ~ location + hx.isch + dm + iv.altep + time.rand
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - time.rand 1 54.285 10351 1529.8 2.5623 0.110091
## - hx.isch 1 70.254 10367 1530.5 3.3160 0.069222 .
## - location 1 73.922 10370 1530.7 3.4891 0.062375 .
## - iv.altep 1 75.248 10372 1530.8 3.5517 0.060080 .
## <none> 10296 1533.4
## - dm 1 196.360 10493 1536.5 9.2682 0.002458 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1529.79
## nihss ~ location + hx.isch + dm + iv.altep
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - hx.isch 1 61.291 10412 1526.5 2.8837 0.090118 .
## - iv.altep 1 77.127 10428 1527.2 3.6288 0.057377 .
## - location 1 83.207 10434 1527.5 3.9148 0.048424 *
## <none> 10351 1529.8
## - dm 1 199.377 10550 1533.0 9.3806 0.002314 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1526.5
## nihss ~ location + dm + iv.altep
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - iv.altep 1 74.245 10486 1523.8 3.4797 0.062725 .
## - location 1 76.286 10488 1523.9 3.5754 0.059233 .
## <none> 10412 1526.5
## - dm 1 205.253 10617 1529.9 9.6199 0.002036 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1523.8
## nihss ~ location + dm
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - location 1 76.769 10563 1521.2 3.5799 0.059073 .
## <none> 10486 1523.8
## - dm 1 228.424 10715 1528.2 10.6519 0.001177 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=1521.19
## nihss ~ dm
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 10563 1521.2
## - dm 1 219.81 10783 1525.1 10.197 0.001497 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(bw)
##
## Call:
## lm(formula = nihss ~ dm, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2413 -4.2413 -0.2413 3.7587 9.7869
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.2413 0.2236 81.564 <2e-16 ***
## dmYes -2.0282 0.6352 -3.193 0.0015 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.643 on 490 degrees of freedom
## Multiple R-squared: 0.02039, Adjusted R-squared: 0.01839
## F-statistic: 10.2 on 1 and 490 DF, p-value: 0.001497
ols_step_backward_p (model = full, details = FALSE)
##
##
## Elimination Summary
## ---------------------------------------------------------------------------
## Variable Adj.
## Step Removed R-Square R-Square C(p) AIC RMSE
## ---------------------------------------------------------------------------
## 1 afib 0.0793 0.0443 8.0132 2914.5019 4.5814
## 2 trt 0.0788 0.0458 6.2572 2912.7562 4.5777
## 3 age 0.0783 0.0472 4.5385 2911.0491 4.5742
## 4 mrankin 0.0745 0.0494 4.4682 2907.0543 4.5692
## 5 aspects 0.0735 0.0503 2.9915 2905.5966 4.5669
## 6 sbp 0.0724 0.0512 1.5350 2904.1592 4.5648
## 7 sex 0.0709 0.0516 0.3033 2902.9535 4.5637
## 8 extra.ica 0.0689 0.0515 -0.6399 2902.0439 4.5640
## ---------------------------------------------------------------------------
Discuss
Different variables are chosen when using backward selection.
Build regression model from a set of candidate predictor variables by entering and removing predictors based on p values, in a stepwise manner until there is no variable left to enter or remove any more. ols_step_both_p (model = full, details = FALSE)
ols_step_both_p (model = full, details = FALSE)
##
## Stepwise Selection Summary
## ---------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## ---------------------------------------------------------------------------------------
## 1 dm addition 0.020 0.018 14.2240 2911.0257 4.6430
## 2 ia.occlus addition 0.045 0.035 3.6180 2906.5181 4.6032
## 3 iv.altep addition 0.053 0.041 1.7480 2904.6141 4.5897
## 4 location addition 0.059 0.045 0.5550 2903.3695 4.5793
## ---------------------------------------------------------------------------------------
Discuss
For both selection approach, only 4 variables are chosen.
Fits all regressions involving one regressor, two regressors, three regressors, and so on. It tests all possible subsets of the set of potential independent variables.
library(olsrr)
full = lm( nihss ~ dm + mrankin + sbp + ia.occlus + extra.ica,
data = df1)
full.mod = ols_step_all_possible(model = full)
full.mod
## Index N Predictors R-Square Adj. R-Square
## 4 1 1 ia.occlus 2.234128e-02 0.0143112271
## 1 2 1 dm 2.038541e-02 0.0183861934
## 2 3 1 mrankin 5.863365e-03 -0.0002481311
## 3 4 1 sbp 1.572324e-03 -0.0004652839
## 5 5 1 extra.ica 1.813095e-05 -0.0020226484
## 8 6 2 dm ia.occlus 4.497508e-02 0.0351497252
## 11 7 2 mrankin ia.occlus 2.797134e-02 0.0139130773
## 6 8 2 dm mrankin 2.492816e-02 0.0169193516
## 15 9 2 ia.occlus extra.ica 2.428961e-02 0.0142514334
## 13 10 2 sbp ia.occlus 2.343476e-02 0.0133877907
## 7 11 2 dm sbp 2.191510e-02 0.0179147544
## 9 12 2 dm extra.ica 2.039349e-02 0.0163869197
## 10 13 2 mrankin sbp 7.341047e-03 -0.0008122097
## 12 14 2 mrankin extra.ica 5.863624e-03 -0.0023017671
## 14 15 2 sbp extra.ica 1.591304e-03 -0.0024921675
## 17 16 3 dm mrankin ia.occlus 4.911097e-02 0.0333612509
## 21 17 3 dm ia.occlus extra.ica 4.701721e-02 0.0352277308
## 19 18 3 dm sbp ia.occlus 4.595280e-02 0.0341501550
## 24 19 3 mrankin ia.occlus extra.ica 2.955505e-02 0.0134814320
## 22 20 3 mrankin sbp ia.occlus 2.906237e-02 0.0129805867
## 16 21 3 dm mrankin sbp 2.636686e-02 0.0163500626
## 25 22 3 sbp ia.occlus extra.ica 2.534585e-02 0.0132882715
## 18 23 3 dm mrankin extra.ica 2.492816e-02 0.0148965521
## 20 24 3 dm sbp extra.ica 2.192376e-02 0.0159109923
## 23 25 3 mrankin sbp extra.ica 7.341365e-03 -0.0028711721
## 28 26 4 dm mrankin ia.occlus extra.ica 5.086244e-02 0.0331399500
## 26 27 4 dm mrankin sbp ia.occlus 5.008766e-02 0.0323507074
## 29 28 4 dm sbp ia.occlus extra.ica 4.795866e-02 0.0341894626
## 30 29 4 mrankin sbp ia.occlus extra.ica 3.060974e-02 0.0125090910
## 27 30 4 dm mrankin sbp extra.ica 2.636687e-02 0.0143219201
## 31 31 5 dm mrankin sbp ia.occlus extra.ica 5.180254e-02 0.0320894926
## Mallow's Cp
## 4 7.94505763
## 1 8.93722874
## 2 16.30394569
## 3 18.48069771
## 5 19.26910590
## 8 -1.53658119
## 11 7.08904795
## 6 8.63279106
## 15 8.95671314
## 13 9.39035895
## 7 10.16124777
## 9 10.93312817
## 10 17.55434970
## 12 18.30381405
## 14 20.47106950
## 17 -1.63462402
## 21 -0.57250650
## 19 -0.03255545
## 24 8.28566554
## 22 8.53559389
## 16 9.90296509
## 25 10.42090427
## 18 10.63279096
## 20 12.15685756
## 23 19.55418800
## 28 -0.52310673
## 26 -0.13007972
## 29 0.94991788
## 30 9.75064687
## 27 11.90296424
## 31 1.00000000
plot(full.mod)
Discuss
best.mod = ols_step_best_subset(full)
best.mod
## Best Subsets Regression
## -------------------------------------------------
## Model Index Predictors
## -------------------------------------------------
## 1 ia.occlus
## 2 dm ia.occlus
## 3 dm mrankin ia.occlus
## 4 dm mrankin ia.occlus extra.ica
## 5 dm mrankin sbp ia.occlus extra.ica
## -------------------------------------------------
##
## Subsets Regression Summary
## ----------------------------------------------------------------------------------------------------------------------------------------
## Adj. Pred
## Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
## ----------------------------------------------------------------------------------------------------------------------------------------
## 1 0.0223 0.0143 0.0021 7.9451 2916.0424 1513.7749 2941.2332 10585.0734 21.7349 0.0443 0.9856
## 2 0.0450 0.0351 0.0218 -1.5366 2906.5181 1504.3755 2935.9075 10361.2066 21.3185 0.0434 0.9667
## 3 0.0491 0.0334 0.0139 -1.6346 2910.3828 1504.3063 2952.3676 10337.5193 21.4011 0.0436 0.9665
## 4 0.0509 0.0331 0.0108 -0.5231 2911.4758 1505.4578 2957.6590 10339.7098 21.4492 0.0437 0.9686
## 5 0.0518 0.0321 0.008 1.0000 2912.9882 1507.0261 2963.3699 10350.7664 21.5157 0.0438 0.9716
## ----------------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria
## SBIC: Sawa's Bayesian Information Criteria
## SBC: Schwarz Bayesian Criteria
## MSEP: Estimated error of prediction, assuming multivariate normality
## FPE: Final Prediction Error
## HSP: Hocking's Sp
## APC: Amemiya Prediction Criteria
plot(best.mod)
Discuss
Conclusion
Both Bayesian Model Average: BMA package and
Criterion-Based method suggest similar best model with
2 variables dm + ia.occlus.
Therefore, these 2 approaches should be applied to select the
best variables.
Then, apply Frequentist or Bayesian approach for
further analysis:
Linear regression - Frequentist approach: https://rpubs.com/minhtri/968628
Linear regression - Bayesian approach: https://rpubs.com/minhtri/968636