Stat412 Recitation 4

Previously on STAT 412:

Detecting multicollinearity
Remedies for multicollinearity

Confounding

In any experiment, there are two main variables:

The independent variable: the variable that an experimenter changes or controls so that they can observe the effects on the dependent variable.

The dependent variable: the variable being measured in an experiment that is “dependent” on the independent variable.

Researchers are often interested in understanding how changes in the independent variable affect the dependent variable. However, sometimes there is a third variable that is not accounted for that can affect the relationship between the two variables under study.

This type of variable is known as a confounding variable and it can confound the results of a study and make it appear that there exists some type of cause-and-effect relationship between two variables that doesn’t actually exist.

For example, suppose a researcher collects data on ice cream sales and shark attacks and finds that the two variables are highly correlated. Does this mean that increased ice cream sales cause more shark attacks? That’s unlikely. The more likely cause is the confounding variable temperature. When it is warmer outside, more people buy ice cream and more people go in the ocean.

Additional Confounding examples:

Smoking and Lung Cancer: In a study investigating the link between smoking and lung cancer, age can be a confounding variable.
Suppose a study is examining the relationship between education level and income, occupation and the years of experience could be a confounding variable because certain jobs pay more.

Requirements for Confounding:

It must be correlated with the independent variable.
It must have a causal relationship with the dependent variable.

Possible Problems of Confounding:

Confounding variables can make it seem that cause-and-effect relationships exist when they don’t.
Confounding variables can mask the true cause-and-effect relationship between variables.

Remedies for Confounding:

Strategies to reduce confounding are:

randomization (aim is random distribution of confounders between study groups)
restriction (restrict entry to study of individuals with confounding factors - risks bias in itself)
matching (of individuals or groups, aim for equal distribution of confounders)
stratification (confounders are distributed evenly within each stratum)
adjustment (usually distorted by choice of standard)
multivariate analysis (only works if you can identify and measure the confounders)

Let’s continue with the application.

Apply pre-processing steps:

library(readr)
books = read_delim("amazon-books.txt", delim = "\t", 
    escape_double = FALSE, trim_ws = TRUE)

## Rows: 325 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (5): Title, Author, Hard/ Paper, Publisher, ISBN-10
## dbl (8): List Price, Amazon Price, NumPages, Pub year, Height, Width, Thick,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

books=books[,c(3,5,6,10,11,12,13)]  #ignore character variables such as title, author name and book ISBN number
head(books)

List Price	Hard/ Paper	NumPages	Height	Width	Thick	Weight (oz)
12.95	P	304	7.8	5.5	0.8	11.2
15.00	P	273	8.4	5.5	0.7	7.2
1.50	P	96	8.3	5.2	0.3	4.0
15.99	P	672	8.8	6.0	1.6	28.8
30.50	P	720	8.0	5.2	1.4	22.4
28.95	H	460	8.9	6.3	1.7	32.0

dim(books)  #325 observations and 7 covariates

## [1] 325   7

class(books)  #Convert to data frame

## [1] "tbl_df"     "tbl"        "data.frame"

books=as.data.frame(books)
str(books)

## 'data.frame':    325 obs. of  7 variables:
##  $ List Price : num  12.9 15 1.5 16 30.5 ...
##  $ Hard/ Paper: chr  "P" "P" "P" "P" ...
##  $ NumPages   : num  304 273 96 672 720 460 336 405 NA 304 ...
##  $ Height     : num  7.8 8.4 8.3 8.8 8 8.9 7.8 8.2 9.6 9.6 ...
##  $ Width      : num  5.5 5.5 5.2 6 5.2 6.3 5.3 5.3 6.5 6.4 ...
##  $ Thick      : num  0.8 0.7 0.3 1.6 1.4 1.7 1.2 0.8 2.1 1.1 ...
##  $ Weight (oz): num  11.2 7.2 4 28.8 22.4 32 15.5 11.2 NA 19.2 ...

colnames(books)[c(1,2,3,7)]=c("Price","Cover","Pages","Weight")
books[,2]=as.factor(books[,2])
str(books)

## 'data.frame':    325 obs. of  7 variables:
##  $ Price : num  12.9 15 1.5 16 30.5 ...
##  $ Cover : Factor w/ 2 levels "H","P": 2 2 2 2 2 1 1 2 1 1 ...
##  $ Pages : num  304 273 96 672 720 460 336 405 NA 304 ...
##  $ Height: num  7.8 8.4 8.3 8.8 8 8.9 7.8 8.2 9.6 9.6 ...
##  $ Width : num  5.5 5.5 5.2 6 5.2 6.3 5.3 5.3 6.5 6.4 ...
##  $ Thick : num  0.8 0.7 0.3 1.6 1.4 1.7 1.2 0.8 2.1 1.1 ...
##  $ Weight: num  11.2 7.2 4 28.8 22.4 32 15.5 11.2 NA 19.2 ...

sum(is.na(books)) #there are 22 missing values

## [1] 22

books=na.omit(books)  #omit them
sum(is.na(books))

## [1] 0

Implementation of Simple Linear Regression

Now, I would like to build a simple linear regression model with certain covariates, where the dependent variable is the price of a book.

Price vs Height:

fit1=lm(Price ~ Height, data=books)
summary(fit1)

## 
## Call:
## lm(formula = Price ~ Height, data = books)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.009  -3.918  -1.582   1.424 104.567 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -39.6362     6.2959  -6.296 1.05e-09 ***
## Height        7.0773     0.7698   9.193  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.76 on 310 degrees of freedom
## Multiple R-squared:  0.2142, Adjusted R-squared:  0.2117 
## F-statistic: 84.52 on 1 and 310 DF,  p-value: < 2.2e-16

Model significance: OK.

Covariate significance: OK.

Adj R-squared:0.21

Price vs Weight:

fit2=lm(Price ~ Weight, data=books)
summary(fit2)

## 
## Call:
## lm(formula = Price ~ Weight, data = books)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.604  -4.765  -1.310   1.189 114.129 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.0661     1.4730   5.476 9.01e-08 ***
## Weight        0.7926     0.1047   7.571 4.29e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.19 on 310 degrees of freedom
## Multiple R-squared:  0.156,  Adjusted R-squared:  0.1533 
## F-statistic: 57.32 on 1 and 310 DF,  p-value: 4.287e-13

Model significance: OK.

Covariate significance: OK.

Adj R-squared:0.15

Price vs Pages:

fit3=lm(Price ~ Pages, data=books)
summary(fit3)

## 
## Call:
## lm(formula = Price ~ Pages, data = books)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.788  -4.894  -2.621   0.179 124.341 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.44876    1.71755   7.830 7.79e-14 ***
## Pages        0.01350    0.00468   2.885  0.00419 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.09 on 310 degrees of freedom
## Multiple R-squared:  0.02615,    Adjusted R-squared:  0.02301 
## F-statistic: 8.323 on 1 and 310 DF,  p-value: 0.004188

Model significance: OK.

Covariate significance: OK.

Adj R-squared:0.023

Price vs Width:

fit4=lm(Price ~ Width, data=books)
summary(fit4)

## 
## Call:
## lm(formula = Price ~ Width, data = books)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.616  -2.894  -0.364   1.525 102.471 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -23.1055     4.4245  -5.222 3.25e-07 ***
## Width         7.3883     0.7878   9.378  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.71 on 310 degrees of freedom
## Multiple R-squared:  0.221,  Adjusted R-squared:  0.2185 
## F-statistic: 87.95 on 1 and 310 DF,  p-value: < 2.2e-16

Model significance: OK.

Covariate significance: OK.

Adj R-squared:0.22

fit4.4=lm(Price ~ Width+Height, data=books)
summary(fit4.4)

## 
## Call:
## lm(formula = Price ~ Width + Height, data = books)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.836  -3.059  -0.579   1.905  97.170 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -48.0528     6.1573  -7.804 9.33e-14 ***
## Width         5.0312     0.8631   5.829 1.40e-08 ***
## Height        4.6771     0.8398   5.569 5.56e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.18 on 309 degrees of freedom
## Multiple R-squared:  0.2921, Adjusted R-squared:  0.2875 
## F-statistic: 63.74 on 2 and 309 DF,  p-value: < 2.2e-16

Model significance: OK.

Covariate significance: OK.

Adj R-squared:0.29

Let’s compare fit4 and fit 4.4. The difference between these two models is additional .variable height. Coefficient of the variable Width is 7.39 in the first model while its value is 5.03 in the second model. If the change in percentage exceeds 10%, then we can say that the additional variable “Height” is a confounding variable. In regression analysis, you can include the confounding variables as covariates in the model.This helps to isolate the effect of the independent variables while holding the confounders constant.

Let’s calculate the percentage change:

Percentage_Change = (fit4$coefficients[2] - fit4.4$coefficients[2])/fit4$coefficients[2]*100
Percentage_Change

##    Width 
## 31.90374

Since the percentage change is 31.90%, which is greater than 10%, this indicates that the association between book price and width is confounded by height. Also, adding those variables to the model the R square increase from 0.22 to 0.29, which means that these new variable is explaining approximately 7% of the variance in the book price.

This change may occur because of the collinearity problem, as we saw in the last recitation. Now, Let us decide the reason of this change by calculating the variance inflation factor.

library(car)

## Zorunlu paket yükleniyor: carData

vif(fit4.4)

##    Width   Height 
## 1.316541 1.316541

There is no collinearity problem.

Price vs Thick:

fit5=lm(Price ~ Thick, data=books)
summary(fit5)

## 
## Call:
## lm(formula = Price ~ Thick, data = books)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.398  -4.184  -2.562   0.213 123.566 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   14.420      2.023   7.129 7.14e-12 ***
## Thick          3.928      2.110   1.861   0.0637 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.19 on 310 degrees of freedom
## Multiple R-squared:  0.01105,    Adjusted R-squared:  0.007858 
## F-statistic: 3.463 on 1 and 310 DF,  p-value: 0.06369

Model significance: Not OK.

Covariate significance: Not OK at the 5% significance level.

Adj R-squared:0.008

fit5.5=lm(Price ~ Thick+Weight, data=books)
summary(fit5.5)

## 
## Call:
## lm(formula = Price ~ Thick + Weight, data = books)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.195  -4.468  -1.118   1.588 105.500 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  13.0477     1.8220   7.161 5.88e-12 ***
## Thick       -11.4235     2.5865  -4.417 1.39e-05 ***
## Weight        1.2104     0.1389   8.714  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.84 on 309 degrees of freedom
## Multiple R-squared:  0.2062, Adjusted R-squared:  0.201 
## F-statistic: 40.12 on 2 and 309 DF,  p-value: 3.232e-16

Model significance: OK.

Covariate significance: OK.

Adj R-squared:0.20

Let’s examine the confounding variable effect:

Percentage_Change = (fit5$coefficients[2] - fit5.5$coefficients[2])/fit5$coefficients[2]*100
Percentage_Change

##    Thick 
## 390.8552

It is highly greater than 10% change. Therefore, we can say that weight is a confounding variable on the relation between price and thick.

Let’s check whether collinearity exists or not.

vif(fit5.5)

##    Thick   Weight 
## 1.865117 1.865117

No collinearity problem.

Interaction

In regression, when the influence of an independent variable on a dependent variable keeps varying based on the values of other independent variables, we say that there is an interaction effect.

For instance, determining the probability of dropout of a school student can depend on the number of years of education completed so far. At the same time, the effect of the number of years of education on the probability of dropout can change significantly based on the gender of the student. In such cases, we say that there is a interaction effect of these two variables (the number of years of education completed and gender) on the chances of dropout.

Data Dictionary:

A dataset with 52 observations on the following 6 variables.
Beds: Number of beds in the nursing home
AllPatientDays: Annual total patient days (in hundreds)
PatientRevenue: Annual patient care revenue (in hundreds of dollars)
NurseSalaries: Annual nursing salaries (in hundreds of dollars)
FacilitiesExpend: Annual facilities expenditure (in hundreds of dollars)
Rural: 1=rural or 0=non-rural

library(readr)
Nursing = read_delim("Nursing2.csv", 
    delim = ";", escape_double = FALSE, trim_ws = TRUE)

## Rows: 52 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## dbl (6): Beds, AllPatientDays, PatientRevenue, NurseSalaries, FacilitiesExpe...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Preprocessing steps:

str(Nursing)

## spc_tbl_ [52 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Beds            : num [1:52] 244 59 120 120 120 65 120 90 96 120 ...
##  $ AllPatientDays  : num [1:52] 385 203 392 419 363 234 372 305 169 188 ...
##  $ PatientRevenue  : num [1:52] 23521 9160 21900 22354 17421 ...
##  $ NurseSalaries   : num [1:52] 5230 2459 6304 6590 5362 ...
##  $ FacilitiesExpend: num [1:52] 5334 493 6115 6346 6225 ...
##  $ Rural           : num [1:52] 0 1 0 0 0 1 1 1 0 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Beds = col_double(),
##   ..   AllPatientDays = col_double(),
##   ..   PatientRevenue = col_double(),
##   ..   NurseSalaries = col_double(),
##   ..   FacilitiesExpend = col_double(),
##   ..   Rural = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Nursing=as.data.frame(Nursing)
Nursing$Rural=as.factor(Nursing$Rural)

Build the multiple linear regression:

m=lm(NurseSalaries~.,data=Nursing)
summary(m)

## 
## Call:
## lm(formula = NurseSalaries ~ ., data = Nursing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4029.5  -615.4   -77.7   853.7  2253.3 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.749e+03  6.052e+02   4.542 4.01e-05 ***
## Beds             -7.046e+00  8.580e+00  -0.821  0.41576    
## AllPatientDays    1.068e+00  3.586e+00   0.298  0.76710    
## PatientRevenue    1.079e-01  6.618e-02   1.630  0.10984    
## FacilitiesExpend  2.318e-01  1.048e-01   2.212  0.03199 *  
## Rural1           -1.180e+03  3.810e+02  -3.098  0.00332 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1234 on 46 degrees of freedom
## Multiple R-squared:  0.5012, Adjusted R-squared:  0.447 
## F-statistic: 9.244 on 5 and 46 DF,  p-value: 3.824e-06

Model significance: OK.

Covariate significance: Not OK for all covariates.

Adj R-squared:0.45

Let’s check the multicollinearity.

vif(m)

##             Beds   AllPatientDays   PatientRevenue FacilitiesExpend 
##         4.114498         6.290267         7.131940         1.397288 
##            Rural 
##         1.121779

Seems reasonable. The model does not suffer from multicollinearity problem since all values are smaller than 10.

What should be the next step?

Path 1:

m2=lm(NurseSalaries~.-Beds,data=Nursing)
summary(m2)

## 
## Call:
## lm(formula = NurseSalaries ~ . - Beds, data = Nursing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4144.1  -702.7   -67.1   830.2  2304.1 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.610e+03  5.791e+02   4.507 4.35e-05 ***
## AllPatientDays    1.746e-01  3.405e+00   0.051  0.95933    
## PatientRevenue    9.095e-02  6.266e-02   1.452  0.15327    
## FacilitiesExpend  2.088e-01  1.006e-01   2.075  0.04352 *  
## Rural1           -1.122e+03  3.729e+02  -3.007  0.00422 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1230 on 47 degrees of freedom
## Multiple R-squared:  0.4939, Adjusted R-squared:  0.4508 
## F-statistic: 11.47 on 4 and 47 DF,  p-value: 1.414e-06

we can exclude the variable beds. However, there are still some variables which are statistically insignificant. Let’s continue to drop them.

m2.2=lm(NurseSalaries~.-Beds-AllPatientDays,data=Nursing)
summary(m2.2)

## 
## Call:
## lm(formula = NurseSalaries ~ . - Beds - AllPatientDays, data = Nursing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4120.3  -708.2   -71.7   833.6  2306.6 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.622e+03  5.266e+02   4.978 8.68e-06 ***
## PatientRevenue    9.382e-02  2.799e-02   3.352  0.00157 ** 
## FacilitiesExpend  2.076e-01  9.704e-02   2.140  0.03749 *  
## Rural1           -1.122e+03  3.690e+02  -3.041  0.00382 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1217 on 48 degrees of freedom
## Multiple R-squared:  0.4939, Adjusted R-squared:  0.4622 
## F-statistic: 15.61 on 3 and 48 DF,  p-value: 3.216e-07

Model significance: OK.

Covariate significance: OK.

Adj R-squared:0.46

Residual standard error: 1217

Exclude beds and all patient days from the model. Now, all variables are statistically significant.

What if we include interactions to the model instead of excluding insignificant variables?

Path 2:

m3=lm(NurseSalaries~.+Rural*AllPatientDays,data=Nursing)
summary(m3)

## 
## Call:
## lm(formula = NurseSalaries ~ . + Rural * AllPatientDays, data = Nursing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2758.90  -668.30   -30.97   761.48  2428.20 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -589.55183  937.17138  -0.629  0.53248    
## Beds                    -6.63375    7.31950  -0.906  0.36960    
## AllPatientDays          14.38588    4.36943   3.292  0.00194 ** 
## PatientRevenue           0.05610    0.05774   0.972  0.33643    
## FacilitiesExpend         0.19223    0.08988   2.139  0.03791 *  
## Rural1                2959.22115 1022.75883   2.893  0.00586 ** 
## AllPatientDays:Rural1  -13.56981    3.17896  -4.269  0.00010 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1053 on 45 degrees of freedom
## Multiple R-squared:  0.645,  Adjusted R-squared:  0.5976 
## F-statistic: 13.62 on 6 and 45 DF,  p-value: 9.548e-09

Model significance: OK.

Covariate significance: not OK.

Adj R-squared:0.60

Add one more interaction:

m3.3=lm(NurseSalaries~.+Rural*AllPatientDays+Beds*AllPatientDays,data=Nursing)
summary(m3.3)

## 
## Call:
## lm(formula = NurseSalaries ~ . + Rural * AllPatientDays + Beds * 
##     AllPatientDays, data = Nursing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2308.2  -583.3  -106.2   565.1  1778.3 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -3.073e+03  9.398e+02  -3.269  0.00210 ** 
## Beds                   2.210e+01  8.631e+00   2.561  0.01395 *  
## AllPatientDays         2.316e+01  4.073e+00   5.687 9.70e-07 ***
## PatientRevenue         4.524e-02  4.780e-02   0.946  0.34912    
## FacilitiesExpend       1.054e-01  7.661e-02   1.376  0.17582    
## Rural1                 2.734e+03  8.471e+02   3.227  0.00236 ** 
## AllPatientDays:Rural1 -1.178e+01  2.657e+00  -4.432 6.12e-05 ***
## Beds:AllPatientDays   -8.010e-02  1.715e-02  -4.670 2.84e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 870.5 on 44 degrees of freedom
## Multiple R-squared:  0.7626, Adjusted R-squared:  0.7248 
## F-statistic: 20.19 on 7 and 44 DF,  p-value: 7.9e-12

Model significance: OK.

Covariate significance: not OK for all.

Adj R-squared:0.73

Now, exclude the insignificant variables.

m3.3.3=lm(NurseSalaries~.+Rural*AllPatientDays+Beds*AllPatientDays-PatientRevenue-FacilitiesExpend,data=Nursing)
summary(m3.3.3)

## 
## Call:
## lm(formula = NurseSalaries ~ . + Rural * AllPatientDays + Beds * 
##     AllPatientDays - PatientRevenue - FacilitiesExpend, data = Nursing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2317.4  -475.2  -112.8   415.6  1944.1 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -3.572e+03  9.095e+02  -3.927 0.000286 ***
## Beds                   3.004e+01  7.653e+00   3.925 0.000288 ***
## AllPatientDays         2.633e+01  3.301e+00   7.977 3.20e-10 ***
## Rural1                 3.073e+03  8.377e+02   3.669 0.000631 ***
## AllPatientDays:Rural1 -1.276e+01  2.629e+00  -4.854 1.43e-05 ***
## Beds:AllPatientDays   -8.813e-02  1.682e-02  -5.241 3.90e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 884.9 on 46 degrees of freedom
## Multiple R-squared:  0.7435, Adjusted R-squared:  0.7157 
## F-statistic: 26.67 on 5 and 46 DF,  p-value: 1.504e-12

Model significance: OK.

Covariate significance: OK.

Adj R-squared:0.46

Error: ~885

Hence, interaction terms increase the performance of the model.

Stat412 Recitation 4

Koca Guler, B.

2024-03-21

Confounding

Interaction

Path 1:

Path 2: