1 + 1[1] 2
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
1 + 1[1] 2
You can add options to executable code like this
[1] 4
The echo: false option disables the printing of code (only output is displayed).
# Load the dataset
data(mtcars)
# View the first few rows of the dataset
head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Description of the mtcars dataset
# Dependent variable: mpg Miles per gallon (fuel efficiency)
# Independent variables: cyl Number of cylinders
# disp Displacement (in cubic inches)
# hp Horsepower
# drat Rear axle ratio
# wt Weight (in 1000 lbs)
# qsec 1/4 mile time (seconds)
# vs Engine type (0 = V-shaped, 1 = straight)
# am Transmission type (0 = automatic, 1 = manual)
# gear Number of forward gears
# carb Number of carburetors
# Correct model
# Run the full multivariate linear regression model
full_model <- lm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear +
carb, data = mtcars)
# View the summary of the model
summary(full_model)
Call:
lm(formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs +
am + gear + carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4506 -1.6044 -0.1196 1.2193 4.6271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.30337 18.71788 0.657 0.5181
cyl -0.11144 1.04502 -0.107 0.9161
disp 0.01334 0.01786 0.747 0.4635
hp -0.02148 0.02177 -0.987 0.3350
drat 0.78711 1.63537 0.481 0.6353
wt -3.71530 1.89441 -1.961 0.0633 .
qsec 0.82104 0.73084 1.123 0.2739
vs 0.31776 2.10451 0.151 0.8814
am 2.52023 2.05665 1.225 0.2340
gear 0.65541 1.49326 0.439 0.6652
carb -0.19942 0.82875 -0.241 0.8122
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
Estimating Equation \[ mpg_i = \beta_0 + \beta_1 cyl_i +\beta_2 disp_i + \beta_3 hp_i + \beta_4 drat_i + \beta_5 wt_i + \beta_6 qsec_i + \beta_7 vs_i + \beta_8 am_i + \beta_9 gear_i + \\beta_{10} carb_i + \epsilon_i \]
# Short / Incorrect model: omit the predictor variable "gear"
# Calculate the correlation between "mpg" and "gear"
correlation_result <- cor(mtcars$mpg, mtcars$gear)
# Print the correlation
print(correlation_result)[1] 0.4802848
# Test the statistical significance of the correlation
cor_test_result <- cor.test(mtcars$mpg, mtcars$gear)
# Print the result of the correlation test
print(cor_test_result)
Pearson's product-moment correlation
data: mtcars$mpg and mtcars$gear
t = 2.9992, df = 30, p-value = 0.005401
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1580618 0.7100628
sample estimates:
cor
0.4802848
# The outcome variable "mpg" and the omitted predictor variable
# "gear" are correlated because the p-value = 0.005401 < alpha = 1%
# Calculate the correlation between "cyl" and "gear"
correlation_result_2 <- cor(mtcars$cyl, mtcars$gear)
# Print the correlation
print(correlation_result_2)[1] -0.4926866
# Test the statistical significance of the correlation
cor_test_result_2 <- cor.test(mtcars$cyl, mtcars$gear)
# Print the result of the correlation test
print(cor_test_result_2)
Pearson's product-moment correlation
data: mtcars$cyl and mtcars$gear
t = -3.1011, df = 30, p-value = 0.004173
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.7180260 -0.1738615
sample estimates:
cor
-0.4926866
# The included independent variable "cyl" and the omitted predictor variable
# "gear" are correlated because the p-value = 0.004173 < alpha = 1%
# The omitted variable will cause bias (i.e. the two conditions for OVB are met)
# because 1) The omitted variable "gear" is a determinant of the dependent
# variable "mpg" and 2) The omitted variable is correlated with at least one of
# the included independent variables.Estimating Equation \[ mpg_i = \beta_0 + \beta_1 cyl_i +\beta_2 disp_i + \beta_3 hp_i + \beta_4 drat_i + \beta_5 wt_i + \beta_6 qsec_i + \beta_7 vs_i + \beta_8 am_i + + \beta_9 carb_i + \epsilon_i \]
# Calculate correlations
correlations <- cor(mtcars)
# Print correlations involving "gear", "mpg", and other predictor variables
cor_gear_mpg <- correlations["gear", "mpg"]
cor_gear_other <- correlations["gear", ]
# Print results
print(paste("Correlation between gear and mpg:", round(cor_gear_mpg, 2)))[1] "Correlation between gear and mpg: 0.48"
print("Correlation of gear with other variables:")[1] "Correlation of gear with other variables:"
print(round(cor_gear_other, 2)) mpg cyl disp hp drat wt qsec vs am gear carb
0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
# Check OVB conditions
if (cor_gear_mpg > 0 && any(cor_gear_other[c("cyl", "disp", "hp", "drat", "wt",
"qsec", "vs", "am", "carb")] > 0))
{print("Omitting gear may cause positive OVB.")}[1] "Omitting gear may cause positive OVB."
if (cor_gear_mpg > 0 && any(cor_gear_other[c("cyl", "disp", "hp", "drat", "wt",
"qsec", "vs", "am", "carb")] < 0))
{print("Omitting gear may cause negative OVB.")}[1] "Omitting gear may cause negative OVB."
if (cor_gear_mpg < 0 && any(cor_gear_other[c("cyl", "disp", "hp", "drat", "wt",
"qsec", "vs", "am", "carb")] < 0))
{print("Omitting gear may cause positive OVB.")}
if (cor_gear_mpg < 0 && any(cor_gear_other[c("cyl", "disp", "hp", "drat", "wt",
"qsec", "vs", "am", "carb")] > 0))
{print("Omitting gear may cause negative OVB.")}# Calculate correlations
correlations <- cor(mtcars)
# Print correlations involving gear and mpg
cor_gear_mpg <- correlations["gear", "mpg"]
cor_gear_other <- correlations["gear", ]
# Print results
print(paste("Correlation between gear and mpg:", round(cor_gear_mpg, 2)))[1] "Correlation between gear and mpg: 0.48"
print("Correlation of gear with other variables:")[1] "Correlation of gear with other variables:"
print(round(cor_gear_other, 2)) mpg cyl disp hp drat wt qsec vs am gear carb
0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
# Identify potential OVB direction
if (cor_gear_mpg > 0 && any(cor_gear_other[c("cyl", "disp", "hp", "drat", "wt",
"qsec", "vs", "am", "carb")] < 0))
{
print("Positive OVB expected.")
} else if (cor_gear_mpg < 0 && any(cor_gear_other[c("cyl", "disp", "hp", "drat",
"wt", "qsec", "vs", "am",
"carb")] > 0))
{
print("Negative OVB expected.")
} else {
print("OVB direction is unclear based on correlations.")
}[1] "Positive OVB expected."
# Direction of the omitted variable bias
# Thus, the OVB of "gear" will be in positive direction.
# Compare the 2 regressions: full, correct model vs. omitted variable "gear"
# Load necessary packages
library(stargazer)
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
# Fit the full model (including "gear"), with all the independent variables
full_model <- lm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am
+ gear + carb, data = mtcars)
# Fit the model without the "gear" variable, one omitted variable
omitted_model <- lm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am
+ carb, data = mtcars)
# Check the 2 models
summary(full_model)
Call:
lm(formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs +
am + gear + carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4506 -1.6044 -0.1196 1.2193 4.6271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.30337 18.71788 0.657 0.5181
cyl -0.11144 1.04502 -0.107 0.9161
disp 0.01334 0.01786 0.747 0.4635
hp -0.02148 0.02177 -0.987 0.3350
drat 0.78711 1.63537 0.481 0.6353
wt -3.71530 1.89441 -1.961 0.0633 .
qsec 0.82104 0.73084 1.123 0.2739
vs 0.31776 2.10451 0.151 0.8814
am 2.52023 2.05665 1.225 0.2340
gear 0.65541 1.49326 0.439 0.6652
carb -0.19942 0.82875 -0.241 0.8122
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
summary(omitted_model)
Call:
lm(formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs +
am + carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5183 -1.4339 -0.3922 1.2020 4.5924
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.64181 16.78606 0.932 0.3615
cyl -0.27315 0.95981 -0.285 0.7786
disp 0.01395 0.01747 0.798 0.4332
hp -0.02063 0.02128 -0.969 0.3429
drat 0.84089 1.60057 0.525 0.6046
wt -3.86609 1.82850 -2.114 0.0461 *
qsec 0.79507 0.71495 1.112 0.2781
vs 0.35800 2.06357 0.173 0.8639
am 2.80345 1.91663 1.463 0.1577
carb -0.04506 0.73653 -0.061 0.9518
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.601 on 22 degrees of freedom
Multiple R-squared: 0.8678, Adjusted R-squared: 0.8137
F-statistic: 16.05 on 9 and 22 DF, p-value: 9.885e-08
dim(mtcars)[1] 32 11
colSums(is.na(mtcars)) mpg cyl disp hp drat wt qsec vs am gear carb
0 0 0 0 0 0 0 0 0 0 0
# Use stargazer to show the regression results side by side
stargazer(full_model, omitted_model,
title = "Regression Results: Full Model vs. Omitted Gear Model",
type = "text",
column.labels = c("Full Model", "Omitted Gear Model"),
dep.var.labels = "Miles per Gallon (mpg)",
model.numbers = FALSE,
single.row = TRUE,
digits = 3)
Regression Results: Full Model vs. Omitted Gear Model
==================================================================
Dependent variable:
----------------------------------------------
Miles per Gallon (mpg)
Full Model Omitted Gear Model
------------------------------------------------------------------
cyl -0.111 (1.045) -0.273 (0.960)
disp 0.013 (0.018) 0.014 (0.017)
hp -0.021 (0.022) -0.021 (0.021)
drat 0.787 (1.635) 0.841 (1.601)
wt -3.715* (1.894) -3.866** (1.828)
qsec 0.821 (0.731) 0.795 (0.715)
vs 0.318 (2.105) 0.358 (2.064)
am 2.520 (2.057) 2.803 (1.917)
gear 0.655 (1.493)
carb -0.199 (0.829) -0.045 (0.737)
Constant 12.303 (18.718) 15.642 (16.786)
------------------------------------------------------------------
Observations 32 32
R2 0.869 0.868
Adjusted R2 0.807 0.814
Residual Std. Error 2.650 (df = 21) 2.601 (df = 22)
F Statistic 13.932*** (df = 10; 21) 16.048*** (df = 9; 22)
==================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
# The direction of the OVB is confirmed as it is predicted in the formula.