# packages
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
# input data
data <- mtcars
str(data)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
\[ \text{mpg}_i = \beta_0 + \beta_1 \times \text{hp}_i + \beta_2 \times \text{wt}_i + \epsilon_i \]
where:
summary(mtcars[, c("mpg", "hp", "wt")])
## mpg hp wt
## Min. :10.40 Min. : 52.0 Min. :1.513
## 1st Qu.:15.43 1st Qu.: 96.5 1st Qu.:2.581
## Median :19.20 Median :123.0 Median :3.325
## Mean :20.09 Mean :146.7 Mean :3.217
## 3rd Qu.:22.80 3rd Qu.:180.0 3rd Qu.:3.610
## Max. :33.90 Max. :335.0 Max. :5.424
# Fit the model
model <- lm(mpg ~ hp + wt, data = mtcars)
stargazer(model, type = "text")
##
## ===============================================
## Dependent variable:
## ---------------------------
## mpg
## -----------------------------------------------
## hp -0.032***
## (0.009)
##
## wt -3.878***
## (0.633)
##
## Constant 37.227***
## (1.599)
##
## -----------------------------------------------
## Observations 32
## R2 0.827
## Adjusted R2 0.815
## Residual Std. Error 2.593 (df = 29)
## F Statistic 69.211*** (df = 2; 29)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
hp: The coefficient for hp is -0.032, indicating that for each additional unit of horsepower, the miles per gallon (mpg) decreases by 0.032, holding the weight constant. This negative relationship suggests that more powerful engines (higher horsepower) tend to consume more fuel.
wt: The coefficient for wt is -3.878, meaning that for each additional unit of weight, mpg decreases by 3.878, holding horsepower constant. This is also a negative relationship, showing that heavier cars are less fuel-efficient.
Both coefficients have negative signs, which is consistent with expectations: more horsepower generally leads to higher fuel consumption, and heavier cars are typically less efficient. The magnitude of the weight’s impact (-3.878) is considerably larger than that of horsepower, suggesting that weight is a more significant factor in determining fuel efficiency than horsepower in this model.
Both hp and wt have three asterisks (***), indicating that they are statistically significant at the 0.01 level. This high level of significance suggests strong evidence against the null hypothesis that these coefficients are zero, affirming the impact of both horsepower and weight on fuel efficiency.
plot(model$fitted.values, resid(model), xlab = "Fitted Values", ylab = "Residuals",
main = "Residuals vs Fitted")
abline(h = 0, col = "red")
The residuals appear to be randomly distributed, indicating that the model’s assumptions of linearity and equal variance are reasonably satisfied.
qqnorm(resid(model))
qqline(resid(model), col = "red")
The points in the Q-Q plot largely follow the 45-degree line, indicating that the residuals are approximately normally distributed. There are some slight deviations at the tails, but these are not extreme.
plot(model$fitted.values, sqrt(abs(resid(model))), xlab = "Fitted Values", ylab = "Sqrt|Residuals|",
main = "Scale-Location")
abline(h = 0, col = "red")
The spread of residuals is fairly uniform across the range of fitted values, without any obvious pattern like a fan shape. This suggests that the variance of the residuals is constant across different values of the predictors, supporting the assumption of homoscedasticity.
plot(model, which = 5)
Most data points are within the Cook’s distance lines, indicating that they are not unduly influential. However, there are a few points outside of these bounds, notably the Maserati Bora, which appears to have high leverage and a significant Cook’s distance.
Linearity of the relationship: Assumption: Each predictor variable should have a linear relationship with the outcome variable.
Independence of residuals: Residuals should not be correlated with each other; independence of observations.
Homoscedasticity: The variability of residuals is nearly constant across different levels of fitted values.
Normality of residuals: For small datasets, residuals should be approximately normally distributed.
For OLS to be considered BLUE, it must satisfy the Gauss-Markov assumptions. BLUE stands for Best Linear Unbiased Estimator, and each term implies:
Non-linear Relationships: Many relationships between variables are not linear but can often be approximated by linear models after transformations. Taking the logarithm of one or more variables can help linearize relationships, making it possible to fit a linear model more effectively. For example, exponential growth processes (common in economic and biological contexts) can be modeled linearly if a logarithmic transformation is applied.
Variance Stabilization: Heteroscedasticity (non-constant variance of the error terms) is a common violation of OLS assumptions that can affect the efficiency and accuracy of the estimates. Log transformation often stabilizes variance, especially when data include large range values or when larger values tend to have larger variances. This stabilization enhances the model’s validity by aligning with the assumption of homoscedasticity.
Improved Interpretation: Transformations can also simplify the interpretation of the regression coefficients. In a model where a log transformation has been applied to the dependent variable, a percentage change in one of the predictors leads to a proportional percentage change in the outcome variable. This is particularly useful in financial and economic modeling where elasticity (the percentage change response in the dependent variable to a one percent change in an independent variable) is of interest.