Haiding Luo
2023 12 11
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(ggplot2)
data <- mtcars
stargazer(data, type = "text")
##
## ============================================
## Statistic N Mean St. Dev. Min Max
## --------------------------------------------
## mpg 32 20.091 6.027 10.400 33.900
## cyl 32 6.188 1.786 4 8
## disp 32 230.722 123.939 71.100 472.000
## hp 32 146.688 68.563 52 335
## drat 32 3.597 0.535 2.760 4.930
## wt 32 3.217 0.978 1.513 5.424
## qsec 32 17.849 1.787 14.500 22.900
## vs 32 0.438 0.504 0 1
## am 32 0.406 0.499 0 1
## gear 32 3.688 0.738 3 5
## carb 32 2.812 1.615 1 8
## --------------------------------------------
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
data <- mtcars[, c("mpg", "qsec", "wt")]
data
## mpg qsec wt
## Mazda RX4 21.0 16.46 2.620
## Mazda RX4 Wag 21.0 17.02 2.875
## Datsun 710 22.8 18.61 2.320
## Hornet 4 Drive 21.4 19.44 3.215
## Hornet Sportabout 18.7 17.02 3.440
## Valiant 18.1 20.22 3.460
## Duster 360 14.3 15.84 3.570
## Merc 240D 24.4 20.00 3.190
## Merc 230 22.8 22.90 3.150
## Merc 280 19.2 18.30 3.440
## Merc 280C 17.8 18.90 3.440
## Merc 450SE 16.4 17.40 4.070
## Merc 450SL 17.3 17.60 3.730
## Merc 450SLC 15.2 18.00 3.780
## Cadillac Fleetwood 10.4 17.98 5.250
## Lincoln Continental 10.4 17.82 5.424
## Chrysler Imperial 14.7 17.42 5.345
## Fiat 128 32.4 19.47 2.200
## Honda Civic 30.4 18.52 1.615
## Toyota Corolla 33.9 19.90 1.835
## Toyota Corona 21.5 20.01 2.465
## Dodge Challenger 15.5 16.87 3.520
## AMC Javelin 15.2 17.30 3.435
## Camaro Z28 13.3 15.41 3.840
## Pontiac Firebird 19.2 17.05 3.845
## Fiat X1-9 27.3 18.90 1.935
## Porsche 914-2 26.0 16.70 2.140
## Lotus Europa 30.4 16.90 1.513
## Ford Pantera L 15.8 14.50 3.170
## Ferrari Dino 19.7 15.50 2.770
## Maserati Bora 15.0 14.60 3.570
## Volvo 142E 21.4 18.60 2.780
pairs(data, pch = 18, col = "steelblue")
model <- lm(mpg ~ qsec + wt, data = mtcars)
summary(model)
##
## Call:
## lm(formula = mpg ~ qsec + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3962 -2.1431 -0.2129 1.4915 5.7486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.7462 5.2521 3.760 0.000765 ***
## qsec 0.9292 0.2650 3.506 0.001500 **
## wt -5.0480 0.4840 -10.430 2.52e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.596 on 29 degrees of freedom
## Multiple R-squared: 0.8264, Adjusted R-squared: 0.8144
## F-statistic: 69.03 on 2 and 29 DF, p-value: 9.395e-12
The coefficient for qsec is 0.9292, indicating that if qsec increases by 1 second, the predicted mpg increases by approximately 0.9292 miles. This suggests that vehicles with poorer acceleration performance (requiring more time to accelerate to a quarter mile) seem to have better fuel efficiency.
The coefficient for wt at -5.0480 indicates that for each unit increase in car weight, the predicted mpg decreases by approximately 5.048. This suggests that heavier vehicles tend to have lower fuel efficiency.
The p-values for qsec and wt are very small (0.001500 and 2.52e-11, respectively), far below the usual significance level of 0.05. This means that these two independent variables have a statistically significant impact on mpg.
The regression model shows that mpg is positively correlated with qsec and negatively correlated with wt. This means that cars with poorer acceleration performance generally have higher fuel efficiency, while heavier vehicles tend to have lower fuel efficiency.
plot(model)
Residuals vs Fitted
the red loess curve in the graph suggests the possible presence of non-linearity, as the curve is not perfectly horizontal.
As the fitted values increase, there appears to be a slight increase in the spread of the residuals, which may indicate that the variance is not constant.
The majority of the points lie close to the diagonal line, which suggests that the residuals are approximately normally distributed in that range.
The points labeled “Chrysler Imperial,” “Fiat 128,” and “Toyota Corolla” are notably far from the line, These could be considered outliers.
This plot indicates that as the fitted values increase, there is a slight increase in the spread of the standardized residuals, further suggesting the presence of heteroscedasticity.
The Chrysler Imperial is significantly away from the center, showing high leverage and larger residuals, suggesting that it might have a greater influence on the model.
Under classical assumptions, the OLS estimator is the best linear unbiased estimator (BLUE) of the parameters, and this conclusion is the famous Gauss-Markov theorem.
In some cases, there may be outliers in the dependent variable, which can have a negative impact on the regression model. By taking the logarithm of the independent variables, we can reduce the impact of outliers, as the logarithmic transformation converts extreme values into values that are closer to the center.
Additionally, the linear regression model assumes that the relationship between variables is linear. However, in reality, many relationships between variables are nonlinear, and taking the logarithm of these variables can transform nonlinear relationships into linear ones.