Discussion 14

Haiding Luo

2023 12 11

Part 1

library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(ggplot2) 
data <- mtcars
stargazer(data, type = "text")
## 
## ============================================
## Statistic N   Mean   St. Dev.  Min     Max  
## --------------------------------------------
## mpg       32 20.091   6.027   10.400 33.900 
## cyl       32  6.188   1.786     4       8   
## disp      32 230.722 123.939  71.100 472.000
## hp        32 146.688  68.563    52     335  
## drat      32  3.597   0.535   2.760   4.930 
## wt        32  3.217   0.978   1.513   5.424 
## qsec      32 17.849   1.787   14.500 22.900 
## vs        32  0.438   0.504     0       1   
## am        32  0.406   0.499     0       1   
## gear      32  3.688   0.738     3       5   
## carb      32  2.812   1.615     1       8   
## --------------------------------------------
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
data <- mtcars[, c("mpg", "qsec", "wt")]
data
##                      mpg  qsec    wt
## Mazda RX4           21.0 16.46 2.620
## Mazda RX4 Wag       21.0 17.02 2.875
## Datsun 710          22.8 18.61 2.320
## Hornet 4 Drive      21.4 19.44 3.215
## Hornet Sportabout   18.7 17.02 3.440
## Valiant             18.1 20.22 3.460
## Duster 360          14.3 15.84 3.570
## Merc 240D           24.4 20.00 3.190
## Merc 230            22.8 22.90 3.150
## Merc 280            19.2 18.30 3.440
## Merc 280C           17.8 18.90 3.440
## Merc 450SE          16.4 17.40 4.070
## Merc 450SL          17.3 17.60 3.730
## Merc 450SLC         15.2 18.00 3.780
## Cadillac Fleetwood  10.4 17.98 5.250
## Lincoln Continental 10.4 17.82 5.424
## Chrysler Imperial   14.7 17.42 5.345
## Fiat 128            32.4 19.47 2.200
## Honda Civic         30.4 18.52 1.615
## Toyota Corolla      33.9 19.90 1.835
## Toyota Corona       21.5 20.01 2.465
## Dodge Challenger    15.5 16.87 3.520
## AMC Javelin         15.2 17.30 3.435
## Camaro Z28          13.3 15.41 3.840
## Pontiac Firebird    19.2 17.05 3.845
## Fiat X1-9           27.3 18.90 1.935
## Porsche 914-2       26.0 16.70 2.140
## Lotus Europa        30.4 16.90 1.513
## Ford Pantera L      15.8 14.50 3.170
## Ferrari Dino        19.7 15.50 2.770
## Maserati Bora       15.0 14.60 3.570
## Volvo 142E          21.4 18.60 2.780
pairs(data, pch = 18, col = "steelblue")

model <- lm(mpg ~ qsec + wt, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ qsec + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3962 -2.1431 -0.2129  1.4915  5.7486 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.7462     5.2521   3.760 0.000765 ***
## qsec          0.9292     0.2650   3.506 0.001500 ** 
## wt           -5.0480     0.4840 -10.430 2.52e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.596 on 29 degrees of freedom
## Multiple R-squared:  0.8264, Adjusted R-squared:  0.8144 
## F-statistic: 69.03 on 2 and 29 DF,  p-value: 9.395e-12

The coefficient for qsec is 0.9292, indicating that if qsec increases by 1 second, the predicted mpg increases by approximately 0.9292 miles. This suggests that vehicles with poorer acceleration performance (requiring more time to accelerate to a quarter mile) seem to have better fuel efficiency.

The coefficient for wt at -5.0480 indicates that for each unit increase in car weight, the predicted mpg decreases by approximately 5.048. This suggests that heavier vehicles tend to have lower fuel efficiency.

The p-values for qsec and wt are very small (0.001500 and 2.52e-11, respectively), far below the usual significance level of 0.05. This means that these two independent variables have a statistically significant impact on mpg.

The regression model shows that mpg is positively correlated with qsec and negatively correlated with wt. This means that cars with poorer acceleration performance generally have higher fuel efficiency, while heavier vehicles tend to have lower fuel efficiency.

plot(model)

Residuals vs Fitted

the red loess curve in the graph suggests the possible presence of non-linearity, as the curve is not perfectly horizontal.

As the fitted values increase, there appears to be a slight increase in the spread of the residuals, which may indicate that the variance is not constant.

Q-Q Residuals

The majority of the points lie close to the diagonal line, which suggests that the residuals are approximately normally distributed in that range.

The points labeled “Chrysler Imperial,” “Fiat 128,” and “Toyota Corolla” are notably far from the line, These could be considered outliers.

Scale-Location

This plot indicates that as the fitted values increase, there is a slight increase in the spread of the standardized residuals, further suggesting the presence of heteroscedasticity.

Residuals vs Leverage

The Chrysler Imperial is significantly away from the center, showing high leverage and larger residuals, suggesting that it might have a greater influence on the model.

Under classical assumptions, the OLS estimator is the best linear unbiased estimator (BLUE) of the parameters, and this conclusion is the famous Gauss-Markov theorem.

Part 2

In some cases, there may be outliers in the dependent variable, which can have a negative impact on the regression model. By taking the logarithm of the independent variables, we can reduce the impact of outliers, as the logarithmic transformation converts extreme values into values that are closer to the center.

Additionally, the linear regression model assumes that the relationship between variables is linear. However, in reality, many relationships between variables are nonlinear, and taking the logarithm of these variables can transform nonlinear relationships into linear ones.