This blog describes a few model comparison packages using an example.
blog1 <- read.csv("https://raw.githubusercontent.com/irene908/DATA621/main/Blog1.csv")
Creating a copy of the data to store the log of PrizeMoney
Logblog1 <- blog1
Logblog1$logPrizeMoney <- log(blog1$PrizeMoney)
Logblog1$PrizeMoney <- NULL
The stargazer package helps to compare the model outputs and display the summary statistics.
https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf.
Skew and kurtosis will not be displayed by this.
The result is visible only after knitting the markdown.
stargazer(Logblog1, type = "html", nobs = TRUE, mean.sd = TRUE, median = TRUE, iqr = TRUE)
| Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Median | Pctl(75) | Max |
| DrivingAccuracy | 196 | 63.380 | 5.413 | 49.750 | 59.758 | 63.240 | 66.965 | 78.430 |
| Scrambling | 196 | 57.494 | 3.162 | 49.020 | 55.260 | 57.650 | 59.457 | 66.450 |
| PuttsPerRound | 196 | 29.201 | 0.442 | 27.960 | 28.910 | 29.190 | 29.477 | 30.190 |
| logPrizeMoney | 196 | 10.378 | 0.980 | 7.714 | 9.762 | 10.509 | 10.967 | 13.404 |
describe() displays the skew and kurtosis as well.
s <- describe(blog1[,c(1:4)])[,c(2:4,8,9,11,12)]
s
## n mean sd min max skew kurtosis
## PrizeMoney 196 50891.17 63902.95 2240.00 662771.00 5.29 42.57
## DrivingAccuracy 196 63.38 5.41 49.75 78.43 0.09 0.03
## Scrambling 196 57.49 3.16 49.02 66.45 0.00 0.09
## PuttsPerRound 196 29.20 0.44 27.96 30.19 0.13 -0.10
Below I have 3 models. 2 of these models use the log transformation dataset.
lm1 <- lm(PrizeMoney ~., blog1)
lm2 <- lm(logPrizeMoney ~., Logblog1)
lm3 <- lm(logPrizeMoney ~ DrivingAccuracy + Scrambling + PuttsPerRound, Logblog1)
Stargazer result
stargazer(lm1, lm2, lm3, type="html")
| Dependent variable: | |||
| PrizeMoney | logPrizeMoney | ||
| (1) | (2) | (3) | |
| DrivingAccuracy | -1,353.794 | 0.010 | 0.010 |
| (918.371) | (0.014) | (0.014) | |
| Scrambling | 6,992.504*** | 0.100*** | 0.100*** |
| (1,725.205) | (0.026) | (0.026) | |
| PuttsPerRound | 5,530.681 | -0.117 | -0.117 |
| (11,361.840) | (0.170) | (0.170) | |
| Constant | -426,837.100 | 7.381 | 7.381 |
| (372,993.200) | (5.571) | (5.571) | |
| Observations | 196 | 196 | 196 |
| R2 | 0.091 | 0.139 | 0.139 |
| Adjusted R2 | 0.077 | 0.125 | 0.125 |
| Residual Std. Error (df = 192) | 61,386.980 | 0.917 | 0.917 |
| F Statistic (df = 3; 192) | 6.437*** | 10.290*** | 10.290*** |
| Note: | p<0.1; p<0.05; p<0.01 | ||
memisc is another package that displays the summary statistics and helps to compare multiple models side by side.
https://cran.r-project.org/web/packages/memisc/memisc.pdf
Notice below I have used the summary.stats argument to mention the summary stats I need to display.
lm_memisc <- mtable("Model 1"=lm1,"Model 2"=lm2,"Model 3"=lm3, summary.stats = c('R-squared','F','p','N'))
lm_memisc
##
## Calls:
## Model 1: lm(formula = PrizeMoney ~ ., data = blog1)
## Model 2: lm(formula = logPrizeMoney ~ ., data = Logblog1)
## Model 3: lm(formula = logPrizeMoney ~ DrivingAccuracy + Scrambling + PuttsPerRound,
## data = Logblog1)
##
## =================================================================
## Model 1 Model 2 Model 3
## -------------- ------------- -------------
## PrizeMoney logPrizeMoney logPrizeMoney
## -----------------------------------------------------------------
## (Intercept) -426837.131 7.381 7.381
## (372993.195) (5.571) (5.571)
## DrivingAccuracy -1353.794 0.010 0.010
## (918.371) (0.014) (0.014)
## Scrambling 6992.504*** 0.100*** 0.100***
## (1725.205) (0.026) (0.026)
## PuttsPerRound 5530.681 -0.117 -0.117
## (11361.840) (0.170) (0.170)
## -----------------------------------------------------------------
## R-squared 0.091 0.139 0.139
## F 6.437 10.290 10.290
## p 0.000 0.000 0.000
## N 196 196 196
## =================================================================
## Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05
A portion of the data can be indexed and displayed as required.
lm_memisc[c("DrivingAccuracy","Scrambling"), c("Model 2","Model 3")]
##
## Calls:
## Model 2: lm(formula = logPrizeMoney ~ ., data = Logblog1)
## Model 3: lm(formula = logPrizeMoney ~ DrivingAccuracy + Scrambling + PuttsPerRound,
## data = Logblog1)
##
## ===========================================
## Model 2 Model 3
## -------------------------------------------
## DrivingAccuracy 0.010 0.010
## (0.014) (0.014)
## Scrambling 0.100*** 0.100***
## (0.026) (0.026)
## -------------------------------------------
## R-squared 0.139 0.139
## F 10.290 10.290
## p 0.000 0.000
## N 196 196
## ===========================================
## Significance: *** = p < 0.001;
## ** = p < 0.01;
## * = p < 0.05