Blog 1

Model Comparisons

This blog describes a few model comparison packages using an example.

Data

blog1 <- read.csv("https://raw.githubusercontent.com/irene908/DATA621/main/Blog1.csv")

Creating a copy of the data to store the log of PrizeMoney

Logblog1 <- blog1
Logblog1$logPrizeMoney <- log(blog1$PrizeMoney)
Logblog1$PrizeMoney <- NULL

Stargazer package

The stargazer package helps to compare the model outputs and display the summary statistics.

https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf.

Skew and kurtosis will not be displayed by this.

The result is visible only after knitting the markdown.

stargazer(Logblog1, type = "html", nobs = TRUE, mean.sd = TRUE, median = TRUE, iqr = TRUE)


Statistic	N	Mean	St. Dev.	Min	Pctl(25)	Median	Pctl(75)	Max

DrivingAccuracy	196	63.380	5.413	49.750	59.758	63.240	66.965	78.430
Scrambling	196	57.494	3.162	49.020	55.260	57.650	59.457	66.450
PuttsPerRound	196	29.201	0.442	27.960	28.910	29.190	29.477	30.190
logPrizeMoney	196	10.378	0.980	7.714	9.762	10.509	10.967	13.404

describe() displays the skew and kurtosis as well.

s <- describe(blog1[,c(1:4)])[,c(2:4,8,9,11,12)]
s

##                   n     mean       sd     min       max skew kurtosis
## PrizeMoney      196 50891.17 63902.95 2240.00 662771.00 5.29    42.57
## DrivingAccuracy 196    63.38     5.41   49.75     78.43 0.09     0.03
## Scrambling      196    57.49     3.16   49.02     66.45 0.00     0.09
## PuttsPerRound   196    29.20     0.44   27.96     30.19 0.13    -0.10

Models

Below I have 3 models. 2 of these models use the log transformation dataset.

lm1 <- lm(PrizeMoney ~., blog1)
lm2 <- lm(logPrizeMoney ~., Logblog1)
lm3 <- lm(logPrizeMoney ~ DrivingAccuracy + Scrambling + PuttsPerRound, Logblog1)

Stargazer result

stargazer(lm1, lm2, lm3, type="html")


	Dependent variable:

	PrizeMoney	logPrizeMoney
	(1)	(2)	(3)

DrivingAccuracy	-1,353.794	0.010	0.010
	(918.371)	(0.014)	(0.014)

Scrambling	6,992.504^***	0.100^***	0.100^***
	(1,725.205)	(0.026)	(0.026)

PuttsPerRound	5,530.681	-0.117	-0.117
	(11,361.840)	(0.170)	(0.170)

Constant	-426,837.100	7.381	7.381
	(372,993.200)	(5.571)	(5.571)


Observations	196	196	196
R²	0.091	0.139	0.139
Adjusted R²	0.077	0.125	0.125
Residual Std. Error (df = 192)	61,386.980	0.917	0.917
F Statistic (df = 3; 192)	6.437^***	10.290^***	10.290^***

Note:	p<0.1; p<0.05; p<0.01

memisc package

memisc is another package that displays the summary statistics and helps to compare multiple models side by side.

https://cran.r-project.org/web/packages/memisc/memisc.pdf

Notice below I have used the summary.stats argument to mention the summary stats I need to display.

lm_memisc <- mtable("Model 1"=lm1,"Model 2"=lm2,"Model 3"=lm3, summary.stats = c('R-squared','F','p','N'))
lm_memisc

## 
## Calls:
## Model 1: lm(formula = PrizeMoney ~ ., data = blog1)
## Model 2: lm(formula = logPrizeMoney ~ ., data = Logblog1)
## Model 3: lm(formula = logPrizeMoney ~ DrivingAccuracy + Scrambling + PuttsPerRound, 
##     data = Logblog1)
## 
## =================================================================
##                       Model 1         Model 2        Model 3     
##                    --------------  -------------  -------------  
##                      PrizeMoney    logPrizeMoney  logPrizeMoney  
## -----------------------------------------------------------------
##   (Intercept)      -426837.131        7.381          7.381       
##                    (372993.195)      (5.571)        (5.571)      
##   DrivingAccuracy    -1353.794        0.010          0.010       
##                       (918.371)      (0.014)        (0.014)      
##   Scrambling          6992.504***     0.100***       0.100***    
##                      (1725.205)      (0.026)        (0.026)      
##   PuttsPerRound       5530.681       -0.117         -0.117       
##                     (11361.840)      (0.170)        (0.170)      
## -----------------------------------------------------------------
##   R-squared              0.091        0.139          0.139       
##   F                      6.437       10.290         10.290       
##   p                      0.000        0.000          0.000       
##   N                    196          196            196           
## =================================================================
##   Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05

A portion of the data can be indexed and displayed as required.

lm_memisc[c("DrivingAccuracy","Scrambling"), c("Model 2","Model 3")]

## 
## Calls:
## Model 2: lm(formula = logPrizeMoney ~ ., data = Logblog1)
## Model 3: lm(formula = logPrizeMoney ~ DrivingAccuracy + Scrambling + PuttsPerRound, 
##     data = Logblog1)
## 
## ===========================================
##                     Model 2     Model 3    
## -------------------------------------------
##   DrivingAccuracy    0.010       0.010     
##                     (0.014)     (0.014)    
##   Scrambling         0.100***    0.100***  
##                     (0.026)     (0.026)    
## -------------------------------------------
##   R-squared          0.139       0.139     
##   F                 10.290      10.290     
##   p                  0.000       0.000     
##   N                196         196         
## ===========================================
##   Significance: *** = p < 0.001;   
##                 ** = p < 0.01;   
##                 * = p < 0.05

DATA 621 Blog 1

Irene Jacob