DATA 621 - Business Analytics and Data Mining

Initialize

if (!require('dplyr')) install.packages('dplyr')
if (!require('stargazer')) install.packages('stargazer')
if (!require('psych')) (install.packages('psych'))

pga <- read.csv('pgatour2006.csv', header=T)

pga <- pga %>%
    select(PrizeMoney, DrivingAccuracy, GIR, PuttingAverage,
           BirdieConversion, SandSaves, Scrambling, PuttsPerRound)

pgaLog <- pga
pgaLog$logPrizeMoney <- log(pga$PrizeMoney)
pgaLog$PrizeMoney <- NULL

Summary Statistics

{r results='asis'}

stargazer(pgaLog, type = "html", nobs = TRUE, mean.sd = TRUE, median = TRUE, iqr = TRUE)


Statistic	N	Mean	St. Dev.	Min	Pctl(25)	Median	Pctl(75)	Max

DrivingAccuracy	196	63.380	5.413	49.750	59.758	63.240	66.965	78.430
GIR	196	65.186	2.722	56.870	63.523	65.355	66.770	74.150
PuttingAverage	196	1.780	0.025	1.712	1.763	1.778	1.796	1.851
BirdieConversion	196	28.982	2.207	23.170	27.508	29.010	30.553	35.660
SandSaves	196	48.972	5.828	33.910	45.130	48.655	52.870	63.640
Scrambling	196	57.494	3.162	49.020	55.260	57.650	59.457	66.450
PuttsPerRound	196	29.201	0.442	27.960	28.910	29.190	29.477	30.190
logPrizeMoney	196	10.378	0.980	7.714	9.762	10.509	10.967	13.404

Use Describe Function of Psych library

Con of using stargazer for summary statistics is that you can’t predict skew and kurtosis. Which we can get by using describe function in the psych library.

summary <- describe(pga[,c(1:8)])[,c(2:5,8,9,11,12)]
knitr::kable(summary)

	n	mean	sd	median	min	max	skew	kurtosis
PrizeMoney	196	50891.168367	6.390295e+04	36644.500	2240.000	662771.000	5.2943317	42.5710106
DrivingAccuracy	196	63.380102	5.413023e+00	63.240	49.750	78.430	0.0942275	0.0281998
GIR	196	65.186071	2.722364e+00	65.355	56.870	74.150	-0.2459686	0.6762018
PuttingAverage	196	1.779852	2.472810e-02	1.778	1.712	1.851	0.1562248	-0.2411670
BirdieConversion	196	28.982296	2.206556e+00	29.010	23.170	35.660	-0.0215926	0.2586721
SandSaves	196	48.971735	5.828313e+00	48.655	33.910	63.640	0.0028410	-0.2420466
Scrambling	196	57.494439	3.162257e+00	57.650	49.020	66.450	0.0037607	0.0927853
PuttsPerRound	196	29.201071	4.417023e-01	29.190	27.960	30.190	0.1282510	-0.1031871

Model Comparison

Where the stargazer library really shines in the ability to compare different models using a single table.

mod1 <- lm(PrizeMoney ~., pga)
mod2 <- lm(logPrizeMoney ~., pgaLog)
mod3 <- lm(logPrizeMoney ~ GIR + BirdieConversion + SandSaves + Scrambling + PuttsPerRound, pgaLog)

We are using 3 models in this example two of which utilize log transformation of Y and other uses original values for Y. The stargazer package helps plot these models against each other easisly and show clear difference.

stargazer(mod1, mod2, mod3, type="html", column.labels = c("Good", "Better", "Best"))


	Dependent variable:

	PrizeMoney	logPrizeMoney
	Good	Better	Best
	(1)	(2)	(3)

DrivingAccuracy	-1,835.830^**	-0.004
	(889.161)	(0.012)

GIR	9,671.334^***	0.199^***	0.197^***
	(3,309.355)	(0.044)	(0.029)

PuttingAverage	-47,435.300	-0.466
	(521,566.400)	(6.906)

BirdieConversion	10,426.030^***	0.157^***	0.163^***
	(3,049.642)	(0.040)	(0.033)

SandSaves	1,182.058	0.015	0.016
	(744.818)	(0.010)	(0.010)

Scrambling	4,741.258^**	0.052	0.050^**
	(2,400.818)	(0.032)	(0.025)

PuttsPerRound	5,267.517	-0.343	-0.350
	(35,765.740)	(0.474)	(0.231)

Constant	-1,165,233.000^**	0.194	-0.583
	(587,382.900)	(7.777)	(7.159)


Observations	196	196	196
R²	0.406	0.558	0.557
Adjusted R²	0.384	0.541	0.546
Residual Std. Error	50,142.970 (df = 188)	0.664 (df = 188)	0.661 (df = 190)
F Statistic	18.387^*** (df = 7; 188)	33.866^*** (df = 7; 188)	47.875^*** (df = 5; 190)

Note:	p<0.1; p<0.05; p<0.01

Conclusion

Stargazer package make it easier to display and visualize data and model outputs which help in decision making.

DATA 621 - Business Analytics and Data Mining

Blog 2 - STARGAZER : For Model Comparison

Samriti Malhotra

Stargazer Package

Initialize

Summary Statistics

Use Describe Function of Psych library

Model Comparison

Conclusion