The stargazer package is a great R package to compare outputs of models and display basic summary statistics.
Using pgatour2006 data from MARR 6.5.
if (!require('dplyr')) install.packages('dplyr')
if (!require('stargazer')) install.packages('stargazer')
if (!require('psych')) (install.packages('psych'))
pga <- read.csv('pgatour2006.csv', header=T)
pga <- pga %>%
select(PrizeMoney, DrivingAccuracy, GIR, PuttingAverage,
BirdieConversion, SandSaves, Scrambling, PuttsPerRound)
pgaLog <- pga
pgaLog$logPrizeMoney <- log(pga$PrizeMoney)
pgaLog$PrizeMoney <- NULL
{r results='asis'}
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Median | Pctl(75) | Max |
DrivingAccuracy | 196 | 63.380 | 5.413 | 49.750 | 59.758 | 63.240 | 66.965 | 78.430 |
GIR | 196 | 65.186 | 2.722 | 56.870 | 63.523 | 65.355 | 66.770 | 74.150 |
PuttingAverage | 196 | 1.780 | 0.025 | 1.712 | 1.763 | 1.778 | 1.796 | 1.851 |
BirdieConversion | 196 | 28.982 | 2.207 | 23.170 | 27.508 | 29.010 | 30.553 | 35.660 |
SandSaves | 196 | 48.972 | 5.828 | 33.910 | 45.130 | 48.655 | 52.870 | 63.640 |
Scrambling | 196 | 57.494 | 3.162 | 49.020 | 55.260 | 57.650 | 59.457 | 66.450 |
PuttsPerRound | 196 | 29.201 | 0.442 | 27.960 | 28.910 | 29.190 | 29.477 | 30.190 |
logPrizeMoney | 196 | 10.378 | 0.980 | 7.714 | 9.762 | 10.509 | 10.967 | 13.404 |
Con of using stargazer for summary statistics is that you can’t predict skew and kurtosis. Which we can get by using describe function in the psych library.
n | mean | sd | median | min | max | skew | kurtosis | |
---|---|---|---|---|---|---|---|---|
PrizeMoney | 196 | 50891.168367 | 6.390295e+04 | 36644.500 | 2240.000 | 662771.000 | 5.2943317 | 42.5710106 |
DrivingAccuracy | 196 | 63.380102 | 5.413023e+00 | 63.240 | 49.750 | 78.430 | 0.0942275 | 0.0281998 |
GIR | 196 | 65.186071 | 2.722364e+00 | 65.355 | 56.870 | 74.150 | -0.2459686 | 0.6762018 |
PuttingAverage | 196 | 1.779852 | 2.472810e-02 | 1.778 | 1.712 | 1.851 | 0.1562248 | -0.2411670 |
BirdieConversion | 196 | 28.982296 | 2.206556e+00 | 29.010 | 23.170 | 35.660 | -0.0215926 | 0.2586721 |
SandSaves | 196 | 48.971735 | 5.828313e+00 | 48.655 | 33.910 | 63.640 | 0.0028410 | -0.2420466 |
Scrambling | 196 | 57.494439 | 3.162257e+00 | 57.650 | 49.020 | 66.450 | 0.0037607 | 0.0927853 |
PuttsPerRound | 196 | 29.201071 | 4.417023e-01 | 29.190 | 27.960 | 30.190 | 0.1282510 | -0.1031871 |
Where the stargazer library really shines in the ability to compare different models using a single table.
mod1 <- lm(PrizeMoney ~., pga)
mod2 <- lm(logPrizeMoney ~., pgaLog)
mod3 <- lm(logPrizeMoney ~ GIR + BirdieConversion + SandSaves + Scrambling + PuttsPerRound, pgaLog)
We are using 3 models in this example two of which utilize log transformation of Y and other uses original values for Y. The stargazer package helps plot these models against each other easisly and show clear difference.
Dependent variable: | |||
PrizeMoney | logPrizeMoney | ||
Good | Better | Best | |
(1) | (2) | (3) | |
DrivingAccuracy | -1,835.830** | -0.004 | |
(889.161) | (0.012) | ||
GIR | 9,671.334*** | 0.199*** | 0.197*** |
(3,309.355) | (0.044) | (0.029) | |
PuttingAverage | -47,435.300 | -0.466 | |
(521,566.400) | (6.906) | ||
BirdieConversion | 10,426.030*** | 0.157*** | 0.163*** |
(3,049.642) | (0.040) | (0.033) | |
SandSaves | 1,182.058 | 0.015 | 0.016 |
(744.818) | (0.010) | (0.010) | |
Scrambling | 4,741.258** | 0.052 | 0.050** |
(2,400.818) | (0.032) | (0.025) | |
PuttsPerRound | 5,267.517 | -0.343 | -0.350 |
(35,765.740) | (0.474) | (0.231) | |
Constant | -1,165,233.000** | 0.194 | -0.583 |
(587,382.900) | (7.777) | (7.159) | |
Observations | 196 | 196 | 196 |
R2 | 0.406 | 0.558 | 0.557 |
Adjusted R2 | 0.384 | 0.541 | 0.546 |
Residual Std. Error | 50,142.970 (df = 188) | 0.664 (df = 188) | 0.661 (df = 190) |
F Statistic | 18.387*** (df = 7; 188) | 33.866*** (df = 7; 188) | 47.875*** (df = 5; 190) |
Note: | p<0.1; p<0.05; p<0.01 |
Stargazer package make it easier to display and visualize data and model outputs which help in decision making.