{gvlma}
- The {gvlma} package is a comprehensive, automatic testing suite for many of the assumptions of general linear models.
- It does both statistical tests and diagnostic plots using an extremely simple implementation for powerful results.
The package is an implementation of a paper by Pena & Slate called Global Validation of Linear Model Assumptions and allows you to quickly check for:
Linearity - the Global Stat tests for the null hypothesis that our model is a linear combination of its predictors.
Homoscedasticity - the respective stat tests for the null that the residial variance is relatively constant over the range of values.
Normality - skewness and kurtosis tests help you understand if the residuals fits a normal distribution.
If the null is rejected you probably need to transform your data in some way (like a log transform). This can also be assessed by looking at the normal probability plot it generates.
gvlma( )
The gvlma( ) function in the {gvlma} package, performs a global validation of linear model assumptions as well separate evaluations of skewness, kurtosis, and heteroscedasticity.
# Global test of model assumptions
gvmodel <- gvlma(fit)
summary(gvmodel)
Example
This examples uses the cheddar data set, available in the {faraway} R package.
library("gvlma")
# model <- lm(y ~ x, data)
summary(gvlma(model))
##
## Call:
## lm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.390 -6.612 -1.009 4.908 25.449
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -28.8768 19.7354 -1.463 0.15540
## Acetic 0.3277 4.4598 0.073 0.94198
## H2S 3.9118 1.2484 3.133 0.00425 **
## Lactic 19.6705 8.6291 2.280 0.03108 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.13 on 26 degrees of freedom
## Multiple R-squared: 0.6518, Adjusted R-squared: 0.6116
## F-statistic: 16.22 on 3 and 26 DF, p-value: 3.81e-06
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = model)
##
## Value p-value Decision
## Global Stat 1.33099 0.8561 Assumptions acceptable.
## Skewness 1.12180 0.2895 Assumptions acceptable.
## Kurtosis 0.02119 0.8843 Assumptions acceptable.
## Link Function 0.02906 0.8646 Assumptions acceptable.
## Heteroscedasticity 0.15894 0.6901 Assumptions acceptable.
Diagnostic Plots for {gvlma}
- The diagnostic plots also let you understand the relation between your data and these assumptions visually.
- Other useful capabilities are the link function test which is used for understanding whether the underlying data is categorical or continuous.
plot(gvlma(model),which=1)