Regression models engage an exercise in variance accounting. How much of the outcome is explained by the inputs, individually (slope divided by standard error is t) and collectively (Average explained/Average unexplained with averaging over degrees of freedom is F). This, of course, assumes normal errors. This document provides a function for making use of the black box. Just as in common parlance, a black box is the unexplained. Let’s take an example.
OregonSalaries <- structure(list(Obs = 1:32, Salary = c(41514.38701, 40964.06985,
39170.19178, 37936.57206, 33981.77752, 36077.27107, 39174.05733,
39037.372, 29131.74865, 36200.44592, 38561.3987, 33247.92306,
33609.4874, 33669.22275, 37805.83017, 35846.13454, 47342.65909,
46382.3851, 45812.91029, 46409.65664, 43796.05285, 43124.02135,
49443.81792, 44805.79217, 44440.32001, 46679.59218, 47337.09786,
47298.72531, 41461.0474, 43598.293, 43431.18499, 49266.41189),
Gender = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Female", "Male"
), class = "factor")), .Names = c("Obs", "Salary", "Gender"
), class = "data.frame", row.names = c(NA, -32L))
black.box.maker <- function(mod1) {
d1 <- dim(mod1$model)[[1]]
sumsq1 <- var(mod1$model[,1], na.rm=TRUE)*(d1-1)
rt1 <- sqrt(sumsq1)
sumsq2 <- var(mod1$fitted.values, na.rm=TRUE)*(d1-1)
rsquare <- round(sumsq2/sumsq1, digits=4)
rt2 <- sqrt(sumsq2)
plot(x=NA, y=NA, xlim=c(0,rt1), ylim=c(0,rt1), main=paste("R-squared:",rsquare), xlab="", ylab="", bty="n", cex=0.5)
polygon(x=c(0,0,rt1,rt1), y=c(0,rt1,rt1,0), col="black")
polygon(x=c(0,0,rt2,rt2), y=c(0,rt2,rt2,0), col="white")
}
First, a regression model. I will estimate the following regression:
\[ Salary = \alpha + \beta_{1}*Gender + \epsilon \]
GenderReg <- lm(Salary ~Gender, data=OregonSalaries)
summary(GenderReg)
##
## Call:
## lm(formula = Salary ~ Gender, data = OregonSalaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7488.7 -2107.9 433.3 1743.9 4893.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36620.5 705.1 51.94 < 2e-16 ***
## GenderMale 9043.9 997.1 9.07 4.22e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2820 on 30 degrees of freedom
## Multiple R-squared: 0.7328, Adjusted R-squared: 0.7239
## F-statistic: 82.26 on 1 and 30 DF, p-value: 4.223e-10
Now to the plot.
black.box.maker(GenderReg)
Voila. How to change it up. Well, three things are required in a copy and paste from the R Commander. First, you will need to import data of some form, obviously. Second, we will need a regression model. Finally, we will need to execute the function black.box.maker on the model with black.box.maker(model.name) just as the code chunk above illustrates.