Variance in the Outcome: The Black Box

Regression models engage an exercise in variance accounting. How much of the outcome is explained by the inputs, individually (slope divided by standard error is t) and collectively (Average explained/Average unexplained with averaging over degrees of freedom is F). This, of course, assumes normal errors. This document provides a function for making use of the black box. Just as in common parlance, a black box is the unexplained. Let’s take an example.

OregonSalaries <- structure(list(Obs = 1:32, Salary = c(41514.38701, 40964.06985, 
39170.19178, 37936.57206, 33981.77752, 36077.27107, 39174.05733, 
39037.372, 29131.74865, 36200.44592, 38561.3987, 33247.92306, 
33609.4874, 33669.22275, 37805.83017, 35846.13454, 47342.65909, 
46382.3851, 45812.91029, 46409.65664, 43796.05285, 43124.02135, 
49443.81792, 44805.79217, 44440.32001, 46679.59218, 47337.09786, 
47298.72531, 41461.0474, 43598.293, 43431.18499, 49266.41189), 
    Gender = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Female", "Male"
    ), class = "factor")), .Names = c("Obs", "Salary", "Gender"
), class = "data.frame", row.names = c(NA, -32L))
black.box.maker <- function(mod1) {
            d1 <- dim(mod1$model)[[1]]
            sumsq1 <- var(mod1$model[,1], na.rm=TRUE)*(d1-1)
            rt1 <- sqrt(sumsq1)
            sumsq2 <- var(mod1$fitted.values, na.rm=TRUE)*(d1-1)
            rsquare <- round(sumsq2/sumsq1, digits=4)
            rt2 <- sqrt(sumsq2)
            plot(x=NA, y=NA, xlim=c(0,rt1), ylim=c(0,rt1), main=paste("R-squared:",rsquare), xlab="", ylab="", bty="n", cex=0.5)
            polygon(x=c(0,0,rt1,rt1), y=c(0,rt1,rt1,0), col="black")
            polygon(x=c(0,0,rt2,rt2), y=c(0,rt2,rt2,0), col="white")
            }

Invoking the Function

First, a regression model. I will estimate the following regression:

\[ Salary = \alpha + \beta_{1}*Gender + \epsilon \]

GenderReg <- lm(Salary ~Gender, data=OregonSalaries)
summary(GenderReg)
## 
## Call:
## lm(formula = Salary ~ Gender, data = OregonSalaries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7488.7 -2107.9   433.3  1743.9  4893.9 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  36620.5      705.1   51.94  < 2e-16 ***
## GenderMale    9043.9      997.1    9.07 4.22e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2820 on 30 degrees of freedom
## Multiple R-squared:  0.7328, Adjusted R-squared:  0.7239 
## F-statistic: 82.26 on 1 and 30 DF,  p-value: 4.223e-10

Now to the plot.

black.box.maker(GenderReg)

Voila. How to change it up. Well, three things are required in a copy and paste from the R Commander. First, you will need to import data of some form, obviously. Second, we will need a regression model. Finally, we will need to execute the function black.box.maker on the model with black.box.maker(model.name) just as the code chunk above illustrates.