Tobit Models

In this dataset, “prog” is the type of program the student is in. It is a categorical (nominal) variable that takes on three values: academic (prog = 1), general (prog = 2), and vocational (prog = 3). Note that here the lowest value of “apt”, aptitude, is 352. That is, no student received a score of 200 (the lowest score possible), meaning that even though censoring from below was possible, it does not occur in the dataset. Looking at the histogram, we can see the censoring in the values of apt: there are far more cases with scores of 750 to 800 than one would expect looking at the rest of the distribution.

Tobit regression coefficients are interpreted in the similar manner to OLS regression coefficients; however, the linear effect is on the uncensored latent variable, not the observed outcome.

. For a one unit increase in read, there is a 2.6981 point increase in the predicted value of apt.

. A one unit increase in math is associated with a 5.9146 unit increase in the predicted value of apt.

. The terms for prog have a slightly different interpretation (reference program: academic):

..The predicted value of apt is -12.7146 points lower for students in a general program than for students in an academic program.

..The predicted value of apt is -46.1419 points lower for students in a vocational program than for students in an academic program.

..The coefficient labeled “(Intercept):1” is the intercept or constant for the model.

. The coefficient labeled “(Intercept):2” is an ancillary statistic. If we exponentiate this value, we get a statistic that is analogous to the square root of the residual variance in OLS regression. The value of 65.6773 can be compared to the standard deviation of academic aptitude which was 99.21, a substantial reduction.

summary(m <- vglm(apt ~ read + math + prog, tobit(Upper = 800), data = dat))


Call:
vglm(formula = apt ~ read + math + prog, family = tobit(Upper = 800), 
    data = dat)

Coefficients: 
                Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  209.55956   32.54590   6.439 1.20e-10 ***
(Intercept):2    4.18476    0.05235  79.944  < 2e-16 ***
read             2.69796    0.61928   4.357 1.32e-05 ***
math             5.91460    0.70539   8.385  < 2e-16 ***
proggeneral    -12.71458   12.40857  -1.025 0.305523    
progvocational -46.14327   13.70667  -3.366 0.000761 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: mu, loglink(sd)

Log-likelihood: -1041.063 on 394 degrees of freedom

Number of Fisher scoring iterations: 5 

No Hauck-Donner effect found in any of the estimates

We can test the significance of program type overall by fitting a model without program in it and using a likelihood ratio test (LRT). The LRT with two degrees of freedom is associated with a p-value of 0.0032, indicating that the overall effect of prog is statistically significant.

[1] 0.003155176

Model Fit Plots

We can test model fit with residuals plots, to assess their absolute as well as relative (pearson) values and assumptions such as normality and homogeneity of variance. Model fit looks good.