Question 6

In this exercise, you will further analyze the Wage data set considered throughout this chapter.

  1. Perform polynomial regression to predict wage using age. Use cross-validation to select the optimal degree d for the polynomial. What degree was chosen, and how does this compare to the results of hypothesis testing using ANOVA? Make a plot of the resulting polynomial fit to the data.
library(ISLR)
attach(Wage)
summary(Wage)
##       year           age                     maritl           race     
##  Min.   :2003   Min.   :18.00   1. Never Married: 648   1. White:2480  
##  1st Qu.:2004   1st Qu.:33.75   2. Married      :2074   2. Black: 293  
##  Median :2006   Median :42.00   3. Widowed      :  19   3. Asian: 190  
##  Mean   :2006   Mean   :42.41   4. Divorced     : 204   4. Other:  37  
##  3rd Qu.:2008   3rd Qu.:51.00   5. Separated    :  55                  
##  Max.   :2009   Max.   :80.00                                          
##                                                                        
##               education                     region    
##  1. < HS Grad      :268   2. Middle Atlantic   :3000  
##  2. HS Grad        :971   1. New England       :   0  
##  3. Some College   :650   3. East North Central:   0  
##  4. College Grad   :685   4. West North Central:   0  
##  5. Advanced Degree:426   5. South Atlantic    :   0  
##                           6. East South Central:   0  
##                           (Other)              :   0  
##            jobclass               health      health_ins      logwage     
##  1. Industrial :1544   1. <=Good     : 858   1. Yes:2083   Min.   :3.000  
##  2. Information:1456   2. >=Very Good:2142   2. No : 917   1st Qu.:4.447  
##                                                            Median :4.653  
##                                                            Mean   :4.654  
##                                                            3rd Qu.:4.857  
##                                                            Max.   :5.763  
##                                                                           
##       wage       
##  Min.   : 20.09  
##  1st Qu.: 85.38  
##  Median :104.92  
##  Mean   :111.70  
##  3rd Qu.:128.68  
##  Max.   :318.34  
## 

For this particular problem, I will be looking at the relationship between wage and age.

To begin, I will first look for the polynomial the brings out the lowest MSE with a cross-validation method.

library(boot)
set.seed(1)
cv.err<-rep(NA, 10)
for (i in 1:10) {
  fit<-glm(wage~poly(age, i), data=Wage)
  cv.err[i]<-cv.glm(Wage, fit, K=10)$delta[1]
}
plot(1:10, cv.err, xlab="Degree", ylab = "Test MSE", type = "l")
d.min<-which.min(cv.err)
points(d.min, cv.err[d.min], col="red", cex=2, pch=20)

Through cross-validation, the 4th degree is the optimal degree for the polynomial since it has the lowest test MSE. I will now use ANOVA to test the hypothesis the chosen model above (\(\mu_1\)) is sufficient to explain the data against the alternative hypothesis that a more complex model (\(\mu_2\)) is required.

fit.1=lm(wage~age, data = Wage)
fit.2=lm(wage~poly(age,2), data = Wage)
fit.3=lm(wage~poly(age,3), data = Wage)
fit.4=lm(wage~poly(age,4), data = Wage)
fit.5=lm(wage~poly(age,5), data = Wage)
anova(fit.1,fit.2,fit.3,fit.4,fit.5)
## Analysis of Variance Table
## 
## Model 1: wage ~ age
## Model 2: wage ~ poly(age, 2)
## Model 3: wage ~ poly(age, 3)
## Model 4: wage ~ poly(age, 4)
## Model 5: wage ~ poly(age, 5)
##   Res.Df     RSS Df Sum of Sq        F    Pr(>F)    
## 1   2998 5022216                                    
## 2   2997 4793430  1    228786 143.5931 < 2.2e-16 ***
## 3   2996 4777674  1     15756   9.8888  0.001679 ** 
## 4   2995 4771604  1      6070   3.8098  0.051046 .  
## 5   2994 4770322  1      1283   0.8050  0.369682    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value comparing the linear fit 1 to quadratic fit 2 is essentiallly 0, indicating that a linear fit is not sufficient to explain the relationship between Wage and Age. Similarly, the p-value comparing the quadratic fit 2 and cubic fit 3 is very low (0.0017), so the quadratic fit is also insufficient. The p-value comparing the cubic fit 3 and degree-4 polynimal fit 4 is approximately 5%. Going on, the degree-5 polynomial seems unnecessary considering it’s p-value of 0.37. From this ANOVA, the cubic or quartic polynomial seem to provide a reasonable fit to the data.

Now, I will plot the predictions of these two polynomials.

plot(wage~age, data=Wage, col="grey80", main="Polynomial 3 & 4 on Age Against Wage")
agelim=range(Wage$age)
age.grid=seq(from=agelim[1], to=agelim[2])
fit=lm(wage~poly(age,3), data=Wage)
preds=predict(fit, newdata = list(age=age.grid))
lines(age.grid, preds, col="royalblue1", lwd=2)
fit2=lm(wage~poly(age,4), data=Wage)
preds2=predict(fit2, newdata=list(age=age.grid))
lines(age.grid, preds2, col="indianred1", lwd=2)

As seen in the graph above, the two polynomial functions predict pretty similarly except towards where age=75 and on. I believe that either of these would make a sufficient fit to this data. However, if we were to take a pick of the functions based on simplicity and interpretability, I would go with the polynomial with degree=3.

  1. Fit a step function to predict wage using age, and perform crossvalidation to choose the optimal number of cuts. Make a plot of the fit obtained.
cuts=rep(NA, 10)
for (i in 2:10) {
  Wage$age.cut=cut(Wage$age, i)
  fit.cut=glm(wage~age.cut, data=Wage)
  cuts[i]=cv.glm(Wage, fit.cut, K=10)$delta[1]
}
plot(2:10, cuts[-1], main= "# of Cuts", xlab = "Cuts", ylab="Test MSE", type="l")
d.min=which.min(cuts)
points(which.min(cuts), cuts[which.min(cuts)], col="red", cex=2, pch=20)

For this problem, we see that the test error is minimum for 8 cuts. Now I will fit the entire data on a step function using 8 cuts.

plot(wage~age, data=Wage, col="grey80", main="Step Function with 8 Cuts")
step.fit=glm(wage~cut(age,8), data=Wage)
step.preds=predict(step.fit, list(age=age.grid))
lines(age.grid, step.preds, col="orchid4", lwd=2)

As described in the book, we use the step function in order to avoid imposing such a global structure to the data. In the graph above, you will see that the data is separated into 8 different bins that describe the trends with wage in regard to age. This function is typically used for data like this where the you are attempting to understand trends against several age groups.

Question 10

This question relates to the College data set.

library(ISLR)
data(College)
summary(College)
##  Private        Apps           Accept          Enroll       Top10perc    
##  No :212   Min.   :   81   Min.   :   72   Min.   :  35   Min.   : 1.00  
##  Yes:565   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00  
##            Median : 1558   Median : 1110   Median : 434   Median :23.00  
##            Mean   : 3002   Mean   : 2019   Mean   : 780   Mean   :27.56  
##            3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00  
##            Max.   :48094   Max.   :26330   Max.   :6392   Max.   :96.00  
##    Top25perc      F.Undergrad     P.Undergrad         Outstate    
##  Min.   :  9.0   Min.   :  139   Min.   :    1.0   Min.   : 2340  
##  1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320  
##  Median : 54.0   Median : 1707   Median :  353.0   Median : 9990  
##  Mean   : 55.8   Mean   : 3700   Mean   :  855.3   Mean   :10441  
##  3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925  
##  Max.   :100.0   Max.   :31643   Max.   :21836.0   Max.   :21700  
##    Room.Board       Books           Personal         PhD        
##  Min.   :1780   Min.   :  96.0   Min.   : 250   Min.   :  8.00  
##  1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00  
##  Median :4200   Median : 500.0   Median :1200   Median : 75.00  
##  Mean   :4358   Mean   : 549.4   Mean   :1341   Mean   : 72.66  
##  3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00  
##  Max.   :8124   Max.   :2340.0   Max.   :6800   Max.   :103.00  
##     Terminal       S.F.Ratio      perc.alumni        Expend     
##  Min.   : 24.0   Min.   : 2.50   Min.   : 0.00   Min.   : 3186  
##  1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751  
##  Median : 82.0   Median :13.60   Median :21.00   Median : 8377  
##  Mean   : 79.7   Mean   :14.09   Mean   :22.74   Mean   : 9660  
##  3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830  
##  Max.   :100.0   Max.   :39.80   Max.   :64.00   Max.   :56233  
##    Grad.Rate     
##  Min.   : 10.00  
##  1st Qu.: 53.00  
##  Median : 65.00  
##  Mean   : 65.46  
##  3rd Qu.: 78.00  
##  Max.   :118.00
  1. Split the data into a training set and a test set. Using out-of-state tuition as the response and the other variables as the predictors, perform forward stepwise selection on the training set in order to identify a satisfactory model that uses just a subset of the predictors.
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'lattice'
## The following object is masked from 'package:boot':
## 
##     melanoma
## Loading required package: ggplot2
library(leaps)
set.seed(1)
train=createDataPartition(College$Outstate, p=0.75, list=FALSE)
training=College[train,]
testing=College[-train,]
dummy.matrix=model.matrix(Outstate~., data=training)
forward.model=regsubsets(x=dummy.matrix, y=training$Outstate, method="forward", nvmax=17)
## Warning in leaps.setup(x, y, wt = weights, nbest = nbest, nvmax = nvmax, :
## 1 linear dependencies found
## Reordering variables and trying again:
## Warning in rval$lopt[] <- rval$vorder[rval$lopt]: number of items to
## replace is not a multiple of replacement length

After splitting the data into the appropriate dataset and the best subset of coefficients has been selected, I will start to analyze the model fitted to those subset.

forward.summary=summary(forward.model)
forward.summary
## Subset selection object
## 18 Variables  (and intercept)
##             Forced in Forced out
## PrivateYes      FALSE      FALSE
## Apps            FALSE      FALSE
## Accept          FALSE      FALSE
## Enroll          FALSE      FALSE
## Top10perc       FALSE      FALSE
## Top25perc       FALSE      FALSE
## F.Undergrad     FALSE      FALSE
## P.Undergrad     FALSE      FALSE
## Room.Board      FALSE      FALSE
## Books           FALSE      FALSE
## Personal        FALSE      FALSE
## PhD             FALSE      FALSE
## Terminal        FALSE      FALSE
## S.F.Ratio       FALSE      FALSE
## perc.alumni     FALSE      FALSE
## Expend          FALSE      FALSE
## Grad.Rate       FALSE      FALSE
## (Intercept)     FALSE      FALSE
## 1 subsets of each size up to 17
## Selection Algorithm: forward
##           (Intercept) PrivateYes Apps Accept Enroll Top10perc Top25perc
## 1  ( 1 )  " "         " "        " "  " "    " "    " "       " "      
## 2  ( 1 )  " "         "*"        " "  " "    " "    " "       " "      
## 3  ( 1 )  " "         "*"        " "  " "    " "    " "       " "      
## 4  ( 1 )  " "         "*"        " "  " "    " "    " "       " "      
## 5  ( 1 )  " "         "*"        " "  " "    " "    " "       " "      
## 6  ( 1 )  " "         "*"        " "  " "    " "    " "       " "      
## 7  ( 1 )  " "         "*"        " "  "*"    " "    " "       " "      
## 8  ( 1 )  " "         "*"        " "  "*"    "*"    " "       " "      
## 9  ( 1 )  " "         "*"        " "  "*"    "*"    " "       "*"      
## 10  ( 1 ) " "         "*"        "*"  "*"    "*"    " "       "*"      
## 11  ( 1 ) " "         "*"        "*"  "*"    "*"    " "       "*"      
## 12  ( 1 ) " "         "*"        "*"  "*"    "*"    " "       "*"      
## 13  ( 1 ) " "         "*"        "*"  "*"    "*"    " "       "*"      
## 14  ( 1 ) " "         "*"        "*"  "*"    "*"    " "       "*"      
## 15  ( 1 ) " "         "*"        "*"  "*"    "*"    "*"       "*"      
## 16  ( 1 ) " "         "*"        "*"  "*"    "*"    "*"       "*"      
## 17  ( 1 ) " "         "*"        "*"  "*"    "*"    "*"       "*"      
##           F.Undergrad P.Undergrad Room.Board Books Personal PhD Terminal
## 1  ( 1 )  " "         " "         " "        " "   " "      " " " "     
## 2  ( 1 )  " "         " "         " "        " "   " "      " " " "     
## 3  ( 1 )  " "         " "         "*"        " "   " "      " " " "     
## 4  ( 1 )  " "         " "         "*"        " "   " "      " " " "     
## 5  ( 1 )  " "         " "         "*"        " "   " "      "*" " "     
## 6  ( 1 )  " "         " "         "*"        " "   " "      "*" " "     
## 7  ( 1 )  " "         " "         "*"        " "   " "      "*" " "     
## 8  ( 1 )  " "         " "         "*"        " "   " "      "*" " "     
## 9  ( 1 )  " "         " "         "*"        " "   " "      "*" " "     
## 10  ( 1 ) " "         " "         "*"        " "   " "      "*" " "     
## 11  ( 1 ) "*"         " "         "*"        " "   " "      "*" " "     
## 12  ( 1 ) "*"         " "         "*"        " "   " "      "*" "*"     
## 13  ( 1 ) "*"         " "         "*"        " "   " "      "*" "*"     
## 14  ( 1 ) "*"         " "         "*"        "*"   " "      "*" "*"     
## 15  ( 1 ) "*"         " "         "*"        "*"   " "      "*" "*"     
## 16  ( 1 ) "*"         " "         "*"        "*"   "*"      "*" "*"     
## 17  ( 1 ) "*"         "*"         "*"        "*"   "*"      "*" "*"     
##           S.F.Ratio perc.alumni Expend Grad.Rate
## 1  ( 1 )  " "       " "         "*"    " "      
## 2  ( 1 )  " "       " "         "*"    " "      
## 3  ( 1 )  " "       " "         "*"    " "      
## 4  ( 1 )  " "       "*"         "*"    " "      
## 5  ( 1 )  " "       "*"         "*"    " "      
## 6  ( 1 )  " "       "*"         "*"    "*"      
## 7  ( 1 )  " "       "*"         "*"    "*"      
## 8  ( 1 )  " "       "*"         "*"    "*"      
## 9  ( 1 )  " "       "*"         "*"    "*"      
## 10  ( 1 ) " "       "*"         "*"    "*"      
## 11  ( 1 ) " "       "*"         "*"    "*"      
## 12  ( 1 ) " "       "*"         "*"    "*"      
## 13  ( 1 ) "*"       "*"         "*"    "*"      
## 14  ( 1 ) "*"       "*"         "*"    "*"      
## 15  ( 1 ) "*"       "*"         "*"    "*"      
## 16  ( 1 ) "*"       "*"         "*"    "*"      
## 17  ( 1 ) "*"       "*"         "*"    "*"
par(mfrow=c(2,2))
plot(forward.summary$adjr2, pch=19, col="steelblue4", main="Adjusted R2", xlab="Number of Variables Used", ylab="Adjusted R2 Values")
plot(forward.summary$bic, pch=19, col="steelblue4", main="BIC", xlab="Number of Variables Used", ylab="BIC")
RMSE=sqrt(forward.summary$rss/nrow(training))
plot(RMSE, pch=19, col="steelblue4", main="RMSE", xlab="Number of Variables Used", ylab ="RMSE")
plot(forward.summary$cp, pch=19, col="steelblue4", main="CP", xlab="Number of Variables Used", ylab="CP")

print(forward.summary$bic)
##  [1] -339.6571 -535.4045 -643.4238 -704.1590 -738.4836 -754.3565 -753.1089
##  [8] -781.5852 -779.5157 -777.0209 -773.3869 -768.8531 -764.1952 -759.3010
## [15] -753.9628 -748.2359 -741.8663
print(RMSE)
##  [1] 2975.638 2502.807 2269.312 2142.608 2069.243 2030.210 2021.326
##  [8] 1961.912 1954.702 1948.228 1943.669 1940.616 1937.774 1935.327
## [15] 1933.618 1932.554 1932.554
print(forward.summary$adjr2)
##  [1] 0.4521220 0.6117377 0.6802524 0.7144688 0.7332271 0.7427516 0.7445553
##  [8] 0.7589329 0.7602846 0.7614544 0.7621542 0.7624856 0.7627654 0.7629482
## [15] 0.7629500 0.7627932 0.7623743
print(forward.summary$cp)
##  [1] 759.50831 369.63243 203.06526 120.49761  75.75103  53.54405  50.09881
##  [8]  16.29674  14.02468  12.20193  11.51819  11.72415  12.05642  12.62281
## [15]  13.62272  15.00030  17.00000

One of the first things that I notice about the several models is that not all variables are included. When looking at RMSE and Adjusted \(R^2\), we start to see that the error and % of explained variance stabilizes around 8 variables. At 8 variables, we have a error of 1961.912, which is significantly smaller than errors from 1-6 variables. Then at 8 variables, we have almost 76% of the variation in model explained.

The variables that the Forward Stepwise model selected include: PrivateYes, Accept, Enroll, PhD, Room.Board, expend, perc.alumni, and Grad.Rate.

  1. Fit a GAM on the training data, using out-of-state tuition as the response and the features selected in the previous step as the predictors. Plot the results, and explain your findings.
coeff=names(coef(forward.model, id=8))
coeff=coeff[-grep("Intercept", coeff)]

gam.model=train(x=training[,coeff], y=training$Outstate, method="gamSpline", trControl=trainControl(method = "cv", number=10), tuneGrid=expand.grid(df=1:10))
## Loading required package: gam
## Loading required package: splines
## Loading required package: foreach
## Loaded gam 1.16
plot(gam.model, xlim=c(0,10))

The GAM Model shows that the optimal degrees of freedom as chosen by 10-fold cross-validation is 3.

postResample(predict(gam.model, training), training$Outstate)
##         RMSE     Rsquared          MAE 
## 2067.1961713    0.7362112 1593.9508202

The GAM model chosen from cross-validation achieved an \(R^2\) of 73.62% mean that this percentage of variation in the outcome variable is explained by the data. The GAM Model also pushed out an error of approximately 2067.2. This model did not out-perform the forward step model above.

  1. Evaluate the model obtained on the test set, and explain the results obtained.
postResample(predict(gam.model, testing), testing$Outstate)
##         RMSE     Rsquared          MAE 
## 2324.0904162    0.6702745 1619.7849436

When the model was used on the test set, the GAM produced a 67.03% for \(r^2\) and had a much higher error. The reason that this happened is due to the fact that this model has never seen the data tested.

  1. For which variables, if any, is there evidence of a non-linear relationship with the response?
summary(gam.model)
## 
## Call: (function (formula, family = gaussian, data, weights, subset, 
##     na.action, start = NULL, etastart, mustart, control = gam.control(...), 
##     model = TRUE, method = "glm.fit", x = FALSE, y = TRUE, ...) 
## {
##     call <- match.call()
##     if (is.character(family)) 
##         family <- get(family, mode = "function", envir = parent.frame())
##     if (is.function(family)) 
##         family <- family()
##     if (is.null(family$family)) {
##         print(family)
##         stop("`family' not recognized")
##     }
##     if (missing(data)) 
##         data <- environment(formula)
##     mf <- match.call(expand.dots = FALSE)
##     m <- match(c("formula", "data", "subset", "weights", "etastart", 
##         "mustart", "offset"), names(mf), 0L)
##     mf <- mf[c(1L, m)]
##     mf$na.action = quote(na.pass)
##     mf$drop.unused.levels <- TRUE
##     mf[[1L]] <- quote(stats::model.frame)
##     gam.slist <- gam.smoothers()$slist
##     mt <- if (missing(data)) 
##         terms(formula, gam.slist)
##     else terms(formula, gam.slist, data = data)
##     mf$formula <- mt
##     mf <- eval(mf, parent.frame())
##     if (missing(na.action)) {
##         naa = getOption("na.action", "na.fail")
##         na.action = get(naa)
##     }
##     mf = na.action(mf)
##     mt = attributes(mf)[["terms"]]
##     switch(method, model.frame = return(mf), glm.fit = 1, glm.fit.null = 1, 
##         stop("invalid `method': ", method))
##     Y <- model.response(mf, "any")
##     X <- if (!is.empty.model(mt)) 
##         model.matrix(mt, mf, contrasts)
##     else matrix(, NROW(Y), 0)
##     weights <- model.weights(mf)
##     offset <- model.offset(mf)
##     if (!is.null(weights) && any(weights < 0)) 
##         stop("Negative wts not allowed")
##     if (!is.null(offset) && length(offset) != NROW(Y)) 
##         stop("Number of offsets is ", length(offset), ", should equal ", 
##             NROW(Y), " (number of observations)")
##     mustart <- model.extract(mf, "mustart")
##     etastart <- model.extract(mf, "etastart")
##     fit <- gam.fit(x = X, y = Y, smooth.frame = mf, weights = weights, 
##         start = start, etastart = etastart, mustart = mustart, 
##         offset = offset, family = family, control = control)
##     if (length(offset) && attr(mt, "intercept") > 0) {
##         fit$null.dev <- glm.fit(x = X[, "(Intercept)", drop = FALSE], 
##             y = Y, weights = weights, offset = offset, family = family, 
##             control = control[c("epsilon", "maxit", "trace")], 
##             intercept = TRUE)$deviance
##     }
##     if (model) 
##         fit$model <- mf
##     fit$na.action <- attr(mf, "na.action")
##     if (x) 
##         fit$x <- X
##     if (!y) 
##         fit$y <- NULL
##     fit <- c(fit, list(call = call, formula = formula, terms = mt, 
##         data = data, offset = offset, control = control, method = method, 
##         contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt, 
##             mf)))
##     class(fit) <- c("Gam", "glm", "lm")
##     if (!is.null(fit$df.residual) && !(fit$df.residual > 0)) 
##         warning("Residual degrees of freedom are negative or zero.  This occurs when the sum of the parametric and nonparametric degrees of freedom exceeds the number of observations.  The model is probably too complex for the amount of data available.")
##     fit
## })(formula = .outcome ~ s(Terminal, df = 4) + s(Top10perc, df = 4) + 
##     s(Grad.Rate, df = 4) + s(Books, df = 4) + s(Enroll, df = 4) + 
##     s(Apps, df = 4) + s(Expend, df = 4), family = function (link = "identity") 
## {
##     linktemp <- substitute(link)
##     if (!is.character(linktemp)) 
##         linktemp <- deparse(linktemp)
##     okLinks <- c("inverse", "log", "identity")
##     if (linktemp %in% okLinks) 
##         stats <- make.link(linktemp)
##     else if (is.character(link)) {
##         stats <- make.link(link)
##         linktemp <- link
##     }
##     else {
##         if (inherits(link, "link-glm")) {
##             stats <- link
##             if (!is.null(stats$name)) 
##                 linktemp <- stats$name
##         }
##         else {
##             stop(gettextf("link \"%s\" not available for gaussian family; available links are %s", 
##                 linktemp, paste(sQuote(okLinks), collapse = ", ")), 
##                 domain = NA)
##         }
##     }
##     structure(list(family = "gaussian", link = linktemp, linkfun = stats$linkfun, 
##         linkinv = stats$linkinv, variance = function(mu) rep.int(1, 
##             length(mu)), dev.resids = function(y, mu, wt) wt * 
##             ((y - mu)^2), aic = function(y, n, mu, wt, dev) {
##             nobs <- length(y)
##             nobs * (log(dev/nobs * 2 * pi) + 1) + 2 - sum(log(wt))
##         }, mu.eta = stats$mu.eta, initialize = expression({
##             n <- rep.int(1, nobs)
##             if (is.null(etastart) && is.null(start) && is.null(mustart) && 
##                 ((family$link == "inverse" && any(y == 0)) || 
##                   (family$link == "log" && any(y <= 0)))) stop("cannot find valid starting values: please specify some")
##             mustart <- y
##         }), validmu = function(mu) TRUE, valideta = stats$valideta), 
##         class = "family")
## }, data = structure(list(Apps = c(2186, 417, 193, 587, 353, 1899, 
## 1038, 582, 1732, 1179, 1267, 1420, 4302, 1130, 7313, 12809, 708, 
## 1734, 7548, 662, 1879, 761, 948, 627, 602, 1690, 261, 1910, 2496, 
## 990, 6075, 1163, 632, 1220, 1320, 632, 519, 1858, 878, 202, 502, 
## 805, 500, 6773, 377, 692, 20192, 3356, 443, 3767, 4186, 367, 
## 1436, 838, 7365, 12586, 6548, 860, 2362, 599, 1011, 7811, 1784, 
## 848, 2853, 1747, 100, 2694, 8728, 1160, 1096, 3877, 1257, 1754, 
## 3847, 1307, 369, 495, 601, 1283, 4158, 4681, 2785, 174, 1013, 
## 959, 212, 342, 880, 2887, 460, 2174, 689, 1006, 604, 2848, 4856, 
## 1432, 798, 946, 344, 457, 511, 444, 983, 672, 7117, 2100, 9478, 
## 314, 6756, 232, 688, 528, 440, 1538, 2967, 866, 504, 8587, 2373, 
## 967, 2762, 1994, 3014, 434, 793, 360, 604, 1011, 700, 3330, 379, 
## 458, 5597, 486, 516, 1422, 2417, 1457, 245, 3624, 3151, 8506, 
## 1256, 659, 560, 1801, 4784, 1455, 1339, 1415, 1947, 3306, 1381, 
## 11651, 291, 4200, 3440, 1801, 553, 5187, 895, 346, 2161, 2464, 
## 1110, 668, 809, 5653, 727, 11115, 7837, 3793, 348, 3596, 633, 
## 1886, 440, 1151, 780, 608, 510, 2039, 2491, 1202, 1709, 380, 
## 3140, 1006, 817, 7178, 1006, 1721, 13865, 1377, 817, 823, 920, 
## 922, 2688, 7428, 602, 699, 1712, 949, 608, 450, 600, 723, 894, 
## 1756, 8681, 3050, 268, 16587, 8427, 7259, 472, 2957, 605, 8474, 
## 833, 313, 1005, 589, 3121, 2212, 461, 1456, 355, 1040, 2929, 
## 544, 979, 497, 831, 1166, 1470, 1386, 6397, 244, 477, 2774, 787, 
## 1660, 810, 1561, 900, 3570, 2747, 1641, 2013, 2397, 1891, 3579, 
## 1286, 1756, 535, 740, 874, 1004, 2432, 962, 3073, 1611, 5152, 
## 499, 4350, 478, 695, 1464, 1107, 233, 1002, 578, 2286, 857, 1584, 
## 1742, 9239, 2618, 331, 6011, 610, 905, 1217, 594, 4255, 480, 
## 1576, 1310, 3500, 263, 1232, 3708, 586, 882, 1800, 279, 235, 
## 368, 325, 1321, 657, 1310, 2519, 2225, 947, 1879, 787, 2220, 
## 1563, 10634, 812, 1127, 2968, 465, 6040, 5891, 10706, 616, 860, 
## 12289, 1743, 379, 4778, 2324, 792, 2936, 2190, 776, 4522, 1496, 
## 910, 2308, 8256, 1603, 940, 943, 19315, 3821, 701, 838, 809, 
## 875, 1132, 2405, 1082, 13218, 5139, 21804, 516, 1025, 3712, 2088, 
## 1771, 696, 1966, 4996, 2302, 3586, 2227, 935, 560, 3304, 1777, 
## 3820, 5785, 2307, 2095, 3971, 213, 1046, 920, 833, 2519, 292, 
## 3294, 888, 876, 1910, 785, 489, 4216, 1680, 9402, 4019, 1109, 
## 584, 855, 1183, 2115, 4576, 936, 540, 464, 1003, 1016, 437, 4293, 
## 2925, 2281, 818, 385, 2540, 4301, 1093, 6118, 1047, 213, 1244, 
## 283, 3713, 1489, 191, 2643, 1340, 1243, 651, 450, 1557, 1768, 
## 3646, 13528, 15039, 12512, 7294, 8000, 5318, 7888, 4877, 8598, 
## 8399, 5549, 3150, 2119, 2096, 462, 10477, 257, 4414, 1769, 529, 
## 4095, 497, 4345, 592, 1500, 1154, 845, 759, 1262, 3058, 247, 
## 222, 2425, 626, 2267, 484, 3495, 4800, 1797, 3235, 15698, 682, 
## 6348, 6855, 9735, 681, 14446, 927, 576, 2096, 12445, 11220, 3580, 
## 14939, 1487, 8579, 1597, 4777, 4269, 14292, 14438, 3347, 7122, 
## 1458, 3844, 1877, 1618, 6277, 1209, 9750, 1757, 14596, 5191, 
## 6071, 4418, 4144, 4743, 12394, 8586, 1758, 4044, 9643, 5892, 
## 8766, 2306, 285, 4471, 848, 7693, 7589, 12229, 2379, 2850, 2057, 
## 374, 7473, 3281, 4217, 1445, 1712, 5095, 7663, 15849, 12749, 
## 1558, 2593, 910, 2409, 2029, 663, 1399, 325, 368, 2075, 7791, 
## 3550, 7759, 4963, 2996, 15712, 1470, 647, 800, 1416, 5661, 1231, 
## 3315, 6540, 1373, 1190, 665, 2895, 318, 1480, 1164, 1566, 1205, 
## 9167, 2702, 5548, 3100, 662, 996, 1432, 1738, 1681, 2139, 1631, 
## 663, 4186, 1239, 3325, 2320, 152, 1979, 2768, 2197, 2097, 10705, 
## 2989), Enroll = c(512, 137, 55, 158, 103, 489, 227, 172, 472, 
## 290, 385, 220, 418, 322, 1910, 3761, 166, 951, 3070, 257, 497, 
## 306, 295, 172, 206, 662, 111, 285, 531, 279, 2367, 348, 129, 
## 481, 284, 222, 114, 480, 200, 122, 104, 287, 156, 1025, 181, 
## 209, 3810, 418, 151, 1061, 740, 158, 1202, 292, 4615, 1462, 862, 
## 285, 700, 224, 213, 1650, 913, 298, 753, 449, 35, 489, 1191, 
## 352, 464, 713, 363, 505, 527, 616, 90, 210, 203, 401, 902, 1436, 
## 1007, 88, 288, 351, 91, 126, 224, 457, 167, 557, 250, 275, 295, 
## 456, 727, 317, 238, 177, 97, 177, 186, 122, 249, 278, 1217, 553, 
## 2194, 132, 871, 106, 144, 186, 149, 383, 876, 157, 185, 1087, 
## 452, 459, 533, 495, 487, 319, 244, 108, 328, 410, 314, 1303, 
## 265, 165, 1565, 227, 200, 366, 426, 345, 125, 858, 958, 1236, 
## 853, 167, 113, 438, 781, 452, 336, 338, 523, 1071, 374, 3023, 
## 126, 942, 1123, 819, 228, 446, 314, 146, 685, 678, 332, 237, 
## 294, 1727, 286, 1390, 2276, 1238, 127, 575, 284, 526, 221, 248, 
## 198, 176, 194, 432, 573, 326, 634, 104, 454, 328, 307, 1433, 
## 317, 806, 1606, 178, 262, 274, 347, 244, 500, 1349, 215, 176, 
## 624, 302, 127, 125, 124, 361, 262, 478, 2408, 471, 103, 5873, 
## 3441, 1368, 262, 691, 284, 911, 279, 137, 298, 148, 822, 408, 
## 174, 381, 142, 286, 622, 177, 271, 231, 224, 510, 425, 320, 1092, 
## 82, 204, 482, 363, 326, 356, 458, 217, 651, 724, 527, 212, 1525, 
## 719, 868, 363, 366, 223, 177, 428, 239, 563, 212, 1547, 342, 
## 1685, 199, 756, 117, 239, 176, 323, 153, 119, 187, 564, 376, 
## 891, 607, 3290, 1032, 225, 960, 189, 319, 496, 307, 1609, 380, 
## 913, 316, 1779, 103, 303, 722, 239, 330, 526, 126, 121, 159, 
## 95, 328, 113, 458, 462, 1190, 266, 483, 233, 467, 240, 3176, 
## 195, 308, 1610, 176, 1620, 1973, 2397, 385, 366, 1902, 626, 107, 
## 678, 370, 186, 669, 458, 351, 2181, 428, 450, 295, 1522, 504, 
## 385, 288, 3450, 680, 458, 159, 428, 207, 302, 1061, 302, 1153, 
## 973, 5874, 154, 297, 806, 362, 306, 169, 327, 936, 391, 730, 
## 437, 345, 270, 679, 382, 695, 499, 509, 514, 1921, 85, 284, 225, 
## 217, 776, 96, 956, 393, 367, 463, 295, 120, 736, 691, 2151, 888, 
## 386, 131, 139, 411, 494, 1000, 197, 165, 183, 295, 300, 211, 
## 591, 632, 1408, 447, 193, 994, 1166, 642, 3204, 511, 75, 352, 
## 97, 443, 375, 63, 465, 285, 414, 243, 194, 489, 380, 585, 1843, 
## 3087, 1724, 904, 1464, 1025, 1036, 814, 1143, 656, 853, 650, 
## 390, 465, 146, 2442, 109, 335, 437, 243, 1195, 215, 2604, 279, 
## 611, 253, 254, 244, 276, 478, 100, 91, 601, 145, 611, 177, 528, 
## 1515, 938, 2133, 2478, 204, 922, 2408, 2064, 246, 3252, 415, 
## 137, 694, 3623, 3320, 1627, 5705, 388, 3681, 226, 1823, 985, 
## 3409, 3816, 1006, 1643, 588, 1669, 823, 479, 3526, 265, 2529, 
## 394, 3331, 1500, 1449, 2049, 1853, 2233, 2464, 2503, 419, 688, 
## 1968, 756, 1243, 538, 208, 910, 377, 2328, 1876, 2477, 1292, 
## 1046, 828, 185, 3013, 1448, 1686, 326, 696, 2400, 1735, 2678, 
## 3343, 472, 1030, 342, 759, 1073, 192, 308, 86, 212, 520, 1499, 
## 653, 1477, 1567, 704, 4277, 287, 271, 256, 417, 903, 345, 425, 
## 2440, 724, 324, 226, 579, 130, 452, 478, 483, 278, 2738, 604, 
## 1549, 825, 184, 377, 548, 417, 344, 502, 434, 315, 526, 383, 
## 1301, 769, 75, 575, 682, 543, 695, 1317, 691), Top10perc = c(16, 
## 60, 16, 38, 17, 37, 30, 21, 37, 38, 44, 9, 83, 14, 20, 24, 46, 
## 12, 25, 12, 36, 21, 42, 16, 21, 30, 15, 50, 53, 18, 34, 23, 17, 
## 28, 26, 14, 25, 37, 16, 19, 11, 67, 25, 15, 15, 20, 45, 76, 5, 
## 30, 48, 12, 10, 22, 48, 87, 49, 32, 40, 8, 17, 47, 29, 25, 16, 
## 34, 10, 75, 60, 19, 27, 71, 9, 24, 9, 25, 12, 35, 1, 31, 6, 10, 
## 8, 8, 55, 23, 28, 25, 16, 30, 14, 35, 15, 29, 15, 58, 46, 29, 
## 14, 23, 11, 35, 23, 34, 23, 29, 68, 29, 29, 10, 78, 16, 30, 22, 
## 35, 33, 30, 18, 10, 87, 77, 15, 32, 50, 31, 10, 20, 4, 25, 9, 
## 33, 15, 10, 16, 12, 19, 17, 33, 36, 27, 23, 11, 14, 76, 43, 47, 
## 36, 14, 30, 1, 12, 18, 39, 42, 20, 50, 16, 30, 16, 13, 22, 3, 
## 20, 51, 56, 24, 18, 19, 27, 17, 30, 71, 89, 9, 12, 42, 10, 31, 
## 26, 40, 7, 10, 20, 56, 57, 18, 36, 30, 40, 34, 20, 25, 33, 35, 
## 90, 95, 22, 52, 35, 37, 25, 25, 26, 36, 37, 30, 26, 20, 3, 10, 
## 28, 42, 10, 55, 16, 25, 26, 23, 14, 10, 24, 75, 3, 10, 36, 16, 
## 5, 44, 10, 20, 34, 48, 20, 15, 31, 24, 15, 9, 21, 28, 40, 12, 
## 29, 35, 21, 15, 6, 48, 22, 17, 12, 20, 33, 22, 24, 25, 16, 3, 
## 6, 12, 21, 23, 20, 21, 9, 27, 36, 26, 39, 9, 21, 26, 13, 5, 16, 
## 25, 37, 25, 6, 30, 35, 42, 15, 22, 26, 32, 36, 36, 18, 19, 13, 
## 5, 15, 10, 23, 41, 16, 2, 47, 17, 12, 20, 16, 15, 37, 26, 30, 
## 29, 36, 27, 40, 65, 1, 39, 7, 30, 13, 19, 36, 23, 12, 29, 22, 
## 85, 8, 15, 50, 52, 56, 35, 36, 22, 29, 26, 31, 22, 37, 31, 20, 
## 41, 48, 86, 10, 11, 20, 7, 58, 10, 34, 90, 20, 29, 32, 22, 17, 
## 6, 21, 35, 47, 53, 58, 16, 27, 22, 11, 10, 31, 21, 26, 19, 15, 
## 10, 13, 21, 24, 12, 39, 20, 44, 26, 14, 16, 15, 23, 20, 34, 20, 
## 40, 20, 19, 60, 42, 28, 16, 24, 23, 10, 23, 27, 13, 25, 51, 18, 
## 20, 18, 13, 41, 12, 15, 13, 28, 44, 10, 47, 13, 5, 36, 42, 33, 
## 8, 17, 37, 51, 25, 16, 36, 27, 7, 17, 8, 6, 13, 56, 19, 9, 16, 
## 5, 27, 36, 28, 19, 30, 41, 22, 33, 27, 15, 19, 12, 15, 5, 57, 
## 14, 46, 19, 16, 62, 12, 20, 9, 49, 14, 24, 25, 85, 22, 68, 26, 
## 23, 44, 22, 24, 11, 35, 54, 43, 36, 52, 26, 25, 16, 16, 27, 22, 
## 12, 10, 42, 56, 26, 49, 18, 33, 19, 24, 32, 75, 15, 15, 23, 18, 
## 32, 85, 25, 27, 51, 12, 46, 56, 23, 21, 29, 14, 30, 29, 45, 8, 
## 20, 26, 45, 27, 19, 17, 46, 41, 27, 18, 74, 40, 20, 9, 14, 17, 
## 23, 10, 44, 21, 5, 49, 71, 53, 30, 18, 2, 29, 20, 17, 41, 10, 
## 75, 34, 68, 31, 6, 12, 17, 80, 40, 6, 12, 28, 31, 24, 7, 30, 
## 3, 20, 29, 56, 21, 35, 24, 15, 32, 81, 10, 20, 24, 17, 42, 49, 
## 4, 34, 95, 28), Books = c(750, 450, 800, 500, 500, 450, 300, 
## 660, 500, 600, 400, 450, 660, 900, 96, 700, 500, 450, 600, 540, 
## 540, 600, 400, 750, 400, 1000, 500, 750, 600, 600, 600, 400, 
## 300, 650, 355, 350, 600, 350, 550, 500, 600, 400, 500, 500, 450, 
## 400, 475, 1495, 500, 2000, 410, 500, 500, 560, 860, 720, 800, 
## 450, 500, 300, 425, 612, 400, 600, 400, 500, 300, 550, 450, 480, 
## 400, 525, 400, 526, 600, 570, 500, 600, 2340, 650, 500, 300, 
## 654, 600, 600, 600, 400, 700, 600, 500, 350, 700, 600, 500, 400, 
## 500, 500, 550, 400, 500, 630, 600, 450, 380, 800, 450, 600, 500, 
## 470, 400, 550, 450, 570, 400, 500, 550, 650, 680, 450, 550, 600, 
## 500, 500, 550, 595, 600, 400, 600, 450, 450, 500, 600, 530, 450, 
## 120, 600, 450, 600, 500, 450, 450, 490, 570, 600, 450, 475, 330, 
## 600, 700, 350, 500, 500, 750, 800, 400, 600, 350, 600, 500, 400, 
## 600, 500, 600, 630, 600, 500, 500, 500, 400, 580, 550, 670, 795, 
## 720, 500, 500, 531, 600, 600, 500, 400, 550, 400, 400, 525, 450, 
## 400, 500, 300, 450, 660, 600, 500, 700, 500, 700, 500, 500, 400, 
## 400, 600, 1000, 500, 450, 465, 500, 500, 300, 600, 350, 600, 
## 500, 537, 400, 500, 600, 640, 634, 400, 400, 400, 500, 500, 500, 
## 450, 500, 500, 750, 600, 400, 600, 400, 450, 750, 920, 500, 600, 
## 528, 400, 400, 750, 550, 600, 450, 600, 350, 1000, 500, 800, 
## 225, 550, 500, 500, 618, 600, 700, 500, 450, 500, 550, 450, 525, 
## 500, 450, 450, 470, 600, 600, 650, 575, 350, 500, 600, 600, 500, 
## 400, 400, 450, 540, 475, 500, 900, 1100, 450, 550, 550, 1000, 
## 600, 480, 600, 200, 400, 550, 500, 550, 250, 500, 850, 750, 500, 
## 550, 500, 500, 550, 630, 400, 600, 700, 550, 700, 600, 450, 500, 
## 600, 600, 450, 600, 400, 400, 620, 470, 450, 300, 759, 500, 400, 
## 575, 558, 600, 600, 550, 500, 800, 420, 500, 616, 660, 600, 630, 
## 450, 512, 500, 500, 400, 800, 400, 436, 598, 554, 675, 450, 570, 
## 610, 450, 1000, 600, 400, 370, 500, 1230, 550, 700, 450, 500, 
## 600, 500, 955, 800, 690, 400, 450, 350, 450, 400, 600, 500, 350, 
## 650, 800, 500, 400, 500, 600, 400, 600, 569, 612, 630, 1200, 
## 500, 600, 600, 500, 650, 350, 500, 475, 500, 550, 500, 500, 500, 
## 200, 450, 630, 221, 576, 400, 500, 600, 550, 600, 600, 660, 500, 
## 300, 630, 675, 700, 600, 450, 600, 450, 480, 700, 708, 600, 500, 
## 500, 550, 630, 620, 600, 550, 600, 500, 1125, 400, 500, 635, 
## 400, 1000, 450, 600, 650, 490, 600, 600, 400, 700, 600, 500, 
## 468, 500, 500, 550, 500, 450, 600, 500, 450, 518, 750, 500, 790, 
## 400, 500, 556, 700, 500, 530, 600, 400, 700, 630, 525, 687, 500, 
## 525, 700, 630, 530, 494, 550, 500, 500, 630, 600, 600, 700, 500, 
## 500, 900, 650, 600, 550, 600, 500, 450, 595, 765, 500, 400, 600, 
## 630, 500, 700, 500, 750, 600, 650, 500, 495, 500, 600, 500, 500, 
## 500, 500, 750, 500, 452, 450, 1200, 858, 500, 500, 708, 541, 
## 376, 550, 475, 600, 500, 500, 450, 600, 500, 630, 600, 400, 500, 
## 500, 740, 500, 500, 500, 585, 500, 400, 680, 800, 540, 500, 500, 
## 500, 600, 850, 600, 450, 500, 500, 500, 639, 500, 600, 430, 530, 
## 700, 500, 500, 550, 500, 500, 550, 300, 580, 500, 400, 530, 500, 
## 617, 630, 500), Terminal = c(30, 97, 72, 73, 93, 100, 84, 41, 
## 88, 84, 87, 84, 98, 66, 96, 93, 88, 60, 91, 65, 83, 70, 95, 67, 
## 68, 74, 77, 98, 93, 78, 76, 89, 85, 61, 95, 84, 59, 80, 68, 80, 
## 63, 79, 76, 68, 67, 71, 81, 96, 80, 81, 97, 80, 62, 73, 76, 100, 
## 97, 69, 81, 76, 96, 81, 81, 66, 92, 69, 50, 93, 93, 81, 62, 95, 
## 91, 96, 47, 52, 85, 92, 58, 90, 73, 80, 89, 62, 99, 80, 98, 71, 
## 81, 95, 71, 95, 67, 86, 77, 94, 98, 62, 46, 77, 80, 64, 87, 70, 
## 71, 76, 92, 96, 89, 87, 98, 76, 60, 64, 76, 76, 90, 68, 51, 99, 
## 97, 63, 97, 94, 94, 64, 72, 43, 60, 78, 92, 75, 62, 60, 71, 65, 
## 58, 89, 68, 98, 71, 74, 43, 98, 75, 80, 87, 78, 90, 75, 58, 60, 
## 90, 96, 74, 89, 55, 97, 97, 76, 69, 56, 76, 59, 95, 51, 58, 67, 
## 53, 96, 76, 92, 92, 89, 59, 95, 25, 90, 56, 90, 83, 80, 61, 92, 
## 65, 86, 89, 41, 96, 93, 97, 64, 84, 77, 97, 100, 63, 99, 80, 
## 95, 99, 90, 91, 88, 81, 90, 72, 76, 35, 44, 75, 88, 84, 92, 64, 
## 88, 88, 79, 53, 67, 68, 97, 91, 55, 97, 55, 73, 95, 94, 72, 68, 
## 95, 90, 83, 94, 65, 61, 77, 92, 84, 99, 88, 70, 98, 72, 72, 85, 
## 91, 93, 55, 80, 62, 74, 77, 88, 95, 81, 66, 65, 55, 45, 72, 92, 
## 97, 68, 97, 94, 79, 80, 93, 76, 90, 76, 68, 74, 64, 95, 82, 48, 
## 75, 89, 82, 33, 74, 69, 89, 46, 79, 77, 54, 54, 67, 83, 50, 92, 
## 74, 68, 60, 91, 55, 51, 50, 37, 82, 71, 87, 86, 76, 93, 98, 100, 
## 100, 71, 98, 77, 87, 83, 79, 72, 83, 78, 59, 68, 100, 74, 47, 
## 96, 93, 95, 78, 93, 64, 92, 68, 67, 64, 95, 78, 87, 76, 96, 98, 
## 100, 62, 71, 90, 90, 93, 85, 96, 74, 86, 75, 71, 73, 95, 80, 
## 97, 90, 98, 96, 92, 89, 79, 78, 54, 85, 81, 95, 73, 82, 75, 78, 
## 64, 60, 89, 90, 90, 94, 95, 74, 88, 100, 24, 75, 75, 93, 92, 
## 98, 56, 100, 85, 85, 84, 71, 65, 49, 81, 73, 73, 92, 97, 76, 
## 59, 65, 81, 88, 54, 75, 75, 56, 89, 56, 80, 91, 77, 89, 89, 83, 
## 45, 63, 90, 89, 79, 96, 97, 96, 83, 85, 78, 85, 90, 84, 93, 90, 
## 75, 100, 86, 99, 84, 70, 58, 61, 88, 93, 58, 75, 83, 94, 76, 
## 74, 91, 53, 96, 93, 78, 96, 70, 74, 88, 96, 78, 96, 89, 96, 58, 
## 99, 95, 95, 93, 87, 79, 98, 81, 97, 95, 87, 90, 56, 94, 68, 92, 
## 88, 92, 92, 90, 94, 78, 86, 88, 73, 92, 75, 87, 83, 93, 94, 85, 
## 94, 82, 90, 96, 93, 96, 86, 89, 89, 99, 86, 77, 83, 62, 88, 89, 
## 94, 65, 99, 80, 75, 92, 73, 100, 93, 96, 93, 90, 92, 94, 87, 
## 78, 81, 89, 94, 75, 85, 69, 41, 89, 98, 98, 90, 87, 63, 89, 81, 
## 65, 85, 78, 97, 91, 96, 87, 68, 58, 68, 98, 98, 68, 33, 81, 91, 
## 84, 79, 89, 79, 70, 78, 83, 96, 92, 86, 92, 80, 99, 81, 60, 80, 
## 48, 95, 94, 60, 75, 96, 75), Expend = c(10527, 19016, 10922, 
## 9727, 8861, 11487, 11644, 8991, 10932, 7940, 9305, 7355, 21424, 
## 10908, 5854, 4602, 14579, 4739, 6642, 7836, 9220, 6871, 11361, 
## 6523, 6136, 8086, 9337, 13894, 12580, 9084, 7503, 8954, 7550, 
## 7614, 12957, 5922, 16364, 10511, 7718, 8324, 7623, 8649, 8377, 
## 7041, 6259, 9073, 16836, 20447, 15387, 7671, 17150, 5935, 4900, 
## 8655, 7916, 20440, 13675, 6333, 9511, 7117, 7922, 8453, 7786, 
## 5391, 7972, 9557, 7976, 17960, 24386, 7666, 6716, 19733, 6318, 
## 12751, 7697, 6897, 9583, 9685, 13025, 8444, 4900, 5501, 6413, 
## 3365, 13118, 12692, 16095, 7729, 8129, 11951, 7963, 11659, 4417, 
## 10141, 8741, 15954, 15494, 8604, 6889, 10015, 9773, 7477, 7315, 
## 9447, 6084, 8996, 9534, 14140, 7850, 5015, 30639, 6875, 8415, 
## 7309, 10965, 10340, 22906, 6898, 8686, 29619, 17581, 5113, 12423, 
## 11525, 13861, 4525, 6852, 5480, 7325, 11178, 9303, 9825, 5619, 
## 8135, 5682, 10188, 6430, 15003, 9815, 7502, 8952, 6329, 12878, 
## 28457, 5527, 7527, 9552, 8355, 11220, 6972, 7967, 3930, 8923, 
## 6722, 6339, 7250, 4607, 10864, 4362, 5039, 6336, 6418, 6078, 
## 8095, 12940, 7711, 5664, 6786, 7136, 6751, 7508, 19635, 11271, 
## 7762, 7348, 14720, 6081, 9553, 10270, 16623, 6475, 7949, 8222, 
## 18979, 4957, 9114, 9907, 7483, 17761, 10296, 12263, 6791, 9248, 
## 6466, 37219, 21569, 7335, 8588, 12639, 12165, 13040, 10093, 13957, 
## 12434, 9284, 8187, 7703, 9466, 9249, 4923, 7348, 12851, 5569, 
## 9605, 7305, 8686, 8420, 9812, 3186, 8367, 8118, 56233, 6564, 
## 7840, 12067, 6535, 6195, 14067, 7863, 7649, 8954, 13224, 9084, 
## 9067, 15687, 4054, 5531, 6119, 8118, 8196, 14665, 8694, 6863, 
## 12999, 5177, 10912, 3480, 8747, 9268, 6374, 5553, 7578, 7547, 
## 4546, 9456, 13009, 8024, 8832, 7114, 10642, 4796, 7940, 10062, 
## 11291, 5801, 10223, 9982, 10819, 5358, 10502, 10622, 8317, 9034, 
## 6971, 6652, 6383, 8995, 6881, 3871, 7762, 7846, 7473, 5524, 7832, 
## 8128, 11218, 7140, 6170, 6223, 5704, 4172, 4333, 6324, 11989, 
## 9566, 8122, 8111, 6990, 18359, 5073, 7016, 6695, 6525, 8536, 
## 10613, 7215, 10888, 5972, 8797, 12529, 9241, 7788, 6562, 9670, 
## 10090, 8871, 6310, 10889, 6601, 6157, 6086, 6261, 6357, 26385, 
## 9209, 4948, 16593, 16196, 8568, 9979, 12011, 6578, 6433, 8802, 
## 6413, 7114, 10059, 9431, 9157, 11216, 8992, 16185, 4870, 7634, 
## 7895, 7652, 14329, 5967, 8354, 28320, 8135, 8604, 9315, 6880, 
## 8847, 7333, 11080, 16358, 15886, 15605, 13388, 11299, 9405, 7519, 
## 6422, 7957, 11561, 7252, 11878, 7120, 6719, 4629, 7440, 7344, 
## 7360, 4322, 8243, 8680, 18367, 11165, 5081, 10190, 8953, 8946, 
## 5125, 9533, 7930, 10872, 6874, 8545, 18372, 10368, 10175, 10080, 
## 10065, 7356, 9431, 14086, 7411, 7881, 14837, 21199, 5916, 4444, 
## 8219, 7498, 12726, 4718, 4632, 6591, 7818, 12995, 6860, 9988, 
## 9828, 9209, 9619, 10357, 7651, 8426, 9995, 10307, 12837, 7495, 
## 9075, 11177, 13705, 6632, 5280, 7511, 5563, 6442, 5716, 6608, 
## 6174, 6436, 8170, 10554, 18953, 14231, 7233, 5970, 8294, 6415, 
## 9158, 7002, 3605, 7936, 7744, 9186, 5096, 10219, 7603, 18034, 
## 11806, 9586, 14703, 4933, 10872, 6864, 15411, 5901, 16352, 6820, 
## 15934, 7683, 36854, 13889, 10178, 8731, 10650, 10891, 8767, 7780, 
## 14737, 7881, 12833, 8559, 6735, 9657, 10139, 10207, 7618, 9021, 
## 10276, 7462, 17500, 6716, 6971, 9699, 6433, 6813, 9718, 7855, 
## 7011, 15893, 7392, 6005, 5657, 6333, 10244, 25765, 13789, 9060, 
## 11217, 9158, 11984, 26037, 10074, 3864, 9131, 5035, 8246, 11020, 
## 17007, 4078, 5917, 8546, 7237, 8612, 4696, 4329, 18784, 11743, 
## 9275, 12646, 13597, 16527, 8488, 6254, 6490, 5968, 8745, 3733, 
## 12082, 7164, 4958, 9681, 23850, 17089, 10458, 11183, 5733, 8944, 
## 7804, 8050, 14904, 9006, 41766, 7735, 15736, 10912, 4550, 6563, 
## 6951, 21409, 13935, 17858, 4249, 8080, 10026, 5983, 4599, 7203, 
## 4222, 7925, 8596, 11916, 22704, 11778, 9603, 8543, 7885, 22014, 
## 7264, 5318, 6729, 8960, 10414, 10774, 4469, 8323, 40386, 4509
## ), Grad.Rate = c(56, 59, 15, 55, 63, 73, 80, 52, 73, 74, 68, 
## 69, 100, 46, 70, 48, 54, 48, 69, 58, 71, 69, 71, 48, 65, 85, 
## 71, 79, 91, 72, 72, 73, 52, 49, 69, 58, 55, 63, 48, 56, 35, 72, 
## 51, 75, 53, 58, 72, 96, 46, 85, 84, 49, 18, 82, 33, 97, 93, 78, 
## 83, 71, 55, 59, 81, 49, 64, 83, 52, 91, 74, 79, 67, 67, 79, 75, 
## 118, 64, 24, 66, 47, 67, 49, 50, 51, 58, 74, 47, 52, 73, 63, 
## 79, 74, 77, 46, 67, 75, 91, 93, 96, 100, 83, 43, 75, 77, 78, 
## 64, 72, 93, 69, 59, 37, 99, 42, 55, 75, 75, 64, 85, 46, 54, 98, 
## 94, 58, 81, 82, 87, 46, 60, 54, 87, 42, 67, 42, 38, 54, 76, 82, 
## 70, 59, 81, 64, 86, 63, 44, 96, 50, 67, 53, 68, 94, 24, 22, 69, 
## 57, 66, 68, 58, 62, 80, 46, 43, 83, 51, 62, 54, 82, 65, 29, 74, 
## 52, 46, 55, 95, 70, 34, 76, 83, 36, 69, 72, 77, 76, 39, 60, 83, 
## 100, 65, 80, 96, 91, 65, 69, 70, 64, 73, 100, 100, 52, 63, 79, 
## 79, 79, 60, 72, 72, 72, 67, 44, 47, 21, 84, 52, 56, 54, 83, 69, 
## 68, 65, 75, 54, 26, 75, 90, 36, 56, 80, 68, 61, 88, 51, 87, 65, 
## 79, 84, 75, 77, 57, 60, 51, 94, 85, 91, 58, 56, 69, 53, 45, 100, 
## 81, 92, 63, 62, 70, 59, 45, 53, 65, 72, 70, 51, 59, 55, 64, 79, 
## 70, 68, 96, 77, 90, 84, 77, 68, 51, 66, 51, 52, 32, 91, 82, 59, 
## 89, 85, 65, 49, 71, 64, 58, 67, 61, 53, 63, 100, 27, 37, 15, 
## 74, 83, 56, 60, 84, 44, 78, 64, 21, 80, 72, 81, 83, 52, 61, 72, 
## 34, 65, 57, 62, 52, 76, 42, 74, 76, 41, 56, 58, 68, 92, 63, 33, 
## 83, 79, 67, 83, 75, 45, 48, 87, 65, 37, 62, 83, 69, 42, 63, 66, 
## 44, 48, 54, 66, 62, 35, 85, 99, 96, 67, 58, 80, 86, 47, 74, 68, 
## 68, 70, 77, 70, 72, 79, 68, 61, 90, 51, 58, 82, 97, 58, 48, 69, 
## 67, 85, 83, 76, 67, 98, 78, 84, 55, 98, 56, 61, 41, 100, 55, 
## 52, 73, 66, 89, 64, 71, 58, 47, 79, 70, 48, 81, 90, 45, 53, 43, 
## 43, 72, 71, 56, 51, 52, 67, 89, 65, 78, 40, 78, 63, 72, 45, 59, 
## 73, 79, 97, 74, 56, 57, 49, 63, 42, 53, 66, 76, 53, 65, 59, 46, 
## 90, 61, 67, 53, 46, 98, 43, 64, 50, 10, 43, 84, 60, 39, 70, 65, 
## 91, 96, 78, 93, 52, 65, 64, 88, 56, 33, 39, 66, 57, 90, 54, 71, 
## 63, 75, 51, 45, 77, 66, 63, 54, 81, 69, 57, 47, 31, 55, 63, 68, 
## 56, 59, 51, 53, 49, 48, 53, 64, 75, 37, 83, 53, 55, 35, 77, 44, 
## 93, 66, 72, 63, 63, 100, 80, 62, 43, 92, 48, 63, 47, 68, 38, 
## 45, 89, 62, 53, 29, 50, 82, 47, 37, 79, 95, 65, 53, 65, 36, 46, 
## 45, 78, 79, 68, 40, 95, 83, 90, 96, 45, 31, 73, 68, 73, 72, 75, 
## 89, 67, 90, 56, 52, 63, 48, 91, 69, 64, 60, 67, 60, 55, 52, 61, 
## 65, 62, 80, 85, 71, 52, 63, 67, 59, 99, 91, 58, 59, 50, 78, 82, 
## 40, 49, 99, 99), .outcome = c(12280, 12960, 7560, 13500, 13290, 
## 13868, 15595, 10468, 16548, 9690, 12572, 8700, 19760, 9996, 6806, 
## 7434, 8644, 3460, 6300, 11902, 13353, 10990, 11280, 9925, 8620, 
## 10995, 9690, 19264, 17926, 11290, 6450, 12850, 9000, 7800, 16304, 
## 9550, 21700, 8050, 8740, 8540, 6200, 11660, 6500, 7844, 7150, 
## 9900, 18420, 19030, 14080, 10870, 19380, 9592, 4371, 10265, 2340, 
## 19528, 18550, 13306, 13130, 10518, 8900, 7380, 10230, 6060, 10750, 
## 13050, 8400, 19292, 17900, 12200, 8150, 15700, 7656, 13712, 9384, 
## 7344, 11400, 8950, 11230, 10938, 5962, 4620, 7242, 8300, 11850, 
## 16624, 13500, 10335, 9300, 17500, 10740, 15960, 7168, 13925, 
## 9888, 18930, 19510, 10860, 9800, 11790, 12600, 11180, 12224, 
## 10900, 9990, 11844, 11720, 16240, 8412, 8294, 18624, 6900, 10800, 
## 9216, 12050, 15248, 10628, 8920, 9130, 19545, 17295, 4528, 16900, 
## 14300, 18700, 4486, 9570, 8310, 9800, 9000, 8730, 5800, 4950, 
## 11190, 5710, 9650, 8770, 15360, 14190, 14990, 11800, 9100, 7800, 
## 17600, 5401, 10485, 10955, 6297, 15000, 6806, 9400, 5120, 13900, 
## 6597, 8025, 6680, 8390, 14235, 6198, 5840, 9650, 13320, 5500, 
## 9900, 13440, 10970, 8180, 9476, 12500, 10800, 8100, 18300, 6489, 
## 6744, 9150, 19964, 6120, 13000, 9420, 15588, 11750, 8330, 10310, 
## 15688, 5224, 13404, 14125, 11000, 19700, 13252, 13218, 7161, 
## 8200, 5504, 18485, 17230, 9376, 8800, 11090, 14067, 19029, 11600, 
## 13470, 13960, 12275, 9990, 8080, 9950, 7260, 7800, 8050, 14550, 
## 7799, 14360, 10000, 9766, 7550, 14424, 7620, 3946, 6398, 18800, 
## 7656, 9414, 14850, 8400, 7870, 19240, 9600, 10910, 8664, 15747, 
## 12600, 6987, 16880, 9400, 5170, 4938, 11040, 13850, 18700, 11700, 
## 8840, 15800, 5950, 4818, 9200, 13380, 4400, 7352, 7920, 11200, 
## 5150, 3957, 11100, 11500, 13900, 12450, 7320, 9620, 9858, 10440, 
## 12370, 14700, 4300, 13850, 11610, 11200, 6490, 11510, 10200, 
## 11200, 11040, 4486, 7680, 6930, 11985, 6720, 5016, 10300, 8856, 
## 8127, 6840, 7844, 8200, 11320, 11505, 5580, 9866, 4386, 3840, 
## 8550, 5552, 8438, 14990, 7050, 10520, 4515, 19300, 6844, 8950, 
## 10500, 9900, 12850, 12474, 12250, 16975, 4738, 10850, 8832, 5376, 
## 10320, 5542, 8400, 8242, 11718, 5834, 12580, 4856, 6746, 7799, 
## 9840, 9900, 16404, 14134, 9990, 19670, 16560, 12900, 15990, 16732, 
## 6400, 5336, 12888, 6530, 8530, 11000, 13312, 11925, 14210, 10645, 
## 18200, 2580, 8640, 10178, 9700, 16200, 4290, 11859, 19900, 14400, 
## 9556, 11020, 10100, 12030, 4449, 13840, 13970, 19960, 17475, 
## 15200, 13250, 13425, 9490, 8734, 12520, 16425, 4356, 7410, 11070, 
## 12950, 4259, 8670, 10880, 11200, 9985, 12750, 12200, 11690, 12730, 
## 10800, 13030, 10860, 10575, 5130, 8236, 8384, 13584, 8325, 8955, 
## 17238, 12669, 12825, 12000, 11240, 7210, 9240, 16160, 11250, 
## 8990, 18710, 18820, 4680, 3738, 9520, 5472, 12772, 7070, 4740, 
## 4285, 7200, 11850, 8400, 7000, 10456, 11550, 13332, 6800, 8678, 
## 5000, 13900, 12315, 16900, 12170, 6550, 6550, 6550, 6550, 6550, 
## 6550, 6550, 6550, 6550, 6550, 6550, 6840, 6550, 16130, 14500, 
## 15150, 7850, 5666, 10965, 4860, 8490, 7850, 7860, 6400, 7070, 
## 11172, 7600, 10900, 9456, 18810, 11412, 11010, 12240, 7700, 6735, 
## 7800, 18732, 6874, 4440, 5028, 12024, 9500, 18930, 8907, 11656, 
## 10760, 10220, 11130, 10430, 11800, 7090, 5697, 4460, 7560, 11120, 
## 6994, 13540, 6540, 8594, 8723, 8566, 6919, 16500, 9843, 4916, 
## 9057, 7246, 5595, 11450, 11180, 5972, 8400, 8677, 7558, 4104, 
## 6197, 5173, 17020, 10786, 12040, 16230, 10330, 14500, 17840, 
## 13226, 3687, 11584, 5800, 8074, 6760, 17230, 4973, 4652, 11712, 
## 8550, 5764, 4422, 4104, 15350, 11750, 6857, 15516, 12212, 8199, 
## 6172, 6704, 7032, 6900, 5988, 8840, 14900, 9600, 4286, 11800, 
## 17865, 18920, 15925, 10217, 5587, 10260, 10900, 9140, 12925, 
## 13500, 13850, 11600, 13750, 8200, 2700, 8840, 9160, 18345, 14900, 
## 9850, 4470, 14200, 14510, 6940, 5918, 8124, 5542, 10720, 12065, 
## 11480, 18460, 16249, 12350, 11150, 10060, 19629, 7820, 4200, 
## 6400, 9100, 15948, 15884, 6797, 6900, 19840, 4990)), .Names = c("Apps", 
## "Enroll", "Top10perc", "Books", "Terminal", "Expend", "Grad.Rate", 
## ".outcome"), row.names = c("Adelphi.University", "Agnes.Scott.College", 
## "Alaska.Pacific.University", "Albertson.College", "Albertus.Magnus.College", 
## "Albion.College", "Albright.College", "Alderson.Broaddus.College", 
## "Alfred.University", "Allentown.Coll..of.St..Francis.de.Sales", 
## "Alma.College", "American.International.College", "Amherst.College", 
## "Andrews.University", "Appalachian.State.University", "Arizona.State.University.Main.campus", 
## "Arkansas.College..Lyon.College.", "Arkansas.Tech.University", 
## "Auburn.University.Main.Campus", "Augsburg.College", "Augustana.College.IL", 
## "Augustana.College", "Austin.College", "Averett.College", "Baker.University", 
## "Baldwin.Wallace.College", "Barat.College", "Bard.College", "Barnard.College", 
## "Barry.University", "Baylor.University", "Beaver.College", "Belmont.Abbey.College", 
## "Belmont.University", "Beloit.College", "Benedictine.College", 
## "Bennington.College", "Berry.College", "Bethany.College", "Bethel.College.KS", 
## "Bethel.College", "Birmingham.Southern.College", "Blackburn.College", 
## "Bloomsburg.Univ..of.Pennsylvania", "Bluefield.College", "Bluffton.College", 
## "Boston.University", "Bowdoin.College", "Bradford.College", "Bradley.University", 
## "Brandeis.University", "Brenau.University", "Brewton.Parker.College", 
## "Bridgewater.College", "Brigham.Young.University.at.Provo", "Brown.University", 
## "Bucknell.University", "Buena.Vista.College", "Butler.University", 
## "Cabrini.College", "Caldwell.College", "California.Polytechnic.San.Luis", 
## "Calvin.College", "Campbellsville.College", "Canisius.College", 
## "Capital.University", "Capitol.College", "Carleton.College", 
## "Carnegie.Mellon.University", "Carroll.College", "Carson.Newman.College", 
## "Case.Western.Reserve.University", "Castleton.State.College", 
## "Catholic.University.of.America", "Cazenovia.College", "Cedarville.College", 
## "Centenary.College", "Centenary.College.of.Louisiana", "Center.for.Creative.Studies", 
## "Central.College", "Central.Connecticut.State.University", "Central.Missouri.State.University", 
## "Central.Washington.University", "Central.Wesleyan.College", 
## "Centre.College", "Chapman.University", "Chatham.College", "Chestnut.Hill.College", 
## "Christian.Brothers.University", "Clark.University", "Clarke.College", 
## "Clarkson.University", "Clinch.Valley.Coll..of..the.Univ..of.Virginia", 
## "Coe.College", "Coker.College", "Colby.College", "Colgate.University", 
## "College.Misericordia", "College.of.Mount.St..Joseph", "College.of.Mount.St..Vincent", 
## "College.of.Notre.Dame", "College.of.Notre.Dame.of.Maryland", 
## "College.of.Saint.Catherine", "College.of.Saint.Elizabeth", "College.of.Saint.Rose", 
## "College.of.St..Scholastica", "College.of.William.and.Mary", 
## "College.of.Wooster", "Colorado.State.University", "Columbia.College.MO", 
## "Columbia.University", "Concordia.Lutheran.College", "Concordia.University.CA", 
## "Concordia.University", "Converse.College", "Cornell.College", 
## "Creighton.University", "D.Youville.College", "Dana.College", 
## "Dartmouth.College", "Davidson.College", "Delta.State.University", 
## "Denison.University", "DePauw.University", "Dickinson.College", 
## "Dickinson.State.University", "Doane.College", "Dominican.College.of.Blauvelt", 
## "Dordt.College", "Dowling.College", "Drury.College", "East.Tennessee.State.University", 
## "East.Texas.Baptist.University", "Eastern.College", "Eastern.Illinois.University", 
## "Eastern.Mennonite.College", "Eastern.Nazarene.College", "Eckerd.College", 
## "Elizabethtown.College", "Elmira.College", "Elms.College", "Elon.College", 
## "Embry.Riddle.Aeronautical.University", "Emory.University", "Emporia.State.University", 
## "Erskine.College", "Eureka.College", "Evergreen.State.College", 
## "Fairfield.University", "Fayetteville.State.University", "Ferrum.College", 
## "Flagler.College", "Florida.Institute.of.Technology", "Florida.International.University", 
## "Florida.Southern.College", "Florida.State.University", "Fontbonne.College", 
## "Fordham.University", "Fort.Lewis.College", "Francis.Marion.University", 
## "Franciscan.University.of.Steubenville", "Franklin.Pierce.College", 
## "Freed.Hardeman.University", "Fresno.Pacific.College", "Furman.University", 
## "Gannon.University", "Gardner.Webb.University", "Geneva.College", 
## "George.Fox.College", "George.Mason.University", "Georgetown.College", 
## "Georgetown.University", "Georgia.Institute.of.Technology", "Georgia.State.University", 
## "Georgian.Court.College", "Gettysburg.College", "Goldey.Beacom.College", 
## "Gonzaga.University", "Goshen.College", "Goucher.College", "Green.Mountain.College", 
## "Greensboro.College", "Greenville.College", "Grinnell.College", 
## "Grove.City.College", "Guilford.College", "Gustavus.Adolphus.College", 
## "Gwynedd.Mercy.College", "Hamilton.College", "Hamline.University", 
## "Hampden...Sydney.College", "Hampton.University", "Hanover.College", 
## "Harding.University", "Harvard.University", "Harvey.Mudd.College", 
## "Hastings.College", "Hendrix.College", "Hillsdale.College", "Hiram.College", 
## "Hobart.and.William.Smith.Colleges", "Hofstra.University", "Hollins.College", 
## "Hood.College", "Hope.College", "Houghton.College", "Huntingdon.College", 
## "Huntington.College", "Huron.University", "Husson.College", "Illinois.College", 
## "Illinois.Institute.of.Technology", "Illinois.State.University", 
## "Illinois.Wesleyan.University", "Immaculata.College", "Indiana.University.at.Bloomington", 
## "Iowa.State.University", "Ithaca.College", "Jamestown.College", 
## "Jersey.City.State.College", "John.Brown.University", "Johns.Hopkins.University", 
## "Johnson.State.College", "Judson.College", "Juniata.College", 
## "Kansas.Wesleyan.University", "Keene.State.College", "Kenyon.College", 
## "Keuka.College", "King.s.College", "King.College", "Knox.College", 
## "La.Salle.University", "LaGrange.College", "Lake.Forest.College", 
## "Lakeland.College", "Lambuth.University", "Lander.University", 
## "Le.Moyne.College", "Lebanon.Valley.College", "Lehigh.University", 
## "Lesley.College", "LeTourneau.University", "Lewis.and.Clark.College", 
## "Lincoln.Memorial.University", "Lincoln.University", "Lindenwood.College", 
## "Linfield.College", "Livingstone.College", "Lock.Haven.University.of.Pennsylvania", 
## "Longwood.College", "Loras.College", "Louisiana.College", "Louisiana.Tech.University", 
## "Loyola.University", "Loyola.University.Chicago", "Lycoming.College", 
## "Lynchburg.College", "Lyndon.State.College", "MacMurray.College", 
## "Malone.College", "Manchester.College", "Manhattan.College", 
## "Manhattanville.College", "Mankato.State.University", "Marietta.College", 
## "Marquette.University", "Mary.Baldwin.College", "Mary.Washington.College", 
## "Marymount.College.Tarrytown", "Marymount.Manhattan.College", 
## "Maryville.College", "Marywood.College", "Mayville.State.University", 
## "McKendree.College", "McMurry.University", "Mercer.University", 
## "Meredith.College", "Mesa.State.College", "Messiah.College", 
## "Miami.University.at.Oxford", "Michigan.Technological.University", 
## "MidAmerica.Nazarene.College", "Millersville.University.of.Penn.", 
## "Milligan.College", "Millsaps.College", "Milwaukee.School.of.Engineering", 
## "Mississippi.College", "Mississippi.State.University", "Mississippi.University.for.Women", 
## "Missouri.Southern.State.College", "Missouri.Valley.College", 
## "Montana.State.University", "Montreat.Anderson.College", "Moravian.College", 
## "Morehouse.College", "Morningside.College", "Morris.College", 
## "Mount.Holyoke.College", "Mount.Marty.College", "Mount.Mary.College", 
## "Mount.Mercy.College", "Mount.Saint.Clare.College", "Mount.Saint.Mary.s.College", 
## "Mount.St..Mary.s.College", "Mount.Union.College", "Muhlenberg.College", 
## "Murray.State.University", "Nazareth.College.of.Rochester", "New.Jersey.Institute.of.Technology", 
## "New.Mexico.Institute.of.Mining.and.Tech.", "Niagara.University", 
## "North.Adams.State.College", "North.Carolina.State.University.at.Raleigh", 
## "North.Carolina.Wesleyan.College", "North.Central.College", "North.Dakota.State.University", 
## "North.Park.College", "Northeast.Missouri.State.University", 
## "Northern.Arizona.University", "Northern.Illinois.University", 
## "Northwest.Nazarene.College", "Northwestern.College", "Northwestern.University", 
## "Norwich.University", "Notre.Dame.College", "Oberlin.College", 
## "Occidental.College", "Oglethorpe.University", "Ohio.Northern.University", 
## "Ohio.Wesleyan.University", "Oklahoma.Christian.University", 
## "Oklahoma.State.University", "Otterbein.College", "Ouachita.Baptist.University", 
## "Our.Lady.of.the.Lake.University", "Pace.University", "Pacific.Lutheran.University", 
## "Pacific.Union.College", "Pacific.University", "Pennsylvania.State.Univ..Main.Campus", 
## "Pepperdine.University", "Peru.State.College", "Pfeiffer.College", 
## "Point.Loma.Nazarene.College", "Point.Park.College", "Polytechnic.University", 
## "Prairie.View.A..and.M..University", "Presbyterian.College", 
## "Princeton.University", "Providence.College", "Purdue.University.at.West.Lafayette", 
## "Queens.College", "Quincy.University", "Quinnipiac.College", 
## "Ramapo.College.of.New.Jersey", "Randolph.Macon.College", "Randolph.Macon.Woman.s.College", 
## "Reed.College", "Rensselaer.Polytechnic.Institute", "Rhodes.College", 
## "Rider.University", "Roanoke.College", "Rockhurst.College", "Rocky.Mountain.College", 
## "Roger.Williams.University", "Rollins.College", "Rowan.College.of.New.Jersey", 
## "Rutgers.State.University.at.Newark", "Sacred.Heart.University", 
## "Saint.Anselm.College", "Saint.Cloud.State.University", "Saint.Francis.College.IN", 
## "Saint.Francis.College", "Saint.Joseph.s.College.IN", "Saint.Joseph.s.College", 
## "Saint.Joseph.s.University", "Saint.Joseph.College", "Saint.Louis.University", 
## "Saint.Mary.s.College", "Saint.Mary.s.College.of.Minnesota", 
## "Saint.Michael.s.College", "Saint.Xavier.University", "Salem.Teikyo.University", 
## "Salisbury.State.University", "Samford.University", "San.Diego.State.University", 
## "Santa.Clara.University", "Savannah.Coll..of.Art.and.Design", 
## "Schreiner.College", "Scripps.College", "Seattle.Pacific.University", 
## "Seattle.University", "Seton.Hall.University", "Seton.Hill.College", 
## "Shorter.College", "Siena.Heights.College", "Simmons.College", 
## "Simpson.College", "Sioux.Falls.College", "Skidmore.College", 
## "Smith.College", "Southeast.Missouri.State.University", "Southeastern.Oklahoma.State.Univ.", 
## "Southern.California.College", "Southern.Illinois.University.at.Edwardsville", 
## "Southern.Methodist.University", "Southwest.Baptist.University", 
## "Southwest.Missouri.State.University", "Southwest.State.University", 
## "Southwestern.College", "Southwestern.University", "Spalding.University", 
## "Spelman.College", "St..Bonaventure.University", "St..Martin.s.College", 
## "St..Mary.s.College.of.California", "St..Mary.s.College.of.Maryland", 
## "St..Mary.s.University.of.San.Antonio", "St..Paul.s.College", 
## "Stephens.College", "Stetson.University", "Stevens.Institute.of.Technology", 
## "Stonehill.College", "SUNY.at.Albany", "SUNY.at.Buffalo", "SUNY.at.Stony.Brook", 
## "SUNY.College..at.Brockport", "SUNY.College..at.Oswego", "SUNY.College.at.Buffalo", 
## "SUNY.College.at.Cortland", "SUNY.College.at.Fredonia", "SUNY.College.at.Geneseo", 
## "SUNY.College.at.New.Paltz", "SUNY.College.at.Plattsburgh", "SUNY.College.at.Potsdam", 
## "SUNY.College.at.Purchase", "Susquehanna.University", "Sweet.Briar.College", 
## "Syracuse.University", "Tabor.College", "Talladega.College", 
## "Taylor.University", "Texas.A.M.University.at.Galveston", "Texas.Christian.University", 
## "Texas.Lutheran.College", "Texas.Southern.University", "Texas.Wesleyan.University", 
## "The.Citadel", "Thiel.College", "Tiffin.University", "Transylvania.University", 
## "Tri.State.University", "Trinity.College.CT", "Trinity.College.DC", 
## "Trinity.College.VT", "Trinity.University", "Tusculum.College", 
## "Tuskegee.University", "Union.College.KY", "Union.College.NY", 
## "Univ..of.Wisconsin.at.OshKosh", "University.of.Alabama.at.Birmingham", 
## "University.of.Arkansas.at.Fayetteville", "University.of.California.at.Irvine", 
## "University.of.Charleston", "University.of.Chicago", "University.of.Cincinnati", 
## "University.of.Connecticut.at.Storrs", "University.of.Dallas", 
## "University.of.Delaware", "University.of.Detroit.Mercy", "University.of.Dubuque", 
## "University.of.Evansville", "University.of.Florida", "University.of.Georgia", 
## "University.of.Hawaii.at.Manoa", "University.of.Illinois...Urbana", 
## "University.of.Indianapolis", "University.of.Kansas", "University.of.La.Verne", 
## "University.of.Louisville", "University.of.Maryland.at.Baltimore.County", 
## "University.of.Maryland.at.College.Park", "University.of.Massachusetts.at.Amherst", 
## "University.of.Massachusetts.at.Dartmouth", "University.of.Miami", 
## "University.of.Minnesota.at.Morris", "University.of.Mississippi", 
## "University.of.Missouri.at.Rolla", "University.of.Missouri.at.Saint.Louis", 
## "University.of.Nebraska.at.Lincoln", "University.of.New.England", 
## "University.of.New.Hampshire", "University.of.North.Carolina.at.Asheville", 
## "University.of.North.Carolina.at.Chapel.Hill", "University.of.North.Carolina.at.Greensboro", 
## "University.of.North.Carolina.at.Wilmington", "University.of.North.Texas", 
## "University.of.Northern.Iowa", "University.of.Oklahoma", "University.of.Pennsylvania", 
## "University.of.Pittsburgh.Main.Campus", "University.of.Portland", 
## "University.of.Puget.Sound", "University.of.Rhode.Island", "University.of.Richmond", 
## "University.of.Rochester", "University.of.San.Francisco", "University.of.Sci..and.Arts.of.Oklahoma", 
## "University.of.Scranton", "University.of.South.Carolina.at.Aiken", 
## "University.of.South.Carolina.at.Columbia", "University.of.South.Florida", 
## "University.of.Southern.California", "University.of.Southern.Indiana", 
## "University.of.Southern.Mississippi", "University.of.St..Thomas.MN", 
## "University.of.St..Thomas.TX", "University.of.Tennessee.at.Knoxville", 
## "University.of.Texas.at.Arlington", "University.of.Texas.at.San.Antonio", 
## "University.of.the.South", "University.of.Tulsa", "University.of.Utah", 
## "University.of.Vermont", "University.of.Virginia", "University.of.Washington", 
## "University.of.West.Florida", "University.of.Wisconsin.Stout", 
## "University.of.Wisconsin.Superior", "University.of.Wisconsin.at.Green.Bay", 
## "University.of.Wyoming", "Upper.Iowa.University", "Ursinus.College", 
## "Ursuline.College", "Valley.City.State.University", "Valparaiso.University", 
## "Vanderbilt.University", "Vassar.College", "Villanova.University", 
## "Virginia.Commonwealth.University", "Virginia.State.University", 
## "Virginia.Tech", "Virginia.Wesleyan.College", "Viterbo.College", 
## "Wabash.College", "Wagner.College", "Wake.Forest.University", 
## "Wartburg.College", "Washington.and.Lee.University", "Washington.State.University", 
## "Wayne.State.College", "Waynesburg.College", "Webster.University", 
## "Wellesley.College", "Wells.College", "Wentworth.Institute.of.Technology", 
## "West.Liberty.State.College", "West.Virginia.Wesleyan.College", 
## "Western.Maryland.College", "Western.Michigan.University", "Western.State.College.of.Colorado", 
## "Western.Washington.University", "Westfield.State.College", "Westminster.College.MO", 
## "Westminster.College", "Wheaton.College.IL", "Westminster.College.PA", 
## "Whittier.College", "Widener.University", "Wilkes.University", 
## "William.Jewell.College", "Williams.College", "Wingate.College", 
## "Winona.State.University", "Winthrop.University", "Wisconsin.Lutheran.College", 
## "Wittenberg.University", "Worcester.Polytechnic.Institute", "Worcester.State.College", 
## "Xavier.University.of.Louisiana", "Yale.University", "York.College.of.Pennsylvania"
## ), class = "data.frame"))
## Deviance Residuals:
##     Min      1Q  Median      3Q     Max 
## -7781.2 -1165.4   135.2  1323.6  7582.1 
## 
## (Dispersion Parameter for gaussian family taken to be 4496568)
## 
##     Null Deviance: 9454415304 on 583 degrees of freedom
## Residual Deviance: 2495596522 on 555.0003 degrees of freedom
## AIC: 10633.77 
## 
## Number of Local Scoring Iterations: 2 
## 
## Anova for Parametric Effects
##                       Df     Sum Sq    Mean Sq  F value Pr(>F)    
## s(Terminal, df = 4)    1 1036026604 1036026604 230.4038 <2e-16 ***
## s(Top10perc, df = 4)   1 1703782767 1703782767 378.9073 <2e-16 ***
## s(Grad.Rate, df = 4)   1  863561464  863561464 192.0490 <2e-16 ***
## s(Books, df = 4)       1    4913517    4913517   1.0927 0.2963    
## s(Enroll, df = 4)      1  525253363  525253363 116.8121 <2e-16 ***
## s(Apps, df = 4)        1  556944262  556944262 123.8598 <2e-16 ***
## s(Expend, df = 4)      1 1094524665 1094524665 243.4133 <2e-16 ***
## Residuals            555 2495596522    4496568                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Anova for Nonparametric Effects
##                      Npar Df Npar F     Pr(F)    
## (Intercept)                                      
## s(Terminal, df = 4)        3  1.570  0.195493    
## s(Top10perc, df = 4)       3  3.712  0.011543 *  
## s(Grad.Rate, df = 4)       3  5.901  0.000573 ***
## s(Books, df = 4)           3  4.692  0.003024 ** 
## s(Enroll, df = 4)          3 16.460 2.934e-10 ***
## s(Apps, df = 4)            3 11.015 4.921e-07 ***
## s(Expend, df = 4)          3 42.436 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With the exception of the intercept, all variables seem to have a non-linear relationship with the response variable.

plot(testing$Outstate~testing$Top10perc)

plot(testing$Outstate~testing$Grad.Rate)

plot(testing$Outstate~testing$Books)

plot(testing$Outstate~testing$Enroll)

plot(testing$Outstate~testing$Apps)

plot(testing$Outstate~testing$Expend)

As seen in these graphs, none of the chosen variables have a linear relationship with the outcome variable.

Question 11

In Section 7.7, it was mentioned that GAMs are generally fit using a backfitting approach. The idea behind backfitting is actually quite simple. We will now explore backfitting in the context of multiple linear regression.

Suppose that we would like to perform multiple linear regression, but we do not have software to do so. Instead, we only have software to perform simple linear regression. Therefore, we take the following iterative approach: we repeatedly hold all but one coefficient estimate fixed at its current value, and update only that coefficient estimate using a simple linear regression. The process is continued until convergence-that is, until the coefficient estimates stop changing.

We now try this out on a toy example.

  1. Generate a response Y and two predictors \(X_1\) and \(X_2\), with \(n\) = 100.

  2. Initialize \(\hat\beta_1\) to take on a value of your choice. It does not matter what value you choose.

set.seed(1)
y=rnorm(100)
x1=rnorm(100)
x2=rnorm(100)
beta1=2.34
  1. Keeping \(\hat\beta_1\) fixed, fit the model

Y-\(\hat\beta_1X_1\)=\(\beta_0\)+\(\beta_2X_2\)+\(\epsilon\)

You can do this as follows:

a=y-beta1*x1
beta2=lm(a~x2)$coef[2]
a
##   [1]  0.82520421  0.08509218  1.29592805  1.22549347  1.86123584
##   [6] -4.95592059 -1.18966644 -1.39148299 -0.32321239 -4.24168042
##  [11]  2.99940447  1.47009191 -3.97278102 -0.69207042  1.61020186
##  [16]  0.87423695  0.73259305  1.59696134 -0.33517950  1.00885465
##  [21]  2.10291783 -2.36057455  0.57668080 -1.56918942  0.85427208
##  [26] -1.72376790  0.01634520 -1.38268842  1.11693547  1.17673400
##  [31]  1.21790412  1.27522537 -0.85602948  3.49923711 -2.09440495
##  [36]  3.18029802  0.30999418  1.17686158  2.62592716  0.89631421
##  [41]  4.31507746 -3.00656663  4.59299888  1.64132434  1.92249735
##  [46]  1.04942131 -4.51938775  0.72782717  2.89759703  4.72012468
##  [51] -0.65533194 -0.56859638  1.08539969  1.04534433  4.91368083
##  [56]  4.49634987 -2.70728888  0.40962944  3.80927845 -4.50919466
##  [61]  1.40688288  0.51919421 -1.78711097 -2.04622685  0.70575552
##  [66] -4.97348747 -1.20819538  4.79887234  0.49114841  1.68697196
##  [71] -4.92515992 -0.95752397 -0.45865085 -0.75355976 -0.47207143
##  [76]  0.37270514 -2.28636855 -4.85496797 -2.32975698 -3.41602660
##  [81]  2.31262807 -2.43749425  0.66346296  1.90979827 -0.62524703
##  [86]  0.70443615 -2.36403447  1.48844796  1.37671431  2.43419501
##  [91] -0.12809676  0.26716024  2.87269334 -1.24285956  4.41374717
##  [96]  3.01076995 -4.64890124  1.80381765 -2.18863344  0.41831732
beta2
##        x2 
## 0.1233176
  1. Keeping \(\hat\beta_2\) fixed, fit the model

Y-\(\hat\beta_2X_2\)=\(\beta_0\)+\(\beta_1X_1\)+\(\epsilon\)

You can do this as follows:

a= y-beta2 *x2
beta1=lm(a~x1)$coef[2]
a
##   [1] -0.676940276 -0.024624528 -1.031282939  1.636087568  0.611317604
##   [6] -1.128474094  0.405168033  0.671569500  0.577433749 -0.368293749
##  [11]  1.532051607  0.337964170 -0.571883100 -2.045729098  1.003113043
##  [16] -0.232344965  0.021882892  1.098388934  0.742021519  0.599414746
##  [21]  1.132713759  0.781873405  0.152292127 -1.947304258  0.762451512
##  [26] -0.278487928 -0.114961088 -1.272764273 -0.502467483  0.385487363
##  [31]  1.480249365  0.253467126  0.466654298 -0.124158691 -1.369694624
##  [36] -0.402887393 -0.463449037  0.086997872  0.964773425  0.763834761
##  [41] -0.251747472 -0.380885396  0.669404300  0.665023340 -0.832169728
##  [46] -0.460839554  0.431764266  0.800061631 -0.091860559  0.755266535
##  [51]  0.381307319 -0.662237338  0.349709358 -1.098821716  1.347250024
##  [56]  1.839049733 -0.070877344 -1.114763511  0.523509501 -0.082611596
##  [61]  2.284341114  0.008759804  0.724802346 -0.077731584 -0.955333574
##  [66]  0.155489769 -1.752895897  1.612193496  0.194075541  2.288509196
##  [71]  0.507440482 -0.758580336  0.715775352 -1.260786615 -1.272872390
##  [76]  0.152071753 -0.161002527 -0.090273156  0.236657559 -0.702948957
##  [81] -0.617765200 -0.084923156  1.014782560 -1.437092572  0.665546168
##  [86]  0.456400221  1.145498040 -0.420741893  0.316535688  0.143144937
##  [91] -0.494411522  1.161454713  1.130292776  0.876096326  1.367521768
##  [96]  0.541906659 -1.371004063 -0.691050606 -1.218376972 -0.435688203
beta1
##          x1 
## 0.005663458
  1. Write a for loop to repeat (c) and (d) 1,000 times. Report the estimates of \(\hat\beta_0\), \(\hat\beta_1\), and \(\hat\beta_2\) at each iteration of the for loop. Create a plot in which each of these values is displayed, with \(\hat\beta_0\), \(\hat\beta_1\), and \(\hat\beta_2\) each shown in a different color.
iteration=100
df=data.frame(0.0, 0.27, 0.0)
names(df)=c("beta0", "beta1", "beta2")
for (i in 1:iteration){
   beta1 = df[nrow(df), 2]
   a= y - beta1 * x1
  beta2= lm(a ~ x2)$coef[2]
  a= y - beta2 * x2
   beta1 <- lm(a ~ x1)$coef[2]
  beta0 <- lm(a ~ x1)$coef[1]
  df[nrow(df) + 1,] <- list(beta0, beta1, beta2)
}
plot(df$beta0, col="purple2", type="l", lwd=2, main="Backfitting", xlab="Iterations", ylab="Betas")
lines(df$beta1, col="skyblue2", lwd=2)
lines(df$beta2, col="palegreen3", lwd=2)

The coefficients here quick attain their least squared values during the iterations.

  1. Compare your answer in (e) to the results of simply performing multiple linear regression to predict \(Y\) using \(X_1\) and \(X_2\). Use the abline() function to overlay those multiple linear regression coefficient estimates on the plot obtained in (e).
lm.fit<-coef(lm(y~x1+x2))
plot(df$beta0, col="purple2", type="l", lwd=2, main="Multiple Linear Regression VS Backfitting", xlab="Iterations", ylab="Betas")
lines(df$beta1, col="skyblue2", lwd=2)
lines(df$beta2, col="palegreen3", lwd=2)
abline(h=lm.fit[1], col="grey60", lty=2, lwd=2)
abline(h=lm.fit[2], col="grey60", lty=2, lwd=2)
abline(h=lm.fit[3], col="grey60", lty=2, lwd=2)

The dotted lines on this graph show that the estimated multiple regression coefficients match exactly with the coefficients obtained using the backfitting method.

  1. On this data set, how many backfitting iterations were required in order to obtain a “good” approximation to the multiple regression coefficient estimates?

As seen in the graphs above, the betas didn’t reach their least squared values until almost 5 iterations. I would say that 5 iterations are necessary for backfitting in order to produce the same approximately to the multiple regression coefficient estimates.