This example will cover the use of R functions for fitting Ordinal and Multinomial logit models to complex survey data.
For this example I am using 2014 CDC Behavioral Risk Factor Surveillance System (BRFSS) SMART county data. Link
In the basic logistic regression model, we assumed a dichotomous outcome * y = 0 or 1 This is a very flexible type of model since we can write any number of outcomes as a dichtomous variable.
Although the logistic regression model is very flexible, we may often run into other kinds of outcomes that it is not applicable to
These types of outcomes require other kinds of regression models to analyze them.
Suppose we have an outcome with J ordered outcomes, and each person, i, has a response: Yi, and we can write the probablity of a person having response level J as: Pr(Yij=j) for J= 1 … J
For an ordered response it is easier to work with the cumluative probablity function or
Pr(Yij⩽
So the probabilities increase in adjacent categories (ordered categories). I.e. Pr(Y_ij \leqslant 3)>Pr(Y_ij \leqslant 2)>Pr(Y_ij \leqslant 1) Since we are using cumulative probabilities, we do not acutally have to estimate all J probabilities. We need only estimate J-1 probabilites, due to the complementary rule of probability: Pr(y=0) = 1-Pr(y>0)
In order to construct a regression model, we have to link a linear function of covariates to these probabilities. A general depiction of this model can be thought of as a latent trait model, where a latent variable, z is not observable but continuous, but we can observe the discrete representation of it as the J categories, then we can write this as:
\pi_{j-1} < Z{i} \leqslant \pi{j} or graphically as:
The most common model for an ordinal outcome such as this is the Proportional Odds Model. In this model, for each level of J, there is an intercept term, \beta_{0j} which depends on the category, J. The other explantory variables x do not depend on the category J, however. This generates a model of the form:
\text {log} \frac{\pi_1+\cdots +\pi_j}{\pi_{j+1}+\cdots +\pi_j} = \beta_{0j}+x'\beta The base assumption for this model is that each x variable affects the probability of J increasing by 1 level in the same way, for all transitions. We can visualize this as:
This model is attractive because we have a direct and consistent interpretation of our \beta’s in the model. While this is nice, the assumption can be very weak in practice, with the \beta’s not being equal between categories. In other words, come covariates may affect the probability of certain transitions more than others.
We can evaluate the assumption generally by fitting a series of binomial logistic regressions and examining the \beta’s between them. This can be done by re-coding the outcome from a single ordered outcome to a series of binomial outcomes. For example:
#General ordinal coding, with 5 being the worst and 1 being the best health
brfss_14$generalhealth<-recode(brfss_14$genhlth, recodes="1:2=1;3=2;4:5=3; else=NA", as.factor.result = T)
brfss_14$generalhealth<-relevel(brfss_14$generalhealth, ref = "1")
#First we tell R our survey design
options(survey.lonely.psu = "adjust")
des<-svydesign(ids=~1, strata=~ststr, weights=~strwt, data = brfss_14[is.na(brfss_14$strwt)==F&is.na(brfss_14$race_eth)==F&is.na(brfss_14$generalhealth)==F&is.na(brfss_14$educ)==F&is.na(brfss_14$agec)==F,] )
To fit an ordinal logit to survey data in R, we use the svyolr
fucntion in the survey library.
#Here I fit three nested models for the health outcome
fit.solr1<-svyolr(generalhealth~race_eth+educ+agec,des)
summary(fit.solr1)
## Call:
## svyolr(generalhealth ~ race_eth + educ + agec, des)
##
## Coefficients:
## Value Std. Error t value
## race_ethhispanic 0.5830581 0.01882817 30.967322
## race_ethnh black 0.4995386 0.01991045 25.089270
## race_ethnh multirace 0.4786802 0.04219548 11.344348
## race_ethnh other 0.3525681 0.02840802 12.410866
## educ1somehs -0.3331931 0.04104785 -8.117189
## educ2hsgrad -1.0056403 0.03639295 -27.632832
## educ3somecol -1.3316164 0.03663009 -36.353076
## educ4colgrad -2.0225064 0.03653930 -55.351536
## agec(24,39] 0.3750152 0.02476654 15.142012
## agec(39,59] 0.8657256 0.02332231 37.120059
## agec(59,79] 1.1173438 0.02353356 47.478749
## agec(79,99] 1.3880860 0.03036296 45.716433
##
## Intercepts:
## Value Std. Error t value
## 1|2 -0.3962 0.0423 -9.3723
## 2|3 1.2345 0.0422 29.2382
#Calculate the AIC ourself
fit.solr1$deviance+2*length(fit.solr1$coefficients)
## [1] 846454.4
Which gives us the results from the proportional odds model. In this case, we see that all minority groups have higher odds of reporting poorer health, compared to whites. We also see that as education increases, the odds of reporting poorer health decreases. For age, as age increases, the odds of reporting poorer health increases, compared to those <25 years of age.
Now we can “examine” the proportional odds assumption by fitting logits for each change
des$variables$healthnum<-recode(des$variables$genhlth, recodes="1:2=1;3=2;4:5=3; else=NA", as.factor.result = F)
ex1<-svyglm(I(healthnum>1)~race_eth+educ+agec,des, family="binomial")
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
ex2<-svyglm(I(healthnum>2)~race_eth+educ+agec,des, family="binomial")
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
This is NOT a statistical test, just a a rough guide. Here, I plot the coefficients of each model. If the proportional odds is correct, and the assumption is ok, then all these should be “approximately” the same values.
plot(coef(ex1)[-1], ylim=c(-3, 3), type="l",xaxt="n", ylab="Beta", main=c("Comparison of betas for", " proportional odds assumption"))
lines(coef(ex2)[-1], col=1, lty=2)
axis(side=1, at=1:12, labels=F)
text(x=1:12, y=-4, srt = 90, pos = 1, xpd = TRUE,cex=.8,
labels = c( "Hispanic", "Black","MultRace" ,"Other",
"SomeHS","HS Graduate", "SomeColl", "CollGrad", "Age 24-39","Age 39-59" ,"Age 59-79", "Age 80+" ))
legend("bottomright",col=c(1,1),lty=c(1,2), legend=c(">1", ">2"))
lines(coef(fit.solr1)[c(-13:-16)], col=4, lwd=3)
Based on this, the race/ethnicity effects appear to be pretty consistent across transitions, but the education and age effects do not look to be consistent. For example, the dot and dash line for the age variables( Pr(Y>4)) is much higher than than the solid line(Pr(Y>1)). A simliar set of differences is ween for the education levels, but in the opposite direction. i.e. higher education is more protective against y>4 than for y>1.
#Just print the odds ratios,
round(exp(rbind(coef(ex1)[-1], coef(ex2)[-1])),3)
## race_ethhispanic race_ethnh black race_ethnh multirace
## [1,] 1.883 1.672 1.544
## [2,] 1.698 1.591 1.756
## race_ethnh other educ1somehs educ2hsgrad educ3somecol educ4colgrad
## [1,] 1.485 0.731 0.370 0.270 0.140
## [2,] 1.178 0.720 0.367 0.264 0.112
## agec(24,39] agec(39,59] agec(59,79] agec(79,99]
## [1,] 1.435 2.175 2.753 3.738
## [2,] 1.720 3.690 4.845 5.818
Which again shows us the effects are not consistent across transitions, just like the plot does. If anything the plot is easier to digest.
library(VGAM)
## Loading required package: stats4
## Loading required package: splines
##
## Attaching package: 'VGAM'
## The following object is masked from 'package:survey':
##
## calibrate
## The following object is masked from 'package:car':
##
## logit
fit.vgam<-vglm(as.ordered(generalhealth)~race_eth+educ+agec,brfss_14, weights = strwt/mean(strwt, na.rm=T), family=cumulative(parallel = T, reverse = T))
summary(fit.vgam)
##
## Call:
## vglm(formula = as.ordered(generalhealth) ~ race_eth + educ +
## agec, family = cumulative(parallel = T, reverse = T), data = brfss_14,
## weights = strwt/mean(strwt, na.rm = T))
##
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## logit(P[Y>=2]) -10.565 -0.5522 -0.1051 0.48578 9.329
## logit(P[Y>=3]) -5.531 -0.3628 -0.1702 -0.06838 15.647
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 0.396163 0.020015 19.79 <2e-16 ***
## (Intercept):2 -1.234466 0.020048 -61.58 <2e-16 ***
## race_ethhispanic 0.583059 0.008967 65.02 <2e-16 ***
## race_ethnh black 0.499539 0.009852 50.71 <2e-16 ***
## race_ethnh multirace 0.478678 0.021612 22.15 <2e-16 ***
## race_ethnh other 0.352569 0.013866 25.43 <2e-16 ***
## educ1somehs -0.333188 0.019529 -17.06 <2e-16 ***
## educ2hsgrad -1.005634 0.017078 -58.89 <2e-16 ***
## educ3somecol -1.331611 0.017194 -77.45 <2e-16 ***
## educ4colgrad -2.022500 0.017229 -117.39 <2e-16 ***
## agec(24,39] 0.375015 0.012007 31.23 <2e-16 ***
## agec(39,59] 0.865725 0.011332 76.40 <2e-16 ***
## agec(59,79] 1.117343 0.011723 95.31 <2e-16 ***
## agec(79,99] 1.388084 0.017304 80.22 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 2
##
## Names of linear predictors: logit(P[Y>=2]), logit(P[Y>=3])
##
## Residual deviance: 843109 on 903576 degrees of freedom
##
## Log-likelihood: -421554.5 on 903576 degrees of freedom
##
## Number of iterations: 5
##
## No Hauck-Donner effect found in any of the estimates
##
## Exponentiated coefficients:
## race_ethhispanic race_ethnh black race_ethnh multirace
## 1.7915095 1.6479612 1.6139400
## race_ethnh other educ1somehs educ2hsgrad
## 1.4227171 0.7166352 0.3658127
## educ3somecol educ4colgrad agec(24,39]
## 0.2640516 0.1323242 1.4550133
## agec(39,59] agec(59,79] agec(79,99]
## 2.3767294 3.0567226 4.0071668
This is an alternative model where we don’t assume proportionality of the odds of the various transitions.
#continuation Ratio
fit.crp<-vglm(as.ordered(generalhealth)~race_eth+educ+agec,brfss_14, weights = strwt/mean(strwt, na.rm=T), family=cratio( parallel = F, reverse = T))
summary(fit.crp)
##
## Call:
## vglm(formula = as.ordered(generalhealth) ~ race_eth + educ +
## agec, family = cratio(parallel = F, reverse = T), data = brfss_14,
## weights = strwt/mean(strwt, na.rm = T))
##
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## logit(P[Y<2|Y<=2]) -8.065 -0.46272 0.1007 0.5103 10.883
## logit(P[Y<3|Y<=3]) -20.175 0.08913 0.2106 0.3779 4.816
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 0.21815 0.02815 7.748 9.30e-15 ***
## (Intercept):2 1.55679 0.02676 58.178 < 2e-16 ***
## race_ethhispanic:1 -0.56181 0.01092 -51.459 < 2e-16 ***
## race_ethhispanic:2 -0.52917 0.01207 -43.835 < 2e-16 ***
## race_ethnh black:1 -0.43475 0.01200 -36.220 < 2e-16 ***
## race_ethnh black:2 -0.46408 0.01322 -35.098 < 2e-16 ***
## race_ethnh multirace:1 -0.30517 0.02639 -11.564 < 2e-16 ***
## race_ethnh multirace:2 -0.56276 0.02922 -19.262 < 2e-16 ***
## race_ethnh other:1 -0.40188 0.01579 -25.454 < 2e-16 ***
## race_ethnh other:2 -0.16368 0.02155 -7.594 3.09e-14 ***
## educ1somehs:1 0.11547 0.02949 3.916 9.01e-05 ***
## educ1somehs:2 0.32846 0.02153 15.259 < 2e-16 ***
## educ2hsgrad:1 0.59267 0.02594 22.847 < 2e-16 ***
## educ2hsgrad:2 1.00224 0.01897 52.830 < 2e-16 ***
## educ3somecol:1 0.84025 0.02594 32.393 < 2e-16 ***
## educ3somecol:2 1.33294 0.01933 68.943 < 2e-16 ***
## educ4colgrad:1 1.33983 0.02582 51.888 < 2e-16 ***
## educ4colgrad:2 2.18497 0.02000 109.227 < 2e-16 ***
## agec(24,39]:1 -0.25338 0.01323 -19.157 < 2e-16 ***
## agec(24,39]:2 -0.54250 0.02112 -25.680 < 2e-16 ***
## agec(39,59]:1 -0.49122 0.01266 -38.800 < 2e-16 ***
## agec(39,59]:2 -1.30570 0.01959 -66.660 < 2e-16 ***
## agec(59,79]:1 -0.66564 0.01323 -50.318 < 2e-16 ***
## agec(59,79]:2 -1.57796 0.01992 -79.197 < 2e-16 ***
## agec(79,99]:1 -0.92800 0.02098 -44.226 < 2e-16 ***
## agec(79,99]:2 -1.76103 0.02540 -69.341 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 2
##
## Names of linear predictors: logit(P[Y<2|Y<=2]), logit(P[Y<3|Y<=3])
##
## Residual deviance: 840373.8 on 903564 degrees of freedom
##
## Log-likelihood: -420186.9 on 903564 degrees of freedom
##
## Number of iterations: 5
#Adjacent categories
fit.acat<-vglm(as.ordered(generalhealth)~race_eth+educ+agec,brfss_14, weights = strwt/mean(strwt, na.rm=T), family=acat(parallel = F, reverse = T))
summary(fit.acat)
##
## Call:
## vglm(formula = as.ordered(generalhealth) ~ race_eth + educ +
## agec, family = acat(parallel = F, reverse = T), data = brfss_14,
## weights = strwt/mean(strwt, na.rm = T))
##
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## loge(P[Y=1]/P[Y=2]) -7.737 -0.52749 0.1101 0.5585 11.047
## loge(P[Y=2]/P[Y=3]) -20.959 0.04074 0.1252 0.3604 5.191
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 0.20873 0.02815 7.414 1.23e-13 ***
## (Intercept):2 0.63307 0.02922 21.669 < 2e-16 ***
## race_ethhispanic:1 -0.56317 0.01088 -51.773 < 2e-16 ***
## race_ethhispanic:2 -0.22630 0.01319 -17.162 < 2e-16 ***
## race_ethnh black:1 -0.43973 0.01195 -36.796 < 2e-16 ***
## race_ethnh black:2 -0.22023 0.01457 -15.117 < 2e-16 ***
## race_ethnh multirace:1 -0.30537 0.02628 -11.621 < 2e-16 ***
## race_ethnh multirace:2 -0.38663 0.03267 -11.836 < 2e-16 ***
## race_ethnh other:1 -0.40277 0.01573 -25.603 < 2e-16 ***
## race_ethnh other:2 0.06758 0.02320 2.913 0.003581 **
## educ1somehs:1 0.11261 0.02945 3.824 0.000131 ***
## educ1somehs:2 0.31732 0.02383 13.315 < 2e-16 ***
## educ2hsgrad:1 0.60119 0.02593 23.189 < 2e-16 ***
## educ2hsgrad:2 0.77119 0.02105 36.644 < 2e-16 ***
## educ3somecol:1 0.84659 0.02593 32.649 < 2e-16 ***
## educ3somecol:2 0.96311 0.02145 44.892 < 2e-16 ***
## educ4colgrad:1 1.34908 0.02582 52.252 < 2e-16 ***
## educ4colgrad:2 1.48585 0.02212 67.161 < 2e-16 ***
## agec(24,39]:1 -0.25526 0.01319 -19.348 < 2e-16 ***
## agec(24,39]:2 -0.39316 0.02267 -17.341 < 2e-16 ***
## agec(39,59]:1 -0.48515 0.01263 -38.425 < 2e-16 ***
## agec(39,59]:2 -1.02644 0.02109 -48.672 < 2e-16 ***
## agec(59,79]:1 -0.65866 0.01318 -49.955 < 2e-16 ***
## agec(59,79]:2 -1.20000 0.02149 -55.839 < 2e-16 ***
## agec(79,99]:1 -0.95021 0.02087 -45.522 < 2e-16 ***
## agec(79,99]:2 -1.23165 0.02764 -44.561 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 2
##
## Names of linear predictors: loge(P[Y=1]/P[Y=2]), loge(P[Y=2]/P[Y=3])
##
## Residual deviance: 840232.5 on 903564 degrees of freedom
##
## Log-likelihood: -420116.2 on 903564 degrees of freedom
##
## Number of iterations: 5
We also commonly see a resonse variable with unordered categories for a response. Such as:
These are example of what generally is called an âalternative decision makingâ process.
A distribution commonly used for this type of outcome is the multivariate extension of the binomial distribution, known as the multinomial distribution.
If you have an outcome with J unordered classes, then \pi_1, \pi_2, …, \pi_J are the probabilities of observing the J classes. Remember, \sum_J \pi_j = 1.
If we observe y_1 outcomes in the first category and y_2 outcomes in the second and so on, the let:
\mathbf{y}=\begin{bmatrix} y_1\\ y_2\\ \vdots \\ y_4 \end{bmatrix}, with \sum_{j=1}^Jy_i = n
In this case y follows the multinomial distribution:
f(y|n)=\frac{n!}{y_1!y_2!\cdots y_j!} \pi_1^{y1}\pi_2^{y2}\cdots\pi_J^{yJ}
When constructing a regression model for such a distrubution, we choose one level of the outcome as the reference level and compare the probabilty of chooseing other options to it. For example, if we modeled the probability of choosing option j to option 1 (as the reference level), the we would have the logistic regression model:
logit(\pi_{j}) =log \left ( \frac{\pi_j}{\pi_1} \right)=x'_j\beta_j
This makes J-1 equations, again with the reference category estimated by the complementary rule. We then have a total of J-1 regression equations to interpret, where we compare the odds of being in each of the other j-1 categories to the reference category. If we want to estimate the probability of being in each particular class of the outcome, for a response with a given set of x values, we can solve the equation for \pi_j as:
\pi_j = \frac{exp(x'_j\beta_j)}{1+\sum_{j=2}^J exp(x'_j\beta_j)}
This is a much more complicated model than the proportional odds model, and we will have many more effects to interpret, owing to the J-1 separate equations we are estimating.
Unfortunately, in R there is no multinomial survey design model, so we have to trick R into doing this for us. I follow the advice of Thomas Lumley, who wrote the survey
package. In his book Complex Surveys, in appendix E, he describes how models that are not included in the package, but that can accept a weight argument, can be fit by creating some replicate weights based off the Taylor series weights. This is what Thomas Lumley (the author of the survey package) says to do in this case (and in cases where the type of model can’t be fit otherwise)
First, create a bootstrap replicate weight set
des2<-as.svrepdesign(des, type="bootstrap" , replicates=10)
Now I fit the model, this is screwy looking because I have to use the multinom()
function from the nnet
library to fit the model and since I want to extract the equations from the model I have to use the coef()
function.
library(nnet)
mfit<-withReplicates(des2, quote(coef(multinom(generalhealth~race_eth+educ+agec,weights=.weights,trace=F ))))
Now, I get the betas out. I have 4 equations, and 12 total betas for each equation, so i’m going to put the results into a matrix that is 4 rows by 12 columns (12 beta’s). I throw away the intercepts becuase I’m not going to be interpreting them anyway.
mfitcoef<-data.frame(matrix(attr(attr(mfit, "var"), "means")[-1:-4], nrow=4, ncol=12, byrow=F))
## Warning in matrix(attr(attr(mfit, "var"), "means")[-1:-4], nrow = 4, ncol
## = 12, : data length [22] is not a sub-multiple or multiple of the number of
## rows [4]
names(mfitcoef)<-names(coef(ex1)[-1])
Calculate the odds ratios and compute test statistics for each beta (removing the intercepts )
round(exp(mfitcoef), 3)
## race_ethhispanic race_ethnh black race_ethnh multirace race_ethnh other
## 1 1.540 1.492 0.548 0.257
## 2 1.927 1.380 0.254 0.059
## 3 1.333 0.891 0.428 1.289
## 4 2.005 0.647 0.164 1.928
## educ1somehs educ2hsgrad educ3somecol educ4colgrad agec(24,39]
## 1 1.628 2.571 1.333 0.891 0.428
## 2 4.595 8.679 2.005 0.647 0.164
## 3 1.916 1.540 1.492 0.548 0.257
## 4 6.440 1.927 1.380 0.254 0.059
## agec(39,59] agec(59,79] agec(79,99]
## 1 1.289 1.916 1.540
## 2 1.928 6.440 1.927
## 3 1.628 2.571 1.333
## 4 4.595 8.679 2.005
#get the covariance matrix for the betas
vcov<-matrix(attr(mfit, "var"), nrow=52, ncol=52) #52 = 13*4 = #betas * #equations
#get the z tests by using the coefficients and the standard errors (diag(vcov))
z<-as.vector(mfitcoef/sqrt(diag(vcov)[-1:-4]))
## Warning in sqrt(diag(vcov)[-1:-4]): NaNs produced
z<-round(z, 4)
#p values
pvals<-round((1-pnorm(abs(as.matrix(z)),0,1))*2,4)
round(pvals, 3)
## race_ethhispanic race_ethnh black race_ethnh multirace
## [1,] 0 NaN NaN
## [2,] 0 NaN 0
## [3,] NaN NaN 0
## [4,] NaN NaN NaN
## race_ethnh other educ1somehs educ2hsgrad educ3somecol educ4colgrad
## [1,] NaN NaN 0 NaN NaN
## [2,] 0 NaN 0 0 NaN
## [3,] NaN 0 NaN 0 0
## [4,] NaN NaN NaN NaN NaN
## agec(24,39] agec(39,59] agec(59,79] agec(79,99]
## [1,] NaN NaN NaN 0
## [2,] NaN NaN 0 0
## [3,] NaN NaN NaN 0
## [4,] NaN NaN NaN 0
So in these results, each row consists of an equation for a particular set of outcomes. For example, let’s interpret the first row, which corresponds to the model Pr(y='very good')/Pr(y='excellent')
In this case hispanics are less likely to report very good instead of excellent health, compared to NH Whites. This is also the case for NH blacks. While Others and multirace respondents are more likely to report very good compared to excellent health, compared to NH whites. In terms of education, those with more education are more likely to report very good, compared to excellent health, as are people who are older.
Let’s consider the fourth row. This corresponds to the model Pr(y='poor')/Pr(y='excellent'). In this case hispanics are more likely to report poor instead of excellent health, compared to NH Whites. This is also the case for NH blacks, Others and multirace respondents. In terms of education, those with more education are more less to report poor, compared to excellent health, compared to those with little education, and people who are older are much more likely to report poor health compared to excellent health.
#get a series of predicted probabilites for different "types" of people for each model
dat<-expand.grid(race_eth=levels(brfss_14$race_eth), educ=levels(brfss_14$educ), agec=levels(brfss_14$agec))
#generate the fitted values
#Unfortunately, the survey proportional odds model won't generate fitted values
#but here I use a weighted multinomial model, which fits the data better anyway
fit.mult<-multinom(generalhealth~race_eth+educ+agec, brfss_14,weights=strwt/mean(strwt, na.rm=T))
## # weights: 42 (26 variable)
## initial value 494399.888336
## iter 10 value 428665.699653
## iter 20 value 428336.761906
## iter 30 value 420117.976511
## final value 420116.245418
## converged
fitm<-predict(fit.mult, newdat=dat,type="probs")
#add the values to the fake data
dat$fitted.prob.mlrm<-round(fitm, 3)
#Print the fitted probabilities
head(dat, n=20)
## race_eth educ agec fitted.prob.mlrm.1 fitted.prob.mlrm.2
## 1 nhwhite 0Prim [0,24] 0.446 0.362
## 2 hispanic 0Prim [0,24] 0.296 0.422
## 3 nh black 0Prim [0,24] 0.323 0.407
## 4 nh multirace 0Prim [0,24] 0.338 0.372
## 5 nh other 0Prim [0,24] 0.355 0.431
## 6 nhwhite 1somehs [0,24] 0.499 0.362
## 7 hispanic 1somehs [0,24] 0.346 0.441
## 8 nh black 1somehs [0,24] 0.375 0.422
## 9 nh multirace 1somehs [0,24] 0.393 0.387
## 10 nh other 1somehs [0,24] 0.404 0.438
## 11 nhwhite 2hsgrad [0,24] 0.643 0.286
## 12 hispanic 2hsgrad [0,24] 0.495 0.386
## 13 nh black 2hsgrad [0,24] 0.526 0.363
## 14 nh multirace 2hsgrad [0,24] 0.549 0.331
## 15 nh other 2hsgrad [0,24] 0.550 0.366
## 16 nhwhite 3somecol [0,24] 0.705 0.245
## 17 hispanic 3somecol [0,24] 0.566 0.346
## 18 nh black 3somecol [0,24] 0.596 0.322
## 19 nh multirace 3somecol [0,24] 0.620 0.293
## 20 nh other 3somecol [0,24] 0.618 0.322
## fitted.prob.mlrm.3
## 1 0.192
## 2 0.281
## 3 0.270
## 4 0.291
## 5 0.214
## 6 0.140
## 7 0.214
## 8 0.203
## 9 0.220
## 10 0.158
## 11 0.070
## 12 0.119
## 13 0.111
## 14 0.120
## 15 0.084
## 16 0.050
## 17 0.088
## 18 0.081
## 19 0.087
## 20 0.061
This will use polr
with survey weights to get fitted probabilities from the proportional odds model. The fitted values are right, but the standard errors won’t be.
library(MASS)
dat2<-dat[,c(-4:-8)]
ordfit<-polr(generalhealth~race_eth+educ+agec, brfss_14,weights=strwt/mean(strwt, na.rm=T))
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
fitted.ord<-predict(ordfit, newdata=dat2, type="probs")
dat2$fitted.ord<-round(fitted.ord, 3)
head(dat2, n=20)
## race_eth educ agec fitted.ord.1 fitted.ord.2 fitted.ord.3
## 1 nhwhite 0Prim [0,24] 0.402 0.372 0.225
## 2 hispanic 0Prim [0,24] 0.273 0.384 0.343
## 3 nh black 0Prim [0,24] 0.290 0.386 0.324
## 4 nh multirace 0Prim [0,24] 0.294 0.386 0.320
## 5 nh other 0Prim [0,24] 0.321 0.386 0.293
## 6 nhwhite 1somehs [0,24] 0.484 0.343 0.173
## 7 hispanic 1somehs [0,24] 0.344 0.384 0.272
## 8 nh black 1somehs [0,24] 0.363 0.381 0.256
## 9 nh multirace 1somehs [0,24] 0.368 0.380 0.252
## 10 nh other 1somehs [0,24] 0.398 0.374 0.229
## 11 nhwhite 2hsgrad [0,24] 0.648 0.256 0.096
## 12 hispanic 2hsgrad [0,24] 0.507 0.333 0.160
## 13 nh black 2hsgrad [0,24] 0.527 0.323 0.149
## 14 nh multirace 2hsgrad [0,24] 0.533 0.321 0.147
## 15 nh other 2hsgrad [0,24] 0.564 0.305 0.132
## 16 nhwhite 3somecol [0,24] 0.718 0.210 0.071
## 17 hispanic 3somecol [0,24] 0.587 0.292 0.121
## 18 nh black 3somecol [0,24] 0.607 0.280 0.112
## 19 nh multirace 3somecol [0,24] 0.612 0.277 0.110
## 20 nh other 3somecol [0,24] 0.642 0.260 0.099
Compare the estimates from the two models:
head(dat, n=10) #Multinomial model
## race_eth educ agec fitted.prob.mlrm.1 fitted.prob.mlrm.2
## 1 nhwhite 0Prim [0,24] 0.446 0.362
## 2 hispanic 0Prim [0,24] 0.296 0.422
## 3 nh black 0Prim [0,24] 0.323 0.407
## 4 nh multirace 0Prim [0,24] 0.338 0.372
## 5 nh other 0Prim [0,24] 0.355 0.431
## 6 nhwhite 1somehs [0,24] 0.499 0.362
## 7 hispanic 1somehs [0,24] 0.346 0.441
## 8 nh black 1somehs [0,24] 0.375 0.422
## 9 nh multirace 1somehs [0,24] 0.393 0.387
## 10 nh other 1somehs [0,24] 0.404 0.438
## fitted.prob.mlrm.3
## 1 0.192
## 2 0.281
## 3 0.270
## 4 0.291
## 5 0.214
## 6 0.140
## 7 0.214
## 8 0.203
## 9 0.220
## 10 0.158
head(dat2, n=10) #proportional odds model
## race_eth educ agec fitted.ord.1 fitted.ord.2 fitted.ord.3
## 1 nhwhite 0Prim [0,24] 0.402 0.372 0.225
## 2 hispanic 0Prim [0,24] 0.273 0.384 0.343
## 3 nh black 0Prim [0,24] 0.290 0.386 0.324
## 4 nh multirace 0Prim [0,24] 0.294 0.386 0.320
## 5 nh other 0Prim [0,24] 0.321 0.386 0.293
## 6 nhwhite 1somehs [0,24] 0.484 0.343 0.173
## 7 hispanic 1somehs [0,24] 0.344 0.384 0.272
## 8 nh black 1somehs [0,24] 0.363 0.381 0.256
## 9 nh multirace 1somehs [0,24] 0.368 0.380 0.252
## 10 nh other 1somehs [0,24] 0.398 0.374 0.229