Earlier this week I published estimates of support for same-sex marriage in each of Australia’s 150 CEDs based on the ABC’s 2016 Vote Compass (VC) survey. My analysis — drawing strongly on joint work with Dr Shaun Ratcliff and Luke Mansillo — used a variety of post-stratification methods.

With the release of the results of the Australian Marriage Law Postal Survey (AMLPS), we can compare the performance of the post-stratification techniques. My prior expectation is that the post-stratified VC estimates will perform well with respect to the rank-ordering of CEDs stated level of support for SSM in AMLPS.

Comparisons of estimates

We examine the performance of the BART + Bayes estimates (dark grey), raked estimates (red) and the results from the AMLPS released today by ABS (blue).

library(scales)
plotData$division <- reorder(plotData$division,plotData$yes_per/100,mean)

library(ggplot2)
ggplot(data=plotData,
       aes(y=division,x=p.bayes,xmin=p.bayes.lo,xmax=p.bayes.up,color="BART + Bayes")) + 
  geom_vline(xintercept = .5,col=gray(.45)) +
  geom_errorbarh(height=0,color=gray(.35)) +
  geom_point() +
  geom_point(inherit.aes = FALSE,aes(x=yes_per/100,y=division,color="AMLPS")) +
  geom_point(inherit.aes = FALSE,aes(x=p.raked,y=division,color="Raked")) +
  scale_y_discrete("") +
  scale_x_continuous("Percentage supporting SSM",
                     labels = percent,
                     breaks=seq(.3,.9,by=.1)) +
  scale_color_manual(name=" ",
                     values=c("BART + Bayes"=gray(.45),
                              "AMLPS"="blue",
                              "Raked"="red")) +
theme(panel.border = element_blank(),
      legend.position = 'top')
Comparison of AMPLS results (blue), Bayes estimates (gray) and raked VC estimates (red).

Comparison of AMPLS results (blue), Bayes estimates (gray) and raked VC estimates (red).

The following is an array of scatterplots, showing the relationship between the various post-stratified estimates and actual results. Correlations reported are Spearman rank-order correlations.

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...){
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y,use="pairwise",method="spearman"))
    txt <- format(c(r, 0.123456789), digits = digits)[1]
    txt <- paste0(prefix, txt)
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = 2)
}

pairs(cbind(plotData$p.bayes,
            plotData$p.glmer,
            plotData$p.bart,
            plotData$p.nnet,
            plotData$p.glm,
            plotData$p.raked,
            plotData$naive,
            plotData$yes_per/100),
      labels=c("BART\n+ Bayes",
               "GLMER","BART","nnet","logistic\nregression",
               "raking","naive","AMLPS\n(actuall)"),
      lower.panel=function(x,y){
        abline(0,1)
        abline(coef=coef(lm(y~x)),col="red")
        points(x,y,pch=1)
        },
      upper.panel = panel.cor)

The raked estimates perform quite well relative to other more complex methods of statistical adjustment. Indeed, if a rank ordering of CEDs is all we seek, then even the naive estimates based on the raw, unadjusted VC responses have tolerable performance.

ols <- lm(yes_per~I(p.bayes*100),data=plotData)
b <- coef(ols)

In the next scatterplot I focus on the relationship between the BART + Bayes estimates and the AMLPS results; the dark line is a 45 degree line and if the BART + Bayes estimates perfectly predicted the AMLPS results the data would all fall on this line. Points below the 45 degree line are overestimates, where the BART + Bayes prediction overstates the AMLPS result; convrsely, points above the 45 degree line are underestimates.

The regression (red line) has an intercept of 0.65 and a slope of 1.08, indicating that the BART + Bayes estimates generally underestimated SSM support (as expected, given that respondents reporting DK on VC were not dropped or reallocated) — but only by less than a percentage point, on average — and that this underestimation is larger among CEDs with higher levels of support for SSM. This general pattern holds up despite the large over-estimates of SSM support in a number of CEDs in Western Sydney. Indeed, the regression result is a little misleading, masking the general pattern of about a 5% underestimate, save for the large overestimates.

refData <- data.frame(x=c(0,100),y=c(0,100))
olsData <- data.frame(x=seq(20,90,length=101))
olsData$y <- b[1] + b[2]*olsData$x

library(highcharter)
hc <- highchart(width=800,height=800) %>%
  hc_add_theme(hc_theme_ggplot2()) %>%
  hc_add_series(data = refData,
                id="45 degrees",
                name="45 degrees",
                marker=list(enabled=FALSE,radius=0),
                enableMouseTracking="false",
                type="line") %>%
  hc_add_series(data = olsData,
                ##id="Regression",
                name="Regression",
                marker=list(enabled=FALSE,radius=0),
                enableMouseTracking="false",
                type="line") %>%
  hc_add_series_scatter(x=plotData$p.bayes*100,
                        y=plotData$yes_per,
                        marker=list(symbol="square",
                                    fillColor="#ccc",
                                    lineWidth=1,
                                    lineColor="#555"),
                        division=as.character(plotData$division),
                        st=plotData$State,
                        error=round(plotData$yes_per - plotData$p.bayes*100,1)) %>%
  hc_tooltip(crosshairs=TRUE,
             useHTML = TRUE,
             shared = FALSE,
             backgroundColor ="rgba(0,0,0,.55)",
             style=list("z-index"="9998",
                        color="white",
                        fontWeight="normal",
                        fontFamily="Avenir Next, Roboto, Helvetica, sans-serif"),
             valueDecmials = 1,
             headerFormat = "<table>",
             pointFormat=paste("<tr><th>{point.division} ({point.st})</th></tr>",
                               "<tr><th>Error: {point.error}</th></tr>"),
             footerFormat = "</table>") %>%
  hc_chart(zoomType = "xy") %>%
  hc_title(text="AMLPS results by electoral division, vs survey-based predictions",
           align="left") %>%
  hc_xAxis(title=list(text="BART + Bayes prediction",
                      style=list(color="#111")),
           tickPositions=seq(30,80,by=10),
           startOnTick=FALSE,
           endOnTick=FALSE,
           min=25,max=85) %>%
  hc_yAxis(title=list(text="AMLPS result",
                      style=list(color="#111")),
           crosshair=TRUE,
           tickPositions=seq(30,80,by=10),
           startOnTick=FALSE,
           endOnTick=FALSE,
           min=25,max=85)

hc$x$hc_opts$series[[1]]$enableMouseTracking <- FALSE
hc$x$hc_opts$series[[2]]$enableMouseTracking <- FALSE

##hc$x$hc_opts$series[[1]]$showInLegend <- FALSE
##hc$x$hc_opts$series[[2]]$showInLegend <- FALSE

tlist <- list(hc)
htmltools::tagList(tlist)                    

Mean-squared error, median absoute deviation:

library(reshape2)
estimates <- melt(plotData,
                  id.vars=c('division','yes_per'),
                  measure.vars=c("p.bayes",
                                 "p.glmer",
                                 "p.bart",
                                 "p.nnet",
                                 "p.glm",
                                 "p.raked",
                                 "naive"))
library(plyr)
library(dplyr)
mse <- estimates %>% 
  mutate(e = yes_per/100 - value) %>%
group_by(variable) %>%
  summarise(mse = mean(e^2,na.rm=TRUE),
            rmse = sqrt(mse),
            mad = median(abs(e),na.rm=TRUE)) %>%
  arrange(rmse)

kable(mse[,c("variable","rmse","mad")],
      digits=4,
      format="html",
      col.names = c("Estimator","RMSE","MAD")) %>%
  kable_styling(full_width=FALSE)
Estimator RMSE MAD
p.raked 0.0702 0.0627
p.bayes 0.0840 0.0771
p.bart 0.0843 0.0773
p.glmer 0.0847 0.0775
p.glm 0.0864 0.0745
p.nnet 0.0900 0.0761
naive 0.1150 0.0763

On both RMSE and MAD, again we see the raked estimates outperforming the various MRP-like approaches. The naive estimates — based on no statistical adjustment at all, but simply taking the VC results at face value — have poor RMSE, but perform at least as well as the most of the adjustment methods on the MAD criterion.

Inspection of errors

The following dotplot compares errors for various estimators ordered by the magnitude of the error for the BART + Bayes estimator. Generally, we see slightly smaller errors for the raked estimator than the BART + Bayes estimator.

estimates$e <- estimates$yes_per/100 - estimates$value
indx <- order(abs(estimates$e[estimates$variable=="p.bayes"]))
levs <- estimates$division[estimates$variable=='p.bayes'][indx]
estimates$division <- factor(estimates$division,levels=levs,ordered = TRUE)

library(ggplot2)
ggplot(data=subset(estimates,variable %in% c("p.bayes","p.raked")),
                   aes(y=division,x=e,color=variable)) +
  geom_point() +
  scale_y_discrete("") +
    scale_color_manual(values = c("p.bayes"="blue","p.raked"="red"),
                     labels=c("BART + Bayes",
                              "Raking (survey)")) +
  scale_x_continuous("Error",labels=percent,breaks=seq(-.2,.2,by=.05)) +
  guides(color=guide_legend(title="",ncol = 2)) +
  theme(panel.border = element_blank(),
        legend.position = "top")

tmp <- dcast(subset(estimates,variable %in% c("p.bayes","p.raked","naive")),
             division + yes_per ~ variable)
tmp$p.raked <- 100*tmp$p.raked
tmp$p.bayes <- 100*tmp$p.bayes
tmp$p.naive <- 100*tmp$naive
tmp$e.bayes <- tmp$yes_per - tmp$p.bayes
tmp$e.raked <- tmp$yes_per - tmp$p.raked
tmp$e.naive <- tmp$yes_per - tmp$p.naive
indx <- order(abs(tmp$e.bayes),decreasing = TRUE)
tmp$state <- ddata$State[match(tmp$division,ddata$division)]

Only in 29 seats does the BART + Bayes estimate have a smaller error than the raked estimate.

The misses in some seats are quite large, especially in Western Sydney seats where the “Yes” vote is low and far below that predicted by the adjustments of the VC estimates.

The general pattern is one of under-estimation of support for SSM, reflecting the fact that “don’t know” responses on VC were not dropped nor rellocated in my analysis.

Divisions with the 20 largest errors for the BART + Bayes estimates are reported below:

library(knitr)
library(kableExtra)
kable(tmp[indx[1:20],
      c("division","state",
        "yes_per",
        "p.bayes","e.bayes",
        "p.raked","e.raked",
        "p.naive","e.naive")],
      format = "html",
      col.names = c("Division","State",
                    "AMLPS Yes",
                    "Estimate","Error",
                    "Estimate","Error",
                    "Estimate","Error"),
      digits=1,row.names = FALSE) %>%
  kable_styling(full_width=FALSE) %>%
  add_header_above(header=c(" "=3,
                            "BART + Bayes"=2,
                            "Raked"=2,
                            "Naive"=2))
BART + Bayes
Raked
Naive
Division State AMLPS Yes Estimate Error Estimate Error Estimate Error
Watson NSW 30.4 53.6 -23.2 44.1 -13.7 66.7 -36.3
Blaxland NSW 26.1 49.0 -22.9 45.9 -19.8 59.2 -33.1
Corangamite VIC 71.6 56.2 15.4 59.6 12.0 75.7 -4.1
McMahon NSW 35.1 50.1 -15.0 46.9 -11.8 62.5 -27.4
Werriwa NSW 36.3 50.7 -14.4 45.5 -9.2 64.1 -27.8
Fowler NSW 36.3 50.0 -13.7 48.8 -12.5 65.5 -29.2
Fairfax QLD 64.3 50.7 13.6 52.1 12.2 64.7 -0.4
Cowper NSW 60.0 46.5 13.5 48.5 11.5 68.4 -8.4
Flinders VIC 70.0 56.6 13.4 57.8 12.2 74.6 -4.6
Parramatta NSW 38.4 51.8 -13.4 50.4 -12.0 64.5 -26.1
Warringah NSW 75.0 61.8 13.2 65.4 9.6 79.3 -4.3
Boothby SA 68.5 56.3 12.2 58.0 10.5 74.4 -5.9
Chifley NSW 41.3 53.3 -12.0 50.6 -9.3 59.7 -18.4
Mayo SA 64.7 52.9 11.8 54.1 10.6 70.0 -5.3
Barton NSW 43.6 55.2 -11.6 53.4 -9.8 76.0 -32.4
Fisher QLD 62.8 51.6 11.2 53.4 9.4 64.4 -1.6
Denison TAS 73.8 62.7 11.1 64.8 9.0 79.8 -6.0
Macquarie NSW 63.9 52.8 11.1 54.4 9.5 71.4 -7.5
Brisbane QLD 79.5 68.5 11.0 69.8 9.7 81.1 -1.6
Richmond NSW 67.9 56.9 11.0 58.2 9.7 74.7 -6.8

Error analysis

With a large set of Census variables aggregated to the CED level, we can begin to examine the model mis-specification that if corrected, might have improved the performance of the model adjustment procedures.

We examine the relationship between prediction errors with four Census aggregates: two indicators of birth place (1) born in East Asia or (2) born outside of Australia, East Asia or an English speaking country (a good measure of the proportion of a CED born in the Middle East or continental Europe); (3) a religious affiliation other than a Christian denomination or no religion; (4) proportion of the electorate not speaking English.

Birth place figured prominently in earlier versions of our modeling and does play a role in the raked estimates, which generally have better performance here. Religious affiliation did figure in our modeling.

plotData$e <- -plotData$p.bayes + plotData$yes_per/100
plotData$e.raked <- -plotData$p.raked + plotData$yes_per/100

library(reshape2)
edata <- melt(plotData,id.vars = c("division","e","e.raked"),
              measure.vars = c("bp.East_Asia",
                               "Religion_Other_Religion",
                               "bp.Other",
                               "SpeakEnglish"))

edata$variable <- factor(edata$variable,
                         levels=sort(unique(edata$variable)),
                         labels=c("Born East Asia",
                                  "Religion Other",
                                  "Born Other",
                                  "Speak English"))

ggplot(edata,aes(x=value,y=e)) + 
  geom_smooth(method="gam",se=FALSE,color="red") +
  geom_point() +
  facet_wrap(~variable,ncol=2,scales="free_x") +
  scale_x_continuous("Prevalence of given attribute, CED",label=percent) +
  scale_y_continuous("Error of BART + Bayes",label=percent)

In each case it is clear that the variables above contain information that — had our modeling utilised it, or utilised it more effectively — we could have presumably improved the performance of our CED estimates based on the adjusting the VC estimates.

Recommendations

I close with some recommendations for myself (!) and the broader research community: