Election Forecasting

Forecasting election outcomes is an essential component of contemporary political analysis, offering a systematic way to interpret the dynamics of campaign periods. As elections approach, polling data accumulate at an increasingly rapid pace, providing a valuable but noisy signal of shifting voter intentions. Daily polls, in particular, capture short-term movements in public opinion that may reflect campaign events, economic shocks, or evolving partisan narratives. Yet these polls also contain substantial sampling error, house effects (機構效應, see this post), and methodological inconsistencies. Consequently, raw polling numbers often obscure rather than clarify the underlying electoral landscape. Effective forecasting therefore requires statistical procedures that separate meaningful shifts in voter sentiment from transient fluctuations.

One widely used strategy is to model national trends in public opinion and then adjust each state’s polling trajectory relative to this national baseline. This approach rests on the observation that much of the movement in polls during a campaign is national in character: major events, candidate performance, and macroeconomic conditions tend to shape preferences across states simultaneously. By estimating a smooth national trend, researchers can anchor the overall direction of opinion and reduce noise inherent in individual surveys.

However, states are not mere microcosms of national politics. Historical voting patterns, demographic compositions, and regional political cultures create state-specific deviations from the national trend. Modeling these deviations allows forecasters to capture persistent state-level biases as well as localized shocks that diverge from national dynamics. Incorporating both components—national shifts and state-specific variation—produces forecasts that are both more robust and more responsive to real changes in voter sentiment.

Ultimately, an election forecasting model that combines daily polling data with a hierarchical structure—linking national movement to state-level adjustments—offers a powerful tool for understanding electoral trajectories. It enables analysts to filter noise, quantify uncertainty, and generate probabilistic predictions that reflect both the shared national political environment and the distinctive features of individual states (or sub-national units). Such models are particularly valuable in closely contested elections, where small state-level shifts can determine the overall outcome.

Since this week’s topic centers on “panel data,” for pedagogical purpose, we illustrate election forecasting with four simple panel models-fully-pooled models, fixed effects model, random intercept model, and random intercept model w/ house effects-using 2020 U.S. Presidential election (Biden vs. Trump) data from ten swing states (搖擺州) between mid-June to Nov. 3rd, 2020 (a 20-week campaign period). We intend to achieve the following objectives from this analysis:

Estimate these four models
Compare deviance statistics (rMSE)
Dichotomize prediction outcomes: the U.S.’s unique two-party system allows us to insulate election outcomes from the “noise” exerted by fringe parties
Assess prediction accuracy: accuracy score and ROC-AUC plot

Let’s rumble!

Data Preparation

### Load the 2020 US election data
# I compiled this data set with data from various polling agencies (Gallup, Monmouth, Rasmussen, Siena, Quinnipiac). Email me for code if you would like
# House effects were calculated using Nate Silver's formula 
# Each cell value under the "adjusted_biden_pct" column was calculated by subtracting house_bias from the average of Biden's polls released by the 5 pollsters, that is: biden_pct - house_bias

panel_data <- readRDS(url("https://www.dropbox.com/scl/fi/j35trj1cdriznsr3c4gy2/2020US_data.rds?rlkey=ufyeb8zw1lr5vtv9alfaeq8y3&st=exwdabs9&dl=1"))

# Check out the data
dim(panel_data)  # 10 swing states x 20 weeks = 200 observations

## [1] 200   7

names(panel_data)

## [1] "state"              "week"               "time_trend"        
## [4] "adjusted_biden_pct" "house_bias"         "biden_final_pct"   
## [7] "state_f"

str(panel_data)

## 'data.frame':    200 obs. of  7 variables:
##  $ state             : chr  "AZ" "AZ" "AZ" "AZ" ...
##  $ week              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ time_trend        : num  0 0.0526 0.1053 0.1579 0.2105 ...
##  $ adjusted_biden_pct: num  -0.751 -0.371 -2.353 -0.636 -0.654 ...
##  $ house_bias        : num  1.2 0.8 2.8 1.1 1.1 0.5 2.8 1.1 2.8 2.8 ...
##  $ biden_final_pct   : num  0.494 0.494 0.494 0.494 0.494 0.494 0.494 0.494 0.494 0.494 ...
##  $ state_f           : Factor w/ 10 levels "AZ","FL","GA",..: 1 1 1 1 1 1 1 1 1 1 ...

Feeding data into prediction models

Four models are fitted.

Fully-pooled model \[\begin{equation*} y_{it} = \beta_0 + \beta_1 t_i + \varepsilon_{it}, \qquad \varepsilon_{it} \overset{iid}{\sim} \mathcal{N}(0,\sigma^2) \end{equation*}\]
Fixed effects model \[\begin{equation*} y_{it} = \beta_0 + \beta_1 t_i + \underbrace{\sum_{s=2}^{S}\gamma_s \mathbf{1}\{s_i=s\}}_{\mbox{States}} + \varepsilon_{it}, \qquad \varepsilon_{it} \overset{iid}{\sim} \mathcal{N}(0,\sigma^2) \end{equation*}\]
Random-intercept model \[\begin{equation*} y_{it} = \beta_0 + \beta_1 t_i + u_s + \varepsilon_{it}, \qquad \begin{cases} \mbox{State intercept }u_s \overset{iid}{\sim} \mathcal{N}(0,\tau^2) \\ \mbox{Nation }\varepsilon_{it} \overset{iid}{\sim} \mathcal{N}(0,\sigma^2) \end{cases}, \;\; u_s \perp \varepsilon_{it} \end{equation*}\]
Random-intercept + house-effect model \[\begin{equation*} y_{it} = \beta_0 + \beta_1 t_i + \underbrace{\beta_2 h_{it}}_{\substack{\text{House} \\ \text{effects}}} + u_s + \varepsilon_{it}, \qquad \begin{cases} \mbox{State intercept }u_s \overset{iid}{\sim} \mathcal{N}(0,\tau^2) \\ \mbox{Nation }\varepsilon_{it} \overset{iid}{\sim} \mathcal{N}(0,\sigma^2) \end{cases}, \;\; u_s \perp \varepsilon_{it} \end{equation*}\]

#install.packages(lme4")  # Uncomment if you haven't installed this package
library(lme4)   # For running hierarchical or mixed effects models

model_pooled <- lm(adjusted_biden_pct ~ time_trend, data = panel_data)
model_fe     <- lm(adjusted_biden_pct ~ time_trend + state_f, data = panel_data)
model_ri     <- lmer(adjusted_biden_pct ~ time_trend + (1 | state_f), data = panel_data)

## boundary (singular) fit: see help('isSingular')

model_ri_house <- lmer(adjusted_biden_pct ~ time_trend + house_bias + (1 | state_f), data = panel_data)

Prediction

# Create a grid as a space filler to store prediction outcomes 
pred_grid <- data.frame(
  time_trend = 1,
  state_f = factor(unique(panel_data$state)),
  house_bias = mean(panel_data$house_bias)
)

# Predict
# How to use predict(): predict(your model, new data)

pred_pooled <- predict(model_pooled, pred_grid)
pred_fe     <- predict(model_fe, pred_grid)
pred_ri     <- predict(model_ri, pred_grid, allow.new.levels = TRUE)  # allow.new.levels set to TRUE to handle newly-generated state-level predictions
pred_ri_house <- predict(model_ri_house, pred_grid, allow.new.levels = TRUE)

# Biden's final voting results in the 10 swing states (source: https://www.fec.gov/resources/cms-content/documents/federalelections2020.pdf)
biden_end <- c(0.494, 0.494, 0.508, 0.503, 0.462, 0.500, 0.494, 0.475, 0.454, 0.460)


predictions <- data.frame(
  state = unique(panel_data$state),
  actual = biden_end,
  pooled = pred_pooled,
  fe = pred_fe,
  ri = pred_ri,
  ri_house = pred_ri_house
)

head(predictions)

##   state actual     pooled         fe         ri   ri_house
## 1    AZ  0.494 -0.8903929 -1.0438896 -0.8903929 -0.8635277
## 2    GA  0.494 -0.8903929 -0.7447264 -0.8903929 -0.8596526
## 3    MI  0.508 -0.8903929 -0.8540940 -0.8903929 -0.8396328
## 4    NV  0.503 -0.8903929 -0.7205049 -0.8903929 -0.8459044
## 5    NC  0.462 -0.8903929 -0.7902733 -0.8903929 -0.8843482
## 6    PA  0.500 -0.8903929 -1.0973917 -0.8903929 -0.8378611

Deviance Metric: rMSE

A model that produces smaller error around the actual value (final voting result) indicates the model is more accurate. We use root mean quare error (rMSE) as our yardstick.

# rMSE
rmse_vals <- sapply(predictions[, -1], function(x) sqrt(mean((x - predictions$actual)^2)))

rmse_df <- data.frame(Model = names(rmse_vals), RMSE = round(rmse_vals, 4))

print(rmse_df)

##             Model   RMSE
## actual     actual 0.0000
## pooled     pooled 1.3749
## fe             fe 1.3839
## ri             ri 1.3749
## ri_house ri_house 1.3485

# Random effects model w/ house effects produces the smallest rMSE and is therefore the most accurate among the four models, followed by fully-pooled and random effects model (the two having identical rMSE). The fixed effects model fares the worst, but also not too far off.

Dichotomized Measure and Accuracy Score

Under the winner-take-all system of vote allocation, a candidate who receives > 50% of a state’s electoral college votes canvasses all Electoral College votes of that state and thereby “wins” that state. This rather eccentric (though I would say quite practical) rule allows us to dichotomize our prediction outcome for each state, and calculate the overall “accuracy” score of our prediction. The accuracy score is defined as:

\[\begin{equation*} \frac{\mbox{True Positives }+\mbox{ True Negatives}}{\mbox{True Positives }+\mbox{ True Negatives }+\mbox{ False Positives }+\mbox{ False Negatives}}, \end{equation*}\]

corresponding to the four cells on the confusion matrix.

A model that receives higher accuracy score means it’s more accurate.

# Dichotomize prediction outcome
pred_bin <- predictions[, -c(1:2)] > 0.5
actual_bin <- predictions$actual > 0.5
accuracy_vals <- colMeans(pred_bin == actual_bin)
acc_df <- data.frame(Model = names(accuracy_vals), Accuracy = accuracy_vals)

print(acc_df)

##             Model Accuracy
## pooled     pooled      0.8
## fe             fe      0.8
## ri             ri      0.8
## ri_house ri_house      0.8

# It seems that the four models have almost indistinguishable accuracy scores. That's bad, we turn to graphic method as an alternative evaluation metric.

ROC-AUC plot

Receiver Operating Characteristic (ROC) is a graphical tool that illustrates the diagnostic ability of a \(\textit{binary classifier}\) as its discrimination threshold varies. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR, which is calculated as \(\frac{FP}{FP + TN}\) \(=\) 1 \(-\) specificity) across various thresholds. Because the ROC curve is generated by plotting the cumulative distribution function (CDF) of the detection probability on the \(y\)-axis versus the CDF of the false positive rate (which is just the complementary probability of \(specificity\)) on the \(x\)-axis, curves having higher TPR (\(sensitivity\)) and lower FPR (1 \(-\) \(specificity\)) have larger Area Under the Curve (AUC) and are said to have better diagnostic ability. Let’s plot it.

#install.packages("pROC", "ggplot2")  # Uncomment if you haven't installed these packages
library(pROC)
library(ggplot2)

# Calculate the AUC for each model
auc_vals <- sapply(predictions[, -c(1:2)], function(x) {
  roc_obj <- roc(actual_bin, x, quiet = TRUE)
  as.numeric(auc(roc_obj))
})
auc_df <- data.frame(Model = names(auc_vals), AUC = round(auc_vals, 4))

print(auc_df)

##             Model   AUC
## pooled     pooled 0.500
## fe             fe 0.750
## ri             ri 0.500
## ri_house ri_house 0.875

# ROC-AUC Plot
roc_list <- lapply(predictions[, -c(1:2)], function(x) roc(actual_bin, x, quiet = TRUE))
names(roc_list) <- c("Pooled", "Fixed Effects", "Random Intercept", "RI + House")
ggroc(roc_list) + 
  theme_minimal() + 
  labs(title = "ROC Curves: 2020 Swing States", color = "Model", x = "1 - Specificity", y = "Sensitivity")

Aligned with our previous rMSE estimates, the RI + house effects model has the largest AUC and can be said to be the most accurate model among the four.

Gauging the State of Online Public Opinion: the case of COVID vaccine preference

In June 2021, Taiwan was in a period of a significant domestic COVID-19 outbreak, and its vaccination rollout was still in its early stages. Access to vaccines was limited, and a large majority of the population was unvaccinated at that time. What makes this matter worse is that it touched on every nerve of Taiwan’s domestic politics. First, there’s a imploding crisis centered on a shortage of international vaccines (Moderna, BioNTech (hereafter, BNT)) amid another debate over the Emergency Use Authorization (EUA) for a domestically developed vaccine, Medigen (高端) (rather than a “no vaccine” controversy in the sense of a complete absence of shots). The primary issue was the availability and brand preference.

\(\hspace{50mm} \textbf{A Deja Vu?}\)

Public opinion was divided. The DPP government was accusing Beijing of interfering with its BNT deal, suspecting the BNT’s Shanghai-based distributor as carrying some unscrupulous political agendas, whereas the U.S.-supplied Moderna was seen as the testiment of U.S. support for Taiwan’s counter-pandemic efforts, making Moderna the preferred vaccine of choice. Meanwhile, the opposition parties (KMT and TPP) blamed the government for procrastinating the acquisition of BNT vaccines due to the DPP’s ideological stance.

Rhetorics were politically charged, same were partisans’ emotions. This makes us wonder if partisans’ emotional attachments to \(\textbf{their}\) parties were also reflected in their preferred vaccine brands despite the majority of the population was unvaccinated at that time. Let’s find out!

The NTU WebSurvey (NTUWS) research team administered a brief survey on its panelists between 10th and 14th of June, 2021 concerning the following matters:

Public support for counter-pandemic efforts carried out by different levels of government, and
Vaccine brand preferences.

The collected responses were then combined with surveyed panelists’ registered personal attributes to make a usable data set of 1,040 observations. We then applied support vector machine (SVM), a machine learning algorithm for non-linear classification that projects a hyperplane (or a decision boundary) to classify data points belonging to different classes as a function of relevant attributes (which are used as predictors) using kernel trick. Analytical procedures are outlined at below.

Data Preparation

Sys.setlocale("LC_ALL","Chinese")  # Chinese-encoding

## Warning in Sys.setlocale("LC_ALL", "Chinese"): using locale code page other
## than 65001 ("UTF-8") may cause problems

## [1] "LC_COLLATE=Chinese_Taiwan.936;LC_CTYPE=Chinese_Taiwan.936;LC_MONETARY=Chinese_Taiwan.936;LC_NUMERIC=C;LC_TIME=Chinese_Taiwan.936"

vac <- read.csv("C://Users/hktse/Dropbox/眾包平台/問卷資料備份/Surveycake/vaccine_pilot.csv", encoding="UTF-8", stringsAsFactors=FALSE)

# Check out the data
dim(vac)

## [1] 1040   21

str(vac)

## 'data.frame':    1040 obs. of  21 variables:
##  $ X.U.FEFF.Vaccination.helps.counter.COVID.19                                                        : chr  "Agree" "Strongly agree" "Strongly agree" "Agree" ...
##  $ If.COVID.vaccines.beome.available..I.m.willing.to.get.vaccinated                                   : chr  "Strongly agree" "Strongly agree" "Strongly agree" "Agree" ...
##  $ Which.of.the.following.best.describle.your.reason.for.not.getting.vaccinated                       : chr  "" "" "" "" ...
##  $ When.COVID.vaccines.become.available..how.soon.would.you.be.willing.to.get.vaccinated.             : chr  "Immediately" "1-3 months" "1-3 months" "3-6 months" ...
##  $ You.are.most.willing.to.receive.vaccinations.made.by.which.of.the.following.countries.             : chr  "US" "US" "US" "Germany" ...
##  $ Which.vaccines.are.you.most.willing.to.receive.vaccination.                                        : chr  "BNT" "Sinovac" "Moderna" "Moderna" ...
##  $ Source.country..Moderna                                                                            : chr  "US" "US" "US" "US" ...
##  $ Source.country..BNT                                                                                : chr  "US" "US" "Germany" "Germany" ...
##  $ Source.country..AZ                                                                                 : chr  "UK" "UK" "UK" "UK" ...
##  $ From.1.10..How.would.you.rate.Minister.Chen.Shih.chung.s.performance.                              : int  7 7 7 2 5 3 10 7 10 3 ...
##  $ From.1.10..How.would.you.rate.President.Tsai.Ing.Wen.s.performance.                                : int  5 5 5 4 5 4 5 7 10 4 ...
##  $ From.1.10..How.would.you.rate.the.performance.of.the.mayor.of.the.city.county.you.currently.reside.: int  8 9 9 7 5 8 5 6 10 5 ...
##  $ Gender                                                                                             : int  1 1 1 0 0 0 1 0 1 0 ...
##  $ Age                                                                                                : chr  "30~39" "30~39" "30~39" "30~39" ...
##  $ City                                                                                               : chr  "台北市" "新竹縣" "新竹縣" "新北市" ...
##  $ Party                                                                                              : chr  "國民黨" "泛綠，但沒有偏向特定政黨" "泛綠，但沒有偏向特定政黨" "泛藍，但沒有偏向特定政黨" ...
##  $ Education                                                                                          : int  5 5 5 4 4 4 4 3 4 3 ...
##  $ Marriage                                                                                           : chr  "從未結婚" "已婚" "已婚" "已婚" ...
##  $ Children                                                                                           : int  1 0 0 0 1 0 1 0 1 1 ...
##  $ Son                                                                                                : int  0 1 1 1 0 1 0 1 0 0 ...
##  $ Daughter                                                                                           : int  0 1 0 0 0 0 0 1 0 0 ...

names(vac)

##  [1] "X.U.FEFF.Vaccination.helps.counter.COVID.19"                                                        
##  [2] "If.COVID.vaccines.beome.available..I.m.willing.to.get.vaccinated"                                   
##  [3] "Which.of.the.following.best.describle.your.reason.for.not.getting.vaccinated"                       
##  [4] "When.COVID.vaccines.become.available..how.soon.would.you.be.willing.to.get.vaccinated."             
##  [5] "You.are.most.willing.to.receive.vaccinations.made.by.which.of.the.following.countries."             
##  [6] "Which.vaccines.are.you.most.willing.to.receive.vaccination."                                        
##  [7] "Source.country..Moderna"                                                                            
##  [8] "Source.country..BNT"                                                                                
##  [9] "Source.country..AZ"                                                                                 
## [10] "From.1.10..How.would.you.rate.Minister.Chen.Shih.chung.s.performance."                              
## [11] "From.1.10..How.would.you.rate.President.Tsai.Ing.Wen.s.performance."                                
## [12] "From.1.10..How.would.you.rate.the.performance.of.the.mayor.of.the.city.county.you.currently.reside."
## [13] "Gender"                                                                                             
## [14] "Age"                                                                                                
## [15] "City"                                                                                               
## [16] "Party"                                                                                              
## [17] "Education"                                                                                          
## [18] "Marriage"                                                                                           
## [19] "Children"                                                                                           
## [20] "Son"                                                                                                
## [21] "Daughter"

# Recode "Do you think vaccine helps counter COVID19?"
vac[1] <- as.integer(ifelse(vac[1] == "Strongly agree", 4,
       ifelse(vac[1] == "Agree", 3, ifelse(vac[1] == "Disagree", 2, 1
                                           ))))

# Recode: I'm willing to get vaccinated if COVID vaccines beome available
vac[2] <- as.integer(ifelse(vac[2] == "Strongly agree", 4,
                            ifelse(vac[2] == "Agree", 3, ifelse(vac[2] == "Disagree", 2, 1
                            ))))

# Recode: When COVID vaccines become available, how soon would you be willing to get vaccinated?

vac[4] <- as.integer(ifelse(vac[4] == "Immediately", 6,
                            ifelse(vac[4] == "less than 1 month", 5, ifelse(vac[4] == "1-3 months", 4, 
                                    ifelse(vac[4] == "3-6 months", 3, ifelse(vac[4] == "6-12 months", 2, ifelse(vac[4] == "1-2 years", 1, NA)))                                        
                            ))))

# Recode PID: higher values mean more identified with the DPP (or pan-green)

vac$Party <- as.numeric(ifelse(vac$Party == "新黨", -4,
                            ifelse(vac$Party == "親民黨", -3, ifelse(vac$Party == "國民黨", -2, 
                                                                            ifelse(vac$Party == "泛藍，但沒有偏向特定政黨", -1, 
                                                                                   ifelse(vac$Party == "台灣民眾黨", 0, ifelse(vac$Party == "泛綠，但沒有偏向特定政黨", 1, 
                                                                            ifelse(vac$Party == "民進黨", 2,
                                                                                   ifelse(vac$Party == "時代力量", 3,
                                                                                          ifelse(vac$Party == "台灣基進", 4, vac$Party))))))                                        
                            ))))


# Dichotomize partisan 
vac$partisan <- as.factor(ifelse(vac$Party < 0, 1, 0))

# Recode age

vac$Age <- as.integer(ifelse(vac$Age == "1歲", 1, 
                             ifelse(vac$Age == "20~29", 2, 
                                    ifelse(vac$Age == "30~39", 3, 
                                           ifelse(vac$Age == "40~49", 4, 
                                             ifelse(vac$Age == "50~59", 5,     
                                             ifelse(vac$Age == "60~69", 6, 
                                                    ifelse(vac$Age == "70歲以上", 7, vac$Age))))))
                      ))

# Rename columns

names(vac)[1] <- "effectiveness"
names(vac)[2] <- "vaccinated"
names(vac)[10] <- "Chen Shih-Chung's rating"
names(vac)[11] <- "Tsai Ing-Wen's rating"
names(vac)[12] <- "Local Mayor's rating"

names(vac)[4] <- "How soon"
names(vac)[5] <- "Vaccine source country"
names(vac)[6] <- "Brand"

names(vac)[7] <- "Source country: Moderna"
names(vac)[8] <- "Source country: BNT"
names(vac)[9] <- "Source country: AZ"
names(vac)[10] <- "CSC"  # 陳時中's rating
names(vac)[11] <- "TIW"　# 蔡英文's rating

# Recode vaccine source countries
vac$Moderna <- ifelse(vac$"Source country: Moderna" == "US", 1, 0)
vac$BNT <- ifelse(vac$"Source country: BNT" == "Germany", 1, 0)
vac$AZ <- ifelse(vac$"Source country: AZ" == "AZ", 1, 0)

# Recode marriage status
vac$Marriage <- ifelse(vac$Marriage == "從未結婚" | vac$Marriage == "同居" | vac$Marriage == "離婚", 0, 1)

# Trichotomize partisan vaccine orientation: 1 for Moderna, 2 for BNT, 3 for other brands
vac$vac_partisan <- as.factor(ifelse(vac$Brand == "Moderna", 1, ifelse(vac$Brand == "BNT", 2, 3)))


# Subset required columns 
sub_vac <- vac[, c(1, 12, 13, 14, 16, 17, 18, 19, 23, 26, 11)]
sub_vac$Party <- sub_vac$Party + 5  # Convert to positive values

# Temporarily move out BNT_partisan for re-scaling
y <- sub_vac$vac_partisan

Feeding data into SVM

library(caret)
# preprocess the data, "range" scales the data to the interval between zero and one
vac_normalize <- preProcess(sub_vac, method = c("center", "scale"))
sub_vac <- predict(vac_normalize, newdata = sub_vac)

# Assign the rescaled values back in
sub_vac$vac_partisan <- y    


### Plotting SVM
# install.packages(c("plotly", "e1071", "ggplot2")) # Uncomment if you haven't installed these packages
library(e1071)
library(ggplot2)

# First run a simple SVM classifier using Party and approval rating for President 蔡英文 as predictors
svm_2D.fit <- svm(vac_partisan ~ Party + TIW, 
               data = sub_vac, 
               type = "C-classification",  # Linear classifier
               kernal = "radial", 
               gamma = 0.5, cost = 1)

# Color the graph
colors = c("#99FF99","#3399FF","#81D8D0")  # DPP -> KMT -> TPP
plot(svm_2D.fit, data = sub_vac, Party ~ TIW, col = colors)

The plot does show that DPP supporters tend to favor Moderna, compared to their KMT counterparts who slightly prefer BNT, while TPP supporters’ vaccine preferences are a bit hard to classify.

Projecting classification result into a 3-dimensional space

library(plotly)  # For 3D plot

# Keep only Moderna vs BNT (binary) for a clean-cut binary surface for demonstration purpose
sub_vac <- sub_vac[sub_vac$vac_partisan %in% c("Moderna", "BNT"), ]
sub_vac$vac_partisan <- factor(sub_vac$vac_partisan,
                               levels = c("BNT", "Moderna"))   # BNT = negative, Moderna = positive

# Perform a SVM classifier using Partisan orientation, Tsai's rating, and local mayor's rating as predictors
svm_3D.fit <- svm(vac_partisan ~ Party + TIW + `Local Mayor's rating`, 
               data = sub_vac, 
               type = "C-classification",  # Linear classifier
               kernal = "radial", 
               gamma = 0.5, cost = 1)

# Create a 2D grid as x1 and x2 (to save on typing), and then fix x3 at median value
x1_grid <- seq(min(sub_vac$Party), max(sub_vac$Party), length.out = 50)
x2_grid <- seq(min(sub_vac$TIW), max(sub_vac$TIW), length.out = 50)
grid_2d <- expand.grid(x1 = x1_grid, x2 = x2_grid)
grid_2d$x3 <- median(sub_vac$`Local Mayor's rating`)  # Fix x3
# Rename the first three columns corresponding to the 3 predictors used in SVM
names(grid_2d)[1:3] <- c("Party", "TIW", "Local Mayor's rating")

# Predict decision values (signed distance)
grid_2d$decision <- attr(predict(svm_3D.fit, grid_2d, decision.values = TRUE),
                         "decision.values")[,1]

# Reshape decision values into a 50 x 50 matrix for surface
decision_mat <- matrix(grid_2d$decision, nrow = length(x1_grid), ncol = length(x2_grid))


# Customize colorscale (light-blue <- white -> light-green)
custom_colorscale <- list(
  c(0,   "#3399FF"),  # BNT side (negative)
  c(0.5, "#81D8D0"),  # boundary (Tiffany blue)
  c(1,   "#99FF99")   # Moderna side (positive)
)

# Plot: 3D scatter + decision surface (fixed x3 at its median value)

plot_ly() %>%
  # Moderna (light green)
  add_markers(data = sub_vac[sub_vac$vac_partisan == "Moderna", ], x = ~x1, y = ~x2, z = ~x3,
              marker = list(color = "#99FF99", size = 3),
              name = "Moderna", showlegend = TRUE) %>%
  # BNT (light blue)
  add_markers(data = sub_vac[sub_vac$vac_partisan == "BNT", ], x = ~x1, y = ~x2, z = ~x3,
              marker = list(color = "#3399FF", size = 3),
              name = "BNT", showlegend = TRUE) %>%
  # Decision surface: custom colors, NO colorbar
  add_surface(
    x = x1_grid, y = x2_grid,
    z = matrix(median(sub_vac$`Local Mayor's rating`), 50, 50),
    surfacecolor = decision_mat,
    colorscale = custom_colorscale,
    cmin = -3, cmax = 2,  # cmin should be smaller than min(decision_mat), cmax should be larger than max(decision_mat). This means range [-3, 0] will be colored in light blue, and range [1, 2] will be colored in light green.
    opacity = 0.75,
    showscale = FALSE,  # THIS REMOVES THE COLORBAR
    contours = list(z = list(show = TRUE, color = "black", width = 2))
  ) %>%
  layout(
    scene = list(
      xaxis = list(title = "DPP ←  → KMT"),
      yaxis = list(title = "Tsai Ing-wen's rating"),
      zaxis = list(title = "Local Mayor's rating")
    ),
    title = "3D SVM: Partisan Vaccine Preference (Moderna vs BNT)",
    legend = list(title = list(text = "Vaccine preference"))
  )

The output should trigger an interactive pop-up on the “Viewer” of your RStudio output panel (lower right corner). Alas, I don’t have this privilege on my Rmarkdown, so I put the screencap here for your reference.

Clearly, respondents who identified themselves with the DPP AND gave President Tsai Ing-wen higher rating tended to favor Moderna, compared to those who identified themselves with the KMT AND gave President Tsai Ing-wen lower rating. Partisan politics are a pervasive element of modern life, even in matters like COVID!

So, yes, partisanship does affect vaccine brand preference!

PS 5708 Week 12: Forecasting using Panel data

HK Tseng

2025-11-14

Election Forecasting

Data Preparation

Feeding data into prediction models

Prediction

Deviance Metric: rMSE

Dichotomized Measure and Accuracy Score

ROC-AUC plot

Gauging the State of Online Public Opinion: the case of COVID vaccine preference

Data Preparation

Feeding data into SVM

Projecting classification result into a 3-dimensional space