Overview

The semester project requires you to model a problem using any of the methods covered in this class. You are welcome to work on a problem that you are already working on at your work.

Data Exploration

anes_data <- read.csv(paste0("https://raw.githubusercontent.com/jzuniga123/",
                        "SPS/master/DATA%20621/anes_timeseries_2016.csv"), header = T)
VARIABLE CODE NAME DESCRIPTION
330 V161245 relig_attendance PRE: Attend religious services how often
668 V162002 television_usage POST: How many programs about 2016 campaign did Respondent watch on TV
670 V162004 internet_usage POST: How many times Respondent got info about 2016 campaign on the Internet
692 V162018e internet_activity POST: DHS: sent a message on Facebook/Twitter about polit iss
719 V162031 voted_presidential POST: Did Respondent vote for President
726 V162034a presidential_vote POST: For whom did Respondent vote for President
727 V162035 presidential_intensity POST: Preference strong for Pres cand for whom Respondent voted
773 V162076b justice_recall POST: Office recall: US Supreme Ct Chief Justice Roberts
792 V162096 therm_feminists POST: Feeling thermometer: FEMINISTS
794 V162098 therm_unions POST: Feeling thermometer: LABOR UNIONS
796 V162100 therm_business POST: Feeling thermometer: BIG BUSINESS
799 V162103 therm_glbt POST: Feeling thermometer: GAY MEN AND LESBIANS
801 V162105 therm_rich POST: Feeling thermometer: RICH PEOPLE
802 V162106 therm_muslims POST: Feeling thermometer: MUSLIMS
803 V162107 therm_christians POST: Feeling thermometer: CHRISTIANS
804 V162108 therm_jews POST: Feeling thermometer: JEWS
806 V162110 therm_police POST: Feeling thermometer: POLICE
808 V162112 therm_scientists POST: Feeling thermometer: SCIENTISTS
822 V162123 nationalism POST: Better if rest of world more like America
825 V162125x flag_pride POST: SUMMARY- How good/bad does Respondent feel to see American flag
837 V162136x economic_mobility POST: SUMMARY-Economic mobility easier/harder compared to 20 yrs ago
840 V162139 reducing_debt POST: Importance of reducing deficit
841 V162140 millionaires_tax POST: Does Respondent favor or oppose tax on millionaires
845 V162144 obamacare_costs POST: Health Care Law effect on cost of Respondent’s health care
865 V162158 immig_hurts_jobs POST: How likely immigration will take away jobs
867 V162160 terror_anxiety POST: How worried about terrist attack next 12 months
873 V162165 fin_situation POST: Worry about financial situation
899 V162185 govt_size POST: Less govt better OR more that govt should be doing
931 V162212 slavery_hurts POST: Agree/disagree: past slavery make more diff for blacks
954 V162230x gender_roles POST: SUMMARY- Better if man works and woman takes care of home
968 V162239 indep_respect POST: Child trait more important: independence or respect
985 V162255x obama_muslim POST: SUMMARY- Barack Obama is/isn’t Muslim
1197 V999999 target POST: Vote intensity (2-candidate) created by combining V162034a and V162035

Clean Data

Select columns and only keep respondents who voted. Refused and don’t know type answers are represented by negative numbers. Therefore, all negative numbers are being converted to nulls. ANES Feeling Thermometers are evely spaced interval variables ranging from 0 to 100, therefore all numbers outside this range are also being converted to nulls.

variables <- c(330, 668, 670, 692, 719, 726, 727, 773, 792, 794, 796, 799, 801, 802, 803,
  804, 806, 808, 822, 825, 837, 840, 841, 845, 865, 867, 873, 899, 931, 954, 968, 985)
anes_data <- anes_data[ , variables]
anes_data <- subset(anes_data, V162031 == 4) # only look at subset that voted
anes_data <- anes_data[ , -which(names(anes_data) %in% "V162031")]
anes_data[anes_data < 0 | anes_data > 100] <- NA # make invalid values NA
names(anes_data)
##  [1] "V161245"  "V162002"  "V162004"  "V162018e" "V162034a" "V162035" 
##  [7] "V162076b" "V162096"  "V162098"  "V162100"  "V162103"  "V162105" 
## [13] "V162106"  "V162107"  "V162108"  "V162110"  "V162112"  "V162123" 
## [19] "V162125x" "V162136x" "V162139"  "V162140"  "V162144"  "V162158" 
## [25] "V162160"  "V162165"  "V162185"  "V162212"  "V162230x" "V162239" 
## [31] "V162255x"

Normalize Data

Create target variable, remove underlying variables used to create the target variable, specify factor levels for target variable, remove rows with null values for the target variable, and name columns.

V999999 <- rep(NA, nrow(anes_data))
V999999[anes_data[, "V162034a"] == 1 & anes_data[, "V162035"] == 1] <- 1 # Voter w/ strong pref for Hillary
V999999[anes_data[, "V162034a"] == 1 & anes_data[, "V162035"] == 2] <- 2 # Voter w/o strong pref for Hillary
V999999[anes_data[, "V162034a"] == 2 & anes_data[, "V162035"] == 2] <- 3 # Voter w/o strong pref for Trump
V999999[anes_data[, "V162034a"] == 2 & anes_data[, "V162035"] == 1] <- 4 # Voter w/ strong pref for Trump
anes_data <- anes_data[ , -which(names(anes_data) %in% "V162034a")]
anes_data <- anes_data[ , -which(names(anes_data) %in% "V162035")]
anes_data[, "target"] <- as.factor(V999999)
levels(anes_data[, "target"]) <- c("HRC_Strong", "HRC_Weak", "DJT_Weak", "DJT_Strong")
anes_data <- subset(anes_data, !is.na(target)) # remove rows with null target values

colnames(anes_data) <- c("relig_attendance", "television_usage", "internet_usage", 
  "internet_activity", "justice_recall", "therm_feminists", "therm_unions", 
  "therm_business", "therm_glbt", "therm_rich", "therm_muslims", "therm_christians", 
  "therm_jews", "therm_police", "therm_scientists", "nationalism", "flag_pride", 
  "economic_mobility", "reducing_debt", "millionaires_tax", "obamacare_costs", 
  "immig_hurts_jobs", "terror_anxiety", "fin_situation", "govt_size", 
  "slavery_hurts", "gender_roles", "indep_respect", "obama_muslim", "target")
names(anes_data)
##  [1] "relig_attendance"  "television_usage"  "internet_usage"   
##  [4] "internet_activity" "justice_recall"    "therm_feminists"  
##  [7] "therm_unions"      "therm_business"    "therm_glbt"       
## [10] "therm_rich"        "therm_muslims"     "therm_christians" 
## [13] "therm_jews"        "therm_police"      "therm_scientists" 
## [16] "nationalism"       "flag_pride"        "economic_mobility"
## [19] "reducing_debt"     "millionaires_tax"  "obamacare_costs"  
## [22] "immig_hurts_jobs"  "terror_anxiety"    "fin_situation"    
## [25] "govt_size"         "slavery_hurts"     "gender_roles"     
## [28] "indep_respect"     "obama_muslim"      "target"

Convert unordered categorical variables into ordinal variables. For instance, “yes, no, maybe” is not in an intuitive order. However, “yes, maybe, no” is in an intuitive from positive to neutral to negative.

swap <- function(vector, value1, value2){
  vector <- replace(vector, vector == value1, value2)
  vector <- replace(vector, vector == value2, value1)
  return(vector)
}
anes_data[, "millionaires_tax"] <- swap(anes_data[, "millionaires_tax"], 2, 3)
anes_data[, "obamacare_costs"] <- swap(anes_data[, "obamacare_costs"], 2, 3)
anes_data[, "indep_respect"] <- swap(anes_data[, "indep_respect"], 2, 3)

Create random training and evaluation datasets, specify \(X\) variables, specify \(Y\) variable, and specify varible types that will be examined later in the analysis.

set.seed(2017)
N <- nrow(anes_data)
n <- sample(1:N, 2 * N / 3, replace = F)

The training data set is anes_data[n, ] with anes_data[n, -ncol(anes_data)] representing the \(X\) variables and anes_data[n, ncol(anes_data)] representing the \(Y\) variable. The evaluation data set is anes_data[-n, -ncol(anes_data)].

Binary Variables

binary <- c(4, 5, 20, 21, 25, 28)
names(anes_data[binary])
## [1] "internet_activity" "justice_recall"    "millionaires_tax" 
## [4] "obamacare_costs"   "govt_size"         "indep_respect"

Ordinal Variables

ordinal <- c(1:3, 16:19, 22:24, 26, 27, 29)
names(anes_data[ordinal])
##  [1] "relig_attendance"  "television_usage"  "internet_usage"   
##  [4] "nationalism"       "flag_pride"        "economic_mobility"
##  [7] "reducing_debt"     "immig_hurts_jobs"  "terror_anxiety"   
## [10] "fin_situation"     "slavery_hurts"     "gender_roles"     
## [13] "obama_muslim"

Interval Variables

interval <- grep("^therm", colnames(anes_data))
names(anes_data[interval])
##  [1] "therm_feminists"  "therm_unions"     "therm_business"  
##  [4] "therm_glbt"       "therm_rich"       "therm_muslims"   
##  [7] "therm_christians" "therm_jews"       "therm_police"    
## [10] "therm_scientists"

Numerical Summaries

summary(anes_data[n, ], digits=3)
##  relig_attendance television_usage internet_usage internet_activity
##  Min.   :1.00     Min.   :1.00     Min.   :1.00   Min.   :1.00     
##  1st Qu.:1.00     1st Qu.:2.00     1st Qu.:2.00   1st Qu.:1.00     
##  Median :2.00     Median :3.00     Median :3.00   Median :2.00     
##  Mean   :2.42     Mean   :3.04     Mean   :3.02   Mean   :1.64     
##  3rd Qu.:4.00     3rd Qu.:4.00     3rd Qu.:4.00   3rd Qu.:2.00     
##  Max.   :5.00     Max.   :4.00     Max.   :4.00   Max.   :2.00     
##  NA's   :616                       NA's   :2      NA's   :2        
##  justice_recall  therm_feminists  therm_unions   therm_business 
##  Min.   :0.000   Min.   :  0.0   Min.   :  0.0   Min.   :  0.0  
##  1st Qu.:0.000   1st Qu.: 40.0   1st Qu.: 40.0   1st Qu.: 40.0  
##  Median :0.000   Median : 51.0   Median : 60.0   Median : 50.0  
##  Mean   :0.284   Mean   : 57.3   Mean   : 56.8   Mean   : 49.8  
##  3rd Qu.:1.000   3rd Qu.: 79.0   3rd Qu.: 71.0   3rd Qu.: 62.0  
##  Max.   :1.000   Max.   :100.0   Max.   :100.0   Max.   :100.0  
##                  NA's   :24      NA's   :13      NA's   :13     
##    therm_glbt      therm_rich    therm_muslims   therm_christians
##  Min.   :  0.0   Min.   :  0.0   Min.   :  0.0   Min.   :  0.0   
##  1st Qu.: 50.0   1st Qu.: 50.0   1st Qu.: 40.0   1st Qu.: 60.0   
##  Median : 60.0   Median : 50.0   Median : 50.0   Median : 85.0   
##  Mean   : 61.4   Mean   : 55.2   Mean   : 55.2   Mean   : 76.5   
##  3rd Qu.: 85.0   3rd Qu.: 70.0   3rd Qu.: 70.0   3rd Qu.:100.0   
##  Max.   :100.0   Max.   :100.0   Max.   :100.0   Max.   :100.0   
##  NA's   :12      NA's   :17      NA's   :28      NA's   :16      
##    therm_jews     therm_police   therm_scientists  nationalism 
##  Min.   :  0.0   Min.   :  0.0   Min.   :  0.0    Min.   :1.0  
##  1st Qu.: 51.0   1st Qu.: 70.0   1st Qu.: 60.0    1st Qu.:2.0  
##  Median : 71.0   Median : 85.0   Median : 84.0    Median :3.0  
##  Mean   : 73.2   Mean   : 76.9   Mean   : 77.0    Mean   :3.1  
##  3rd Qu.: 86.0   3rd Qu.: 93.0   3rd Qu.: 93.8    3rd Qu.:4.0  
##  Max.   :100.0   Max.   :100.0   Max.   :100.0    Max.   :5.0  
##  NA's   :22      NA's   :7       NA's   :13       NA's   :1    
##    flag_pride   economic_mobility reducing_debt  millionaires_tax
##  Min.   :1.00   Min.   :1.00      Min.   :1.00   Min.   :1.00    
##  1st Qu.:1.00   1st Qu.:4.00      1st Qu.:1.00   1st Qu.:1.00    
##  Median :1.00   Median :6.00      Median :2.00   Median :1.00    
##  Mean   :1.89   Mean   :5.48      Mean   :1.87   Mean   :1.33    
##  3rd Qu.:2.00   3rd Qu.:7.00      3rd Qu.:2.00   3rd Qu.:2.00    
##  Max.   :7.00   Max.   :7.00      Max.   :5.00   Max.   :2.00    
##  NA's   :1      NA's   :7         NA's   :5      NA's   :5       
##  obamacare_costs immig_hurts_jobs terror_anxiety fin_situation 
##  Min.   :1.00    Min.   :1.00     Min.   :1.00   Min.   :1.00  
##  1st Qu.:1.00    1st Qu.:2.00     1st Qu.:2.00   1st Qu.:3.00  
##  Median :1.00    Median :3.00     Median :3.00   Median :3.00  
##  Mean   :1.24    Mean   :2.75     Mean   :2.71   Mean   :3.36  
##  3rd Qu.:1.00    3rd Qu.:4.00     3rd Qu.:3.00   3rd Qu.:4.00  
##  Max.   :2.00    Max.   :4.00     Max.   :5.00   Max.   :5.00  
##  NA's   :35      NA's   :5        NA's   :2      NA's   :3     
##    govt_size    slavery_hurts   gender_roles  indep_respect 
##  Min.   :1.00   Min.   :1.00   Min.   :1.00   Min.   :1.00  
##  1st Qu.:1.00   1st Qu.:2.00   1st Qu.:2.00   1st Qu.:1.00  
##  Median :1.00   Median :3.00   Median :4.00   Median :2.00  
##  Mean   :1.49   Mean   :2.96   Mean   :3.22   Mean   :1.74  
##  3rd Qu.:2.00   3rd Qu.:4.00   3rd Qu.:4.00   3rd Qu.:2.00  
##  Max.   :2.00   Max.   :5.00   Max.   :7.00   Max.   :2.00  
##  NA's   :11     NA's   :4      NA's   :23     NA's   :7     
##   obama_muslim          target   
##  Min.   : 1.00   HRC_Strong:609  
##  1st Qu.: 4.00   HRC_Weak  :243  
##  Median : 8.00   DJT_Weak  :205  
##  Mean   : 7.01   DJT_Strong:586  
##  3rd Qu.:10.00                   
##  Max.   :10.00                   
##  NA's   :84

Looking at the data summaries for the training dataset, we can see that only the variable religious_attendance has a notable amount of missing values. However, other questions that were possibly uncomfortable for respondents like obama_muslim and obamacare_costs also saw higher than average non-responses.

Visualizations

Pie Chart

pie(table(anes_data[n, "target"]), main="Major Party Vote Breakdown", 
    labels = paste0(levels(anes_data[n, "target"]), 
    " (", table(anes_data[n, "target"]), ")"),
    col=c("blue", "lightblue", "pink", "red"))

Histograms

par(mfrow = c(3,5))
for (i in 1:(ncol(anes_data)-1)) {
  hist(anes_data[n,i], xlab = names(anes_data[i]), 
    main = names(anes_data[i]), col="grey", ylab="", cex = .5)
}

Scatterplots

par(mfrow = c(2,5), cex = .5)
for (i in interval) {
  plot(anes_data[n, i], main = names(anes_data[i]), ylab = "", xlab = "") 
}

As expected, many respondents chose answers with multiples of ten on these thermometer questions. There’s largly warm feelings for Christians, Jews, police, and scientists, while there is a broader range of opinion towards the other groups.

Density Curves

par(mfrow = c(2,5))
for (i in interval) {
  d <- density(anes_data[n,i], na.rm=T)
  plot(d, main = names(anes_data[i]), xlab = "", ylab = "")
  polygon(d, col="red")
}

The thermometer questions focusing on more popular groups do show some skewing.

Scatterplots with Densities

par(mfrow = c(3,5), cex = .5)
for (i in ordinal) {
  smoothScatter(anes_data[n,i], main = names(anes_data[i]), ylab = "", 
    xlab = "", colramp = colorRampPalette(c("white", "red")))
}

Correlation Heatmap

library(ggplot2)
library(reshape2)
ggplot(data = melt(abs(cor(na.omit(sapply(anes_data[n,], as.numeric))))), 
       aes(x=Var1, y=Var2, fill=value)) +
  scale_fill_gradient(low = 'black', high = 'red', name = "Absolute Value") +
  geom_tile() + labs(title = "Correlation Heatmap") +
  theme(axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 90, hjust = 1),
        plot.title = element_text(hjust = 0.5))

The correlation between different predictor variables and the response variable range from low to moderate with several interesting correlations surfacing. Some of the strongest correlations in the dataset underscore differences in ideological views with respect to polarizing social issues. For example, the candidate preference and views toward slavery, Muslims, LGBT, and feminists. The variables reflecting views toward financial situation, economic mobility, Jews, and justice recall show weak correlations with the response variable in this dataset.

Principal Component Analysis

PCA <- function(X) {
  Xpca <- prcomp(na.omit(X), center = T, scale. = T) 
  M <- as.matrix(na.omit(X)); R <- as.matrix(Xpca$rotation); score <- M %*% R
  print(list("Importance of Components" = summary(Xpca)$importance[ ,1:4], 
             "Correlation between X and PC" = cor(na.omit(X), score)[ ,1:4]))
  par(mfrow=c(2,3))
  barplot(Xpca$sdev^2, ylab = "Component Variance")
  barplot(cor(cbind(X)), ylab = "Correlations")
  barplot(Xpca$rotation, ylab = "Loadings")  
  biplot(Xpca); barplot(M); barplot(score)
}
PCA(anes_data[n, -ncol(anes_data)])
## $`Importance of Components`
##                             PC1      PC2      PC3      PC4
## Standard deviation     2.289431 1.577647 1.281814 1.272464
## Proportion of Variance 0.180740 0.085830 0.056660 0.055830
## Cumulative Proportion  0.180740 0.266570 0.323220 0.379060
## 
## $`Correlation between X and PC`
##                            PC1         PC2         PC3          PC4
## relig_attendance   0.265338848 -0.02280027  0.34318542 -0.143482872
## television_usage   0.033359822  0.10364935  0.06844106  0.018800255
## internet_usage     0.105157424 -0.01657748  0.21907764  0.064051446
## internet_activity -0.088667801 -0.07507725 -0.14642145  0.008227587
## justice_recall     0.078548773  0.06431838  0.10633371  0.015604102
## therm_feminists    0.809791418  0.48473921  0.28867202 -0.671289226
## therm_unions       0.594001703  0.38144789 -0.16816468 -0.810234883
## therm_business    -0.226148341  0.38069274 -0.36364689  0.327125887
## therm_glbt         0.773044137  0.53249547  0.66795435 -0.308026889
## therm_rich        -0.096355140  0.58089544 -0.11216237  0.360144314
## therm_muslims      0.741941368  0.57618468  0.22548280 -0.259986999
## therm_christians  -0.325055425  0.37969196 -0.35136870  0.043635989
## therm_jews         0.261883605  0.71326233  0.39997866 -0.017413341
## therm_police      -0.343015481  0.38333869  0.15572940  0.182520215
## therm_scientists   0.489974257  0.51822977  0.39827632 -0.434782083
## nationalism        0.390678945  0.06181492  0.16733105 -0.148464933
## flag_pride         0.374163774 -0.06795340  0.02797379 -0.200772844
## economic_mobility -0.036974799 -0.02210561  0.16957486 -0.017452311
## reducing_debt      0.361220906  0.01810991  0.10008120 -0.237668711
## millionaires_tax  -0.332858748 -0.05252630 -0.11312952  0.388522837
## obamacare_costs    0.297638254  0.05494702  0.05207091 -0.224465940
## immig_hurts_jobs   0.447030145  0.12678283  0.16472569 -0.174125255
## terror_anxiety     0.176923537  0.02024504 -0.08249539 -0.068155614
## fin_situation     -0.000116906  0.06990652 -0.14283385  0.125610520
## govt_size          0.370406028  0.10350955 -0.02558070 -0.409500128
## slavery_hurts     -0.478707123 -0.14433811 -0.05284779  0.304707633
## gender_roles       0.352030378  0.01584006  0.22952265 -0.226253090
## indep_respect     -0.289208353 -0.02279324 -0.18811455  0.110489743
## obama_muslim       0.543650372  0.13055775  0.18700451 -0.244974494

We see again that the main underlying component explaining the variance of the data appears to focus on inflamatory social issues, highlighted by the importance of therm_feminists, therm_lgbt, therm_muslims, and slavery_hurts, which additionally had correlation with presidential_vote. The second, and less important, pattern revolves around sentiment possibly involving anti-semetic conspiracy theories, highlighted by the importance of therm_rich and therm_jews.

Data Preparation

Impute Using Mice Package

Impute missing values with means.

library(mice)
library(VIM)
aggr(anes_data[n, -ncol(anes_data)], bars=F, sortVars=T)

## 
##  Variables sorted by number of missings: 
##           Variable        Count
##   relig_attendance 0.3749239197
##       obama_muslim 0.0511259890
##    obamacare_costs 0.0213024954
##      therm_muslims 0.0170419963
##    therm_feminists 0.0146074254
##       gender_roles 0.0139987827
##         therm_jews 0.0133901400
##         therm_rich 0.0103469264
##   therm_christians 0.0097382836
##       therm_unions 0.0079123554
##     therm_business 0.0079123554
##   therm_scientists 0.0079123554
##         therm_glbt 0.0073037127
##          govt_size 0.0066950700
##       therm_police 0.0042604991
##  economic_mobility 0.0042604991
##      indep_respect 0.0042604991
##      reducing_debt 0.0030432136
##   millionaires_tax 0.0030432136
##   immig_hurts_jobs 0.0030432136
##      slavery_hurts 0.0024345709
##      fin_situation 0.0018259282
##     internet_usage 0.0012172855
##  internet_activity 0.0012172855
##     terror_anxiety 0.0012172855
##        nationalism 0.0006086427
##         flag_pride 0.0006086427
##   television_usage 0.0000000000
##     justice_recall 0.0000000000
data <- anes_data[n, -ncol(anes_data)]
MICE <- mice(data, predictorMatrix = quickpred(data), method = "mean", printFlag = F)
anes_data[n, -ncol(anes_data)] <- complete(MICE, action = 1)
aggr(anes_data[n, -ncol(anes_data)], bars=F)

data <- anes_data[-n, -ncol(anes_data)]
MICE <- mice(data, predictorMatrix = quickpred(data), method = "mean", printFlag = F)
anes_data[-n, -ncol(anes_data)] <- complete(MICE, action = 1)

Correlation Tests

library("DT")
display <- function(data) {
  datatable(data, options = list(
    searching = TRUE,
    pageLength = 5,
    lengthMenu = c(5, nrow(data))
    ), rownames = FALSE)
}
library(reshape2)
Corr_XY <- function(X, Y) {
  corr <- data.frame(array(NA, dim = c(ncol(X), 5)))
  colnames(corr) <- c("Y", "X", "r","p","<0.05")
  for (i in 1:ncol(X)) {
    r <- cor.test(as.numeric(Y), X[, i])
    corr[i, 1] <- "target"
    corr[i, 2] <- names(X[i])
    corr[i, 3] <- round(r$estimate, 6)
    corr[i, 4] <- round(r$p.value, 6)
    corr[i, 5] <- corr[i, 4] < 0.05
  }
  return(corr)
}
Corr_XX <- function(X, threshold) {
  corr <- data.frame(array(NA, dim = c(choose(ncol(X), 2), 5)))
  colnames(corr) <- c("X1", "X2", "r","p","<0.05"); k = 1
  for (i in 1:(ncol(X) - 1)) {
    for (j in (i+1):ncol(X)) {
      r <- cor.test(X[,i], X[,j])
      corr[k, 1] <- names(X[i])
      corr[k, 2] <- names(X[j])
      corr[k, 3] <- round(r$estimate, 6)
      corr[k, 4] <- round(r$p.value, 6)
      corr[k, 5] <- corr[k, 4] < 0.05
      k = k + 1
    }
  }
  least <- corr[corr[,"<0.05"] == F, ]
  most <- corr[abs(corr[,"r"]) >= threshold, ]
  result <- list("Correlations" = corr, "Least_Correlated" = least, "Most_Correlated" = most)
  return(result)
}

Between \(Y\) and \(X\) Variables

correlations <- Corr_XY(anes_data[n, -ncol(anes_data)], anes_data[n, ncol(anes_data)])
display(correlations)

The predictor variables justice_recall and fin_situation do not have statistically significant correlations with the response variable and are therefore not being considered for the model.

remove <- c("justice_recall", "fin_situation")
anes_data <- anes_data[ , -which(names(anes_data) %in% remove)]

Between \(X\) Variables

correlations <- Corr_XX(anes_data[n, -ncol(anes_data)], 0.40)

Least Correlated

display(correlations$Least_Correlated)

Here we see that the behavior related variables of relig_attendance, television_usage, internet_usage, and economic_mobility have weak correlations with many issue focused variables.

Most Correlated

display(correlations$Most_Correlated)

When predictor variables are correlated, the model can vary wildly dependent on which combination of the correlated variables are included. Different combinations can lead to vastly different and less precise regression coefficients, error sum of squares varies, and hypothesis test results. The most correlated variables in this data set are moderately correlated, not highly correlated. Under careful examination, it is possible that some of those moderately correlated variables are measuring the same thing. For example (therm_feminists & therm_glbt), (therm_business & therm_rich), (therm_muslims & obama_muslim), and (therm_police & flag_pride). Therefore, the \(X\) variable in these pairs that is least correlated with \(Y\) will not be considered for the model.

remove <- c("therm_glbt", "therm_rich", "therm_muslims", "therm_police")
anes_data <- anes_data[ , -which(names(anes_data) %in% remove)]

Box-Cox Transformation

Box-Cox Power Test

The optimal power transformation is found via the Box-Cox Test where:

\[-2\Rightarrow \frac { 1 }{ Y^{ 2 } }, \quad \quad -1\Rightarrow \frac { 1 }{ Y }, \quad \quad -0.5\Rightarrow \frac { 1 }{ \sqrt { Y } }, \quad \quad 0\Rightarrow \log { \left( Y \right) }, \quad \quad 0.5\Rightarrow \sqrt { Y }, \quad \quad 1\Rightarrow Y, \quad \quad 2\Rightarrow { Y }^{ 2 }\]

library(car)
## Warning: package 'car' was built under R version 3.3.2
interval <- grep("^therm", colnames(anes_data))
box.cox.powers <- powerTransform(anes_data[n, interval] + 1e-32, family="bcPower")
summary(box.cox.powers)$result
##                  Est.Power    Std.Err. Wald Lower Bound Wald Upper Bound
## therm_feminists  0.2563347 0.006073274        0.2444311        0.2682383
## therm_unions     0.2938270 0.007078411        0.2799533        0.3077007
## therm_business   0.2694247 0.006428981        0.2568239        0.2820255
## therm_christians 0.5218412 0.016743361        0.4890243        0.5546582
## therm_jews       0.5694112 0.021088975        0.5280768        0.6107456
## therm_scientists 0.7311165 0.035138996        0.6622441        0.7999890

The Box-Cox Power Test results provide support for raising all the interval variables to powers greater than 0.25 and less than 0.75. Values were shifted ever so slightly to the right with a negligible value (+\(1^{-32}\)) to allow for the calculations to take place on variables that have zeros since the function can only handle \(\mathbb{R}^+\) values.

Box-Cox Power Transformation

interval <- grep("^therm", colnames(anes_data)); k = 1
for (i in interval) {
  var_name <- paste0("bc_", names(anes_data[i]))
  anes_data[, var_name] <- anes_data[, i]^(box.cox.powers$lambda[k])
  k = k + 1
}
anes_data <- anes_data[, -interval]

Build Models

\(AIC\) and \(BIC\) will be used to determine potential models. Adjusted \(R^2\) and Mallows \(C_p\) are not appropriate for logistic regressions.

library(MASS)
null <- polr(target ~ 1, anes_data[n, ], Hess=T)
full <- polr(target ~ ., anes_data[n, ], Hess=T)

AIC Stepwise Selection

Stepwise subset selection based on \(AIC\). Using \(k = 2\) degrees of freedom for the penalty gives the genuine \(AIC\).

Forward Selection

AIC_steps <- step(null, scope=list(lower=null, upper=full), direction="forward", k = 2, trace=F)
AIC_steps$anova
##                     Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                        NA         NA      1640   4199.274 4205.274
## 2         + obama_muslim -1 643.801993      1639   3555.472 3563.472
## 3            + govt_size -1 200.687939      1638   3354.784 3364.784
## 4   + bc_therm_feminists -1 119.631397      1637   3235.152 3247.152
## 5     + immig_hurts_jobs -1  90.466570      1636   3144.686 3158.686
## 6        + slavery_hurts -1  66.445149      1635   3078.241 3094.241
## 7     + millionaires_tax -1  42.032839      1634   3036.208 3054.208
## 8        + reducing_debt -1  24.996449      1633   3011.211 3031.211
## 9      + obamacare_costs -1  19.778353      1632   2991.433 3013.433
## 10   + economic_mobility -1  16.315463      1631   2975.118 2999.118
## 11          + flag_pride -1  14.990560      1630   2960.127 2986.127
## 12 + bc_therm_scientists -1  11.633122      1629   2948.494 2976.494
## 13    + relig_attendance -1   6.452458      1628   2942.041 2972.041
## 14        + gender_roles -1   3.461286      1627   2938.580 2970.580
## 15      + terror_anxiety -1   2.285315      1626   2936.295 2970.295
## 16         + nationalism -1   2.639941      1625   2933.655 2969.655
AIC_steps$call$formula
## target ~ obama_muslim + govt_size + bc_therm_feminists + immig_hurts_jobs + 
##     slavery_hurts + millionaires_tax + reducing_debt + obamacare_costs + 
##     economic_mobility + flag_pride + bc_therm_scientists + relig_attendance + 
##     gender_roles + terror_anxiety + nationalism

The above model has the lowest \(AIC\).

Backward Elimination

AIC_steps <- step(full, scope=list(lower=null, upper=full), direction="backward", k = 2, trace=F)
AIC_steps$anova
##                    Step Df  Deviance Resid. Df Resid. Dev      AIC
## 1                       NA        NA      1617   2925.654 2977.654
## 2       - bc_therm_jews  1 0.1385388      1618   2925.792 2975.792
## 3   - bc_therm_business  1 0.3030515      1619   2926.095 2974.095
## 4     - bc_therm_unions  1 0.6898720      1620   2926.785 2972.785
## 5      - internet_usage  1 0.8280503      1621   2927.613 2971.613
## 6       - indep_respect  1 0.9317921      1622   2928.545 2970.545
## 7   - internet_activity  1 1.5267602      1623   2930.072 2970.072
## 8 - bc_therm_christians  1 1.7219694      1624   2931.794 2969.794
## 9    - television_usage  1 1.8610353      1625   2933.655 2969.655
AIC_steps$call$formula
## target ~ relig_attendance + nationalism + flag_pride + economic_mobility + 
##     reducing_debt + millionaires_tax + obamacare_costs + immig_hurts_jobs + 
##     terror_anxiety + govt_size + slavery_hurts + gender_roles + 
##     obama_muslim + bc_therm_feminists + bc_therm_scientists

The above model has the lowest \(AIC\). These are the same variables obtained through forward selection.

AIC Selected Model

The forward selection and backward elimination stepwise selection methods using \(AIC\) both yield models with the same exact variables.

AIC <- polr(AIC_steps$call$formula, anes_data[n, ], Hess=T)
coef_sum <- head(coef(summary(AIC)), -3)
p <- pt(abs(coef_sum[, "t value"]), lower.tail = F, df = AIC_steps$df.residual) * 2
cbind(coef_sum, "p value" = round(p, 6))
##                           Value Std. Error    t value  p value
## relig_attendance    -0.12252747 0.05684933  -2.155302 0.031285
## nationalism         -0.08374519 0.05152076  -1.625465 0.104257
## flag_pride          -0.16529907 0.05105603  -3.237601 0.001230
## economic_mobility    0.14686348 0.03206333   4.580419 0.000005
## reducing_debt       -0.30240516 0.06821625  -4.433037 0.000010
## millionaires_tax     0.69324163 0.12167283   5.697588 0.000000
## obamacare_costs     -0.53259762 0.13727545  -3.879773 0.000109
## immig_hurts_jobs    -0.37180933 0.06270353  -5.929640 0.000000
## terror_anxiety       0.08422079 0.05138492   1.639018 0.101403
## govt_size           -0.84867211 0.11476028  -7.395173 0.000000
## slavery_hurts        0.26019290 0.04350841   5.980290 0.000000
## gender_roles        -0.07736955 0.04237509  -1.825826 0.068060
## obama_muslim        -0.27473342 0.02294422 -11.973972 0.000000
## bc_therm_feminists  -0.69344932 0.12118874  -5.722060 0.000000
## bc_therm_scientists -0.04257146 0.01328927  -3.203446 0.001384

At a significance level of \(\alpha=0.5\) three of the variables are insignificant: gender_roles, terror_anxiety, and nationalism. Removing those three insignificant variables yields a model with all significant variables.

AIC <- polr(target ~ relig_attendance + flag_pride + economic_mobility + reducing_debt + 
    millionaires_tax + obamacare_costs + immig_hurts_jobs + govt_size + slavery_hurts + 
    obama_muslim + bc_therm_feminists + bc_therm_scientists, anes_data[n, ], Hess=T)
coef_sum <- head(coef(summary(AIC)), -3)
p <- pt(abs(coef_sum[, "t value"]), lower.tail = F, df = AIC_steps$df.residual + 3) * 2
cbind(coef_sum, "p value" = round(p, 6))
##                           Value Std. Error    t value  p value
## relig_attendance    -0.14256807 0.05620478  -2.536583 0.011287
## flag_pride          -0.18134007 0.04878500  -3.717128 0.000208
## economic_mobility    0.14397162 0.03192266   4.510014 0.000007
## reducing_debt       -0.29402930 0.06711412  -4.381035 0.000013
## millionaires_tax     0.70651459 0.12141525   5.818994 0.000000
## obamacare_costs     -0.51566927 0.13697205  -3.764777 0.000173
## immig_hurts_jobs    -0.37159820 0.06162243  -6.030243 0.000000
## govt_size           -0.85599235 0.11453735  -7.473478 0.000000
## slavery_hurts        0.26580310 0.04317729   6.156086 0.000000
## obama_muslim        -0.27952883 0.02281841 -12.250146 0.000000
## bc_therm_feminists  -0.73048679 0.12109927  -6.032132 0.000000
## bc_therm_scientists -0.04308651 0.01328436  -3.243401 0.001205

AIC Odds Ratio

Logistic regression coefficients are in terms of the log odds. The odds ratio is therefore obtained through exponentiation.

t(t(exp(coef_sum[,1])))
##                          [,1]
## relig_attendance    0.8671285
## flag_pride          0.8341516
## economic_mobility   1.1548513
## reducing_debt       0.7452547
## millionaires_tax    2.0269143
## obamacare_costs     0.5971008
## immig_hurts_jobs    0.6896313
## govt_size           0.4248614
## slavery_hurts       1.3044782
## obama_muslim        0.7561399
## bc_therm_feminists  0.4816745
## bc_therm_scientists 0.9578285

Given an ordered target variable that ranges from 1 to 4 (HRC_Strong, HRC_Weak, DJT_Weak, DJT_Strong), each of the variable’s odds indicate how, holding the other variables in the model constant, a change in the given variable would increase a subject’s odds of being in a higher target variable category. Here we see that many of the larger odds above – especially belief that economic_mobility is harder and disagreeing that slavery_hurts– have a larger impact on increasing the odds of being a voter with a strong preference for Trump. Interestingly, opposition to a millionaires_tax has the largest impact increasing the odds of being a voter with a strong preference for Trump.

BIC Stepwise Selection

Stepwise subset selection based on \(BIC\). Using \(k = log(n)\) degrees of freedom for the penalty is sometimes referred to as \(BIC\).

Forward Selection

BIC_steps <- step(null, scope=list(lower=null, upper=full), direction="forward", k = log(length(n)), trace=F)
BIC_steps$anova
##                     Step Df  Deviance Resid. Df Resid. Dev      AIC
## 1                        NA        NA      1640   4199.274 4221.487
## 2         + obama_muslim -1 643.80199      1639   3555.472 3585.089
## 3            + govt_size -1 200.68794      1638   3354.784 3391.805
## 4   + bc_therm_feminists -1 119.63140      1637   3235.152 3279.578
## 5     + immig_hurts_jobs -1  90.46657      1636   3144.686 3196.516
## 6        + slavery_hurts -1  66.44515      1635   3078.241 3137.475
## 7     + millionaires_tax -1  42.03284      1634   3036.208 3102.846
## 8        + reducing_debt -1  24.99645      1633   3011.211 3085.254
## 9      + obamacare_costs -1  19.77835      1632   2991.433 3072.880
## 10   + economic_mobility -1  16.31546      1631   2975.118 3063.969
## 11          + flag_pride -1  14.99056      1630   2960.127 3056.383
## 12 + bc_therm_scientists -1  11.63312      1629   2948.494 3052.154
BIC_steps$call$formula
## target ~ obama_muslim + govt_size + bc_therm_feminists + immig_hurts_jobs + 
##     slavery_hurts + millionaires_tax + reducing_debt + obamacare_costs + 
##     economic_mobility + flag_pride + bc_therm_scientists

The above model has the lowest \(BIC\).

Backward Elimination

BIC_steps <- step(full, scope=list(lower=null, upper=full), direction="backward", k = log(length(n)), trace=F)
BIC_steps$anova
##                     Step Df  Deviance Resid. Df Resid. Dev      AIC
## 1                        NA        NA      1617   2925.654 3118.165
## 2        - bc_therm_jews  1 0.1385388      1618   2925.792 3110.899
## 3    - bc_therm_business  1 0.3030515      1619   2926.095 3103.798
## 4      - bc_therm_unions  1 0.6898720      1620   2926.785 3097.084
## 5       - internet_usage  1 0.8280503      1621   2927.613 3090.508
## 6        - indep_respect  1 0.9317921      1622   2928.545 3084.035
## 7    - internet_activity  1 1.5267602      1623   2930.072 3078.158
## 8  - bc_therm_christians  1 1.7219694      1624   2931.794 3072.475
## 9     - television_usage  1 1.8610353      1625   2933.655 3066.932
## 10         - nationalism  1 2.6399410      1626   2936.295 3062.168
## 11      - terror_anxiety  1 2.2853148      1627   2938.580 3057.049
## 12        - gender_roles  1 3.4612857      1628   2942.041 3053.106
## 13    - relig_attendance  1 6.4524580      1629   2948.494 3052.154
BIC_steps$call$formula
## target ~ flag_pride + economic_mobility + reducing_debt + millionaires_tax + 
##     obamacare_costs + immig_hurts_jobs + govt_size + slavery_hurts + 
##     obama_muslim + bc_therm_feminists + bc_therm_scientists

The above model has the lowest \(BIC\). These are the same variables obtained through forward selection.

BIC Selected Model

The forward selection and backward elimination stepwise selection methods using \(BIC\) both yield models with the same exact variables.

BIC <- polr(BIC_steps$call$formula, anes_data[n, ], Hess=T)
coef_sum <- head(coef(summary(BIC)), -3)
p <- pt(abs(coef_sum[, "t value"]), lower.tail = F, df = BIC_steps$df.residual) * 2
cbind(coef_sum, "p value" = round(p, 6))
##                           Value Std. Error    t value  p value
## flag_pride          -0.18461879 0.04878601  -3.784257 0.000160
## economic_mobility    0.13716700 0.03179330   4.314336 0.000017
## reducing_debt       -0.30092538 0.06714031  -4.482037 0.000008
## millionaires_tax     0.71386167 0.12121665   5.889139 0.000000
## obamacare_costs     -0.52786842 0.13701421  -3.852654 0.000121
## immig_hurts_jobs    -0.37050047 0.06155459  -6.019055 0.000000
## govt_size           -0.84488150 0.11439299  -7.385780 0.000000
## slavery_hurts        0.25922013 0.04306699   6.018997 0.000000
## obama_muslim        -0.28192779 0.02278314 -12.374403 0.000000
## bc_therm_feminists  -0.74243295 0.12124417  -6.123453 0.000000
## bc_therm_scientists -0.04513501 0.01325575  -3.404938 0.000678

At a significance level of \(\alpha=0.5\) none of the variables are insignificant.

BIC Odds Ratio

Logistic regression coefficients are in terms of the log odds. The odds ratio is therefore obtained through exponentiation.

t(t(exp(coef_sum[,1])))
##                          [,1]
## flag_pride          0.8314212
## economic_mobility   1.1470197
## reducing_debt       0.7401330
## millionaires_tax    2.0418611
## obamacare_costs     0.5898610
## immig_hurts_jobs    0.6903887
## govt_size           0.4296083
## slavery_hurts       1.2959190
## obama_muslim        0.7543282
## bc_therm_feminists  0.4759545
## bc_therm_scientists 0.9558684

Given an ordered target variable that ranges from 1 (Strong Hillary) to 4 (Strong Trump), each of the variable odds indicate how, holding the other variables in the model constant, a change in the variable would increase a subject’s odds of being in a higher target variable category. The \(BIC\) odds ratios are very similar to the \(AIC\) model, with the variable millionaires_tax showing potential for more growth. The main difference from the \(AIC\) model however, is that the \(BIC\) model does not agree that the impact of relig_attendance is significant.

Select Models

Evaluate Models

Confusion Matrix

library(caret)
AIC_Scores <- predict(AIC, anes_data[n, ], type="prob")
AIC_Class <- predict(AIC, anes_data[n, ], type="class")
(cm1 <- confusionMatrix(AIC_Class, anes_data[n, "target"]))
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   HRC_Strong HRC_Weak DJT_Weak DJT_Strong
##   HRC_Strong        521      182       27         51
##   HRC_Weak           25       13       22         25
##   DJT_Weak            0        0        0          0
##   DJT_Strong         63       48      156        510
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6354          
##                  95% CI : (0.6116, 0.6587)
##     No Information Rate : 0.3707          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4369          
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: HRC_Strong Class: HRC_Weak Class: DJT_Weak
## Sensitivity                     0.8555        0.053498          0.0000
## Specificity                     0.7485        0.948571          1.0000
## Pos Pred Value                  0.6671        0.152941             NaN
## Neg Pred Value                  0.8979        0.852375          0.8752
## Prevalence                      0.3707        0.147900          0.1248
## Detection Rate                  0.3171        0.007912          0.0000
## Detection Prevalence            0.4753        0.051735          0.0000
## Balanced Accuracy               0.8020        0.501035          0.5000
##                      Class: DJT_Strong
## Sensitivity                     0.8703
## Specificity                     0.7474
## Pos Pred Value                  0.6564
## Neg Pred Value                  0.9122
## Prevalence                      0.3567
## Detection Rate                  0.3104
## Detection Prevalence            0.4729
## Balanced Accuracy               0.8089

The \(AIC\) model has an Accuracy of 0.635423 and Error Rate of 0.364577.

BIC_Scores <- predict(BIC, anes_data[n, ], type="prob")
BIC_Class <- predict(BIC, anes_data[n, ], type="class")
(cm2 <- confusionMatrix(BIC_Class, anes_data[n, "target"]))
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   HRC_Strong HRC_Weak DJT_Weak DJT_Strong
##   HRC_Strong        524      179       30         51
##   HRC_Weak           26        9       19         26
##   DJT_Weak            0        0        0          0
##   DJT_Strong         59       55      156        509
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6342          
##                  95% CI : (0.6104, 0.6575)
##     No Information Rate : 0.3707          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4345          
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: HRC_Strong Class: HRC_Weak Class: DJT_Weak
## Sensitivity                     0.8604        0.037037          0.0000
## Specificity                     0.7485        0.949286          1.0000
## Pos Pred Value                  0.6684        0.112500             NaN
## Neg Pred Value                  0.9010        0.850288          0.8752
## Prevalence                      0.3707        0.147900          0.1248
## Detection Rate                  0.3189        0.005478          0.0000
## Detection Prevalence            0.4772        0.048691          0.0000
## Balanced Accuracy               0.8045        0.493161          0.5000
##                      Class: DJT_Strong
## Sensitivity                     0.8686
## Specificity                     0.7446
## Pos Pred Value                  0.6534
## Neg Pred Value                  0.9109
## Prevalence                      0.3567
## Detection Rate                  0.3098
## Detection Prevalence            0.4741
## Balanced Accuracy               0.8066

The \(BIC\) model has an Accuracy of 0.6342057 and Error Rate of 0.3657943.

Area under the Surface

The ROC curve that plots two predictor variables becomes a ROC surface when more variables lead to higher dimensions.

library(pROC)
AIC_mcr <- multiclass.roc(anes_data[n, "target"], as.numeric(AIC_Class))
BIC_mcr <- multiclass.roc(anes_data[n, "target"], as.numeric(BIC_Class))
AIC_mcr$auc
## Multi-class area under the curve: 0.7579
BIC_mcr$auc
## Multi-class area under the curve: 0.7541

The AIC model has a greater area under the surface but not by much.

Performance Visualizations

par(mfrow=c(1,2), yaxt="n", cex=0.55)
x <- factor(AIC_mcr$predictor, levels=c(1:4), labels=levels(anes_data$target))
y <- AIC_mcr$response
z <- data.frame(x, y)
plot(z, main = "AIC Predictions", xlab = "", ylab="", col=c("blue", "lightblue", "pink", "red"))
x <- factor(BIC_mcr$predictor, levels=c(1:4), labels=levels(anes_data$target))
y <- BIC_mcr$response
z <- data.frame(x, y)
plot(z, main = "BIC Predictions", xlab = "", ylab="", col=c("blue", "lightblue", "pink", "red"))

The bars represent model predictions and the colors represent the actual data (HRC_Strong, HRC_Weak, DJT_Weak, DJT_Strong). Overall, the AIC model has a higher accuracy and larger area under the surface. It is worth noting that the difference in both metrics is minimal, but for purposes of this analysis, the slightly better model will be the one chosen.

Predictions

(predicted <- table(predict(AIC, anes_data[-n, ], type="class")))
## 
## HRC_Strong   HRC_Weak   DJT_Weak DJT_Strong 
##        398         39          0        385

Prevalence

actual <- table(anes_data[-n, "target"])
rbind(actual, predicted) / (N - length(n))
##           HRC_Strong   HRC_Weak  DJT_Weak DJT_Strong
## actual     0.4014599 0.12895377 0.1386861  0.3309002
## predicted  0.4841849 0.04744526 0.0000000  0.4683698
chisq.test(actual, predicted)
## 
##  Pearson's Chi-squared test
## 
## data:  actual and predicted
## X-squared = 12, df = 9, p-value = 0.2133

The prevalence of each condition in the acutal and predicted data is above. Although there is some difference in these figures, the difference is not significant at an \(\alpha=0.05\) as can be seen in the \(\chi^2\) goodness of fit test. The null hypothesis for the \(\chi^2\) goodness of fit test for multinomial distribution is that the observed (predicted) frequency is equal to an expected count (actual) in each category. In other words, the data are consistent with a specified distribution. With a \(p\)-value greater than \(\alpha=0.05\), we fail to reject the null hypothesis. The data supports the conclusion that the model is a good fit for the data.

Margin of Error

p_e <- 0.5 # a priori proportion
N_e <- 235248000 # voting age population
n_e <- length(n) # sample size
a_e <- 1 - 0.95 # confidence level
z_e <- qnorm(1 - a_e / 2) # z-score
(e <- z_e * sqrt((p_e * (1 - p_e) / n_e) * ((N_e - n_e) / (N_e - 1)))) # Margin of Error
## [1] 0.02417674

Using a normal approximation, the randomly selected data are representative of the voting age population within a 0.0241767 margin of error.

References

http://rcompanion.org/rcompanion/e_07.html

http://rmarkdown.rstudio.com/lesson-11.html

https://www.youtube.com/watch?v=vDXEo2vzKbQ

http://cpsievert.github.io/slides/markdown/#/22

https://onlinecourses.science.psu.edu/stat501/node/346

http://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/

http://rmarkdown.rstudio.com/ioslides_presentation_format.html

https://stats.idre.ucla.edu/stata/output/ordered-logistic-regression/

http://stattrek.com/regression/linear-transformation.aspx?Tutorial=AP

https://groups.google.com/forum/#!msg/israel-r-user-group/-sBvk2F11EE/jtbuMU6n7roJ

http://www.r-tutor.com/elementary-statistics/goodness-fit/multinomial-goodness-fit

http://rprogramming.net/create-a-slideshow-powerpoint-with-r-knitr-pandoc-and-slidy/

http://www.electionstudies.org/studypages/anes_timeseries_2016/anes_timeseries_2016.htm

https://www.isixsigma.com/tools-templates/normality/making-data-normal-using-box-cox-power-transformation/