Who vote for Obama in 2008? Just the liberals or more?
I will use this rmarkdown to show both the data analysis with R as well as statistical thinking. This is an assignment from Andrew Gelman’s Survey Statistics class in Columbia University.
By going through this exercise, you should learn how to
I will create a plot of estimated proportion liberal in each state vs. Obama’s vote share in 2008 (data available at http://www.stat.columbia.edu/~gelman/surveys.course/2008ElectionResult.csv)
With the Pew 2008 survey, I will compute the percentage of respondents in each state (excluding Alaska and Hawaii) who are liberal. I will need to recode a liberal variable from pew 2008 survey.
library(foreign)
library(arm)
library(ggplot2)
library(dplyr)
library(readr)
library(openintro)
library(choroplethr)
library(choroplethrMaps)
data = read.dta("pew_research_center_june_elect_wknd_data.dta")
names(data)
# recode ideology
data$ideo = as.character(data$ideo)
data[is.na(data$ideo),]$ideo = 0
data[data$ideo=="very conservative",]$ideo = 2
data[data$ideo=="conservative",]$ideo = 1
data[data$ideo=="moderate",]$ideo = 0
data[data$ideo=="dk/refused",]$ideo = 0
data[data$ideo=="liberal",]$ideo = -1
data[data$ideo=="very liberal",]$ideo = -2
# create a new liberal variable
data$liberal = data$ideo<0
I want to plot Obama vote share versus liberal proportion at state-level. I will need to join the survey result and election result to one dataset.
# read election result
election08 = read_csv("http://www.stat.columbia.edu/~gelman/surveys.course/2008ElectionResult.csv")
# merge the two datasets
obamashare_state = election08 %>% select(state, vote_Obama_pct) %>% mutate_each(funs(tolower))
merge = obamashare_state %>% inner_join(liberalshare_state)
## Joining by: "state"
merge$vote_Obama_pct = as.numeric(merge$vote_Obama_pct)
merge = merge[!merge$state=="hawaii",]
# create a plot
p = ggplot(merge, aes(x=liberal.pct, y=vote_Obama_pct))
p + geom_point()
# add abbreviation to make it more interpretable
merge$abbr = state2abbr(merge$state)
p + geom_text(aes(label=merge$abbr)) +
labs(title="Who vote for Obama in 2008?",
x = "Percentage of people identified as liberals",
y = "Percentage of voters for Obama")
I will create a plot of estimated proportion liberal in each state vs. sample size in each state (again as a scatterplot using the two-letter state abbreviations).
liberalshare_state$abbr = state2abbr(liberalshare_state$state)
ggplot(liberalshare_state, aes(x=samplesize, y=liberal.pct))+
geom_text(aes(label=liberalshare_state$abbr))+
scale_y_continuous(expand = c(0,0), limits = c(0,40)) +
scale_x_continuous(expand = c(0,0), limits = c(0,3000)) +
labs(title = "How certain are we about the estimated liberal share?",
x = "Sample Size",
y = "Percentage of liberals in 2008 Pew Survey")
There seems to be a funnel shape. This makes sense because for states that have less respondents, the uncertainty in estimated proportion of liberals is larger. We should not be as confident regarding the estimates for the states that fall on the left side of the plot.
Since it’s always cool to visualize things in a map, I will create a map of estimated proportion liberal using colors in a U.S. map.
# create a data frame that has region and value
liberalshare_cho = liberalshare_state[,c(1,2)]
names(liberalshare_cho) = c("region","value")
# recode dc for choropleth function
liberalshare_cho[liberalshare_cho$region == "washington dc",1] = "district of columbia"
state_choropleth(liberalshare_cho, title = "Proportion of people identified as liberals in 2008")
## Warning in self$bind(): The following regions were missing and are being
## set to NA: alaska