Preparation

  1. Save the data file (“BG 74 data.xlsx”) into subfolder called Data in folder where you put this .R syntax file.
  2. If you have not already done so, create a .Rproj file in the folder where you put this .R syntax file. To do so, go to File/New Project in RStudio and browse to this folder.
  3. Refer to the file “BG 74 toplines.pdf” for details on questions; the toplines contains variable names for the data set. Their toplines are weighted results, so values/distributions will be slightly different than the unweighted results we create in this lab. (You may wish also to see “BG 74 questionnaire with percentages.pdf” for more details on questions. The question numbers do not always correspond to the variable names; see the topline document for variable names.)

1) Load Battleground data into a data object called “bg” and display the first few rows and 6 columns of bg to make sure the data loaded properly. Please do not display more than 6 columns (to avoid messiness/length) (Hint: use head(bg[, 1:6]))

bg = readxl::read_excel("Data/BG 74 data.xlsx")
head(bg[, 1:6])
## # A tibble: 6 × 6
##     INT  LIST   CP1   CP2 CP3      QA
##   <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1     1     1     1     2 NA        1
## 2     2     2    NA    NA NA        1
## 3     3     2    NA    NA NA        1
## 4     4     1     1     2 NA        1
## 5     5     2    NA    NA NA        1
## 6     6     2    NA    NA NA        1

2) How many rows and columns are there in the dataset?

dim(bg)
## [1] 800 189

ANSWER: 800 rows and 189 columns

3) What is the average age [using variable called “ACTAGE”] of the respondents? (This question may be trickier than you might think.)

mean(bg$ACTAGE[bg$ACTAGE<999])
## [1] 54.09778

ANSWER: The code had to be less than 999, because for those who did not put their age got assigned 999, messing up the average.

4) Re-code all the missing data in ACTAGE (“actual age”) to NA and use the “hist()” function to create a histogram of actual age.

hist(bg$ACTAGE[bg$ACTAGE<999])

5) Create table of the “RAGE” variable (stands for Respondent Age – not anger!). Compare what you see to the information on page 6 of the “toplines” pdf summary document. Why are they different?

table(bg$RAGE)
## 
##   1   2   3   4 
## 133 126 277 264

ANSWER: This information is different because in the summar document the data is weighted, and from the “bg” dataset none of the data is weighted.

6) Provide a table showing the proportions of respondents in each of the four categories of their summary race variable (RRACE). Compare to the percentages in the “Toplines” document and explain.

table(bg$RRACE)/sum(table(bg$RRACE))
## 
##       1       2       3       4 
## 0.75875 0.10250 0.05625 0.08250

ANSWER: While this data similarly is un-weighted, it also reflects that whites make up most of the polls respondents.

7) Create a table that shows proportion in each category of the EDUC education variable.

table(bg$EDUC)/sum(table(bg$EDUC))
## 
##       1       2       3       4       5       6       7       8 
## 0.00375 0.02625 0.16375 0.02875 0.22750 0.31500 0.23250 0.00250

ANSWER: A majority of the respondents in this survey have had some experience in higher education, or have completed a degree in higher education.

8) Use the barplot function to create a histogram of the EDUC education variable. Try to make it visually appealing using tools discussed in lecture slides. Discuss what you see - does it reflect American society?

EDUCLabels = c("Some grade school", "Some high school", "High school graduate", "Technical/Vocation", "Some college", "Graduated college", "Grad/prof school","Unsure/Refused")
par(mar=c(8,9,2,0))
barplot(table(bg$EDUC)/sum(table(bg$EDUC)),
main = "Respondent Age Proportions",
names = EDUCLabels,
horiz = T,
las = T,
col = "orange",
xlab = "Proportion")

ANSWER: It is hard to say if a poll reflects American society, because poll respondents, evident in the chart, typically lean to be more educated. Those with some college or a college degree typically tend to be more civically engaged, therefore more likely to respond to a poll, making most polls more reflective of the higher-educated part of the electorate.

9) Create a data object called bg.Rep that only has Republicans. What are the dimensions of this object? Provide a table of the summary race variable (RRACE) for Republicans only. Briefly discuss.

bg.Rep = bg[bg$PARTYID<4,]
dim(bg.Rep)
## [1] 316 189
table(bg.Rep$RRACE)
## 
##   1   2   3   4 
## 269  11  15  21

ANSWER: This table reflects that Republicans tend to be more white, and have little minority support.

10) RFK favorability. Show a table with the distribution of answers for favorability toward Robert F. Kennedy Jr. (Note that this survey is from March 2024). Use the toplines document to figure out what each number corresponds to.

table(bg$Q19)
## 
##   1   2   3   4   5   6 
## 108 222 115 156 176  23
bg$Q19_table <- factor(bg$Q19,
        levels = c(1, 2, 3, 4, 5, 6),
        labels = c("Favorable/Strongly", "Favorable/Somewhat",    "Unfavorable/Somewhat","Unfavorable/Strongly", "No Opinion", "Never heard of"))

bg_Q19_table <- table(bg$Q19_table)
bg_Q19_df <- as.data.frame(bg_Q19_table)

ANSWER: Opinions for RFK Jr. are on the more moderate side, as most respondents fell into some variant of the “somewhat” category, or do not have an opinion.

11) Ideology. Show a table with the distribution of answers to the IDEOL question. Use the toplines document to figure out what each number corresponds to.

table(bg$IDEOL)
## 
##   1   2   3   4   5   6 
## 167 238  91 171 112  21
bg$IDEOL_table <- factor(bg$IDEOL,
        levels = c(1, 2, 3, 4, 5, 6),
        labels = c("Very Conservative", "Somewhat Conservative",    "Moderate","Somewhat Liberal", "Liberal", "Unsure"))

bg_IDEOL_table <- table(bg$IDEOL_table)
bg_IDEOL_df <- as.data.frame(bg_IDEOL_table)

ANSWER: The distribution of ideologies among the respondents is pretty even, with a slight lean to more conservatives responding to the poll. The smallest group of respondents besides the non-respondents was the moderate group.

12) Recode ideology to missing for people who did not answer that question; do the same for RFK approval. Then display a cross-tabulation of the RFK approval and ideology variables. Briefly discuss.

bg$IDEOL_missing <- factor(bg$IDEOL,
        levels = c(1, 2, 3, 4, 5),
        labels = c("Very Conservative", "Somewhat Conservative",    "Moderate","Somewhat Liberal", "Liberal"))

bg$Q19_missing <- factor(bg$Q19,
        levels = c(1, 2, 3, 4, 5),
        labels = c("Favorable/Strongly", "Favorable/Somewhat",    "Unfavorable/Somewhat","Unfavorable/Strongly", "No Opinion"))
bg$IDEOL_missing [bg$IDEOL ==6] <- NA
bg$Q19_missing [bg$Q19 ==6] <- NA
cross_tab <- table(bg$IDEOL_missing,bg$Q19_missing)
print(cross_tab)
##                        
##                         Favorable/Strongly Favorable/Somewhat
##   Very Conservative                     19                 78
##   Somewhat Conservative                 39                 89
##   Moderate                              13                 15
##   Somewhat Liberal                      22                 25
##   Liberal                                6                 13
##                        
##                         Unfavorable/Somewhat Unfavorable/Strongly No Opinion
##   Very Conservative                       20                   11         36
##   Somewhat Conservative                   29                   18         58
##   Moderate                                10                   16         34
##   Somewhat Liberal                        27                   60         32
##   Liberal                                 27                   48         12

ANSWER:Across all categories the “leaners” tend to be the most vocal about their support or opposition for RFK Jr. Conservatives accross the board were more somewhat favorable of RFK Jr. , compared to the liberals who accross the board were more strong unfavroable of RFK Jr. It seems the liberals were more in agreeance with their views on RFK Jr. compared to the conservatives.

The following items are purely for fun. They are not required and you will not get class credit.

Extra.1 [Not required/for credit] Data coding: Create a table with cross tabs of the REDUC and EDUC variables and explain what you see. How did people who refused EDUC (in columns) question get coded in REDUC variable (in rows)?

Extra.2 [Not required/for credit] How many people approved both Biden and Trump?