You will work with poll results from the U.S. 2016 presidential election aggregated from HuffPost Pollster, RealClearPolitics, polling firms, and news reports. The earliest poll has a start date of 2015-11-06, and the latest poll has a start date of 2016-11-06. The general election was held on November 8, 2016.
Below is a preview of the data set. Consult the data dictionary for further details on the variables.
To get started, load polls.Rdata
(available on Google Classroom). There is an object polls
in polls.Rdata
. Use polls
to answer each question, unless it states otherwise.
Answer 6 of the first 9 questions. In your Rmd file clearly indicate which questions you are answering. Question 10 is extra credit.
Who were the eight most common pollsters? Make a data frame that lists these pollsters in descending order in terms of the number of polls they conducted.
For those pollsters that were given a grade, how many polls were conducted for each grade? Make a data frame that lists the grades and number of pollsters for each grade. Arrange the grades from A+ to D.
Recreate the plot below. The plot is based on data for the state of Michigan, pollster “Ipsos”, and the population of likely voters. Use the ending date of the poll as your x variable. Use the raw polling data. As a hint, you will need to reshape your data frame before you make the plot.
Plot details:
From polls
, produce a data frame that contains the variables: rawpoll_trump
, rawpoll_clinton
, trump_edge
, party_color
. trump_edge
is defined as rawpoll_trump - rawpoll_clinton
. party_color
should take the value “red” if trump_edge >= 0
and “blue” otherwise. Randomly display 8 rows of this data frame. An example in tabular form is given below.
rawpoll_trump | rawpoll_clinton | trump_edge | party_color |
---|---|---|---|
41.00 | 46.00 | -5.00 | blue |
39.00 | 44.00 | -5.00 | blue |
25.00 | 53.00 | -28.00 | blue |
42.00 | 46.00 | -4.00 | blue |
42.00 | 41.00 | 1.00 | red |
32.00 | 42.00 | -10.00 | blue |
35.94 | 25.17 | 10.77 | red |
58.00 | 27.00 | 31.00 | red |
Create a function named avg_polled()
. This function should have one argument, state_of_interest
. The function should return a data frame on the average number of people polled for each state input to argument state_of_interest
. Sort the average number of people polled in descending order. Below are two examples of the function.
avg_polled(state_of_interest = "Michigan")
avg_polled(state_of_interest = c("Michigan", "Oklahoma", "New York"))
Create any plot of your choice that involves at least two variables. You may create new variables based off the data available. You may also include additional data - electoral votes available per state would be interesting. In 1-2 sentences, comment on any relationships or trends between variables.
Reshape and subset polls
to produce the data frame you see below. Click the right arrow to see all columns. Variable support is based off the raw polling numbers.
For the key battleground state Pennsylvania, did any poll with a start date in November 2016 have Trump with more support than Hillary? Hint: You can filter dates, just put the date in quotes.
What were Hillary’s top 4 raw polling numbers at any time during the polling cycle regardless of state? Create a data frame with the poll’s start date, state, and raw polling numbers. Do the same for Trump and then Johnson.
Recreate the plot below. The plot displays the candidate’s edge based off the raw polling numbers of likely voters for each of the 50 states and District of Columbia. The edge was based off the most recent poll’s start date in each state, respectively. Grade was not considered.
Plot details:
The deadline to submit Exam 1 is 11:59pm on Tuesday, February 26. Submit your work by uploading only your Rmd file through Google Classroom. Late work will not be accepted except under certain extraordinary circumstances.
Post your questions in the #exam1 channel on Slack. These should only be general questions, where you feel the directions are not clear. Do not post any code.
This is an individual assignment. This document, its questions, and your answers should only be viewed by you, the instructor, and the teaching assistant. If you fail to abide by these rules, you will earn a 0 and an Academic Dishonesty Report will be filed.
You may use any course material or other resources you find helpful online.
You must always cite any code you copy or use as inspiration. Copied code without citation is plagiarism and will result in a 0 for the assignment.
You must use R Markdown. Formatting is at your discretion but is graded. Use the in-class assignments and resources available online for inspiration. Another useful resource for R Markdown formatting is available at: https://holtzy.github.io/Pimp-my-rmd/
Topic | Points |
---|---|
Answer 6 of 9 questions | 66 |
R Markdown formatting | 7 |
Communication of results | 6 |
Knit | 6 |
Code style | 6 |
- 80 characters per line | |
- Format of tidyverse code | |
- Comments used appropriately | |
- Spaces around operations and commas | |
Efficiency | 6 |
- Using tidyverse code when possible | |
- Avoiding loops | |
Named code chunks | 3 |
Total | 100 |
A bonus of up to 5 points can be earned for correctly answering question 10. There is no partial credit for this question.