title: “hw10” author: “Riley Kearney” date: “2024-10-24” output: html_document
###Part 1: Data Source
-What datasets are you using? I’m using current and historical presidential polls. I’ve been looking for another dataset to include information like age, income, and region. -Summarize each column you plan on using: -Survey length (calculated as the difference between the first and last date of the survey) -Average survey length for each pollster -Number of sponsors -Winning candidate -Poll score -Party -Methodology -State -Age -Income
###Part 2: Data Transformation
-I calculated the survey length by subtracting the first date from the last. -There are two rows for each survey, so I created a new column to indicate the winning candidate for that survey. -I also made a new column to count the number of sponsors for each pollster.
###Part 3: Correlations
-So far, I have been analyzing the correlation between the percentage of the candidate projected to win and the survey length. -There were some outliers in the survey length data, so I applied a filter to remove them.
###Part 4: Modeling Model 1: Simple Linear Regression
-I used survey length and the percentage of the candidate projected to win for my linear regression. -The p-value is 0.01465, and the R-squared is 0.0003. This isn’t the best correlation, so I plan to explore other variables.
###Model 2: Multiple Linear Regression (Continuous Variables)
-This is where I plan to incorporate additional data like age and income, while still including survey length.
###Model 3: Regression with a Categorical Variable
-I intend to add region to this model to increase its detail and identify stronger correlations.
###Part 5: Analysis of Results
-I haven’t yet compared the three models, but I hope that when I do, the results will support my thesis and help predict the outcome of future elections.