The objective of this report is to analyse a polling dataset.A political poll is a survey to research the public opinion about candidates of an election. For this project four tasks need to be examined, all task are related to a survey dataset that contains information about the election of two candidates (A and B). The tasks for this project are:
Furthermore, the dataset contains a total of 645 observations from 8 variables. There are no missing values on the dataset, consequently no data cleaning is needed.
In order to complete the four tasks, the dataset was manipulated to contain these following variables illustrated in the following table.
| Variable | Description |
|---|---|
| Candidate | Identity of the candidate |
| Image | The image of the candidate according to the voter |
| Ballot | Probable vote of the person |
| Age | Approximate age of the voter |
| Income | Approximate value of voter income |
As mentioned, two candidates are participating in this project election. Aiming to compare the candidate’s image, in other words, how each candidate is seen by the electors, it will be created the following graph.
The bar plot illustrated shows how many persons chose one of each six images for each candidate.
Analysing the candidates image comparison plot, it can be noticed that both candidates have similar image results. Beyond that, it can also be observed that Candidate B has more capacity to change his image results because of the higher number of persons that have no opinion about this candidate or never hear of him.
For observe the overall result of candidate A, it will be created a table showing candidate A image according to the voter ballon result.
| Ballot | Candidate.A.Image | Count | Percentage | Total |
|---|---|---|---|---|
| Definitely Candidate A | No opinion | 4 | 0.0263158 | 152 |
| Definitely Candidate A | Somewhat favorable | 32 | 0.2105263 | 152 |
| Definitely Candidate A | Somewhat unfavorable | 3 | 0.0197368 | 152 |
| Definitely Candidate A | Very favorable | 104 | 0.6842105 | 152 |
| Definitely Candidate A | Very unfavorable | 9 | 0.0592105 | 152 |
| Definitely Candidate B | No opinion | 9 | 0.0412844 | 218 |
| Definitely Candidate B | Somewhat favorable | 53 | 0.2431193 | 218 |
| Definitely Candidate B | Somewhat unfavorable | 69 | 0.3165138 | 218 |
| Definitely Candidate B | Very favorable | 10 | 0.0458716 | 218 |
| Definitely Candidate B | Very unfavorable | 77 | 0.3532110 | 218 |
| Probably Candidate A | No opinion | 4 | 0.0404040 | 99 |
| Probably Candidate A | Somewhat favorable | 52 | 0.5252525 | 99 |
| Probably Candidate A | Somewhat unfavorable | 9 | 0.0909091 | 99 |
| Probably Candidate A | Very favorable | 32 | 0.3232323 | 99 |
| Probably Candidate A | Very unfavorable | 2 | 0.0202020 | 99 |
| Probably Candidate B | No opinion | 11 | 0.0948276 | 116 |
| Probably Candidate B | Somewhat favorable | 35 | 0.3017241 | 116 |
| Probably Candidate B | Somewhat unfavorable | 50 | 0.4310345 | 116 |
| Probably Candidate B | Very favorable | 9 | 0.0775862 | 116 |
| Probably Candidate B | Very unfavorable | 11 | 0.0948276 | 116 |
| Undecided | Never heard of | 2 | 0.0333333 | 60 |
| Undecided | No opinion | 8 | 0.1333333 | 60 |
| Undecided | Somewhat favorable | 28 | 0.4666667 | 60 |
| Undecided | Somewhat unfavorable | 12 | 0.2000000 | 60 |
| Undecided | Very favorable | 2 | 0.0333333 | 60 |
| Undecided | Very unfavorable | 8 | 0.1333333 | 60 |
It can be noticed that 50% of the undecided voters are very or somewhat favourable of candidate A election, this represents a significant number (30 voters) that can be persuaded to vote for this candidate.
A heatmap is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions.
Two heatmaps will be created to show the ballot results according to the voters demographics data, it will be analysed the voter income and age caracteristics.
The heatmaps shows that candidate B has a higher percentage of older voters with a low income. On the other hand, candidate A voters are yonger and have a higher income.
To conclude the project it will be investigated if the ballot results have a significant relationship with the voter age. To evaluate this, it will be used the Chi_Square test.
A Chi-Square test is a statistical hypothesis test used for categorical variables. This test uses the distribution of observations in different categories variables to determine if the variables have a significant relationship. Furthermore, the purpose of the Chi-Square test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true.
There are two types of Chi-square test one-sample and two-sample. The one-sample Chi-square is used to evaluate if a sample is significantly related to the null hypothesis, on the other hand, to two-sample can be used to compare two groups for categorical variables.
##
## Pearson's Chi-squared test
##
## data: chisq_data
## X-squared = 26.7, df = 12, p-value = 0.008534
The Chi-Square test results showed a high chi-squared value and a p-value of less than 0.05 significance level. So, the null hypothesis can discredit and we can conclude that the ballot result and the voter’s age have a significant relationship.
To conclude, the data analysis performed by this report demonstrate that the two candidates obtain similar image results. But some differences could be noticed. For example, candidate B has a higher number of voters with no opinion about him.
The ballot result analysis showed that candidate A has half of the undecided voters somewhat favourable or very favourable to him. This fact exposes a potential number of electors that could be persuaded to vote for this candidate.
Finally, the last data analysis of this project showed that the ballot result and the voter’s age have a significant relationship. The older persons with a low income are most of the candidate B electors.