Introduction

The objective of this report is to analyse a polling dataset.A political poll is a survey to research the public opinion about candidates of an election. For this project four tasks need to be examined, all task are related to a survey dataset that contains information about the election of two candidates (A and B). The tasks for this project are:

  • Candidates image comparison;
  • Ballot results according to candidate A image;
  • Voters demographics data analysis;
  • Significance test between ballot results and voters age.

Furthermore, the dataset contains a total of 645 observations from 8 variables. There are no missing values on the dataset, consequently no data cleaning is needed.

In order to complete the four tasks, the dataset was manipulated to contain these following variables illustrated in the following table.

Variable Description
Candidate Identity of the candidate
Image The image of the candidate according to the voter
Ballot Probable vote of the person
Age Approximate age of the voter
Income Approximate value of voter income

Candidates image comparison

As mentioned, two candidates are participating in this project election. Aiming to compare the candidate’s image, in other words, how each candidate is seen by the electors, it will be created the following graph.

The bar plot illustrated shows how many persons chose one of each six images for each candidate.

Analysing the candidates image comparison plot, it can be noticed that both candidates have similar image results. Beyond that, it can also be observed that Candidate B has more capacity to change his image results because of the higher number of persons that have no opinion about this candidate or never hear of him.

Ballot results according to candidate A image

For observe the overall result of candidate A, it will be created a table showing candidate A image according to the voter ballon result.

Ballot Candidate.A.Image Count Percentage Total
Definitely Candidate A No opinion 4 0.0263158 152
Definitely Candidate A Somewhat favorable 32 0.2105263 152
Definitely Candidate A Somewhat unfavorable 3 0.0197368 152
Definitely Candidate A Very favorable 104 0.6842105 152
Definitely Candidate A Very unfavorable 9 0.0592105 152
Definitely Candidate B No opinion 9 0.0412844 218
Definitely Candidate B Somewhat favorable 53 0.2431193 218
Definitely Candidate B Somewhat unfavorable 69 0.3165138 218
Definitely Candidate B Very favorable 10 0.0458716 218
Definitely Candidate B Very unfavorable 77 0.3532110 218
Probably Candidate A No opinion 4 0.0404040 99
Probably Candidate A Somewhat favorable 52 0.5252525 99
Probably Candidate A Somewhat unfavorable 9 0.0909091 99
Probably Candidate A Very favorable 32 0.3232323 99
Probably Candidate A Very unfavorable 2 0.0202020 99
Probably Candidate B No opinion 11 0.0948276 116
Probably Candidate B Somewhat favorable 35 0.3017241 116
Probably Candidate B Somewhat unfavorable 50 0.4310345 116
Probably Candidate B Very favorable 9 0.0775862 116
Probably Candidate B Very unfavorable 11 0.0948276 116
Undecided Never heard of 2 0.0333333 60
Undecided No opinion 8 0.1333333 60
Undecided Somewhat favorable 28 0.4666667 60
Undecided Somewhat unfavorable 12 0.2000000 60
Undecided Very favorable 2 0.0333333 60
Undecided Very unfavorable 8 0.1333333 60

It can be noticed that 50% of the undecided voters are very or somewhat favourable of candidate A election, this represents a significant number (30 voters) that can be persuaded to vote for this candidate.

Voters demographics data analysis

A heatmap is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions.

Two heatmaps will be created to show the ballot results according to the voters demographics data, it will be analysed the voter income and age caracteristics.

Heatmap of ballot results according to voter income

Heatmap of ballot results according to voter age

The heatmaps shows that candidate B has a higher percentage of older voters with a low income. On the other hand, candidate A voters are yonger and have a higher income.

Significance test between ballot results and voters age

To conclude the project it will be investigated if the ballot results have a significant relationship with the voter age. To evaluate this, it will be used the Chi_Square test.

A Chi-Square test is a statistical hypothesis test used for categorical variables. This test uses the distribution of observations in different categories variables to determine if the variables have a significant relationship. Furthermore, the purpose of the Chi-Square test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true.

There are two types of Chi-square test one-sample and two-sample. The one-sample Chi-square is used to evaluate if a sample is significantly related to the null hypothesis, on the other hand, to two-sample can be used to compare two groups for categorical variables.

## 
##  Pearson's Chi-squared test
## 
## data:  chisq_data
## X-squared = 26.7, df = 12, p-value = 0.008534

The Chi-Square test results showed a high chi-squared value and a p-value of less than 0.05 significance level. So, the null hypothesis can discredit and we can conclude that the ballot result and the voter’s age have a significant relationship.

Conclusion

To conclude, the data analysis performed by this report demonstrate that the two candidates obtain similar image results. But some differences could be noticed. For example, candidate B has a higher number of voters with no opinion about him.

The ballot result analysis showed that candidate A has half of the undecided voters somewhat favourable or very favourable to him. This fact exposes a potential number of electors that could be persuaded to vote for this candidate.

Finally, the last data analysis of this project showed that the ballot result and the voter’s age have a significant relationship. The older persons with a low income are most of the candidate B electors.