My research question after going through the data is:
Is political view of an individual USA citizen related to opinion towards the government expenses on improving education?
If there is any difference in opinion then it will mean Govt is not doing so well to ensure high quality education for ‘everyone’ in the country. This is why various citizens with political views have given different opinions. Those who are with Govt will try to defeat by giving false opinion and others may say the truth regardless of which political party is in position.
To do the inference I need variables. One is on the political views of the citizens of US and another is on their opinion towards the government expenses on education.
polviews has 7 levels: Extremely Conservative, Conservative, Slightly Conservative, Moderate, Slightly Liberal, Liberal and Extremely Liberal.
nateduc contained question about what an US citizen think on their Govt’s spending on Improving the nation’s education system.
It contains 3 levels: too much, too little and about the right amount.
Before performing inference, I can perform some exploratory data analysis (EDA) using summary statistics/tables and visual plots.
## polviews nateduc
## Extremely Liberal : 774 Too Little :17657
## Liberal : 3184 About Right: 7864
## Slightly Liberal : 3664 Too Much : 1916
## Moderate :10620
## Slightly Conservative: 4455
## Conservative : 3909
## Extrmly Conservative : 831
## polviews Too Little About Right Too Much
## Extremely Liberal 599 147 28
## Liberal 2309 749 126
## Slightly Liberal 2542 962 160
## Moderate 6852 3184 584
## Slightly Conservative 2708 1392 355
## Conservative 2212 1211 486
## Extrmly Conservative 435 219 177
## Total 17657 7864 1916
The table shows some difference in opinions among the citizens with different political views.
It becomes easier to see this difference visually:
ggplot(dt, aes(x=polviews, fill=nateduc))+
theme(panel.border=element_rect(colour='black', fill=NA)) +
theme(text = element_text(size = 13)) +
labs(x = 'Political Views', y='Proportion')+
geom_bar(position='fill', color='black')+
scale_fill_discrete(name="Opinion")+
coord_flip()The graph clearly shows that Liberal US citizens are more concerned of the Govt. expense on improving education than Conservative US citizens.
Although, The visual indicates that there may be differences, but we need to perform actual inference to confirm that.
The null hypothesis (H0) is that there is no association between the political views of US citizens and their opinions on Govt expenses on improving education.
The alternative hypothesis (HA) is that the political views of US citizens and their opinions on Govt expenses on improving education are associated.
Since the dataset consists of two categorical variables (polviews and nateduc), the adequate test to be used is the chi-square test of independence.
This test is to be used when comparing 2 categorical variables where one of the variables has more than 2 levels. This is the case here, as can be seen below:
## 'data.frame': 27437 obs. of 2 variables:
## $ polviews: Factor w/ 7 levels "Extremely Liberal",..: 4 5 6 6 6 5 5 5 6 2 ...
## $ nateduc : Factor w/ 3 levels "Too Little","About Right",..: 1 1 2 2 1 1 1 1 1 2 ...
## - attr(*, "na.action")= 'omit' Named int [1:29624] 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "names")= chr [1:29624] "1" "2" "3" "4" ...
The chi-square test does not define confidence intervals, so it is not included in this analysis.
The key conditions for the chi-square test of independence are:
## nateduc
## polviews Too Little About Right Too Much
## Extremely Liberal 599 147 28
## Liberal 2309 749 126
## Slightly Liberal 2542 962 160
## Moderate 6852 3184 584
## Slightly Conservative 2708 1392 355
## Conservative 2212 1211 486
## Extrmly Conservative 435 219 177
Since the conditions are met, we can proceed to the next step.
Finally, inference calculation using the chi-square test:
##
## Pearson's Chi-squared test
##
## data: dt$polviews and dt$nateduc
## X-squared = 759.64, df = 12, p-value < 0.00000000000000022
As seen above, the Chi-squared value is too high resulting a very p-value.
Get help on your problems from experienced statisticians at homeworkhelponline.net.
Thank you!