This data set contains one variable of interest called trust.govt which is contains response values to the question:
“How much of the time do you think you can trust the government in Washington to do what is right? Just about always, most of the time, or only some of the time?”
This data set also contains the demographic variables: marital.status, education, race , income, ideology ,party , age
Each variable exists as a labelled factor and as a numeric vector. Numeric variable names are preceded by n.
load("ANES 2020 Govt Trust.Rdata")
dim(anes2020.govttrust)
## [1] 8280 18
sapply(anes2020.govttrust, class)
## trust.govt n.trust.govt marital.status education
## "factor" "numeric" "factor" "factor"
## race income ideology party
## "factor" "factor" "factor" "factor"
## age n.marital.status n.education n.race
## "character" "numeric" "numeric" "numeric"
## n.income n.ideology n.party n.age
## "numeric" "numeric" "numeric" "numeric"
## vote.choice2020 n.vote.choice2020
## "character" "numeric"
In order to define our research question, we need to investigate teh relationship between our variable of interest - in this case trust.govt - and some other variable. For example:
Is there a relationship between the age of voters and the amount of trust they have in the government to do what is right?
To begin our examination of this question, we would look at each of our variables.
table(anes2020.govttrust$trust.govt)
##
## -9. Refused -8. Don't know 1. Always
## 0 0 88
## 2. Most of the time 3. About half the time 4. Some of the time
## 1133 2569 3674
## 5. Never
## 779
summary(anes2020.govttrust$n.trust.govt)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 3.000 4.000 3.476 4.000 5.000 37
table(anes2020.govttrust$age)
##
## 18-30 31-40 41-50 51-60 60+
## 1143 1377 1219 1347 2840
summary(anes2020.govttrust$n.age)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 18.00 37.00 52.00 51.57 66.00 80.00 354
Note that a higher value of n.trust.govt corresponds to trusting the government less often
An analysis can proceded in many ways, but in this case we are going to begin by taking a correlation of the two numeric variables.
# correlation between trust and age
cor(anes2020.govttrust$n.trust.govt,anes2020.govttrust$n.age, use = "complete.obs")
## [1] -0.1347595
The correlation value of -0.1347595 indicates that there is a weak negative relationship between the two variables.
Another way to approach an analysis would be to create a two-way table of our two variables and observe the frequency of each combination of values. In order to draw meaningful comparisons between groups, we would need to use the prop.table() function to display these values as proportions.
# two-way table & prop.table
table1 <- table(anes2020.govttrust$trust.govt, anes2020.govttrust$age)
table2 <- prop.table(table1, margin = 2) # percent of columns (age groups)
table2[3:7,]
##
## 18-30 31-40 41-50 51-60
## 1. Always 0.016681299 0.011636364 0.013968776 0.011177347
## 2. Most of the time 0.102721686 0.098181818 0.124897288 0.146795827
## 3. About half the time 0.266022827 0.297454545 0.300739523 0.329359165
## 4. Some of the time 0.443371378 0.454545455 0.464256368 0.438152012
## 5. Never 0.171202809 0.138181818 0.096138044 0.074515648
##
## 60+
## 1. Always 0.006711409
## 2. Most of the time 0.175203108
## 3. About half the time 0.329212292
## 4. Some of the time 0.434122218
## 5. Never 0.054750971
In order to better visualize and draw comparisons between group percentages, we can use a Barplot. The default organization is to stack bars on top of one another.
The RColorBrewer package contains preset color pallets for visualizations. See all color pallette options here: https://www.r-graph-gallery.com/38-rcolorbrewers-palettes.html
# barplot
library(RColorBrewer)
age.group.colors <- brewer.pal(5, "Set2") # 5 colors from pallette "Set2"
barplot(table2[3:7,],
col = age.group.colors)
legend(4,1,legend = unique(anes2020.govttrust$trust.govt)[1:5],
fill = age.group.colors,
cex = .5)
A stacked barplot is not very informative when there are more than two groups incuded in the analysis. To improve our comparisons, we can create a grouped bar plot instead.
barplot(table2[3:7,],
beside = T,
col = age.group.colors,
ylim = c(0,.65),
ylab = "Proportion of Age Group")
legend(20,.65,legend = unique(anes2020.govttrust$trust.govt)[1:5],
fill = age.group.colors,
cex = .5)
This grouped bar plot does not show any obvious differences between the trust level breakdown of each age group. In fact, the distribution of trust responses appears nearly identical between age groups.
In order to plot the values grouped by trust level, we need to use the matrix transpose function: t() - this function swaps rows and columns.
table2.graph <- t(table2[3:7,]) #transpose table
barplot(table2.graph,
beside = T,
col = age.group.colors,
ylim = c(0,.6),
ylab = "Proportion of Age Group",
cex.names = .5)
legend(25,.6,legend = unique(anes2020.govttrust$age)[1:5],
fill = age.group.colors,
cex = .5)
Grouping the values by trust in government response allows us to better compare the differing proportion of each age group that selected each response.
We can summarize and display the overall differences between age groups’ trust in government by calculating the mean level of trust (using the numeric version of the variable because mean is a calculation) within each age group (using the categorical version of age).
# mean trust in each age group
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "18-30"], na.rm = T)
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "31-40"], na.rm = T)
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "41-50"], na.rm = T)
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "51-60"], na.rm = T)
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "60+"], na.rm = T)
## [1] 3.649693
## [1] 3.609455
## [1] 3.503698
## [1] 3.418033
## [1] 3.354998
We could do this more easily using the aggregate() function:
# mean trust in each age group
age.trust.means <- aggregate(anes2020.govttrust$n.trust.govt ~ anes2020.govttrust$age, FUN = mean)
age.trust.means
## anes2020.govttrust$age anes2020.govttrust$n.trust.govt
## 1 18-30 3.649693
## 2 31-40 3.609455
## 3 41-50 3.503698
## 4 51-60 3.418033
## 5 60+ 3.354998
Older age group have lower mean values of n.trust.govt, which indicates a higher average level of trust in government among older age groups (higher value of n.trust.govt corresponds to trusting the government less often).
# plot of mean trust
barplot(age.trust.means[,2],
names.arg = age.trust.means[,1],
ylim = c(0,5),
col = "lightblue",
main = "Average Trust in Government by Age Group",
ylab = "Mean Government Trust Value",
xlab = "Age Group")
text(c(1:5), age.trust.means[,2]+.1, labels = round(age.trust.means[,2], 3))
Compared to the entire 0-5 range of responses, the differences between group means are difficult to see. We could consider visualizing this another way.
# plot of mean trust
plot(age.trust.means[,2],
type = "b",
xaxt="n",
ylim = c(3.3,3.7),
main = "Average Trust in Government by Age Group",
ylab = "Mean Government Trust Value",
xlab = "Age Group")
axis(1,c(1:5), age.trust.means[,1])
text(c(1:5), age.trust.means[,2]+.02, labels = round(age.trust.means[,2], 3))
By relabeling the x-axis, we are able to display the decrease in mean value between age groups - which we can interpret as an increase in average trust in the government.
# mean trust in government among all voters
t.test(anes2020.govttrust$n.trust.govt, mu = 3)
##
## One Sample t-test
##
## data: anes2020.govttrust$n.trust.govt
## t = 49.004, df = 8242, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 3
## 95 percent confidence interval:
## 3.456881 3.494957
## sample estimates:
## mean of x
## 3.475919
#95 percent confidence interval:
# 3.456881 3.494957
# p-value < 2.2e-16 indicates that reject the null and claim that the true mean value of trust in government among all voters is not equal to 3. We are 95% confident that the true mean value lies between 3.456881 and 3.494957.
# difference in mean trust between age groups (over 50 vs under 50)
t.test(anes2020.govttrust$n.trust.govt[anes2020.govttrust$n.age > 50], anes2020.govttrust$n.trust.govt[anes2020.govttrust$n.age < 50])
##
## Welch Two Sample t-test
##
## data: anes2020.govttrust$n.trust.govt[anes2020.govttrust$n.age > 50] and anes2020.govttrust$n.trust.govt[anes2020.govttrust$n.age < 50]
## t = -10.594, df = 7447.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2522760 -0.1734911
## sample estimates:
## mean of x mean of y
## 3.375270 3.588153
# mean of x: 3.375270
# mean of y: 3.588153
# p-value < 2.2e-16 indicates that we can reject the null and claim that the difference in means between the two age groups is significant and not due to random chance.
There is a significant difference in the average trust in government between voters over and under 50, with the former displaying more trust in the government than the latter.
There is a weak positive relationship between age and trust in government. The average trust in government is higher among older age groups compared to younger age groups, but the magnitude of these differences is incredibly small. The range of all group mean values is 0.295, which is only 7.4% of the entire range of possible answers to the trust in government question.
The observed relationship appears to be largely driven by the differing proportions of each age group that selected the responses: “Most of the time” (lowest - 10% & highest - 18%) and “Never”(lowest - 5% & highest - 17%). Similar proportions (43-47%) of each age group selected the most common response, “Some of the time”.
These results suggest that, on average, while trust in government is similar between all age groups, younger voters have slightly less trust in the government than older voters. This comparison holds true for every age group included in our analysis.