Analysis Writing Example

Data Overview
Research Question
Correlation
Two-way Table
Stacked Barplot
Grouped Barplot
Grouped Barplot - switch groups
Identifying group means
Aggregating to identify group means
Plot group means - bar graph
Plot group means - line graph
Difference in Means
Discussion & Thesis

Data Overview

This data set contains one variable of interest called trust.govt which is contains response values to the question:
“How much of the time do you think you can trust the government in Washington to do what is right? Just about always, most of the time, or only some of the time?”

This data set also contains the demographic variables: marital.status, education, race , income, ideology ,party , age
Each variable exists as a labelled factor and as a numeric vector. Numeric variable names are preceded by n.

load("ANES 2020 Govt Trust.Rdata")
dim(anes2020.govttrust)

## [1] 8280   18

sapply(anes2020.govttrust, class)

##        trust.govt      n.trust.govt    marital.status         education 
##          "factor"         "numeric"          "factor"          "factor" 
##              race            income          ideology             party 
##          "factor"          "factor"          "factor"          "factor" 
##               age  n.marital.status       n.education            n.race 
##       "character"         "numeric"         "numeric"         "numeric" 
##          n.income        n.ideology           n.party             n.age 
##         "numeric"         "numeric"         "numeric"         "numeric" 
##   vote.choice2020 n.vote.choice2020 
##       "character"         "numeric"

Research Question

In order to define our research question, we need to investigate teh relationship between our variable of interest - in this case trust.govt - and some other variable. For example:

Is there a relationship between the age of voters and the amount of trust they have in the government to do what is right?

To begin our examination of this question, we would look at each of our variables.

table(anes2020.govttrust$trust.govt)

## 
##            -9. Refused         -8. Don't know              1. Always 
##                      0                      0                     88 
##    2. Most of the time 3. About half the time    4. Some of the time 
##                   1133                   2569                   3674 
##               5. Never 
##                    779

summary(anes2020.govttrust$n.trust.govt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   3.000   4.000   3.476   4.000   5.000      37

table(anes2020.govttrust$age)

## 
## 18-30 31-40 41-50 51-60   60+ 
##  1143  1377  1219  1347  2840

summary(anes2020.govttrust$n.age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   18.00   37.00   52.00   51.57   66.00   80.00     354

Note that a higher value of n.trust.govt corresponds to trusting the government less often

Correlation

An analysis can proceded in many ways, but in this case we are going to begin by taking a correlation of the two numeric variables.

# correlation between trust and age
cor(anes2020.govttrust$n.trust.govt,anes2020.govttrust$n.age, use = "complete.obs")

## [1] -0.1347595

The correlation value of -0.1347595 indicates that there is a weak negative relationship between the two variables.

Two-way Table

Another way to approach an analysis would be to create a two-way table of our two variables and observe the frequency of each combination of values. In order to draw meaningful comparisons between groups, we would need to use the prop.table() function to display these values as proportions.

# two-way table & prop.table
table1 <- table(anes2020.govttrust$trust.govt, anes2020.govttrust$age)
table2 <- prop.table(table1, margin = 2) # percent of columns (age groups)
table2[3:7,]

##                         
##                                18-30       31-40       41-50       51-60
##   1. Always              0.016681299 0.011636364 0.013968776 0.011177347
##   2. Most of the time    0.102721686 0.098181818 0.124897288 0.146795827
##   3. About half the time 0.266022827 0.297454545 0.300739523 0.329359165
##   4. Some of the time    0.443371378 0.454545455 0.464256368 0.438152012
##   5. Never               0.171202809 0.138181818 0.096138044 0.074515648
##                         
##                                  60+
##   1. Always              0.006711409
##   2. Most of the time    0.175203108
##   3. About half the time 0.329212292
##   4. Some of the time    0.434122218
##   5. Never               0.054750971

Stacked Barplot

In order to better visualize and draw comparisons between group percentages, we can use a Barplot. The default organization is to stack bars on top of one another.
The RColorBrewer package contains preset color pallets for visualizations. See all color pallette options here: https://www.r-graph-gallery.com/38-rcolorbrewers-palettes.html

# barplot
library(RColorBrewer) 
age.group.colors <- brewer.pal(5, "Set2") # 5 colors from pallette "Set2"

barplot(table2[3:7,], 
        col = age.group.colors)
legend(4,1,legend = unique(anes2020.govttrust$trust.govt)[1:5],
       fill = age.group.colors,
       cex = .5)

A stacked barplot is not very informative when there are more than two groups incuded in the analysis. To improve our comparisons, we can create a grouped bar plot instead.

Grouped Barplot

barplot(table2[3:7,], 
        beside = T,
        col = age.group.colors,
        ylim = c(0,.65),
        ylab = "Proportion of Age Group")
legend(20,.65,legend = unique(anes2020.govttrust$trust.govt)[1:5],
       fill = age.group.colors,
       cex = .5)

This grouped bar plot does not show any obvious differences between the trust level breakdown of each age group. In fact, the distribution of trust responses appears nearly identical between age groups.

Grouped Barplot - switch groups

In order to plot the values grouped by trust level, we need to use the matrix transpose function: t() - this function swaps rows and columns.

table2.graph <- t(table2[3:7,]) #transpose table

barplot(table2.graph, 
        beside = T,
        col = age.group.colors,
        ylim = c(0,.6),
        ylab = "Proportion of Age Group",
        cex.names = .5)
legend(25,.6,legend = unique(anes2020.govttrust$age)[1:5],
       fill = age.group.colors,
       cex = .5)

Grouping the values by trust in government response allows us to better compare the differing proportion of each age group that selected each response.

Identifying group means

We can summarize and display the overall differences between age groups’ trust in government by calculating the mean level of trust (using the numeric version of the variable because mean is a calculation) within each age group (using the categorical version of age).

# mean trust in each age group
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "18-30"], na.rm = T)
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "31-40"], na.rm = T)
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "41-50"], na.rm = T)
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "51-60"], na.rm = T)
mean(anes2020.govttrust$n.trust.govt[anes2020.govttrust$age == "60+"], na.rm = T)

## [1] 3.649693
## [1] 3.609455
## [1] 3.503698
## [1] 3.418033
## [1] 3.354998

Aggregating to identify group means

We could do this more easily using the aggregate() function:

# mean trust in each age group
age.trust.means <- aggregate(anes2020.govttrust$n.trust.govt ~ anes2020.govttrust$age, FUN = mean)
age.trust.means

##   anes2020.govttrust$age anes2020.govttrust$n.trust.govt
## 1                  18-30                        3.649693
## 2                  31-40                        3.609455
## 3                  41-50                        3.503698
## 4                  51-60                        3.418033
## 5                    60+                        3.354998

Older age group have lower mean values of n.trust.govt, which indicates a higher average level of trust in government among older age groups (higher value of n.trust.govt corresponds to trusting the government less often).

Plot group means - bar graph

# plot of mean trust
barplot(age.trust.means[,2], 
        names.arg = age.trust.means[,1],
        ylim = c(0,5),
        col = "lightblue",
        main = "Average Trust in Government by Age Group",
        ylab = "Mean Government Trust Value",
        xlab = "Age Group")
text(c(1:5), age.trust.means[,2]+.1, labels = round(age.trust.means[,2], 3))

Compared to the entire 0-5 range of responses, the differences between group means are difficult to see. We could consider visualizing this another way.

Plot group means - line graph

# plot of mean trust
plot(age.trust.means[,2],
        type = "b",
        xaxt="n",
        ylim = c(3.3,3.7),
        main = "Average Trust in Government by Age Group",
        ylab = "Mean Government Trust Value",
        xlab = "Age Group")
axis(1,c(1:5), age.trust.means[,1])
text(c(1:5), age.trust.means[,2]+.02, labels = round(age.trust.means[,2], 3))

By relabeling the x-axis, we are able to display the decrease in mean value between age groups - which we can interpret as an increase in average trust in the government.

Difference in Means

Null hypothesis 1: On average, Americans trust in government “about half of the time” (mean value = 3)
Null hypothesis 2: there is no difference in the average trust in government among voters over 50 and voters under 50. (mean value age>50 = mean value age<50)

# mean trust in government among all voters
t.test(anes2020.govttrust$n.trust.govt, mu = 3)

## 
##  One Sample t-test
## 
## data:  anes2020.govttrust$n.trust.govt
## t = 49.004, df = 8242, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 3
## 95 percent confidence interval:
##  3.456881 3.494957
## sample estimates:
## mean of x 
##  3.475919

#95 percent confidence interval:
 # 3.456881 3.494957

# p-value < 2.2e-16 indicates that reject the null and claim that the true mean value of trust in government among all voters is not equal to 3. We are 95% confident that the true mean value lies between 3.456881 and 3.494957.

# difference in mean trust between age groups (over 50 vs under 50)
t.test(anes2020.govttrust$n.trust.govt[anes2020.govttrust$n.age > 50], anes2020.govttrust$n.trust.govt[anes2020.govttrust$n.age < 50])

## 
##  Welch Two Sample t-test
## 
## data:  anes2020.govttrust$n.trust.govt[anes2020.govttrust$n.age > 50] and anes2020.govttrust$n.trust.govt[anes2020.govttrust$n.age < 50]
## t = -10.594, df = 7447.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2522760 -0.1734911
## sample estimates:
## mean of x mean of y 
##  3.375270  3.588153

# mean of x: 3.375270
# mean of y: 3.588153 

# p-value < 2.2e-16 indicates that we can reject the null and claim that the difference in means between the two age groups is significant and not due to random chance.

There is a significant difference in the average trust in government between voters over and under 50, with the former displaying more trust in the government than the latter.

Discussion & Thesis

There is a weak positive relationship between age and trust in government. The average trust in government is higher among older age groups compared to younger age groups, but the magnitude of these differences is incredibly small. The range of all group mean values is 0.295, which is only 7.4% of the entire range of possible answers to the trust in government question.

The observed relationship appears to be largely driven by the differing proportions of each age group that selected the responses: “Most of the time” (lowest - 10% & highest - 18%) and “Never”(lowest - 5% & highest - 17%). Similar proportions (43-47%) of each age group selected the most common response, “Some of the time”.

These results suggest that, on average, while trust in government is similar between all age groups, younger voters have slightly less trust in the government than older voters. This comparison holds true for every age group included in our analysis.