This week’s coding goals

This week my main focus was to complete the exploratory analyses coding.

Achieving the goals

Second exploratory research question

So for my next research question, I am looking at whether at political orientation between age-groups and whether this also reflects a correlation with belief superiority?

Like gender, there is also potential for levels of conservatism and liberalism to differ between people of different age groups. This could be due to the generational differences between age groups or it could be an indication that individuals become more conservative as they grow older. It would also be interesting to test whether these differences in the levels of conservatism and liberalism would translate into higher/ lower belief since the Harris and van Bavel did find that individuals with higher conservatism also had higher dogmatism.

So to start off I need I will make a new dataframe following Jennifer’s advice from the week 9 Q and A. However, all the code for finding the mean dogmatism scores will stay the same. This time I also included the mean centered political orientation variables.

library(tidyverse)
library(dplyr)
library(ggplot2)
library(car)
library(ggeasy)

Ex_data <- read.csv("beliefsuperiority_all.csv")
Ex_data <- filter(Ex_data,Q62 == 1)

Ex_dog= filter(Ex_data,AC_a==3) %>% 
  filter(AC_b==5)

Ex_dog=dplyr::select(Ex_dog,-starts_with('AC'))

Ex_dog$Q37_2 = recode(Ex_dog$Q37_2, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_4 = recode(Ex_dog$Q37_4, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_5 = recode(Ex_dog$Q37_5, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_7 = recode(Ex_dog$Q37_7, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_10 = recode(Ex_dog$Q37_10, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_11 = recode(Ex_dog$Q37_11, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_13 = recode(Ex_dog$Q37_13, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_16 = recode(Ex_dog$Q37_16, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_18 = recode(Ex_dog$Q37_18, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
Ex_dog$Q37_19 = recode(Ex_dog$Q37_19, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')

dogscale=dplyr::select(Ex_dog,starts_with('Q37'))
Ex_dog$meanDog=rowMeans(dogscale,na.rm = TRUE)


#political orientation
Ex_dog$PO_c= Ex_dog$Q12-mean(Ex_dog$Q12,na.rm=TRUE)

I then made separate age groups so that it was easier to compare the average mean-centered political orientation score between the groups. Since I want a relatively equal number of participants in each age group, I am using the cut_number() function from the ggplot2 package.

Ex_dog$age_groups <- cut_number(Ex_dog$age, 8)

This creates the following levels: 18-25, 25-32, 32-37, 37-44, 44-50, 50-58, 58-66 and 66-86.

For the descriptives, I’ll be finding the average political orientation of the age groups I have established above. So using the same method I used for the descriptives of the gender question, I used gt() from the gt package to create a table showing the average mean-centered political orientation, the SD and SE that each of the age groups have.

library(gt)

Age_PO <- Ex_dog %>% group_by(age_groups) %>% 
  summarise(mean = mean(PO_c, na.rm = TRUE),
            sd = sd(PO_c, na.rm = TRUE),
            n = n(),
            se = sd/sqrt(n))
Age_PO %>% gt() %>% fmt_number(columns = vars(mean,sd,se),
      decimals = 2)
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
age_groups mean sd n se
[18,25] 0.32 1.44 96 0.15
(25,32] 0.25 1.53 90 0.16
(32,37] 0.31 1.48 82 0.16
(37,44] 0.04 1.68 94 0.17
(44,50] −0.09 1.64 84 0.18
(50,58] −0.39 1.69 93 0.18
(58,66] −0.23 1.85 80 0.21
(66,86] −0.21 1.81 88 0.19

Since I am using mean-centered variables, I looked at the actual political orientation question used in the original study and found that higher values are reflective of more extreme liberalism and vice versa. So this means that with mean-centered political orientation, values further in the negative are indicative of more extreme conservatism and values further in the positive are indicative of more extreme liberalism. Looking at the table, older age groups tend to be slightly more conservative and younger age groups tend to be slightly more liberal. It seems that the 18-25 age group are the most liberal and the 50-58 age group are the most conservative. However overall, all the age groups aren’t really that extreme in their political orientation.

Unfortunately, I had a lot of trouble this week understanding what type of variable political orientation was. So for this research question I wasn’t able to create a suitable graph (see challenges) nor was I able to decide on an appropriate statistical test.

For the second part of my question, I used the same gt() method with the age groups I made prior to see the mean level of dogmatism between each group.

Age_Dog <- Ex_dog %>% group_by(age_groups) %>% 
  summarise(mean = mean(meanDog, na.rm = TRUE),
            sd = sd(meanDog, na.rm = TRUE),
            n = n(),
            se = sd/sqrt(n))

Age_Dog %>% gt() %>% fmt_number(columns = vars(mean,sd,se),
      decimals = 2)
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
age_groups mean sd n se
[18,25] 4.39 0.97 96 0.10
(25,32] 4.51 0.96 90 0.10
(32,37] 4.60 1.12 82 0.12
(37,44] 4.65 0.92 94 0.10
(44,50] 4.47 1.22 84 0.13
(50,58] 4.35 1.04 93 0.11
(58,66] 4.28 1.38 80 0.15
(66,86] 4.09 1.22 88 0.13

The table shows that the youngest age group (18-25) and the older age groups (50-86) tend to show less dogmatism while participants ranging from 25 to 50 tend to be more dogmatic.

To visualize this data, I will using scatterplot since both variables (age and dogmatism) are continuous variables. So following the week 8 Q and A, I used ggplot() and geom_point() to make a basic scatterplot. I then used geom_smooth() with the method “lm” to plot a line of regression. Finally, I used position_jitter() to introduce noise into the data so that it was easier to see any trends.

ggplot(Ex_dog, aes(age, meanDog)) + geom_point(aes(y = meanDog), position= position_jitter(width = 3), size = 2.5, alpha = 0.6) + geom_smooth(method = "lm") + labs(x = "Age", y = "Mean Dogmatism")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).

Based on the graph, it seems that there isn’t really much of a correlation between age and dogmatism. If anything at all, older participants seem to be only slightly less dogmatic.

Now to test the strength of the correlation, I used the correlation matrix from the week 9 Q and A using the cor() function from the corrplot package.

library(corrplot)
## corrplot 0.90 loaded
corr_age_dog <- cor(Ex_dog$age, Ex_dog$meanDog, use = "complete.obs")

The cor() function showed me that r = -0.11 which confirms that there is basically no correlation between age and dogmatism.

Third exploratory research question

For my third exploratory analyses question, I was interested in whether there was any significant difference in the controversiality scores on different topics between Republicans and Democrats. I was particularly interested in these two groups since they are politically polarized so it would be interesting to see if both groups ranked certain political topics the same or different.

First I had to read the pilot study data via the read.csv() method since it was in Excel format. I then created new dataframes for Republicans and Democrats using the filter() function from dplyr.

Ex_Pdata <- read.csv("pilotdata_all.csv") 

Republicans = filter(Ex_Pdata, PA==2)
Democrats = filter(Ex_Pdata, PA==1)

At first, I used the subset() function and the ! sign to remove rows that had 3, 4 or 5 in the column PA. However, I found that this made the analyses more complex, so I decided to create new dataframes for Republicans and Democrats instead of removing subsets for Independents, Other and participants with no political affiliation.

I then calculated the column means for columns 8 to 29 since these corresponded to the controversiality topic items used in the study. To do this I used colMeans() to calculate column means as well as the subset() function to create a criteria for the specific columns that I wanted to find means for (8 to 29). I also checked the means I got with the means in the pilot study data to make sure that I was doing the right thing (which I was!!).

R_means = colMeans(subset(Republicans, select = c(8:29)), na.rm=TRUE)
D_means = colMeans(subset(Democrats, select = c(8:29)), na.rm=TRUE)

Now I want to graph the means in a table for descriptives. So using the gt() function, I created two tables for the mean controversiality ratings for Republicans and Democrats.

Controversiality <- Republicans %>% separate(Republicans, Q6, into=c(8:29)) %>% group_by(Q6) %>% summarise(mean = colMeans(subset(select = c(8:29)), na.rm=TRUE),
            sd = sd(R_means, na.rm = TRUE),
            n = n(),
            se = sd/sqrt(n))

Controversiality <- gt() %>% fmt_number(columns = vars(mean,sd,se),
      decimals = 2)

Controversiality <- Democrats %>% separate(Democrats, Q6, into=c(8:29)) %>% group_by(Q6) %>% summarise(mean = colMeans(subset(select = c(8:29)), na.rm=TRUE),
            sd = sd(D_means, na.rm = TRUE),
            n = n(),
            se = sd/sqrt(n))

Unfortunately, I couldn’t figure out how to create a table that graphs the means I already found in R_means. I also don’t know if the separate() function to group the variables by was correct.

I also became stumped on whether a t-test was appropriate for comparing the mean controversiality ratings between topics and political affiliations.

Challenges and successes

Challenges

I had quite a lot of issues regarding my second research question for this week…

  1. By using the cut_number function, each group would have a relatively equal number of participants but the age difference in the last group is 20 years which is drastically different from the age differences in the other groups (typically range from 5-8 yrs difference). This method also created an overlap in the age groups e.g. two groups have age 25 within their ranges.

  2. During this task, I had a lot of trouble figuring which graphs were appropriate for the two variables I have. At first I did a scatterplot following the original study which, plotted Political Orientation on the x-axis and dogmatism (which was a continuous variable, like age) on the y-axis. I ended up with this graph which was hard to see any trends in.

ggplot(Ex_dog, aes(PO_c, age)) + geom_point(position = position_jitter(width = .15), size = 2.5, alpha = 0.6) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).

However, upon further inspection, the items in the scale seem to be either discrete numerical variables or categorical variables. So after Googling, it seems that either type of variable requires a bar graph. So I tried it out which, gave me this wacky graph.

ggplot(Ex_dog) + geom_col(aes(age, PO_c)) + coord_flip() 
## Warning: Removed 1 rows containing missing values (position_stack).

So then I tried to plot the PO_c on the x-axis which, made it worse.

ggplot(Ex_dog) + geom_col(aes(PO_c, age)) + coord_flip()
## Warning: Removed 1 rows containing missing values (position_stack).

Then I tried to do a horizontal boxplot with a scatter which, gave me this…

ggplot(Ex_dog, aes(age_groups, PO_c)) + geom_boxplot() + coord_flip() + geom_jitter(alpha=.4)
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing missing values (geom_point).

This graph is a bit more clear than the previous graphs if I only looked at the boxes but it’s still really messy to look at. I’m not really sure where I’m making errors.

  1. The first time I ran the scatterplot chunk for age and dogmatism, the line of regression plotted correctly near the middle of the graph. However, when I ran the chunk again, the line of regression shifted to the top of the graph.

  2. As seen above, I also had issues creating a table for my third research question and deciding whether t-tests were appropriate.

Successes

One of the triumphs I had this week was that I was able to fix an error I had with cor() function. Initially, I the correlation for corr_age_dog showed NA_real, but using Google, I found that I just had to include the use argument to tell R to only use complete values.

I also finally found an explanation for why the graphs in the published paper were flipped. Looking at the political orientation questions in the original study, higher values reflected stronger liberalism and lower values reflected stronger conservatism. However, the description of the published figure stated that instead, the lower values depicted stronger liberalism whereas higher values showed stronger conservatism. Hence, why the graph in the published paper is a flipped version of the graph produced from the codebook (but it would have still been useful for the researchers to have recorded this change in their codebook).

The next stage

For next week, I hope to fix the issues that I have encountered this week and perhaps also dive into changing the appearance of the graphs for aesthetic purposes. I also hope to finish my verification report by the end of next week.

Questions for Q&A

  1. For the descriptive statistics on age and political orientation, is it ok to separate the age groups like I did? Since there are overlaps between each age group, I think this would change the mean political orientation scores?

  2. Since the political orientation scale has 7 items with 1 being extreme conservatism and 7 being extreme liberalism, would this mean that this scale produces categorical variables?

  3. Would a boxplot be an appropriate graph for graphing age and political orientation?

  4. Is my reason for my 3rd research question a good justification?

  5. For my 3rd research question, is the separate() function appropriate to create a new characteristic for the group_by() function?

  6. Is there a way where I create a table with the means I have in R_means rather than re-using the entire line of code for finding R_means?

  7. Since there are 20 topics of controversiality to compare between Republicans and Democrats what would be a more appropriate statistical analyses? Or would a t-test for each topic be enough?