Week 8

Goals

My goals for this week were:

  1. Finish and present our group presentation! (Yay!!)

  2. Think of some questions to use for exploratory analysis

  3. Start coding one of my questions for the exploratory analysis

Thinking of questions

To start thinking of some potential questions to ask the data, I first used the glimpse() function to look at what other variables are in this data set.

glimpse(data)
## Rows: 1,319
## Columns: 65
## $ Q62             <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ immigration_a   <dbl> 2, 2, 1, 1, 1, 1, 4, 4, 3, 5, 2, 1, 1, 2, 3, 1, 4, 3, …
## $ immigration_b   <dbl> 3, 2, 5, 5, 3, 5, 2, 2, 1, 5, 3, 2, 3, 2, 1, 3, 1, 2, …
## $ abortion_a      <dbl> 3, 5, 1, 1, 2, 2, 3, 3, 1, 4, 3, 2, 1, 4, 5, 3, 2, 5, …
## $ abortion_b      <dbl> 3, 2, 5, 5, 4, 2, 4, 3, 1, 3, 3, 3, 2, 3, 1, 3, 5, 5, …
## $ vote_a          <dbl> 3, 3, 2, 2, 2, 2, 3, 4, 3, 5, 3, 2, 2, 3, 2, 2, 2, 2, …
## $ vote_b          <dbl> 3, 2, 4, 5, 5, 3, 4, 2, 1, 5, 1, 2, 5, 3, 1, 3, 5, 3, …
## $ tax_a           <dbl> 2, 2, 1, 1, 3, 3, 3, 3, 2, 2, 2, 3, 3, 2, 1, 1, 3, 1, …
## $ tax_b           <dbl> 3, 2, 4, 1, 1, 3, 1, 2, 1, 2, 1, 5, 3, 3, 2, 3, 1, 5, …
## $ torture_a       <dbl> 3, 3, 2, 3, 3, 5, 4, 4, 2, 3, 3, 3, 4, 3, 3, 3, 2, 5, …
## $ torture_b       <dbl> 4, 1, 1, 1, 1, 5, 1, 4, 1, 2, 3, 1, 1, 2, 3, 3, 4, 4, …
## $ affirmaction_a  <dbl> 4, 4, 2, 3, 3, 2, 4, 3, 4, 5, 4, 2, 2, 2, 4, 3, 2, 2, …
## $ affirmaction_b  <dbl> 3, 2, 5, 1, 1, 5, 3, 3, 1, 5, 3, 2, 2, 1, 3, 3, 1, 2, …
## $ military_a      <dbl> 3, 2, 1, 1, 5, 3, 4, 2, 2, 4, 3, 2, 2, 3, 5, 2, 3, 3, …
## $ military_b      <dbl> 3, 1, 5, 1, 3, 4, 3, 3, 1, 2, 2, 3, 4, 3, 4, 3, 1, 5, …
## $ covidgov_a      <dbl> 2, 4, 1, 3, 3, 3, 4, 3, 3, 3, 3, 3, 5, 4, 3, 4, 2, 2, …
## $ covidgov_b      <dbl> 2, 3, 5, 1, 3, 3, 2, 3, 1, 2, 3, 4, 3, 3, 2, 3, 1, 4, …
## $ AC_a            <dbl> 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 2, 3, 3, 3, 2, 2, …
## $ AC_b            <dbl> 2, 5, 5, 5, 5, 5, 5, 1, 1, 5, 5, 5, 4, 5, 5, 5, 5, 1, …
## $ Q37_1           <dbl> 7, 4, 9, 6, 5, 5, 7, 7, 5, 3, 5, 2, 4, 6, 1, 7, 2, 2, …
## $ Q37_2           <dbl> 6, 9, 9, 9, 9, 6, 4, 4, 5, 4, 5, 2, 7, 5, 9, 8, 4, 6, …
## $ Q37_3           <dbl> 6, 3, 9, 3, 4, 5, 8, 6, 1, 4, 5, 4, 7, 8, 5, 7, 6, 4, …
## $ Q37_4           <dbl> 8, 9, 9, 6, 8, 5, 6, 8, 6, 3, 5, 4, 3, 5, 8, 7, 8, 4, …
## $ Q37_5           <dbl> 5, 7, 9, 5, 8, 5, 3, 8, 4, 9, 5, 8, 6, 6, 9, 7, 6, 7, …
## $ Q37_6           <dbl> 6, 5, 9, 6, 4, 5, 8, 6, 7, 8, 1, 9, 5, 5, 5, 7, 5, 7, …
## $ Q37_7           <dbl> 2, 7, 9, 5, 8, 6, 7, 4, 1, 3, 5, 1, 3, 2, 7, 7, 6, 2, …
## $ Q37_8           <dbl> 5, 4, 9, 5, 6, 6, 9, 4, 3, 6, 5, 8, 2, 7, 6, 7, 3, 3, …
## $ Q37_9           <dbl> 6, 2, 8, 2, 4, 5, 2, 6, 4, 4, 5, 4, 6, 5, 7, 7, 8, 5, …
## $ Q37_10          <dbl> 8, 9, 9, 5, 8, 6, 6, 5, 5, 4, 5, 9, 3, 5, 7, 7, 5, 5, …
## $ Q37_11          <dbl> 7, 6, 8, 7, 9, 7, 6, 7, 7, 4, 5, 4, 2, 5, 9, 7, 4, 2, …
## $ Q37_12          <dbl> 7, 5, 9, 4, 6, 4, 9, 6, 5, 7, 5, 9, 6, 6, 6, 7, 6, 9, …
## $ Q37_13          <dbl> 8, 9, 7, 8, 8, 5, 6, 2, 5, 3, 5, 2, 5, 5, 5, 7, 7, 7, …
## $ Q37_14          <dbl> 1, 2, 8, 2, 5, 4, 5, 5, 1, 3, 3, 9, 6, 5, 5, 7, 6, 2, …
## $ Q37_15          <dbl> 5, 6, 9, 4, 3, 6, 9, 5, 5, 1, 5, 6, 4, 6, 3, 8, 5, 3, …
## $ Q37_16          <dbl> 8, 8, 9, 5, 7, 4, 8, 7, 3, 4, 5, 4, 3, 4, 7, 7, 3, 3, …
## $ Q37_17          <dbl> 7, 7, 9, 7, 2, 5, 1, 7, 4, 1, 5, 3, 5, 5, 2, 7, 6, 1, …
## $ Q37_18          <dbl> 9, 9, 9, 5, 9, 5, 7, 3, 5, 3, 4, 1, 5, 5, 7, 8, 5, 7, …
## $ Q37_19          <dbl> 6, 8, 9, 5, 6, 5, 8, 8, 5, 3, 5, 3, 7, 7, 5, 8, 5, 1, …
## $ Q37_20          <dbl> 7, 5, 9, 2, 3, 3, 5, 7, 5, 1, 1, 4, 8, 5, 1, 7, 2, 2, …
## $ Q8              <dbl> 1, 4, 1, 4, 4, 5, 4, 3, 1, 4, 4, 2, 4, 1, 1, 4, 2, 4, …
## $ Q10             <dbl> 5, 6, 4, 7, 2, 4, 6, 3, 4, 3, 1, 5, 5, 4, 4, 6, 4, 1, …
## $ Q12             <dbl> 5, 1, 4, 1, 4, 4, 2, 3, 3, 5, 2, 2, 5, 4, 4, 6, 3, 4, …
## $ Q39             <dbl> 5, 4, 7, 1, 4, 4, 6, 1, 2, 5, 2, 1, 4, 4, 4, 6, 3, 4, …
## $ Q40             <dbl> 5, 2, 7, 1, 4, 4, 4, 4, 3, 5, 2, 1, 3, 4, 5, 6, 3, 4, …
## $ Q14             <dbl> 6, 5, 7, 4, 6, 6, 6, 5, 6, 6, 5, 6, 4, 6, 5, 6, 5, 4, …
## $ Q16             <dbl> 1, 1, 2, 2, 3, 2, 3, 2, 1, 1, 3, 1, 1, 2, 1, 3, 3, 3, …
## $ Q18             <dbl> 1, 2, 1, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, …
## $ Q20             <dbl> 26, 83, 41, 55, 35, 35, 24, 28, 49, 24, 53, 32, 49, 25…
## $ Q62_1           <dbl> 4, 5, 5, 5, 5, 5, 5, 7, 5, 4, 4, 3, 3, 2, 5, 2, 5, 5, …
## $ Q44_1           <dbl> 5, 5, 1, 6, 5, 7, 6, 6, 4, 7, 4, 3, 2, 4, 4, 6, 4, 1, …
## $ Q44_2           <dbl> 4, 5, 7, 7, 6, 6, 6, 2, 7, 6, 6, 7, 5, 4, 7, 6, 6, 2, …
## $ Q44_3           <dbl> 5, 7, 7, 7, 4, 6, 5, 7, 7, 7, 7, 2, 7, 4, 7, 6, 6, 3, …
## $ Q44_4           <dbl> 6, 7, 1, 7, 4, 6, 4, 2, 7, 7, 7, 5, 3, 4, 4, 7, 6, 4, …
## $ Q44_5           <dbl> 5, 2, 1, 1, 2, 5, 3, 7, 1, 1, 1, 5, 1, 4, 4, 3, 4, 5, …
## $ Q58             <chr> "it was so cool.", "IT SEEMED WELL DESIGNED", "i like …
## $ rid             <chr> "5eb32f06-1a93-fad6-f05d-1a2c5bd2c3b3", "5eb3308c-7aa0…
## $ age             <dbl> 21, 83, 41, 55, 35, 35, 24, 28, 49, 24, 53, 33, 49, 25…
## $ gender          <dbl> 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, …
## $ hhi             <dbl> 11, 2, 20, 3, 1, 11, 2, 9, 13, 2, 12, 6, 3, 19, 12, 17…
## $ ethnicity       <dbl> 11, 1, 1, 1, 1, 1, 11, 3, 1, 15, 1, 2, 2, 4, 1, 4, 1, …
## $ hispanic        <dbl> 8, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ education       <dbl> 1, 2, 7, 2, 6, 6, 1, 6, 7, 6, 4, 6, 2, 6, 4, 6, 6, 2, …
## $ political_party <dbl> 9, 1, 10, 10, 4, 9, 7, 5, 2, 2, 3, 1, 2, 9, 2, 5, 4, 7…
## $ region          <dbl> 3, 4, 1, 3, 4, 3, 3, 3, 2, 4, 1, 3, 3, 4, 1, 4, 1, 1, …
## $ zip             <dbl> 33904, 92346, 13207, 34761, 83617, 30022, 76661, 74820…

I immediately noticed that variables not reported on in the article such as ethnicity and education, were in the data set. This could lead to some interesting questions!

My next step was to brainstorm some questions:

  1. Does participants’ level of education affect their dogmatism scores?

  2. Does how closely a participant follow politics affect their belief superiority?

  3. Does participants’ ethnicity affect their political affiliations? (which then implicates their dogmatism and belief superiority?)

  4. Does participants’ gender influence their belief superiority?

  5. Is there a relationship between age and dogmatism/belief superiority?

Question selection and coding steps

This week I decided to attempt answering question 4. I chose this question because I think it would be interesting to decipher whether there is a significant gender difference for belief superiority scores. Before submission of my VR, I need to find more literature supporting this research question.

Step 1: Summary

First, I used the summary() function to obtain further statistics about gender and belief superiority scores using the data_attn and beliefs data frames.

summary(data_attn$gender)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.549   2.000   2.000

The reason why I chose to look at the beliefs data frame was because through coding in my group assignment, we had already coded a data frame that only included items relating to participants’ belief superiority (i.e. beliefs).

summary(beliefs)
##  immigration_b     abortion_b        vote_b          tax_b      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :3.000   Median :3.000   Median :3.000   Median :2.000  
##  Mean   :2.643   Mean   :2.793   Mean   :2.704   Mean   :2.352  
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :3       NA's   :2                       NA's   :2      
##    torture_b     affirmaction_b    military_b      covidgov_b   
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :2.000   Median :2.000   Median :2.000   Median :3.000  
##  Mean   :2.523   Mean   :2.465   Mean   :2.419   Mean   :2.758  
##  3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  NA's   :1                       NA's   :1

Step 2: Renaming variables

For some reason, when I attempted to use the rename() function to rename 1 to “male” and 2 to “female”, it came up with an error that rename() could not be used for the numeric variable. I tried changing the numeric variable into character using as.character, and it again said that rename() could not be found using that variable type. So, I did a bit of googling and found these lines of code which worked!

Here, the gender column within data_attn is selected where 1 becomes equal to (using ==) “male” and 2 as “female”.

I then placed this into a tibble to check that it changed the variable to character, and correctly changed the variables themselves.

data_attn$gender[data_attn$gender == 1] <- "male"
data_attn$gender[data_attn$gender == 2] <- "female"
tibble(data_attn$gender)
## # A tibble: 707 x 1
##    `data_attn$gender`
##    <chr>             
##  1 female            
##  2 male              
##  3 female            
##  4 female            
##  5 female            
##  6 female            
##  7 female            
##  8 female            
##  9 female            
## 10 female            
## # … with 697 more rows

Step 3: Descriptives in a table

Next, I wanted to show my new R skills by attempting a function I hadn’t used before; gt(). As my article only had graphs, I wanted to try and understand gt() to place descriptive statistics already gathered by the researchers, into a table.

First I created a new data frame called gender_summary and selected data_attn to then summarise() the number, proportion and percentage of participants who were male and female (determined by using group_by()). I then piped ( %>%) gt() to create a table.

gender_summary <- data_attn %>% group_by(gender) %>% summarise(n=n(), proportion=n/707, percentage=proportion*100)
gender_summary %>% gt()
gender n proportion percentage
female 388 0.5487977 54.87977
male 319 0.4512023 45.12023

Step 4: The beliefs data frame

Because my question refers to belief superiority, I knew that I had to use the beliefs data frame as it relates to all belief superiority items.

As gender wasn’t initially a variable found in this data frame, I used the select() function within dplyr to add the gender column from data_attn, along with all items ending in _b (b for beliefs) into the beliefs data frame.

Next, I used mutate() to create a new variable within beliefs called B that included average belief scores per participant. This was done by adding all _b columns together and dividing them each by 8 (the total number of belief items). I then glimpsed the data to make sure this variable was created.

beliefs=dplyr::select(data_attn,ends_with('_b'), gender)

beliefs<-mutate(beliefs, B = affirmaction_b/8+military_b/8+covidgov_b/8+immigration_b/8+abortion_b/8+vote_b/8+tax_b/8+torture_b/8)

glimpse(beliefs$B)
##  num [1:707] 1.88 4.25 2.5 2.62 3.75 ...

Step 5: Summary table

I then wanted to create a summary table called beliefsup_summary to summarise the mean, standard deviation, standard error and n of belief superiority scores for male and female participants. To achieve this, I used group_by() to group B scores by gender into the beliefs data frame, and used mean(), sd(), and se() functions, whilst ignoring na values with na.rm=TRUE, to find the means, standard deviations, standard errors and n. I then used gt() to place this into a table.

beliefsup_summary <- beliefs %>% group_by(gender) %>% summarise(mean = mean(B, na.rm=TRUE), sd = sd(B, na.rm=TRUE), n=n(), se=sd/sqrt(n))

beliefsup_summary %>% gt()
gender mean sd n se
female 2.417755 1.004412 388 0.05099128
male 2.779762 1.090025 319 0.06102967

Step 6: Graph

Next, I wanted to make a graph of this summary table! First I used my new beliefsup_summary data frame and the ggplot() function to map gender onto the x-axis, the mean of B scores onto the y-axis and fill as gender again. I then used geom_col() to make the plot into a column graph and defined error bars using geom_errorbars() where the max value is the mean+se and the minimum value is the mean-se. I then added theme() components to make the background of the graph white (theme_minimal()), delete grid lines (theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())) and include axis lines with ticks on the y-axis (theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")))). I then used easy_all_text_size(12) + easy_remove_legend() functions to remove the legend and alter all text size on the graph to 12. I also changed the x- and y-axis labels using labs().

beliefsup_summary %>% ggplot(aes(gender, mean, fill=gender)) + geom_col() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2)+ theme_minimal() +theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_all_text_size(12) + easy_remove_legend() + labs(x='Gender', y='Mean Belief Superiority Score')

Step 7: Inferential statistics

After the introduction to inferential statistics in R during the Q&A session this week, I wanted to give the code a go!

First, from the beliefs data frame I created two new data frames each for male or female participants only using the filter() and == functions. This is so that the B scores of these two groups can be compared in a t-test using the t.test() function.

female <- beliefs %>% filter(gender=="female")
male <- beliefs %>% filter(gender=="male")
t.test(female$B, male$B)
## 
##  Welch Two Sample t-test
## 
## data:  female$B and male$B
## t = -4.523, df = 646.52, p-value = 7.255e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5191718 -0.2048429
## sample estimates:
## mean of x mean of y 
##  2.417755  2.779762

Looks like a significant result!

Questions for Jenny :)

  • In regards to my t-test above, I was a little surprised to see that it was significant! I then realised however, that I had mutated multiple columns into a new variable (B), so I was wondering that because there are multiple items averaged into one column/variable, if it is still ok to use a t-test when comparing the new merged variable to gender? If not, what should I use instead?

  • Also, I attempted to put the t-test information into my graph and I followed the ‘my_comparisons’ steps from the Q&A, and I am unsure what the warning means and why it doesn’t work?

my_comparisons <- list(c("male", "female"))

beliefsup_summary %>% ggplot(aes(gender, mean, fill=gender)) + geom_col() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2)+ theme_minimal() +theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_all_text_size(12) + easy_remove_legend() + stat_compare_means(comparisons = my_comparisons, method = "t.test") + labs(x='Gender', y='Mean Belief Superiority Score')
## Warning: Computation failed in `stat_signif()`:
## not enough 'x' observations

I also attempted this after deleting the comparisons = my_comparisons code:

beliefsup_summary %>% ggplot(aes(gender, mean, fill=gender)) + geom_col() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2)+ theme_minimal() +theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_all_text_size(12) + easy_remove_legend() + stat_compare_means(method = "t.test") + labs(x='Gender', y='Mean Belief Superiority Score')
## Warning: Computation failed in `stat_compare_means()`:
## not enough 'x' observations

Thank you!

Challenges

  • My main challenge this week was trying to figure out the correct data frame to use for my question. At first, I tried to use data_attn because it had already cleaned the data for participants who passed attention checks, but then I realised that beliefs was the most appropriate data frame! This is because it incorporates all belief superiority variables from data_attn and allowed me to mutate and create new variables for a table/graph.

  • My other challenge of this week was working out how to use statistics code. I am still unsure if what I’ve done is correct, as mentioned in the questions above.

Successes

  • The main success of this week was finishing and presenting the group presentation! I am so happy and proud of all that we accomplished as a team and how far we have come together.

  • My other success for this week was being able to figure out the code for the exploratory analysis! As I was very apprehensive and nervous about starting this section, I feel quite accomplished and it’s made me realise how far I have come in my coding journey!

  • I am also particularly happy with my table output using gt() as this was my first time using it!

Next Steps

  • My next steps for coding are to continue exploring the data and answering 2 more questions for my exploratory analysis.

  • Specifically, my next steps are also to continue figuring out how to incorporate statistics into my graphs.