My goals for this week were:
Finish and present our group presentation! (Yay!!)
Think of some questions to use for exploratory analysis
Start coding one of my questions for the exploratory analysis
To start thinking of some potential questions to ask the data, I first used the glimpse()
function to look at what other variables are in this data set.
glimpse(data)
## Rows: 1,319
## Columns: 65
## $ Q62 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ immigration_a <dbl> 2, 2, 1, 1, 1, 1, 4, 4, 3, 5, 2, 1, 1, 2, 3, 1, 4, 3, …
## $ immigration_b <dbl> 3, 2, 5, 5, 3, 5, 2, 2, 1, 5, 3, 2, 3, 2, 1, 3, 1, 2, …
## $ abortion_a <dbl> 3, 5, 1, 1, 2, 2, 3, 3, 1, 4, 3, 2, 1, 4, 5, 3, 2, 5, …
## $ abortion_b <dbl> 3, 2, 5, 5, 4, 2, 4, 3, 1, 3, 3, 3, 2, 3, 1, 3, 5, 5, …
## $ vote_a <dbl> 3, 3, 2, 2, 2, 2, 3, 4, 3, 5, 3, 2, 2, 3, 2, 2, 2, 2, …
## $ vote_b <dbl> 3, 2, 4, 5, 5, 3, 4, 2, 1, 5, 1, 2, 5, 3, 1, 3, 5, 3, …
## $ tax_a <dbl> 2, 2, 1, 1, 3, 3, 3, 3, 2, 2, 2, 3, 3, 2, 1, 1, 3, 1, …
## $ tax_b <dbl> 3, 2, 4, 1, 1, 3, 1, 2, 1, 2, 1, 5, 3, 3, 2, 3, 1, 5, …
## $ torture_a <dbl> 3, 3, 2, 3, 3, 5, 4, 4, 2, 3, 3, 3, 4, 3, 3, 3, 2, 5, …
## $ torture_b <dbl> 4, 1, 1, 1, 1, 5, 1, 4, 1, 2, 3, 1, 1, 2, 3, 3, 4, 4, …
## $ affirmaction_a <dbl> 4, 4, 2, 3, 3, 2, 4, 3, 4, 5, 4, 2, 2, 2, 4, 3, 2, 2, …
## $ affirmaction_b <dbl> 3, 2, 5, 1, 1, 5, 3, 3, 1, 5, 3, 2, 2, 1, 3, 3, 1, 2, …
## $ military_a <dbl> 3, 2, 1, 1, 5, 3, 4, 2, 2, 4, 3, 2, 2, 3, 5, 2, 3, 3, …
## $ military_b <dbl> 3, 1, 5, 1, 3, 4, 3, 3, 1, 2, 2, 3, 4, 3, 4, 3, 1, 5, …
## $ covidgov_a <dbl> 2, 4, 1, 3, 3, 3, 4, 3, 3, 3, 3, 3, 5, 4, 3, 4, 2, 2, …
## $ covidgov_b <dbl> 2, 3, 5, 1, 3, 3, 2, 3, 1, 2, 3, 4, 3, 3, 2, 3, 1, 4, …
## $ AC_a <dbl> 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 2, 3, 3, 3, 2, 2, …
## $ AC_b <dbl> 2, 5, 5, 5, 5, 5, 5, 1, 1, 5, 5, 5, 4, 5, 5, 5, 5, 1, …
## $ Q37_1 <dbl> 7, 4, 9, 6, 5, 5, 7, 7, 5, 3, 5, 2, 4, 6, 1, 7, 2, 2, …
## $ Q37_2 <dbl> 6, 9, 9, 9, 9, 6, 4, 4, 5, 4, 5, 2, 7, 5, 9, 8, 4, 6, …
## $ Q37_3 <dbl> 6, 3, 9, 3, 4, 5, 8, 6, 1, 4, 5, 4, 7, 8, 5, 7, 6, 4, …
## $ Q37_4 <dbl> 8, 9, 9, 6, 8, 5, 6, 8, 6, 3, 5, 4, 3, 5, 8, 7, 8, 4, …
## $ Q37_5 <dbl> 5, 7, 9, 5, 8, 5, 3, 8, 4, 9, 5, 8, 6, 6, 9, 7, 6, 7, …
## $ Q37_6 <dbl> 6, 5, 9, 6, 4, 5, 8, 6, 7, 8, 1, 9, 5, 5, 5, 7, 5, 7, …
## $ Q37_7 <dbl> 2, 7, 9, 5, 8, 6, 7, 4, 1, 3, 5, 1, 3, 2, 7, 7, 6, 2, …
## $ Q37_8 <dbl> 5, 4, 9, 5, 6, 6, 9, 4, 3, 6, 5, 8, 2, 7, 6, 7, 3, 3, …
## $ Q37_9 <dbl> 6, 2, 8, 2, 4, 5, 2, 6, 4, 4, 5, 4, 6, 5, 7, 7, 8, 5, …
## $ Q37_10 <dbl> 8, 9, 9, 5, 8, 6, 6, 5, 5, 4, 5, 9, 3, 5, 7, 7, 5, 5, …
## $ Q37_11 <dbl> 7, 6, 8, 7, 9, 7, 6, 7, 7, 4, 5, 4, 2, 5, 9, 7, 4, 2, …
## $ Q37_12 <dbl> 7, 5, 9, 4, 6, 4, 9, 6, 5, 7, 5, 9, 6, 6, 6, 7, 6, 9, …
## $ Q37_13 <dbl> 8, 9, 7, 8, 8, 5, 6, 2, 5, 3, 5, 2, 5, 5, 5, 7, 7, 7, …
## $ Q37_14 <dbl> 1, 2, 8, 2, 5, 4, 5, 5, 1, 3, 3, 9, 6, 5, 5, 7, 6, 2, …
## $ Q37_15 <dbl> 5, 6, 9, 4, 3, 6, 9, 5, 5, 1, 5, 6, 4, 6, 3, 8, 5, 3, …
## $ Q37_16 <dbl> 8, 8, 9, 5, 7, 4, 8, 7, 3, 4, 5, 4, 3, 4, 7, 7, 3, 3, …
## $ Q37_17 <dbl> 7, 7, 9, 7, 2, 5, 1, 7, 4, 1, 5, 3, 5, 5, 2, 7, 6, 1, …
## $ Q37_18 <dbl> 9, 9, 9, 5, 9, 5, 7, 3, 5, 3, 4, 1, 5, 5, 7, 8, 5, 7, …
## $ Q37_19 <dbl> 6, 8, 9, 5, 6, 5, 8, 8, 5, 3, 5, 3, 7, 7, 5, 8, 5, 1, …
## $ Q37_20 <dbl> 7, 5, 9, 2, 3, 3, 5, 7, 5, 1, 1, 4, 8, 5, 1, 7, 2, 2, …
## $ Q8 <dbl> 1, 4, 1, 4, 4, 5, 4, 3, 1, 4, 4, 2, 4, 1, 1, 4, 2, 4, …
## $ Q10 <dbl> 5, 6, 4, 7, 2, 4, 6, 3, 4, 3, 1, 5, 5, 4, 4, 6, 4, 1, …
## $ Q12 <dbl> 5, 1, 4, 1, 4, 4, 2, 3, 3, 5, 2, 2, 5, 4, 4, 6, 3, 4, …
## $ Q39 <dbl> 5, 4, 7, 1, 4, 4, 6, 1, 2, 5, 2, 1, 4, 4, 4, 6, 3, 4, …
## $ Q40 <dbl> 5, 2, 7, 1, 4, 4, 4, 4, 3, 5, 2, 1, 3, 4, 5, 6, 3, 4, …
## $ Q14 <dbl> 6, 5, 7, 4, 6, 6, 6, 5, 6, 6, 5, 6, 4, 6, 5, 6, 5, 4, …
## $ Q16 <dbl> 1, 1, 2, 2, 3, 2, 3, 2, 1, 1, 3, 1, 1, 2, 1, 3, 3, 3, …
## $ Q18 <dbl> 1, 2, 1, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, …
## $ Q20 <dbl> 26, 83, 41, 55, 35, 35, 24, 28, 49, 24, 53, 32, 49, 25…
## $ Q62_1 <dbl> 4, 5, 5, 5, 5, 5, 5, 7, 5, 4, 4, 3, 3, 2, 5, 2, 5, 5, …
## $ Q44_1 <dbl> 5, 5, 1, 6, 5, 7, 6, 6, 4, 7, 4, 3, 2, 4, 4, 6, 4, 1, …
## $ Q44_2 <dbl> 4, 5, 7, 7, 6, 6, 6, 2, 7, 6, 6, 7, 5, 4, 7, 6, 6, 2, …
## $ Q44_3 <dbl> 5, 7, 7, 7, 4, 6, 5, 7, 7, 7, 7, 2, 7, 4, 7, 6, 6, 3, …
## $ Q44_4 <dbl> 6, 7, 1, 7, 4, 6, 4, 2, 7, 7, 7, 5, 3, 4, 4, 7, 6, 4, …
## $ Q44_5 <dbl> 5, 2, 1, 1, 2, 5, 3, 7, 1, 1, 1, 5, 1, 4, 4, 3, 4, 5, …
## $ Q58 <chr> "it was so cool.", "IT SEEMED WELL DESIGNED", "i like …
## $ rid <chr> "5eb32f06-1a93-fad6-f05d-1a2c5bd2c3b3", "5eb3308c-7aa0…
## $ age <dbl> 21, 83, 41, 55, 35, 35, 24, 28, 49, 24, 53, 33, 49, 25…
## $ gender <dbl> 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, …
## $ hhi <dbl> 11, 2, 20, 3, 1, 11, 2, 9, 13, 2, 12, 6, 3, 19, 12, 17…
## $ ethnicity <dbl> 11, 1, 1, 1, 1, 1, 11, 3, 1, 15, 1, 2, 2, 4, 1, 4, 1, …
## $ hispanic <dbl> 8, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ education <dbl> 1, 2, 7, 2, 6, 6, 1, 6, 7, 6, 4, 6, 2, 6, 4, 6, 6, 2, …
## $ political_party <dbl> 9, 1, 10, 10, 4, 9, 7, 5, 2, 2, 3, 1, 2, 9, 2, 5, 4, 7…
## $ region <dbl> 3, 4, 1, 3, 4, 3, 3, 3, 2, 4, 1, 3, 3, 4, 1, 4, 1, 1, …
## $ zip <dbl> 33904, 92346, 13207, 34761, 83617, 30022, 76661, 74820…
I immediately noticed that variables not reported on in the article such as ethnicity and education, were in the data set. This could lead to some interesting questions!
My next step was to brainstorm some questions:
Does participants’ level of education affect their dogmatism scores?
Does how closely a participant follow politics affect their belief superiority?
Does participants’ ethnicity affect their political affiliations? (which then implicates their dogmatism and belief superiority?)
Does participants’ gender influence their belief superiority?
Is there a relationship between age and dogmatism/belief superiority?
This week I decided to attempt answering question 4. I chose this question because I think it would be interesting to decipher whether there is a significant gender difference for belief superiority scores. Before submission of my VR, I need to find more literature supporting this research question.
First, I used the summary()
function to obtain further statistics about gender and belief superiority scores using the data_attn
and beliefs
data frames.
summary(data_attn$gender)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 1.549 2.000 2.000
The reason why I chose to look at the beliefs
data frame was because through coding in my group assignment, we had already coded a data frame that only included items relating to participants’ belief superiority (i.e. beliefs
).
summary(beliefs)
## immigration_b abortion_b vote_b tax_b
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :3.000 Median :3.000 Median :3.000 Median :2.000
## Mean :2.643 Mean :2.793 Mean :2.704 Mean :2.352
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :3 NA's :2 NA's :2
## torture_b affirmaction_b military_b covidgov_b
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :2.000 Median :2.000 Median :2.000 Median :3.000
## Mean :2.523 Mean :2.465 Mean :2.419 Mean :2.758
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :1 NA's :1
For some reason, when I attempted to use the rename()
function to rename 1 to “male” and 2 to “female”, it came up with an error that rename()
could not be used for the numeric variable. I tried changing the numeric variable into character using as.character
, and it again said that rename()
could not be found using that variable type. So, I did a bit of googling and found these lines of code which worked!
Here, the gender column within data_attn
is selected where 1 becomes equal to (using ==
) “male” and 2 as “female”.
I then placed this into a tibble to check that it changed the variable to character, and correctly changed the variables themselves.
data_attn$gender[data_attn$gender == 1] <- "male"
data_attn$gender[data_attn$gender == 2] <- "female"
tibble(data_attn$gender)
## # A tibble: 707 x 1
## `data_attn$gender`
## <chr>
## 1 female
## 2 male
## 3 female
## 4 female
## 5 female
## 6 female
## 7 female
## 8 female
## 9 female
## 10 female
## # … with 697 more rows
Next, I wanted to show my new R skills by attempting a function I hadn’t used before; gt()
. As my article only had graphs, I wanted to try and understand gt()
to place descriptive statistics already gathered by the researchers, into a table.
First I created a new data frame called gender_summary
and selected data_attn
to then summarise()
the number, proportion and percentage of participants who were male and female (determined by using group_by()
). I then piped ( %>%
) gt()
to create a table.
gender_summary <- data_attn %>% group_by(gender) %>% summarise(n=n(), proportion=n/707, percentage=proportion*100)
gender_summary %>% gt()
gender | n | proportion | percentage |
---|---|---|---|
female | 388 | 0.5487977 | 54.87977 |
male | 319 | 0.4512023 | 45.12023 |
Because my question refers to belief superiority, I knew that I had to use the beliefs
data frame as it relates to all belief superiority items.
As gender wasn’t initially a variable found in this data frame, I used the select()
function within dplyr
to add the gender column from data_attn
, along with all items ending in _b
(b for beliefs) into the beliefs data frame.
Next, I used mutate()
to create a new variable within beliefs
called B
that included average belief scores per participant. This was done by adding all _b
columns together and dividing them each by 8 (the total number of belief items). I then glimpsed the data to make sure this variable was created.
beliefs=dplyr::select(data_attn,ends_with('_b'), gender)
beliefs<-mutate(beliefs, B = affirmaction_b/8+military_b/8+covidgov_b/8+immigration_b/8+abortion_b/8+vote_b/8+tax_b/8+torture_b/8)
glimpse(beliefs$B)
## num [1:707] 1.88 4.25 2.5 2.62 3.75 ...
I then wanted to create a summary table called beliefsup_summary
to summarise the mean, standard deviation, standard error and n of belief superiority scores for male and female participants. To achieve this, I used group_by()
to group B
scores by gender into the beliefs
data frame, and used mean()
, sd()
, and se()
functions, whilst ignoring na values with na.rm=TRUE
, to find the means, standard deviations, standard errors and n. I then used gt()
to place this into a table.
beliefsup_summary <- beliefs %>% group_by(gender) %>% summarise(mean = mean(B, na.rm=TRUE), sd = sd(B, na.rm=TRUE), n=n(), se=sd/sqrt(n))
beliefsup_summary %>% gt()
gender | mean | sd | n | se |
---|---|---|---|---|
female | 2.417755 | 1.004412 | 388 | 0.05099128 |
male | 2.779762 | 1.090025 | 319 | 0.06102967 |
Next, I wanted to make a graph of this summary table! First I used my new beliefsup_summary
data frame and the ggplot()
function to map gender onto the x-axis, the mean of B
scores onto the y-axis and fill as gender again. I then used geom_col()
to make the plot into a column graph and defined error bars using geom_errorbars()
where the max value is the mean+se
and the minimum value is the mean-se
. I then added theme()
components to make the background of the graph white (theme_minimal()
), delete grid lines (theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
) and include axis lines with ticks on the y-axis (theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")))
). I then used easy_all_text_size(12) + easy_remove_legend()
functions to remove the legend and alter all text size on the graph to 12. I also changed the x- and y-axis labels using labs()
.
beliefsup_summary %>% ggplot(aes(gender, mean, fill=gender)) + geom_col() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2)+ theme_minimal() +theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_all_text_size(12) + easy_remove_legend() + labs(x='Gender', y='Mean Belief Superiority Score')
After the introduction to inferential statistics in R during the Q&A session this week, I wanted to give the code a go!
First, from the beliefs
data frame I created two new data frames each for male or female participants only using the filter()
and ==
functions. This is so that the B
scores of these two groups can be compared in a t-test using the t.test()
function.
female <- beliefs %>% filter(gender=="female")
male <- beliefs %>% filter(gender=="male")
t.test(female$B, male$B)
##
## Welch Two Sample t-test
##
## data: female$B and male$B
## t = -4.523, df = 646.52, p-value = 7.255e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.5191718 -0.2048429
## sample estimates:
## mean of x mean of y
## 2.417755 2.779762
Looks like a significant result!
In regards to my t-test above, I was a little surprised to see that it was significant! I then realised however, that I had mutated multiple columns into a new variable (B
), so I was wondering that because there are multiple items averaged into one column/variable, if it is still ok to use a t-test when comparing the new merged variable to gender? If not, what should I use instead?
Also, I attempted to put the t-test information into my graph and I followed the ‘my_comparisons’ steps from the Q&A, and I am unsure what the warning means and why it doesn’t work?
my_comparisons <- list(c("male", "female"))
beliefsup_summary %>% ggplot(aes(gender, mean, fill=gender)) + geom_col() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2)+ theme_minimal() +theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_all_text_size(12) + easy_remove_legend() + stat_compare_means(comparisons = my_comparisons, method = "t.test") + labs(x='Gender', y='Mean Belief Superiority Score')
## Warning: Computation failed in `stat_signif()`:
## not enough 'x' observations
I also attempted this after deleting the comparisons = my_comparisons
code:
beliefsup_summary %>% ggplot(aes(gender, mean, fill=gender)) + geom_col() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2)+ theme_minimal() +theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_all_text_size(12) + easy_remove_legend() + stat_compare_means(method = "t.test") + labs(x='Gender', y='Mean Belief Superiority Score')
## Warning: Computation failed in `stat_compare_means()`:
## not enough 'x' observations
Thank you!
My main challenge this week was trying to figure out the correct data frame to use for my question. At first, I tried to use data_attn
because it had already cleaned the data for participants who passed attention checks, but then I realised that beliefs
was the most appropriate data frame! This is because it incorporates all belief superiority variables from data_attn
and allowed me to mutate and create new variables for a table/graph.
My other challenge of this week was working out how to use statistics code. I am still unsure if what I’ve done is correct, as mentioned in the questions above.
The main success of this week was finishing and presenting the group presentation! I am so happy and proud of all that we accomplished as a team and how far we have come together.
My other success for this week was being able to figure out the code for the exploratory analysis! As I was very apprehensive and nervous about starting this section, I feel quite accomplished and it’s made me realise how far I have come in my coding journey!
I am also particularly happy with my table output using gt()
as this was my first time using it!
My next steps for coding are to continue exploring the data and answering 2 more questions for my exploratory analysis.
Specifically, my next steps are also to continue figuring out how to incorporate statistics into my graphs.