Honestly I didn’t run into too many challenges this week, google has helped me out to fix alot of the issues that I’ve had. I thinkt the main challenge was simply just time restrictions and stress due to other assignments, once I’m finished with PSYC3311, I’m sure I’ll be able to work with more time on the verification report
Loading libraries
library(ggpubr)
## Loading required package: ggplot2
library(jmv)
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble 3.1.2 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks rstatix::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(here)
## here() starts at D:/Sync/UNSW - Year 3 THIS IS THE YEAR BABY/Term 2/PSYC3361 - Research Internship/R Markdown Projects/Group_1_projects
library(janitor)
##
## Attaching package: 'janitor'
## The following object is masked from 'package:rstatix':
##
## make_clean_names
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggeasy)
library(readxl)
library(gt)
library(gtsummary)
library(ggplot2)
The first question, which alot of other people not even in my own group but across the class have come up with, is whether x variable correlates with age. That being said specific to the Haigh Journal Article, it is “whether age is correlated with a mistrust in science?”. To be more specific, as in my head I would be hypothesizing that a higher age would correlate with a higher rate of mistrust in science.
So as mentioned in my previous learning log, I had quite a bit of trouble as to the purpose of the descriptives, and what it did was create groups for every single age group from 18 all the way to 73, which honestly serves no use as it isn’t even enough of a summation. Therefore, taking the advice that Jenny mentioned to create a new variable for “age_groups” such as “young” and “old” and therefore should be able to make the descriptives much more clearer.
I want two distinct age ranges, and so young will be between 18-45 and old = 46-73, and the values are simply split along an average value.
Reading in the data using read_csv(). The select(Age, mistrust) specifies the two variables I want to utilise for the exploratory analysis. Changing the data name to Age_Mistrust so its more clear it’s filtered and separated from the original data set. mutate() is used to add a new variable. age_by_group is the name of the new variable, since I want to group the participants by age into two categories. Age <= 45 ~ "Young" specifies for all participants equal to and under the age of 45 to be categorized as Young. Age <= 45 ~ "Old" specifies for all participants equal to and over the age of 46 to be categorized as Old.
ExploratoryData1 <- read_csv("MyDataFinalSubset2.csv") # loading in the data
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## block = col_character(),
## Format = col_character(),
## Conflict = col_character()
## )
## i Use `spec()` for the full column specifications.
Age_Mistrust <- ExploratoryData1 %>%
select(Age, mistrust) %>%
#Adding a new variable "age_by_group"
mutate(
age_by_group = case_when(
Age <= 45 ~ "Young",
Age >= 46 ~ "Old"
))
This is the table code from last week gt() is used to table the results So essentially what I’ve done is change age from a continuous variable to a categorical one now I’m alot happier with the table summary as its much cleaner, and gives a better picture of what I want represented with the disparities between ages towards a mistrust in science! So shoutout not only to Jenny R. for recommending this but also to Ayesha who had a similiar process for one of her exploratory analyses, she helped give some ideas as to how to go about changing age form a continuous to a categorical one
Age_Mistrust %>%
group_by(age_by_group) %>%
summarise(mean = mean(mistrust),
sd = sd(mistrust),
n = n(),
se = sd/sqrt(n)) %>%
gt()
| age_by_group | mean | sd | n | se |
|---|---|---|---|---|
| Old | 2.069444 | 0.7141264 | 72 | 0.08416060 |
| Young | 1.946138 | 0.6333242 | 328 | 0.03496948 |
So I still want to use a scatter plot but want to spice it up a bit, and like Dani says to add glamour graphics! One point from my learning log last week was Jenny S recommended to use geom_jitter() instead of geom_point as the latter has points that stack up on each other, where as jitter adds a bit of random noise to each point so that you can see each of them.
#Mistrust and Age Plot
mistrust_age_plot <- ggplot(
data = Age_Mistrust,
aes(
x = Age,
y = mistrust,
color = age_by_group
)
) +
geom_point() +
theme_minimal()+
geom_smooth(method = "lm")
print(mistrust_age_plot)
## `geom_smooth()` using formula 'y ~ x'
# Correlation test
cor.test() says where the correlation and lets you know whether its a statistically significant correlation
cor.test(ExploratoryData1$Age, ExploratoryData1$mistrust)
##
## Pearson's product-moment correlation
##
## data: ExploratoryData1$Age and ExploratoryData1$mistrust
## t = 3.0808, df = 398, p-value = 0.002208
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05539569 0.24697431
## sample estimates:
## cor
## 0.1526184
# So the result is as follows: t = 3.0808, df = 398, p-value = 0.002208
Since the correlation is has a p value <0.05, it is a statistically significant result between age and mistrust in science. The correlation is 0.152, which according to Pearson’s correlation, it is ≈ 2 and thus equates to a small effect size
I wanted my next question to stray away from characteristics of the participants like manipulating age, gender and so on. I wanted more of a focus on the experimental variables. However, I have no idea how to utilize an interesting plot without it being another correlation one.I may save this for later, seeing if there’s anything else I can do
One other question that has peaked by interest is people’s recall score,I’d like to compare the average rate of people with a higher recall score (>7), compared to people with a lower recall score (<=6) and how it relates to the development of scientific knowledge. My hypothesis would be that people in the experiment with a higher recall score, have a slightly better edge with attention to detail and thus must pay more attention to the nuance within the science community.
Recall_Development <- ExploratoryData1 %>%
select(Recall_score, development) %>%
#Adding a new variable "recall_by_group"
mutate(
recall_by_group = case_when(
Recall_score <= 6 ~ "Less_Recall",
Recall_score >= 7 ~ "More_Recall"
))
Development refers to the “epistemic beliefs about the development of scientific knowledge” and on a five point scale, it was dealing with the topic of the belief about science as an evolving and changing subject. The higher the score, the more sophisticated their beliefs were, and a greater awareness that science is uncertain and constantly evolving.
Recall_Development %>%
group_by(recall_by_group) %>%
summarise(mean = mean(development),
sd = sd(development),
n = n(),
se = sd/sqrt(n)) %>%
gt()
| recall_by_group | mean | sd | n | se |
|---|---|---|---|---|
| Less_Recall | 4.357724 | 0.5972743 | 82 | 0.06595791 |
| More_Recall | 4.592243 | 0.4184862 | 318 | 0.02346754 |
According to the table, those who had more call (=7), had a slightly higher mean = 4.59 compared to those who had less recall = 4.35, indicating that those with more recall had a slightly higher development belief overall. As to what effect though is up to the inferential statistics to showcase.
Utilising t.test to compare the means and to see whether there is any statistical significance between the variables. Creating two data points Less_Recall and More_Recall. This separates more and less recall into two seperate data sets allowing for a t.test() to look at the development numbers seperately and compare the means.
Less_Recall <- Recall_Development %>%
filter(recall_by_group == "Less_Recall")
More_Recall <- Recall_Development %>%
filter(recall_by_group == "More_Recall")
t.test(Less_Recall$development, More_Recall$development)
##
## Welch Two Sample t-test
##
## data: Less_Recall$development and More_Recall$development
## t = -3.3499, df = 102.39, p-value = 0.001133
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.37337458 -0.09566463
## sample estimates:
## mean of x mean of y
## 4.357724 4.592243
The null hypothesis is rejected to a statistically significant degree as p <0.05.But I need to rehash statistics again to know whether it is of practical relevance.
#Mistrust and Age Plot
recall_development_plot <- ggplot(
data = Recall_Development,
aes(
x = recall_by_group,
y = development
)) +
geom_bar(stat = "identity")+
scale_y_discrete(limits = 1, 5)
## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?
print(recall_development_plot)
Honestly this is far from what I’d like to achieve for the second plot, and it has alot of work needed but I have an assignment due tommorow and so that unfortunately takes precedent!!