My Goals From Last Week:

As embarassing as it is, I still have not even started up the verification report - I’ve been referring to it all this time without even a physical document so an immediate step 1 is to create the document and include all the subheaders that are mentioned on the internship website!!
Come up with 3 finalized exploratory analysis questions
Refine all the components for question 1, and if possible doing so for all three questions

Successes

I managed to start the verification report which is a small win
I cleaned up my exploratory analysis for question 1 I’ve yet to fully expand on the discussion components however
I’ve started the second exploratory analysis but have to finish it

Challenges

Honestly I didn’t run into too many challenges this week, google has helped me out to fix alot of the issues that I’ve had. I thinkt the main challenge was simply just time restrictions and stress due to other assignments, once I’m finished with PSYC3311, I’m sure I’ll be able to work with more time on the verification report

Goals for next week

Finish all exploratory analysis
Rubberduck the master coding document
Put it all together and try to finalise the verification report

Refining Question 1

Loading libraries

library(ggpubr)

## Loading required package: ggplot2

library(jmv)
library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## v purrr   0.3.4

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks rstatix::filter(), stats::filter()
## x dplyr::lag()    masks stats::lag()

library(here)

## here() starts at D:/Sync/UNSW - Year 3 THIS IS THE YEAR BABY/Term 2/PSYC3361 - Research Internship/R Markdown Projects/Group_1_projects

library(janitor)

## 
## Attaching package: 'janitor'

## The following object is masked from 'package:rstatix':
## 
##     make_clean_names

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(ggeasy)
library(readxl)
library(gt)
library(gtsummary)
library(ggplot2)

The first question, which alot of other people not even in my own group but across the class have come up with, is whether x variable correlates with age. That being said specific to the Haigh Journal Article, it is “whether age is correlated with a mistrust in science?”. To be more specific, as in my head I would be hypothesizing that a higher age would correlate with a higher rate of mistrust in science.

Descriptives

So as mentioned in my previous learning log, I had quite a bit of trouble as to the purpose of the descriptives, and what it did was create groups for every single age group from 18 all the way to 73, which honestly serves no use as it isn’t even enough of a summation. Therefore, taking the advice that Jenny mentioned to create a new variable for “age_groups” such as “young” and “old” and therefore should be able to make the descriptives much more clearer.

I want two distinct age ranges, and so young will be between 18-45 and old = 46-73, and the values are simply split along an average value.

Reading in the data using read_csv(). The select(Age, mistrust) specifies the two variables I want to utilise for the exploratory analysis. Changing the data name to Age_Mistrust so its more clear it’s filtered and separated from the original data set. mutate() is used to add a new variable. age_by_group is the name of the new variable, since I want to group the participants by age into two categories. Age <= 45 ~ "Young" specifies for all participants equal to and under the age of 45 to be categorized as Young. Age <= 45 ~ "Old" specifies for all participants equal to and over the age of 46 to be categorized as Old.

ExploratoryData1 <- read_csv("MyDataFinalSubset2.csv")  # loading in the data

## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   block = col_character(),
##   Format = col_character(),
##   Conflict = col_character()
## )
## i Use `spec()` for the full column specifications.

Age_Mistrust <- ExploratoryData1 %>% 
    select(Age, mistrust) %>% 

#Adding a new variable "age_by_group"
  mutate(
    age_by_group = case_when(
      Age <= 45 ~ "Young", 
      Age >= 46 ~ "Old"
    ))

This is the table code from last week gt() is used to table the results So essentially what I’ve done is change age from a continuous variable to a categorical one now I’m alot happier with the table summary as its much cleaner, and gives a better picture of what I want represented with the disparities between ages towards a mistrust in science! So shoutout not only to Jenny R. for recommending this but also to Ayesha who had a similiar process for one of her exploratory analyses, she helped give some ideas as to how to go about changing age form a continuous to a categorical one

Age_Mistrust %>% 
  group_by(age_by_group) %>% 
  summarise(mean = mean(mistrust), 
            sd = sd(mistrust), 
            n = n(),
            se = sd/sqrt(n)) %>%  
gt()

age_by_group	mean	sd	n	se
Old	2.069444	0.7141264	72	0.08416060
Young	1.946138	0.6333242	328	0.03496948

Data visulaization

So I still want to use a scatter plot but want to spice it up a bit, and like Dani says to add glamour graphics! One point from my learning log last week was Jenny S recommended to use geom_jitter() instead of geom_point as the latter has points that stack up on each other, where as jitter adds a bit of random noise to each point so that you can see each of them.

#Mistrust and Age Plot 
mistrust_age_plot <- ggplot(
  data = Age_Mistrust, 
  aes(
    x = Age, 
    y = mistrust,
    color = age_by_group
  )
) + 
  geom_point() + 
  theme_minimal()+
  geom_smooth(method = "lm")

print(mistrust_age_plot)

## `geom_smooth()` using formula 'y ~ x'

# Correlation test

cor.test() says where the correlation and lets you know whether its a statistically significant correlation

cor.test(ExploratoryData1$Age, ExploratoryData1$mistrust)

## 
##  Pearson's product-moment correlation
## 
## data:  ExploratoryData1$Age and ExploratoryData1$mistrust
## t = 3.0808, df = 398, p-value = 0.002208
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.05539569 0.24697431
## sample estimates:
##       cor 
## 0.1526184

# So the result is as follows: t = 3.0808, df = 398, p-value = 0.002208

Since the correlation is has a p value <0.05, it is a statistically significant result between age and mistrust in science. The correlation is 0.152, which according to Pearson’s correlation, it is ≈ 2 and thus equates to a small effect size

Exploratory Analysis 2

I wanted my next question to stray away from characteristics of the participants like manipulating age, gender and so on. I wanted more of a focus on the experimental variables. However, I have no idea how to utilize an interesting plot without it being another correlation one.I may save this for later, seeing if there’s anything else I can do

One other question that has peaked by interest is people’s recall score,I’d like to compare the average rate of people with a higher recall score (>7), compared to people with a lower recall score (<=6) and how it relates to the development of scientific knowledge. My hypothesis would be that people in the experiment with a higher recall score, have a slightly better edge with attention to detail and thus must pay more attention to the nuance within the science community.

Recall_Development <- ExploratoryData1 %>% 
    select(Recall_score, development) %>% 

#Adding a new variable "recall_by_group"
  mutate(
    recall_by_group = case_when(
      Recall_score <= 6 ~ "Less_Recall", 
      Recall_score >= 7 ~ "More_Recall"
    ))

Descriptive

Development refers to the “epistemic beliefs about the development of scientific knowledge” and on a five point scale, it was dealing with the topic of the belief about science as an evolving and changing subject. The higher the score, the more sophisticated their beliefs were, and a greater awareness that science is uncertain and constantly evolving.

Recall_Development %>% 
  group_by(recall_by_group) %>% 
  summarise(mean = mean(development), 
            sd = sd(development), 
            n = n(),
            se = sd/sqrt(n)) %>%  
gt()

recall_by_group	mean	sd	n	se
Less_Recall	4.357724	0.5972743	82	0.06595791
More_Recall	4.592243	0.4184862	318	0.02346754

According to the table, those who had more call (=7), had a slightly higher mean = 4.59 compared to those who had less recall = 4.35, indicating that those with more recall had a slightly higher development belief overall. As to what effect though is up to the inferential statistics to showcase.

Inferential statistics

Utilising t.test to compare the means and to see whether there is any statistical significance between the variables. Creating two data points Less_Recall and More_Recall. This separates more and less recall into two seperate data sets allowing for a t.test() to look at the development numbers seperately and compare the means.

Less_Recall <- Recall_Development %>% 
  filter(recall_by_group == "Less_Recall")

More_Recall <- Recall_Development %>% 
  filter(recall_by_group == "More_Recall")

t.test(Less_Recall$development, More_Recall$development)

## 
##  Welch Two Sample t-test
## 
## data:  Less_Recall$development and More_Recall$development
## t = -3.3499, df = 102.39, p-value = 0.001133
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.37337458 -0.09566463
## sample estimates:
## mean of x mean of y 
##  4.357724  4.592243

The null hypothesis is rejected to a statistically significant degree as p <0.05.But I need to rehash statistics again to know whether it is of practical relevance.

Data Visualisation

#Mistrust and Age Plot 
recall_development_plot <- ggplot(
  data = Recall_Development, 
  aes(
    x = recall_by_group, 
    y = development
  )) +
  geom_bar(stat = "identity")+
  scale_y_discrete(limits = 1, 5)

## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?

print(recall_development_plot)

Honestly this is far from what I’d like to achieve for the second plot, and it has alot of work needed but I have an assignment due tommorow and so that unfortunately takes precedent!!

Learning_Log_Week_9

Edward Kwag

01/08/2021