I believe I was able to achieve my goals this week at a comfortable rate. We finished the recording with our group and celebrated when it was finished, could not have asked for a better group!! Attending the Q&A was really useful, although it was a bit long, I believe that was necessary to give us all the tools to be able to tackle the exploratory analysis section of the verification report, so a massive shout-out to Jenny for all the guidance! I think I was able to trial at least one piece of exploratory analysis on my own but there’s alot that I’m still unsure about and so need to focus on a deep dive into Week 9.
Completed all my goals for the week!
I think my ability for the exploratory analysis is nowhere near as refined as it could be and alot of it was based on the help from the Q&A. I think one of my greatest issues is still trying to google relevant sources online.I need to spend further time trying to understand how to table the descriptive statistics better, justify why I’m doing so, clean up both the code and graph and also do the same for statistical tests.
Currently, not sure as to which packages I will have to use and the only one I did not include is the penguins() package since it is a preinstalled data set - and I already have a working data set from the Haigh Journal article.
library(ggpubr)
## Loading required package: ggplot2
library(jmv)
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble 3.1.2 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks rstatix::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(here)
## here() starts at D:/Sync/UNSW - Year 3 THIS IS THE YEAR BABY/Term 2/PSYC3361 - Research Internship/R Markdown Projects/Group_1_projects
library(janitor)
##
## Attaching package: 'janitor'
## The following object is masked from 'package:rstatix':
##
## make_clean_names
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggeasy)
library(readxl)
library(gt)
library(gtsummary)
The first question that came to my mind was “Whether age is correlated with a mistrust in science?” To be more specific, as in my head I would be hypothesizing that a higher age would correlate with a higher rate of mistrust in science. The basis for my thought isn’t based on research but as an initial point of conception to test whether this angle would work for exploratory analysis!
So first things first is to get the descriptive statistics and to do that, I have to identify the type of variables, and I believe age and mistrust in science to be continuous and continuous. I was also thinking about doing age as a categorical variable by splitting it into ‘high age’ and ‘low age’ along the average but I will stick with the former for now.
‘ExploratoryData1’ refers to the new data set title I am working with and it is changed and read in using the read_csv() function. select() is used to select the only variables I want to work with which is the ‘Age’ and ‘mistrust’. The cor variable is used for the correlation between two quantitative variables.
- Its noted in one of the articles that I was reading that the correlation between variables (X,Y) is equal to the correlation between variables (Y,X). - The correlation is computed as the Pearson correlation by defualt by using this function. na_omit is to get rid of any NA’s just in case, not sure if it matters to the data though since it is using data that is already tidied.
ExploratoryData1 <- read_csv("MyDataFinalSubset2.csv") %>% # loading in the data
select(Age, mistrust)
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## block = col_character(),
## Format = col_character(),
## Conflict = col_character()
## )
## i Use `spec()` for the full column specifications.
ExploratoryData1 <- ExploratoryData1 %>%
na.omit()
Descriptive statistics
In terms of getting the descriptives, I’m just using the template that was shown in the Week 8 Q&A, I’m sure this even the right way to table it.With regards to the descriptives, I want to ask the right questions on slack with how I would fix this because the Q&A with regards to continuous variables had it grouped by the contains("length") function and so it was able to include both bill and flipper length. - What I think it’s done is included all the participants under the age 18 and then grouped them together but I’m not sure if I do want them grouped like that and instead have every point of data individually. _ I do need to put greater thought into the descriptives and as to the WHY I’m doing this. The issue may stem from it not even being proper to table two continuous variables - gt() is used to table the results
ExploratoryData1 %>%
group_by(Age) %>%
summarise(mean = mean(mistrust),
sd = sd(mistrust),
n = n(),
se = sd/sqrt(n)) %>%
gt()
| Age | mean | sd | n | se |
|---|---|---|---|---|
| 18 | 1.805556 | 0.4371241 | 12 | 0.12618685 |
| 19 | 2.030303 | 0.4334499 | 11 | 0.13069005 |
| 20 | 2.070175 | 0.9201379 | 19 | 0.21109410 |
| 21 | 1.833333 | 0.6992059 | 16 | 0.17480147 |
| 22 | 1.473684 | 0.4488090 | 19 | 0.10296384 |
| 23 | 1.541667 | 0.3959116 | 8 | 0.13997590 |
| 24 | 1.825397 | 0.4548388 | 21 | 0.09925396 |
| 25 | 1.750000 | 0.6382847 | 16 | 0.15957118 |
| 26 | 1.809524 | 0.6877615 | 21 | 0.15008186 |
| 27 | 1.846154 | 0.5021322 | 13 | 0.13926642 |
| 28 | 1.958333 | 0.6025738 | 8 | 0.21304203 |
| 29 | 2.228070 | 0.6763430 | 19 | 0.15516373 |
| 30 | 2.176471 | 0.5542468 | 17 | 0.13442460 |
| 31 | 2.176471 | 0.6467617 | 17 | 0.15686275 |
| 32 | 1.784314 | 0.5644709 | 17 | 0.13690431 |
| 33 | 1.952381 | 0.7559289 | 7 | 0.28571429 |
| 34 | 2.454545 | 0.8065389 | 11 | 0.24318064 |
| 35 | 2.333333 | 0.6356417 | 12 | 0.18349396 |
| 36 | 1.933333 | 0.5837300 | 10 | 0.18459164 |
| 37 | 1.948718 | 0.4683729 | 13 | 0.12990328 |
| 38 | 1.916667 | 0.6309898 | 4 | 0.31549491 |
| 39 | 1.888889 | 0.3849002 | 3 | 0.22222222 |
| 40 | 2.000000 | 0.8819171 | 3 | 0.50917508 |
| 41 | 1.800000 | 0.4472136 | 5 | 0.20000000 |
| 42 | 2.121212 | 0.5430925 | 11 | 0.16374856 |
| 43 | 2.142857 | 0.5727498 | 7 | 0.21647907 |
| 44 | 2.083333 | 0.6871843 | 4 | 0.34359214 |
| 45 | 1.583333 | 0.7876359 | 4 | 0.39381797 |
| 46 | 1.714286 | 0.4879500 | 7 | 0.18442778 |
| 47 | 1.916667 | 0.8766519 | 4 | 0.43832594 |
| 48 | 1.777778 | 0.5018484 | 6 | 0.20487877 |
| 49 | 1.600000 | 0.3651484 | 5 | 0.16329932 |
| 50 | 2.277778 | 0.7428673 | 6 | 0.30327431 |
| 51 | 1.916667 | 0.1666667 | 4 | 0.08333333 |
| 52 | 1.833333 | 1.0000000 | 4 | 0.50000000 |
| 53 | 2.133333 | 0.6912147 | 5 | 0.30912062 |
| 54 | 2.000000 | NA | 1 | NA |
| 55 | 2.666667 | NA | 1 | NA |
| 56 | 2.083333 | 0.1666667 | 4 | 0.08333333 |
| 57 | 2.166667 | 0.2357023 | 2 | 0.16666667 |
| 58 | 2.888889 | 1.8358568 | 3 | 1.05993245 |
| 59 | 1.833333 | 1.1785113 | 2 | 0.83333333 |
| 60 | 2.666667 | 0.9428090 | 2 | 0.66666667 |
| 61 | 2.000000 | 0.0000000 | 2 | 0.00000000 |
| 62 | 2.444444 | 0.8388705 | 3 | 0.48432210 |
| 63 | 2.666667 | 1.1547005 | 3 | 0.66666667 |
| 64 | 2.000000 | 0.4714045 | 4 | 0.23570226 |
| 65 | 2.333333 | 0.9428090 | 2 | 0.66666667 |
| 72 | 2.000000 | NA | 1 | NA |
| 73 | 2.333333 | NA | 1 | NA |
’mistrust_age_plotis the name of the new plot to relate it to the variables of 'Age' and 'Mistrust'ggplot(data = ExploratoryData1)refers to the plot includes the exploratory data setx = Agespecifies that along the x axis is Agey = mistrustspecifies that along the y variable is the rate of mistrust in the scientific communitygeom_pointis used to include the scatterplotgeom_smooth(method = “lm”)for the line of best fit with "lm" standing for linear model and was taught to us by Jenny in the Week 8 Q&A!theme_minimal()`just changes the background of the plot from grey to a white
#Mistrust and Age Plot
mistrust_age_plot <- ggplot(
data = ExploratoryData1,
aes(
x = Age,
y = mistrust,
)
) +
geom_point() +
geom_smooth(method = "lm") + #line of best fit
theme_minimal()
print(mistrust_age_plot)
## `geom_smooth()` using formula 'y ~ x'
cor() is for the correlation coefficient. cor.test() says where the correlation and lets you know whether its a statistically significant correlation, can just use this one as cor() is redundant. I’ll include both for now, just to show that they can both be utilised! As for the subsequent analysis and the IMPORTANCE of the statistics, I’m still have alot I need to catch up on for the stats.
cor(ExploratoryData1$Age, ExploratoryData1$mistrust)
## [1] 0.1526184
cor.test(ExploratoryData1$Age, ExploratoryData1$mistrust)
##
## Pearson's product-moment correlation
##
## data: ExploratoryData1$Age and ExploratoryData1$mistrust
## t = 3.0808, df = 398, p-value = 0.002208
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05539569 0.24697431
## sample estimates:
## cor
## 0.1526184
# So the result is as follows: t = 3.0808, df = 398, p-value = 0.002208