Once again, I would like to preface an understanding of a lack of plots for my group verification project but my aim for this week was focused on catching up instead of just copying plotting codes for another week - so the aim for next week is to do all the plots that I need to, thanks for reading (if you bothered to read all of this haha)!
Loading libraries
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Reading the data
HaighData2 <- readr::read_csv("Study 7 data.csv")
## Warning: Duplicated column names deduplicated: 'Timer_First Click' =>
## 'Timer_First Click_1' [26], 'Timer_Last Click' => 'Timer_Last Click_1' [27],
## 'Timer_Page Submit' => 'Timer_Page Submit_1' [28], 'Timer_Click Count' =>
## 'Timer_Click Count_1' [29], 'Timer_First Click' => 'Timer_First Click_2' [30],
## 'Timer_Last Click' => 'Timer_Last Click_2' [31], 'Timer_Page Submit' =>
## 'Timer_Page Submit_2' [32], 'Timer_Click Count' => 'Timer_Click Count_2' [33],
## 'Timer_First Click' => 'Timer_First Click_3' [34], 'Timer_Last Click' =>
## 'Timer_Last Click_3' [35], 'Timer_Page Submit' => 'Timer_Page Submit_3' [36],
## 'Timer_Click Count' => 'Timer_Click Count_3' [37], 'Timer_First Click' =>
## 'Timer_First Click_4' [38], 'Timer_Last Click' => 'Timer_Last Click_4' [39],
## 'Timer_Page Submit' => 'Timer_Page Submit_4' [40], 'Timer_Click Count' =>
## 'Timer_Click Count_4' [41], 'Timer_First Click' => 'Timer_First Click_5' [42],
## 'Timer_Last Click' => 'Timer_Last Click_5' [43], 'Timer_Page Submit' =>
## 'Timer_Page Submit_5' [44], 'Timer_Click Count' => 'Timer_Click Count_5' [45],
## 'Timer_First Click' => 'Timer_First Click_6' [46], 'Timer_Last Click' =>
## 'Timer_Last Click_6' [47], 'Timer_Page Submit' => 'Timer_Page Submit_6' [48],
## 'Timer_Click Count' => 'Timer_Click Count_6' [49], 'Timer_First Click' =>
## 'Timer_First Click_7' [50], 'Timer_Last Click' => 'Timer_Last Click_7' [51],
## 'Timer_Page Submit' => 'Timer_Page Submit_7' [52], 'Timer_Click Count' =>
## 'Timer_Click Count_7' [53], 'Timer_First Click' => 'Timer_First Click_8' [54],
## 'Timer_Last Click' => 'Timer_Last Click_8' [55], 'Timer_Page Submit' =>
## 'Timer_Page Submit_8' [56], 'Timer_Click Count' => 'Timer_Click Count_8' [57],
## 'Timer_First Click' => 'Timer_First Click_9' [58], 'Timer_Last Click' =>
## 'Timer_Last Click_9' [59], 'Timer_Page Submit' => 'Timer_Page Submit_9' [60],
## 'Timer_Click Count' => 'Timer_Click Count_9' [61], 'Timer_First Click' =>
## 'Timer_First Click_10' [62], 'Timer_Last Click' => 'Timer_Last Click_10' [63],
## 'Timer_Page Submit' => 'Timer_Page Submit_10' [64], 'Timer_Click Count' =>
## 'Timer_Click Count_10' [65], 'Timer_First Click' => 'Timer_First Click_11' [66],
## 'Timer_Last Click' => 'Timer_Last Click_11' [67], 'Timer_Page Submit' =>
## 'Timer_Page Submit_11' [68], 'Timer_Click Count' => 'Timer_Click Count_11' [69],
## 'Timer_First Click' => 'Timer_First Click_12' [70], 'Timer_Last Click' =>
## 'Timer_Last Click_12' [71], 'Timer_Page Submit' => 'Timer_Page Submit_12' [72],
## 'Timer_Click Count' => 'Timer_Click Count_12' [73], 'Timer_First Click' =>
## 'Timer_First Click_13' [74], 'Timer_Last Click' => 'Timer_Last Click_13' [75],
## 'Timer_Page Submit' => 'Timer_Page Submit_13' [76], 'Timer_Click Count' =>
## 'Timer_Click Count_13' [77], 'Timer_First Click' => 'Timer_First Click_14' [78],
## 'Timer_Last Click' => 'Timer_Last Click_14' [79], 'Timer_Page Submit' =>
## 'Timer_Page Submit_14' [80], 'Timer_Click Count' => 'Timer_Click Count_14' [81],
## 'Timer_First Click' => 'Timer_First Click_15' [82], 'Timer_Last Click' =>
## 'Timer_Last Click_15' [83], 'Timer_Page Submit' => 'Timer_Page Submit_15' [84],
## 'Timer_Click Count' => 'Timer_Click Count_15' [85], 'Timer_First Click' =>
## 'Timer_First Click_16' [86], 'Timer_Last Click' => 'Timer_Last Click_16' [87],
## 'Timer_Page Submit' => 'Timer_Page Submit_16' [88], 'Timer_Click Count' =>
## 'Timer_Click Count_16' [89], 'Timer_First Click' => 'Timer_First Click_17' [90],
## 'Timer_Last Click' => 'Timer_Last Click_17' [91], 'Timer_Page Submit' =>
## 'Timer_Page Submit_17' [92], 'Timer_Click Count' => 'Timer_Click Count_17' [93],
## 'Timer_First Click' => 'Timer_First Click_18' [94], 'Timer_Last Click' =>
## 'Timer_Last Click_18' [95], 'Timer_Page Submit' => 'Timer_Page Submit_18' [96],
## 'Timer_Click Count' => 'Timer_Click Count_18' [97], 'Timer_First Click' =>
## 'Timer_First Click_19' [98], 'Timer_Last Click' => 'Timer_Last Click_19' [99],
## 'Timer_Page Submit' => 'Timer_Page Submit_19' [100], 'Timer_Click Count'
## => 'Timer_Click Count_19' [101], 'Timer_First Click' => 'Timer_First
## Click_20' [102], 'Timer_Last Click' => 'Timer_Last Click_20' [103], 'Timer_Page
## Submit' => 'Timer_Page Submit_20' [104], 'Timer_Click Count' => 'Timer_Click
## Count_20' [105], 'Timer_First Click' => 'Timer_First Click_21' [106],
## 'Timer_Last Click' => 'Timer_Last Click_21' [107], 'Timer_Page Submit'
## => 'Timer_Page Submit_21' [108], 'Timer_Click Count' => 'Timer_Click
## Count_21' [109], 'Timer_First Click' => 'Timer_First Click_22' [110],
## 'Timer_Last Click' => 'Timer_Last Click_22' [111], 'Timer_Page Submit'
## => 'Timer_Page Submit_22' [112], 'Timer_Click Count' => 'Timer_Click
## Count_22' [113], 'Timer_First Click' => 'Timer_First Click_23' [114],
## 'Timer_Last Click' => 'Timer_Last Click_23' [115], 'Timer_Page Submit'
## => 'Timer_Page Submit_23' [116], 'Timer_Click Count' => 'Timer_Click
## Count_23' [117], 'Timer_First Click' => 'Timer_First Click_24' [118],
## 'Timer_Last Click' => 'Timer_Last Click_24' [119], 'Timer_Page Submit'
## => 'Timer_Page Submit_24' [120], 'Timer_Click Count' => 'Timer_Click
## Count_24' [121], 'Timer_First Click' => 'Timer_First Click_25' [122],
## 'Timer_Last Click' => 'Timer_Last Click_25' [123], 'Timer_Page Submit'
## => 'Timer_Page Submit_25' [124], 'Timer_Click Count' => 'Timer_Click
## Count_25' [125], 'Timer_First Click' => 'Timer_First Click_26' [126],
## 'Timer_Last Click' => 'Timer_Last Click_26' [127], 'Timer_Page Submit'
## => 'Timer_Page Submit_26' [128], 'Timer_Click Count' => 'Timer_Click
## Count_26' [129], 'Timer_First Click' => 'Timer_First Click_27' [130],
## 'Timer_Last Click' => 'Timer_Last Click_27' [131], 'Timer_Page Submit'
## => 'Timer_Page Submit_27' [132], 'Timer_Click Count' => 'Timer_Click
## Count_27' [133], 'Timer_First Click' => 'Timer_First Click_28' [134],
## 'Timer_Last Click' => 'Timer_Last Click_28' [135], 'Timer_Page Submit'
## => 'Timer_Page Submit_28' [136], 'Timer_Click Count' => 'Timer_Click
## Count_28' [137], 'Timer_First Click' => 'Timer_First Click_29' [138],
## 'Timer_Last Click' => 'Timer_Last Click_29' [139], 'Timer_Page Submit'
## => 'Timer_Page Submit_29' [140], 'Timer_Click Count' => 'Timer_Click
## Count_29' [141], 'Timer_First Click' => 'Timer_First Click_30' [142],
## 'Timer_Last Click' => 'Timer_Last Click_30' [143], 'Timer_Page Submit'
## => 'Timer_Page Submit_30' [144], 'Timer_Click Count' => 'Timer_Click
## Count_30' [145], 'Timer_First Click' => 'Timer_First Click_31' [146],
## 'Timer_Last Click' => 'Timer_Last Click_31' [147], 'Timer_Page Submit'
## => 'Timer_Page Submit_31' [148], 'Timer_Click Count' => 'Timer_Click
## Count_31' [149], 'Timer_First Click' => 'Timer_First Click_32' [150],
## 'Timer_Last Click' => 'Timer_Last Click_32' [151], 'Timer_Page Submit'
## => 'Timer_Page Submit_32' [152], 'Timer_Click Count' => 'Timer_Click
## Count_32' [153], 'Timer_First Click' => 'Timer_First Click_33' [154],
## 'Timer_Last Click' => 'Timer_Last Click_33' [155], 'Timer_Page Submit'
## => 'Timer_Page Submit_33' [156], 'Timer_Click Count' => 'Timer_Click
## Count_33' [157], 'Timer_First Click' => 'Timer_First Click_34' [158],
## 'Timer_Last Click' => 'Timer_Last Click_34' [159], 'Timer_Page Submit'
## => 'Timer_Page Submit_34' [160], 'Timer_Click Count' => 'Timer_Click
## Count_34' [161], 'Timer_First Click' => 'Timer_First Click_35' [162],
## 'Timer_Last Click' => 'Timer_Last Click_35' [163], 'Timer_Page Submit'
## => 'Timer_Page Submit_35' [164], 'Timer_Click Count' => 'Timer_Click
## Count_35' [165], 'Timer_First Click' => 'Timer_First Click_36' [166],
## 'Timer_Last Click' => 'Timer_Last Click_36' [167], 'Timer_Page Submit'
## => 'Timer_Page Submit_36' [168], 'Timer_Click Count' => 'Timer_Click
## Count_36' [169], 'Timer_First Click' => 'Timer_First Click_37' [170],
## 'Timer_Last Click' => 'Timer_Last Click_37' [171], 'Timer_Page Submit'
## => 'Timer_Page Submit_37' [172], 'Timer_Click Count' => 'Timer_Click
## Count_37' [173], 'Timer_First Click' => 'Timer_First Click_38' [174],
## 'Timer_Last Click' => 'Timer_Last Click_38' [175], 'Timer_Page Submit'
## => 'Timer_Page Submit_38' [176], 'Timer_Click Count' => 'Timer_Click
## Count_38' [177], 'Timer_First Click' => 'Timer_First Click_39' [178],
## 'Timer_Last Click' => 'Timer_Last Click_39' [179], 'Timer_Page Submit'
## => 'Timer_Page Submit_39' [180], 'Timer_Click Count' => 'Timer_Click
## Count_39' [181], 'Timer_First Click' => 'Timer_First Click_40' [182],
## 'Timer_Last Click'
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_character()
## )
## i Use `spec()` for the full column specifications.
Cleaning up the Data
Consent
Consent_participants <- HaighData2 %>%
filter(Consent == "1") #Filter for those who gave consent
count(Consent_participants)
## # A tibble: 1 x 1
## n
## <int>
## 1 412
HaighData2 <- as.data.frame(HaighData2) #so the filter function can be used
HaighData2 <- HaighData2 %>% #renaming two variables with strange names
rename(recall_score = SC0, condition = FL_12_DO)
For the descriptives in experiment 2, they requested a sample of 400 participants and had 412 consented to take part. After applying the pre-registered criteria, there was 400 participants. Participants were aged 18-73 (mean age = 33.5, SD = 12) with demographics being male = 150, 248 = female and 2 = neither.
Excluding Participants
Once again the rules for excluding the participants is as follows: - Those who finished the study - Declared that they answered seriously - And scored 4 and above on recall
HaighData2Tidied <- HaighData2 %>%
filter (Finished == 1,
Serious_check == 1,
recall_score >= 4
)
count(HaighData2Tidied)
## n
## 1 400
Demographics
#Age
HaighData2Tidied %>%
summarise(
Mean_Age = mean(Age), #still not showing the mean age - usually solved by changing it to as numeric, didn't work here
SD = sd(Age),
Max = max(Age),
Min = min(Age)
)
## Warning in mean.default(Age): argument is not numeric or logical: returning NA
## Mean_Age SD Max Min
## 1 NA 12.03415 73 18
#Gender
Male_No.2 <- HaighData2Tidied %>%
filter(HaighData2Tidied$Gender == 1) #Number of male is 150 since I have filtered for everything except those with gender '1'
count(Male_No.2)
## n
## 1 150
Female_No.2 <-HaighData2Tidied %>%
filter(HaighData2Tidied$Gender == 2)
count(Female_No.2)
## n
## 1 248
Neither_No.2 <- HaighData2Tidied %>%
filter(HaighData2Tidied$Gender == 3)
count(Neither_No.2)
## n
## 1 2
Here are just some of the highlights of what I learnt/relearned
So this was just reconstructing the dinoplot but actually took me a bit longer than I expected since I was missing some signs. Once again I would like to preface I know that this may seem very simple to others but redoing this process proved itself very useful I guess to boost my confidence(?) with regards to coding fluency
dino <- readr:::read_csv("data_dino.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## horizontal = col_double(),
## vertical = col_double()
## )
print(dino)
## # A tibble: 142 x 2
## horizontal vertical
## <dbl> <dbl>
## 1 55.4 97.2
## 2 51.5 96.0
## 3 46.2 94.5
## 4 42.8 91.4
## 5 40.8 88.3
## 6 38.7 84.9
## 7 35.6 79.9
## 8 33.1 77.6
## 9 29.0 74.5
## 10 26.2 71.4
## # ... with 132 more rows
picture <- ggplot(data = dino) +
geom_point(mapping = aes(x = horizontal, y = vertical))
plot(picture)
Exercise 1 - Hello Data
Loading libaries
library(tidyverse)
Reading Data
forensic <- read_csv("data_forensic.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## participant = col_double(),
## handwriting_expert = col_character(),
## us = col_character(),
## condition = col_character(),
## age = col_double(),
## forensic_scientist = col_character(),
## forensic_specialty = col_character(),
## handwriting_reports = col_double(),
## confidence = col_double(),
## familiarity = col_double(),
## feature = col_character(),
## est = col_double(),
## true = col_double(),
## band = col_character()
## )
glimpse(forensic)
## Rows: 5,700
## Columns: 14
## $ participant <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ handwriting_expert <chr> "HW Expert", "HW Expert", "HW Expert", "HW Expert"~
## $ us <chr> "Non-US", "Non-US", "Non-US", "Non-US", "Non-US", ~
## $ condition <chr> "Non-US HW Expert", "Non-US HW Expert", "Non-US HW~
## $ age <dbl> 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52~
## $ forensic_scientist <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "~
## $ forensic_specialty <chr> "Handwriting", "Handwriting", "Handwriting", "Hand~
## $ handwriting_reports <dbl> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20~
## $ confidence <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,~
## $ familiarity <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,~
## $ feature <chr> "PLCW.6.5", "PLCY.6.A", "PLCZ.3.2", "PUCX.4.b", "P~
## $ est <dbl> 1, 60, 1, 2, 5, 5, 1, 20, 3, 4, 10, 2, 90, 50, 20,~
## $ true <dbl> 1.571, 1.971, 2.100, 1.096, 1.104, 1.132, 1.376, 1~
## $ band <chr> "Band 01", "Band 01", "Band 01", "Band 01", "Band ~
Exercise 2 - Hello Data
Pipe Function and comparing participant coding layout
#participant 1
participant1 <- ungroup(
summarise(
group_by(
filter(forensic, participant == 1),
band
),
mean = mean(est),
sd = sd(est)
)
)
#participant 2
x <- filter(forensic, participant == 2)
y <- group_by(x, band)
z <- summarise(y, mean = mean(est), sd = sd(est))
participant2 <- ungroup(z)
#participant 3
participant3 <- forensic %>%
filter(participant == 3) %>%
group_by(band) %>%
summarise(mean = mean(est), sd = sd(est)) %>%
ungroup()
#Printing participant summaries
print(participant1)
## # A tibble: 6 x 3
## band mean sd
## <chr> <dbl> <dbl>
## 1 Band 01 9.5 16.8
## 2 Band 25 46.7 31.2
## 3 Band 50 53.5 28.3
## 4 Band 75 50.4 24.1
## 5 Band 99 80.9 29.1
## 6 Band NA NA NA
print(participant2)
## # A tibble: 5 x 3
## band mean sd
## <chr> <dbl> <dbl>
## 1 Band 01 6.26 14.9
## 2 Band 25 10.9 9.62
## 3 Band 50 17.6 15.9
## 4 Band 75 21.9 16.5
## 5 Band 99 49.5 20.6
print(participant3)
## # A tibble: 5 x 3
## band mean sd
## <chr> <dbl> <dbl>
## 1 Band 01 31.2 44.1
## 2 Band 25 46.2 49.4
## 3 Band 50 48.3 47.8
## 4 Band 75 60.7 47.8
## 5 Band 99 85 33.7
To answer the question in the discussion, personally participant summaries 1 and 3 are easier for me to understand. Number 2 with the pipe function does make it visually easier to follow with the function into variable and then the next step
Exercise 3 - Calculating mean, sd and organizing data
forensic <- read_csv("data_forensic.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## participant = col_double(),
## handwriting_expert = col_character(),
## us = col_character(),
## condition = col_character(),
## age = col_double(),
## forensic_scientist = col_character(),
## forensic_specialty = col_character(),
## handwriting_reports = col_double(),
## confidence = col_double(),
## familiarity = col_double(),
## feature = col_character(),
## est = col_double(),
## true = col_double(),
## band = col_character()
## )
# Now that we have the data, what we'll do is calculate the mean and
# standard deviation of the responses (in the "est" variable), with groups
# defined by subject (in the "participant" variable) and the frequency band
# (i.e., the "band" variable)
forensic_banded <- forensic %>%
group_by(participant, band) %>%
summarise(mean_est = mean(est), sd_est = sd(est)) %>%
ungroup()
## `summarise()` has grouped output by 'participant'. You can override using the `.groups` argument.
# Let's draw a plot...
ggplot(data = forensic_banded) +
geom_violin(mapping = aes(x = band, y = mean_est)) +
xlab("Stimulus Band") +
ylab("Responses") +
ggtitle("Distribution of responses") #ggtitle adds a title for the plot
## Warning: Removed 3 rows containing non-finite values (stat_ydensity).
Exercise 4 - Cleaning up the graph and using the filter function
Adding a third grouping variable “handwriting expert” and this is to split between the experts and novices
forensic_banded <- forensic %>%
group_by(participant, band, handwriting_expert) %>% #added handwriting_expert
summarise(mean_est = mean(est), sd_est = sd(est)) %>%
ungroup() %>%
filter(band != "Band NA") #instead of a separate line, just piped filter
## `summarise()` has grouped output by 'participant', 'band'. You can override using the `.groups` argument.
pic <- ggplot(data = forensic_banded) +
geom_violin(mapping = aes(x = band, y = mean_est)) +
xlab("Stimulus Band") +
ylab("Responses") +
ggtitle("Distribution of responses")
plot(pic)
Exercise 5 - Saving it as an extractable data file
write.csv(df, file = “df.csv”), whereby df = data frame or the set of data you want to change
forensic_banded <- forensic %>%
group_by(participant, band, handwriting_expert) %>% #added handwriting_expert
summarise(mean_est = mean(est), sd_est = sd(est)) %>%
ungroup() %>%
filter(band != "Band NA")
## `summarise()` has grouped output by 'participant', 'band'. You can override using the `.groups` argument.
write_csv(forensic_banded, file = "summary_forensic_banded.csv")
Changing the file extension between .png and jpeg does not do too much I think (?), however changing it to a pdf changes it to an actual pdf file. So like I thought .png and .jpeg files are both designed for images however according to google .png is for higher quality images and .jpeg is for lower quality images.
forensic_banded<- write.csv("summary_forensic_banded.csv")
## "","x"
## "1","summary_forensic_banded.csv"
pic <- ggplot(data = forensic_banded) +
geom_violin(mapping = aes(x = band, y = mean_est)) +
xlab("Stimulus Band") +
ylab("Responses") +
ggtitle("Distribution of responses")
# Still unsure how to change width and length of the image file