You will be using this dataset: https://osf.io/jfdtk/
The data set is from this publication: https://cdn.ymaws.com/www.psichi.org/resource/resmgr/journal_2019/24_2_crambletalvarez.pdf
A codebook can be found here: https://osf.io/bkd5m/
You can complete and submit R script for any of the four levels, but higher levels require the completion of lower levels (in other words, if you want to turn in Level 3 script, you’ll need to also do levels 1 and 2).
library(tidyverse)
library(here)
library(janitor)
library(papaja)
library(jmv)
march <- read_csv(here("data", "March.csv")) %>%
clean_names()
## Parsed with column specification:
## cols(
## .default = col_double(),
## StartDate = col_character(),
## EndDate = col_character(),
## Specify_ethnicity = col_character(),
## University = col_character(),
## Class_specify = col_logical(),
## Major = col_character(),
## Nameofwomenclass = col_character()
## )
## See spec(...) for full column specifications.
Lets start by making the data smaller/easier to work with. The exercise only needs you to interact with gpa, scores for male, female, poc, white, and gender of the participant. Lets start by selecting just those variables plus age and ethnicity
march_select <- march %>%
select(age, ethnicity, gender, gpa, male_score, female_score, poc_score, white_score)
#Level 1 : sort
Sort the data by GPA from low to high.
Use the names function to get a list of variables, which one has GPA in it?
names(march_select)
## [1] "age" "ethnicity" "gender" "gpa"
## [5] "male_score" "female_score" "poc_score" "white_score"
Use arrange to sort by gpa. Sort ascending (low to high) first, then use - to sort descending (high to low)
asc_gpa_sorted <- march_select %>%
arrange(gpa)
des_gpa_sorted <- march_select %>%
arrange(-gpa)
#Level 2 : summarise Produce the means and standard deviations for the following variables:
Male_score (recognition for male psychologists like Sigmund Freud; higher scores = more recognition)
Female_score (recognition for psychologist women)
POC_score (recognition for psychologists of color)
White_score (recognition for white psychologists)
Use dplyr summarise variants (_at) to get mean/sd of multiple columns.
# mean
march_select %>%
summarise_at(c("male_score", "female_score", "poc_score", "white_score"), mean, na.rm = TRUE)
## # A tibble: 1 x 4
## male_score female_score poc_score white_score
## <dbl> <dbl> <dbl> <dbl>
## 1 1.61 0.447 0.300 1.23
# standard deviation
march_select %>%
summarise_at(c("male_score", "female_score", "poc_score", "white_score"), sd, na.rm = TRUE)
## # A tibble: 1 x 4
## male_score female_score poc_score white_score
## <dbl> <dbl> <dbl> <dbl>
## 1 0.479 0.344 0.377 0.416
Get mean and sd at the same time by giving a list of functions. This creates new columns for fn1 and fn2 (option1), it would be nice if it would name them. Get that using purrr lamdas (option2) or explicitly naming them (option3)
#option1
march_select %>%
summarise_at(c("male_score", "female_score", "poc_score", "white_score"), list(mean, sd))
## # A tibble: 1 x 8
## male_score_fn1 female_score_fn1 poc_score_fn1 white_score_fn1
## <dbl> <dbl> <dbl> <dbl>
## 1 1.61 0.447 0.300 1.23
## # … with 4 more variables: male_score_fn2 <dbl>, female_score_fn2 <dbl>,
## # poc_score_fn2 <dbl>, white_score_fn2 <dbl>
#option2
march_select %>%
summarise_at(c("male_score", "female_score", "poc_score", "white_score"), list(~mean(.), ~sd(.)))
## # A tibble: 1 x 8
## male_score_mean female_score_me… poc_score_mean white_score_mean
## <dbl> <dbl> <dbl> <dbl>
## 1 1.61 0.447 0.300 1.23
## # … with 4 more variables: male_score_sd <dbl>, female_score_sd <dbl>,
## # poc_score_sd <dbl>, white_score_sd <dbl>
#option3
march_select %>%
summarise_at(c("male_score", "female_score", "poc_score", "white_score"), list(mean = mean, sd = sd))
## # A tibble: 1 x 8
## male_score_mean female_score_me… poc_score_mean white_score_mean
## <dbl> <dbl> <dbl> <dbl>
## 1 1.61 0.447 0.300 1.23
## # … with 4 more variables: male_score_sd <dbl>, female_score_sd <dbl>,
## # poc_score_sd <dbl>, white_score_sd <dbl>
More info about summarise_at/all/if variants here.
Create a graph to compare the mean scores for recognition of women and for recognition of men.
We haven’t made the data long yet. Lets do that before we plot.
Create a df with just participant gender, gender, ethicnity, gpa, and scores for men, women, colour
to_plot <- march_select %>%
select(age, gender, ethnicity, gpa, male_score, female_score) %>%
pivot_longer(names_to = "gender_recog", values_to = "score", male_score:female_score)
Summarise and plot by gender of pioneer.
to_plot %>%
group_by(gender_recog) %>%
summarise(mean = mean(score)) %>%
ggplot(aes(x = gender_recog, y = mean)) +
geom_col() +
theme_apa() +
scale_y_continuous(expand = c(0,0), limits = c(0,3))
Level 4: Run an independent-samples t-test comparing the mean recognition of women (Female_score) variable across male and female participants (Gender).
Make new df, with just female scores
female <- to_plot %>%
filter(gender_recog == "female_score")
The jamovi software has a nice R package called ‘jmv’ associated with it with wrappers for simple stats functions.
The independent samples t-test code looks like this
# template code
ttestIS(formula = DV ~ group, data = df)
# our case
ttestIS(formula = score ~ gender, data = female)
In this case, this will throw an error the first time because there are 3 levels of gender in this dataset. Go back to the codebook, gender has 3 levels, 1 = male, 2 = female, 3= other. Confirm this using the unique() function.
unique(female$gender)
## [1] 2 1 3
Need to filter out other to get independent samples t-test to work.
female <- female %>%
filter(gender != 3)
Try t-test again…
Assign output to object called results, look at ttest table for that object using results$ttest, convert it to a dataframe to make it easier to refer to parameters.
results <- ttestIS(formula = score ~ gender, data = female)
results$ttest
##
## Independent Samples T-Test
## ─────────────────────────────────────────────────────
## statistic df p
## ─────────────────────────────────────────────────────
## score Student's t 0.288 244 0.773
## ─────────────────────────────────────────────────────
ttest_df <- as.data.frame(results$ttest)