In this markdown, we’re going to summarize the results of everyone’s literature search for their group project so far. We’re going to do that by reading the data directly from each group’s relevant_studes google spreadsheet.

Get data

First, we need the sheet id for each of your google spreadsheets containg your literature searches. This id is found in the URL of the spreadsheet.

Next, let’s read the data from your google spreadsheet directly into R. To do that, we can use the googlesheets4 R package and the read_sheet function.

COLS_WE_CARE_ABOUT <- c("coder_name", "unique_id", "screening_decision", "exclusion_reason")

g1_relevant_studies <- read_sheet(SHEET_ID_G1, "relevant_studies") %>%
  select(COLS_WE_CARE_ABOUT) %>%
  mutate(group = "Minimal Group Paradigm Group")

g2_relevant_studies <- read_sheet(SHEET_ID_G2, "relevant_studies") %>%
  select(COLS_WE_CARE_ABOUT) %>%
  mutate(group = "Linda Group")

g4_relevant_studies <- read_sheet(SHEET_ID_G4, 5) %>%
  select(COLS_WE_CARE_ABOUT) %>%
    bind_rows(read_sheet(SHEET_ID_G4, 5) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 6) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 7) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 8) %>% select(COLS_WE_CARE_ABOUT)) %>%
  mutate(group = "Syntactic Bootstrapping Group")

Combine each group’s relevant studies into a single dataframe. Note that this data is tidy (each row is a single observation).

all_relevant_studies <- bind_rows(g1_relevant_studies,
                                  g2_relevant_studies,
                                  g4_relevant_studies)

Let’s only look at rows that have complete data for coder_name, unique_id, screening_decision, exclusion_reason.

clean_data <- all_relevant_studies %>%
  select(group, everything()) %>% # this moves `group` to be the first column
  drop_na(group:screening_decision) # drop columns if they don't have complete data for all columns from group to screenting_decision

Let’s see what the data look like. Pring the first 10 rows.

clean_data %>%
  slice(1:10) %>%
  kable()
group coder_name unique_id screening_decision exclusion_reason
Minimal Group Paradigm Group jailyn mlangeni2017 include NA
Minimal Group Paradigm Group jailyn thompson1990 include NA
Minimal Group Paradigm Group jailyn abrams2008 include NA
Minimal Group Paradigm Group jailyn otten2004 include NA
Minimal Group Paradigm Group jailyn zhong2008 exclude categorization is not arbitary
Minimal Group Paradigm Group jailyn wen2016 include NA
Minimal Group Paradigm Group jailyn bigler2001 exclude categorization is not anonymous
Minimal Group Paradigm Group jailyn decremer1999 exclude categorization is not arbitary
Minimal Group Paradigm Group jailyn foels2006 include NA
Minimal Group Paradigm Group jailyn peysakhovich2017 exclude no full paper access

Summarize data

How many papers has our class entered so far?

count(clean_data) %>%
  kable()
n
1215

How many papers has our class entered so far?, by group?

clean_data %>%
  count(group) %>%
  kable()
group n
Linda Group 422
Minimal Group Paradigm Group 191
Syntactic Bootstrapping Group 602
clean_data %>%
  group_by(group) %>%
  summarize(count = n()) %>% 
  kable()
group count
Linda Group 422
Minimal Group Paradigm Group 191
Syntactic Bootstrapping Group 602

How many have inclusion decisions?

clean_data %>%
  count(screening_decision)%>%
  kable()
screening_decision n
excldue 2
exclude 839
excluded 1
include 373

Ah, it’s hard to tell! Because people used different conventions. Let’s fix this to use include, exclude, and ?.

How many have inclusion decisions by group?

clean_data %>%
  count(screening_decision, group) %>%
  kable()
screening_decision group n
excldue Syntactic Bootstrapping Group 2
exclude Linda Group 293
exclude Minimal Group Paradigm Group 51
exclude Syntactic Bootstrapping Group 495
excluded Linda Group 1
include Linda Group 128
include Minimal Group Paradigm Group 140
include Syntactic Bootstrapping Group 105

Let’s plot this by group

 clean_data %>%
  count(screening_decision, group) %>%
  ggplot(aes(x = group, fill = screening_decision, y = n)) +
  geom_bar(stat = "identity") +
  ylab("Number of papers entered") +
  ggtitle("Literature search counts by group") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))