In this markdown, we’re going to summarize the results of everyone’s literature search for their group project so far. We’re going to do that by reading the data directly from each group’s relevant_studes google spreadsheet.

Get data

First, we need the sheet id for each of your google spreadsheets containg your literature searches. This id is found in the URL of the spreadsheet.

Next, let’s read the data from your google spreadsheet directly into R. To do that, we can use the googlesheets4 R package and the read_sheet function.

COLS_WE_CARE_ABOUT <- c("coder_name", "unique_id", "screening_decision", "exclusion_reason")

g1_relevant_studies <- read_sheet(SHEET_ID_G1, "relevant_studies") %>%
  select(COLS_WE_CARE_ABOUT) %>%
  mutate(group = "Minimal Group Paradigm Group")

g2_relevant_studies <- read_sheet(SHEET_ID_G2, "relevant_studies") %>%
  select(COLS_WE_CARE_ABOUT) %>%
  mutate(group = "Linda Group")

g4_relevant_studies <- read_sheet(SHEET_ID_G4, 5) %>%
  select(COLS_WE_CARE_ABOUT) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 6) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 7) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 8) %>% select(COLS_WE_CARE_ABOUT)) %>%
  mutate(group = "Syntactic Bootstrapping Group")

Combine each group’s relevant studies into a single dataframe. Note that this data is tidy (each row is a single observation).

all_relevant_studies <- bind_rows(g1_relevant_studies,
                                  g2_relevant_studies,
                                  g4_relevant_studies)

Let’s only look at rows that have complete data for coder_name, unique_id, screening_decision, exclusion_reason.

clean_data <- all_relevant_studies %>%
  select(group, everything()) %>% # this moves `group` to be the first column
  drop_na(group:screening_decision) # drop columns if they don't have complete data for all columns from group to screenting_decision

Let’s see what the data look like. Pring the first 10 rows.

clean_data %>%
  slice(1:10) %>%
  kable()
group coder_name unique_id screening_decision exclusion_reason
Linda Group zoe morier1984 exclude not empirical (review paper)
Linda Group zoe charness2009 include Is this the same paper as the one above?
Linda Group zoe sides2002 include NA
Linda Group zoe hertwig1999 include NA
Linda Group zoe fiedler1988 include NA
Linda Group zoe bonini2004 include NA
Linda Group zoe wolford1990 include NA
Linda Group zoe agnoli1989 include NA
Linda Group zoe moro2008 exclude not empirical (review paper)
Linda Group zoe dulany1991 include NA

Summarize Data

How many papers has our class entered so far?

count(clean_data) %>%
  kable()
n
94

How many papers has our class entered so far?

clean_data %>%
  count(group) %>%
  kable()
group n
Linda Group 32
Syntactic Bootstrapping Group 62

How many have inclusion decisions?

clean_data %>%
  count(screening_decision)%>%
  kable()
screening_decision n
? 2
?entire book 1
exclude 34
excluded 24
include 19
included 13
not sure- think exclude 1

Ah, it’s hard to tell! Because people used different conventions. Let’s fix this to use include, exclude, and ?.

How many have inclusion decisions by group?

clean_data %>%
  count(screening_decision, group) %>%
  kable()
screening_decision group n
? Linda Group 2
?entire book Syntactic Bootstrapping Group 1
exclude Linda Group 19
exclude Syntactic Bootstrapping Group 15
excluded Syntactic Bootstrapping Group 24
include Linda Group 11
include Syntactic Bootstrapping Group 8
included Syntactic Bootstrapping Group 13
not sure- think exclude Syntactic Bootstrapping Group 1

Let’s plot this by group

 clean_data %>%
  count(screening_decision, group) %>%
  ggplot(aes(x = group, fill = screening_decision, y = n)) +
  geom_bar(stat = "identity") +
  ylab("Number of papers entered") +
  ggtitle("Literature search counts by group") +
  theme_classic()