Running counts of MA literature searches

Get data
Summarize data

In this markdown, we’re going to summarize the results of everyone’s literature search for their group project so far. We’re going to do that by reading the data directly from each group’s relevant_studes google spreadsheet.

Get data

First, we need the sheet id for each of your google spreadsheets containg your literature searches. This id is found in the URL of the spreadsheet.

Next, let’s read the data from your google spreadsheet directly into R. To do that, we can use the googlesheets4 R package and the read_sheet function.

COLS_WE_CARE_ABOUT <- c("coder_name", "unique_id", "screening_decision", "exclusion_reason")

g1_relevant_studies <- read_sheet(SHEET_ID_G1, "relevant_studies") %>%
  select(COLS_WE_CARE_ABOUT) %>%
  mutate(group = "Minimal Group Paradigm Group")

g2_relevant_studies <- read_sheet(SHEET_ID_G2, "relevant_studies") %>%
  select(COLS_WE_CARE_ABOUT) %>%
  mutate(group = "Linda Group")

g4_relevant_studies <- read_sheet(SHEET_ID_G4, 5) %>%
  select(COLS_WE_CARE_ABOUT) %>%
    bind_rows(read_sheet(SHEET_ID_G4, 5) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 6) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 7) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 8) %>% select(COLS_WE_CARE_ABOUT)) %>%
  mutate(group = "Syntactic Bootstrapping Group")

Combine each group’s relevant studies into a single dataframe. Note that this data is tidy (each row is a single observation).

all_relevant_studies <- bind_rows(g1_relevant_studies,
                                  g2_relevant_studies,
                                  g4_relevant_studies)

Let’s only look at rows that have complete data for coder_name, unique_id, screening_decision, exclusion_reason.

clean_data <- all_relevant_studies %>%
  select(group, everything()) %>% # this moves `group` to be the first column
  drop_na(group:screening_decision) # drop columns if they don't have complete data for all columns from group to screenting_decision

Let’s see what the data look like. Pring the first 10 rows.

clean_data %>%
  slice(1:10) %>%
  kable()

group	coder_name	unique_id	screening_decision	exclusion_reason
Minimal Group Paradigm Group	jailyn	mlangeni2017	include	NA
Minimal Group Paradigm Group	jailyn	thompson1990	include	NA
Minimal Group Paradigm Group	jailyn	abrams2008	include	NA
Minimal Group Paradigm Group	jailyn	otten2004	include	NA
Minimal Group Paradigm Group	jailyn	zhong2008	exclude	categorization is not arbitary
Minimal Group Paradigm Group	jailyn	wen2016	include	NA
Minimal Group Paradigm Group	jailyn	bigler2001	exclude	categorization is not anonymous
Minimal Group Paradigm Group	jailyn	decremer1999	exclude	categorization is not arbitary
Minimal Group Paradigm Group	jailyn	foels2006	include	NA
Minimal Group Paradigm Group	jailyn	peysakhovich2017	exclude	no full paper access

Summarize data

How many papers has our class entered so far?

count(clean_data) %>%
  kable()

n
1215

How many papers has our class entered so far?, by group?

clean_data %>%
  count(group) %>%
  kable()

group	n
Linda Group	422
Minimal Group Paradigm Group	191
Syntactic Bootstrapping Group	602

clean_data %>%
  group_by(group) %>%
  summarize(count = n()) %>% 
  kable()

group	count
Linda Group	422
Minimal Group Paradigm Group	191
Syntactic Bootstrapping Group	602

How many have inclusion decisions?

clean_data %>%
  count(screening_decision)%>%
  kable()

screening_decision	n
excldue	2
exclude	839
excluded	1
include	373

Ah, it’s hard to tell! Because people used different conventions. Let’s fix this to use include, exclude, and ?.

How many have inclusion decisions by group?

clean_data %>%
  count(screening_decision, group) %>%
  kable()

screening_decision	group	n
excldue	Syntactic Bootstrapping Group	2
exclude	Linda Group	293
exclude	Minimal Group Paradigm Group	51
exclude	Syntactic Bootstrapping Group	495
excluded	Linda Group	1
include	Linda Group	128
include	Minimal Group Paradigm Group	140
include	Syntactic Bootstrapping Group	105

Let’s plot this by group

 clean_data %>%
  count(screening_decision, group) %>%
  ggplot(aes(x = group, fill = screening_decision, y = n)) +
  geom_bar(stat = "identity") +
  ylab("Number of papers entered") +
  ggtitle("Literature search counts by group") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Running counts of MA literature searches

MRM Final Projects

Molly Lewis

2020-04-10

Get data

Summarize data