Running counts of MA literature searches

Get data
Summarize Data

In this markdown, we’re going to summarize the results of everyone’s literature search for their group project so far. We’re going to do that by reading the data directly from each group’s relevant_studes google spreadsheet.

Get data

First, we need the sheet id for each of your google spreadsheets containg your literature searches. This id is found in the URL of the spreadsheet.

Next, let’s read the data from your google spreadsheet directly into R. To do that, we can use the googlesheets4 R package and the read_sheet function.

COLS_WE_CARE_ABOUT <- c("coder_name", "unique_id", "screening_decision", "exclusion_reason")

g1_relevant_studies <- read_sheet(SHEET_ID_G1, "relevant_studies") %>%
  select(COLS_WE_CARE_ABOUT) %>%
  mutate(group = "Minimal Group Paradigm Group")

g2_relevant_studies <- read_sheet(SHEET_ID_G2, "relevant_studies") %>%
  select(COLS_WE_CARE_ABOUT) %>%
  mutate(group = "Linda Group")

g4_relevant_studies <- read_sheet(SHEET_ID_G4, 5) %>%
  select(COLS_WE_CARE_ABOUT) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 6) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 7) %>% select(COLS_WE_CARE_ABOUT)) %>%
  bind_rows(read_sheet(SHEET_ID_G4, 8) %>% select(COLS_WE_CARE_ABOUT)) %>%
  mutate(group = "Syntactic Bootstrapping Group")

Combine each group’s relevant studies into a single dataframe. Note that this data is tidy (each row is a single observation).

all_relevant_studies <- bind_rows(g1_relevant_studies,
                                  g2_relevant_studies,
                                  g4_relevant_studies)

Let’s only look at rows that have complete data for coder_name, unique_id, screening_decision, exclusion_reason.

clean_data <- all_relevant_studies %>%
  select(group, everything()) %>% # this moves `group` to be the first column
  drop_na(group:screening_decision) # drop columns if they don't have complete data for all columns from group to screenting_decision

Let’s see what the data look like. Pring the first 10 rows.

clean_data %>%
  slice(1:10) %>%
  kable()

group	coder_name	unique_id	screening_decision	exclusion_reason
Linda Group	zoe	morier1984	exclude	not empirical (review paper)
Linda Group	zoe	charness2009	include	Is this the same paper as the one above?
Linda Group	zoe	sides2002	include	NA
Linda Group	zoe	hertwig1999	include	NA
Linda Group	zoe	fiedler1988	include	NA
Linda Group	zoe	bonini2004	include	NA
Linda Group	zoe	wolford1990	include	NA
Linda Group	zoe	agnoli1989	include	NA
Linda Group	zoe	moro2008	exclude	not empirical (review paper)
Linda Group	zoe	dulany1991	include	NA

Summarize Data

How many papers has our class entered so far?

count(clean_data) %>%
  kable()

n
94

How many papers has our class entered so far?

clean_data %>%
  count(group) %>%
  kable()

group	n
Linda Group	32
Syntactic Bootstrapping Group	62

How many have inclusion decisions?

clean_data %>%
  count(screening_decision)%>%
  kable()

screening_decision	n
?	2
?entire book	1
exclude	34
excluded	24
include	19
included	13
not sure- think exclude	1

Ah, it’s hard to tell! Because people used different conventions. Let’s fix this to use include, exclude, and ?.

How many have inclusion decisions by group?

clean_data %>%
  count(screening_decision, group) %>%
  kable()

screening_decision	group	n
?	Linda Group	2
?entire book	Syntactic Bootstrapping Group	1
exclude	Linda Group	19
exclude	Syntactic Bootstrapping Group	15
excluded	Syntactic Bootstrapping Group	24
include	Linda Group	11
include	Syntactic Bootstrapping Group	8
included	Syntactic Bootstrapping Group	13
not sure- think exclude	Syntactic Bootstrapping Group	1

Let’s plot this by group

 clean_data %>%
  count(screening_decision, group) %>%
  ggplot(aes(x = group, fill = screening_decision, y = n)) +
  geom_bar(stat = "identity") +
  ylab("Number of papers entered") +
  ggtitle("Literature search counts by group") +
  theme_classic()

Running counts of MA literature searches

MRM Final Projects

Molly Lewis

2020-04-02

Get data

Summarize Data