Introduction

Data source is:

Their abstract reads:

In order to explore whether Wikipedia is a cutting edge source for designer drugs information, or whether it mainly features well known substances, we compared the appearance of various designer drugs on Wikipedia and with the date they were added to the EMCDDA watch lists. The EMCDDA, or European Monitoring Centre for Drugs and Drug Addiction, is an organization that monitors drugs and drug use in Europe. Designer Drugs are one of the phenomena that the EMDDA focuses on and their appearance and monitoring is one of the key tasks (EMCDDA 2016). In this interactive graph, you can see above an X-axis with a timeline on which different designer drugs appear in time, shown on the Y-axis. The red dot represents the point in time when the drug came on the EMCDDA watch list, the blue dot when it appeared on Wikipedia. The distance between the dots represents the time that has passed in between.

Initialize

Load packages and data. Recode some variables. Notice that we needed to use tricks to get the first date vector. lapply/map gives us a list of length 1 Date vectors, but apparently sapply, unlist breaks the output by implicitly calling as.numeric or something on them! Oh great Hadley, can we haz map_date please? Or map_vector for other vector-type output?

options(digits = 2)
library(pacman)
p_load(kirkegaard, readr, dplyr, lubridate)

d = read_tsv("data/watchlist.tsv") %>% 
  mutate(
    wikipedia_date = dmy(wikipedia_date),
    watchlist_date = dmy(watchlist_date),
    wiki_prior = watchlist_date - wikipedia_date
  )
## Parsed with column specification:
## cols(
##   substance = col_character(),
##   wikipedia_id = col_character(),
##   wikipedia_date = col_character(),
##   watchlist_date = col_character(),
##   category = col_character()
## )
first_date = d %$% lapply(seq_along(wikipedia_date),function(i) {
  .x = d$wikipedia_date[i]
  .y = d$wikipedia_date[i]
  
  min(c(.x, .y))
  })

d$first_date = do.call(c, first_date)

Analyze

ggplot(d, aes(first_date, wiki_prior)) +
  geom_point() +
  scale_y_continuous("Days Wikipedia prior to European Monitoring Centre\nfor Drugs and Drug Addiction") +
  scale_x_date("Date the drug was first covered by either resource", date_breaks = "1 year", date_labels = "%Y") +
  geom_smooth() +
  theme_bw()
## `geom_smooth()` using method = 'loess'

GG_save("figs/timeline.png")
## `geom_smooth()` using method = 'loess'

My interpretation is that Wikipedia’s new generation of conservative admins are so preoccupied with having high quality content™, that they keep out entries on cutting edge stuff as well as annoy contributors so much that they don’t want to contribute in general, which can be seen in the stagnated growth statistics. What a shame!!! I recommend Gwern’s defense of inclusionism.