library(tidyverse)
library(pageviews) # This package gets data on Wikipedia viewing
library(DT) # DT stands for datatable, and creates interactive tables
Using R, and a package that contains data from Wikipedia (pageviews, which I loaded above), we can view a number of interesting trends. For example, we can find out which was the most popular Wikipedia article on any given date. Of course, it could be expected that in the days following a major event, related articles would be viewed more often. In this case, I am interested in finding out what the top Wikipedia articles were (by views) in the days before and after mass shooting events in the United States. This can give us insight into the zeitgeist in those particular days, at least in terms of what people tend to search for to read about on Wikipedia, one of the most popular websites in the world.
- First, I would like to see a graph showing trends in views of Wikipedia’s Gun Control article over a period of years. So I am going to tell R to search for the number of page views for that article from July 1, 2015 up through July 1, 2018. Then I will tell R to show this data on a graph. As we can see, there are a number of conspicuous spikes in the popularity of that article at seemingly random places in time. This would reasonably be explained by mass shootings that occured at about the same time.
guncontrol <- article_pageviews(article = "Gun control", start = as.Date("2015-7-1"), end = as.Date("2018-7-1"))
guncontrol %>%
ggplot(aes(x = date, y = views)) +
geom_line()

- Now I would like to know, on which days in particular during that several year span was the Gun control article viewed the most times? It is useful to have this data displayed in tabular form, which R gives us by default in this case.
guncontrol %>%
arrange(-views)
- Now I would like to look at the two particular mass shooting events I am looking at data about for this assignment. I will mention each in turn. My goal is to find out what the top articles were in the following day for each of these events respectively. The question is: were people searching Wikipedia for information about (or related to) these events the day after they had occurred?
First we will look at the Stoneman Douglas High School shooting, also known as the “Florida Shooting”, occurred on February 14, 2018. It would seem reasonable to assume that the top Wikipedia searches from the following day might include some articles that would refer to content related to mass shootings.
top <- top_articles(start = as.Date("2018-2-15"))
top %>%
select(article, views) %>%
filter(!article == "Main_Page", !article == "Special:Search") %>%
datatable(class = 'cell-border stripe') %>%
formatStyle("article", backgroundColor = "lightgoldenrodyellow") %>%
formatStyle("views", backgroundColor = "limegreen")
NA
As our results show in the table above, the day after the Florida Shooting, articles titled “Columbine High School massacre” and “School shootings in the United States” were both among the top 10 most popular, as was the article titled “Colt AR-15” which could be also related.
Now that we have established an apparent trend, let’s take a look to see which Wikipedia articles were the most popular the day after the Las Vegas Shooting that occurred on October 1, 2017.
top2 <- top_articles(start = as.Date("2017-10-2"))
top2 %>%
select(article, views) %>%
filter(!article == "Main_Page", !article == "Special:Search") %>%
datatable(class = 'cell-border stripe') %>%
formatStyle("article", backgroundColor = "lightgoldenrodyellow") %>%
formatStyle("views", backgroundColor = "limegreen")
NA
As our table above shows up, the day after the Las Vegas shooting, the 4th most popular article was titled “2017 Las Vegas Strip shooting”, although seemingly the other popular articles are unrelated.
The findings would seem to confirm with data our common-sense ideas about what people will search for the day after these catastophic events.
- Now, I want to know about the popularity of Wikipedia’s article titled “Gun control” over time, focusing on the week leading up to and the two weeks after each of the two shooting events. I want to visualize these trends on a graph to see how closely they resemble each other and to get a more intuitive visual understanding of each.
First I need to tell R to create items called “vegas” and “florida” which will contain sets of data, with the previously mentioned date ranges (1 week before, 2 weeks after the day of each of the two events), tabulating the views of the “Gun control” article for each of those days in the range per each event.
vegas <- article_pageviews(article = "Gun_control",
start = as.Date("2017-9-24"),
end = as.Date("2017-10-15"))
vegas <- vegas %>%
mutate(day = -7:14) %>%
mutate(event = "Vegas")
florida <- article_pageviews(article = "Gun_control",
start = as.Date("2018-2-7"),
end = as.Date("2018-2-28"))
florida <- florida %>%
mutate(day = -7:14) %>%
mutate(event = "Florida")
Finally, I can tell R to construct the graph using the items “vegas” and “florida” that I mentioned previously.
shootings <- bind_rows(vegas, florida)
shootings %>%
ggplot(aes(x = day, y = views, color = event)) +
geom_line() +
theme_minimal() +
labs(x = "Days before/after Shooting",
y = "Wikipedia Views",
color = "Event",
title = "Views of the Wikipedia Gun Control Article before and after Two Mass Shootings")

Now we can see for ourselves that in the case of both events, the popularity of Wikipedia’s “Gun Control” article was quite low and flat. Immediately following each event, views of the article dramatically spiked and then slowly tapered off again in both cases after about two weeks.
