In this article I will show how to get the data for Wikipedia page views and plot basic trends. Let’s say we are interested in Wikipedia page for Mary Kom, five-time World Amateur Boxing champion. Her Wikipedia page is https://en.wikipedia.org/wiki/Mary_Kom. In 2014 a movie based on Mary Kom was released. Wikipedia page for film Mary Kom is - https://en.wikipedia.org/wiki/Mary_Kom_(film). We can download the page views for these two pages in R with wikipediatrend package. Step 1 - Load required packages. We will need wikipediatrend, and ggplot package in R Workspace. (If packages are not installed you can install them with install.packages(“packagename”))Mary_Kom

library("wikipediatrend")
library("ggplot2")

Step 2 - Download the page views from 201101 for Mary Kom with function wp_trend

mk = wp_trend(                        # func wp_trend() builds a data object - mk
  "Mary_Kom",                               # search term is "Mary_Kom"
  from = "2011-01-01", 
  to = prev_month_end())

#dim(mk);    
head(mk)
##   date       count lang page     rank month  title   
## 1 2011-01-14 52    en   Mary_Kom -1   201101 Mary_Kom
## 2 2011-01-15 57    en   Mary_Kom -1   201101 Mary_Kom
## 3 2011-01-16 67    en   Mary_Kom -1   201101 Mary_Kom
## 4 2011-01-17 57    en   Mary_Kom -1   201101 Mary_Kom
## 5 2011-01-10 64    en   Mary_Kom -1   201101 Mary_Kom
## 6 2011-01-11 47    en   Mary_Kom -1   201101 Mary_Kom

Step 3 - Download the page views from 201101 for Mary Kom film with function wp_trend

mkf = wp_trend(                        # func wp_trend() builds a data object - mkf
  "Mary_Kom_(film)",               # search term is "Mary_Kom_(film)"
  from = "2011-01-01",   to = prev_month_end())
#dim(mkf);   
head(mkf)
##   date       count lang page             rank month  title           
## 1 2011-01-14 0     en   Mary_Kom_(fi ... -1   201101 Mary_Kom_(fi ...
## 2 2011-01-15 0     en   Mary_Kom_(fi ... -1   201101 Mary_Kom_(fi ...
## 3 2011-01-16 0     en   Mary_Kom_(fi ... -1   201101 Mary_Kom_(fi ...
## 4 2011-01-17 0     en   Mary_Kom_(fi ... -1   201101 Mary_Kom_(fi ...
## 5 2011-01-10 0     en   Mary_Kom_(fi ... -1   201101 Mary_Kom_(fi ...
## 6 2011-01-11 0     en   Mary_Kom_(fi ... -1   201101 Mary_Kom_(fi ...

Step 4 - Now we can append these data sets together and plot the trend

dcomp = rbind(mk,mkf);
ggplot(dcomp) + 
  geom_line(aes(x = date, y = count, 
                colour = title)) +
  scale_colour_manual(values = c("red", "blue"))

It’s clearly visible that page views for Mary kom spiked in 2012, when she won Bronze medal in 2012 Summer Olympics and in 2014 and in 2014, when movie Mary Kom was released.

Similarly let’s see the trend for Baji Rav I, Mastani and Bajirao Mastani film. As described above first download the page views in last 5 years for all the three wiki articles.

baji = wp_trend(                        # func wp_trend() builds a data object - 
  "Bajirao_I",                               # search term is "Bajirao_I"
  from = "2011-01-01", 
  to = "2016-02-14")

mast = wp_trend(                        # func wp_trend() builds a data object - 
  "Mastani",                               # search term is "Mastani"
  from = "2011-01-01", 
  to = "2016-02-14")

baji_mast = wp_trend(                        # func wp_trend() builds a data object - 
  "Bajirao_Mastani",                               # search term is "Bajirao_Mastani"
  from = "2011-01-01", 
  to = "2016-02-14")

Now combined all the three data sets and plot with ggplot2

dcomp = rbind(baji_mast,baji,mast);
ggplot(dcomp) + 
  geom_line(aes(x = date, y = count, 
                colour = title)) + 
  scale_colour_manual(values = c("red", "blue",'green'))

Again we can see the huge spike between Dec 2015 and Jan 2016. Let’s plot the page views distribution of these articles before June 2015

sub_dcomp =dcomp[dcomp$date < "2015-06-01",]
ggplot(sub_dcomp) + 
  geom_line(aes(x = date, y = count, 
                colour = title)) +
  scale_colour_manual(values = c("red", "blue",'green'))

Interestingly Mastani’ article seems to be more read in wikipedia than Baji Rao I’. Reason might be movie Mastani (1955)directed by Dhirubhai Desai and TV Series RAU and Shrimant Peshwa Bajirao Mastani