In this article I will show how to get the data for Wikipedia page views and plot basic trends. Let’s say we are interested in Wikipedia page for Mary Kom, five-time World Amateur Boxing champion. Her Wikipedia page is https://en.wikipedia.org/wiki/Mary_Kom. In 2014 a movie based on Mary Kom was released. Wikipedia page for film Mary Kom is - https://en.wikipedia.org/wiki/Mary_Kom_(film). We can download the page views for these two pages in R with wikipediatrend package. Step 1 - Load required packages. We will need wikipediatrend, and ggplot package in R Workspace. (If packages are not installed you can install them with install.packages(“packagename”))Mary_Kom
library("wikipediatrend")
library("ggplot2")
Step 2 - Download the page views from 201101 for Mary Kom with function wp_trend
mk = wp_trend( # func wp_trend() builds a data object - mk
"Mary_Kom", # search term is "Mary_Kom"
from = "2011-01-01",
to = prev_month_end())
#dim(mk);
head(mk)
## date count lang page rank month title
## 1 2011-01-14 52 en Mary_Kom -1 201101 Mary_Kom
## 2 2011-01-15 57 en Mary_Kom -1 201101 Mary_Kom
## 3 2011-01-16 67 en Mary_Kom -1 201101 Mary_Kom
## 4 2011-01-17 57 en Mary_Kom -1 201101 Mary_Kom
## 5 2011-01-10 64 en Mary_Kom -1 201101 Mary_Kom
## 6 2011-01-11 47 en Mary_Kom -1 201101 Mary_Kom
Step 3 - Download the page views from 201101 for Mary Kom film with function wp_trend
mkf = wp_trend( # func wp_trend() builds a data object - mkf
"Mary_Kom_(film)", # search term is "Mary_Kom_(film)"
from = "2011-01-01", to = prev_month_end())
#dim(mkf);
head(mkf)
## date count lang page rank month title
## 1 2011-01-14 0 en Mary_Kom_(fi ... -1 201101 Mary_Kom_(fi ...
## 2 2011-01-15 0 en Mary_Kom_(fi ... -1 201101 Mary_Kom_(fi ...
## 3 2011-01-16 0 en Mary_Kom_(fi ... -1 201101 Mary_Kom_(fi ...
## 4 2011-01-17 0 en Mary_Kom_(fi ... -1 201101 Mary_Kom_(fi ...
## 5 2011-01-10 0 en Mary_Kom_(fi ... -1 201101 Mary_Kom_(fi ...
## 6 2011-01-11 0 en Mary_Kom_(fi ... -1 201101 Mary_Kom_(fi ...
Step 4 - Now we can append these data sets together and plot the trend
dcomp = rbind(mk,mkf);
ggplot(dcomp) +
geom_line(aes(x = date, y = count,
colour = title)) +
scale_colour_manual(values = c("red", "blue"))
It’s clearly visible that page views for Mary kom spiked in 2012, when she won Bronze medal in 2012 Summer Olympics and in 2014 and in 2014, when movie Mary Kom was released.
Similarly let’s see the trend for Baji Rav I, Mastani and Bajirao Mastani film. As described above first download the page views in last 5 years for all the three wiki articles.
baji = wp_trend( # func wp_trend() builds a data object -
"Bajirao_I", # search term is "Bajirao_I"
from = "2011-01-01",
to = "2016-02-14")
mast = wp_trend( # func wp_trend() builds a data object -
"Mastani", # search term is "Mastani"
from = "2011-01-01",
to = "2016-02-14")
baji_mast = wp_trend( # func wp_trend() builds a data object -
"Bajirao_Mastani", # search term is "Bajirao_Mastani"
from = "2011-01-01",
to = "2016-02-14")
Now combined all the three data sets and plot with ggplot2
dcomp = rbind(baji_mast,baji,mast);
ggplot(dcomp) +
geom_line(aes(x = date, y = count,
colour = title)) +
scale_colour_manual(values = c("red", "blue",'green'))
Again we can see the huge spike between Dec 2015 and Jan 2016. Let’s plot the page views distribution of these articles before June 2015
sub_dcomp =dcomp[dcomp$date < "2015-06-01",]
ggplot(sub_dcomp) +
geom_line(aes(x = date, y = count,
colour = title)) +
scale_colour_manual(values = c("red", "blue",'green'))
Interestingly Mastani’ article seems to be more read in wikipedia than Baji Rao I’. Reason might be movie Mastani (1955)directed by Dhirubhai Desai and TV Series RAU and Shrimant Peshwa Bajirao Mastani