Obtaining the data from Google Trends we consider the monthly time series of relative search volume for the strings Genstat and Minitab from 2004 until the present. I have chosen these two statistical packages because their names are relatively unique, and because I am a former user of both.
gf <- read.csv("Genstat.csv", header=TRUE)[,-1]
mf <- read.csv("Minitab.csv", header=TRUE)[,-1]
Gseries <- ts(gf, frequency=12, start=c(2004,1))
Mseries <- ts(mf, frequency=12, start=c(2004,1))
Gseries
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2004 48 64 78 15 66 39 41 38 35 72 48 56
## 2005 39 42 35 88 42 85 52 17 61 100 71 42
## 2006 43 45 38 29 11 28 34 62 20 54 40 9
## 2007 20 46 29 15 36 14 13 35 34 28 41 23
## 2008 23 34 14 32 14 22 21 19 20 26 19 36
## 2009 19 28 30 18 24 23 17 15 18 26 23 15
## 2010 20 18 24 29 19 27 22 18 23 22 24 14
## 2011 15 21 21 28 22 11 15 13 18 16 25 18
## 2012 11 16 16 28 19 12 12 14 13 10 17 9
## 2013 16 15 18 21 11 13 15 12 16 11 14 9
## 2014 5 17 19 11 15 8 9 8 12 15 9 9
## 2015 10 13 17 14 15 13 10 11 12 9 9 10
## 2016 11 9 10 14 11 10 8 9 8 11 9 5
## 2017 10 11 9 8 10 8 5 8 8 7 7 8
## 2018 7 10
Mseries
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2004 81 96 90 88 74 75 70 66 93 93 92 74
## 2005 79 100 78 81 74 71 52 62 91 81 84 60
## 2006 66 72 72 76 70 43 46 61 72 70 71 43
## 2007 56 58 63 61 57 50 39 45 64 58 58 45
## 2008 50 53 49 58 53 47 41 43 58 54 53 42
## 2009 39 50 57 49 45 38 36 34 50 47 52 41
## 2010 40 50 53 52 43 41 34 36 48 46 48 37
## 2011 40 42 50 46 43 40 32 35 43 43 45 34
## 2012 36 40 42 44 39 34 30 28 41 39 41 31
## 2013 33 37 37 41 38 30 27 27 40 39 40 29
## 2014 30 38 37 40 36 33 27 29 40 39 40 32
## 2015 28 36 38 40 34 30 28 26 37 37 37 29
## 2016 24 30 33 35 33 27 23 26 32 32 35 26
## 2017 25 32 32 35 32 25 21 24 30 32 33 26
## 2018 25 29
Here are time series plots for the two series.
plot(Gseries, xlab="Year", ylim=c(0,100),
main="Relative Volume of Genstat Queries")
abline(h = 0)
plot(Mseries, xlab="Year", ylim=c(0,100),
main="Relative Volume of Minitab Queries")
abline(h = 0)
The decline in Genstat usage seems much more rapid than that of Minitab. Indeed Minitab seems as if its decline may flatten out.
It should be noted that Genstat queries are much lower in volume than those for Minitab. Indeed when both are downloaded together from Google Trends, many Genstat values are left-censored. A rough comparison of order statistics suggests that Google receives around 30 times as many queries about Minitab as it does about Genstat. This may underestimate Genstat relative usage though, as Genstat users may be more likely to resolve queries within the user community than Minitab users.
Before examining the seasonal effects in these series we will consider BoxCox transformations.
library(fpp)
BoxCox.lambda(Gseries)
## [1] -0.3822953
BoxCox.lambda(Mseries)
## [1] -0.06255488
The faster decline of Genstat is reflected in the estimated BoxCox parameters, with the Genstat series requiring a harsher transformation than the Minitab.
Choosing transformation parameters close to these ML estimates we re-plot the time series after transformation.
tGseries <- -Gseries^(-0.5)
tMseries <- log(Mseries)
plot(tGseries, xlab="Year",
main="Transformed Relative Volume of Genstat Queries")
plot(tMseries, xlab="Year",
main="Transformed Relative Volume of Minitab Queries")
From the time series plots above Minitab appears to show seasonal effects in Google query traffic whereas the Genstat picture of seasonality is less clear, possibly due to the Genstat series being more noisy because of lower query volumes. We will explore this graphically using seasonplot and monthplot from R’s fpp package.
seasonplot(tGseries, xlab="Month",
main="Transformed Relative Volume of Genstat Queries",
year.labels=TRUE, col=1:8, pch=19)
seasonplot(tMseries, xlab="Month",
main="Transformed Relative Volume of Minitab Queries",
year.labels=TRUE, col=1:8, pch=19)
In the Minitab seasonplot the Northern Hemisphere Summer and midwinter emerge as periods of relatively low traffic with Spring and Fall having relatively high traffic.This may relate to vacation periods of Minitab users.
No clear seasonal patters are visible in the Genstat plot.
We now turn to monthplot output.
monthplot(tGseries,xlab="Month",xaxt="n",
main="Transformed Relative Volume of Genstat Queries")
axis(1,at=1:12,labels=month.abb,cex=0.8)
monthplot(tMseries,xlab="Month",xaxt="n",
main="Transformed Relative Volume of Minitab Queries")
axis(1,at=1:12,labels=month.abb,cex=0.8)
The strong declining trend and distinct seasonal effects are again visible in the Minitab monthplot. The Genstat trends are similar but subject to more noise. In the Genstat monthplot weak seasonal effects are also seen. These are generally similar to the Minitab seasonal effects but are weaker, suggesting that a proportion of the Genstat users have the same season-dependent usage patterns as the Minitab users.
It may seem dubious to talk about the relative strength of the Genstat and Minitab seasonal effects when the series have been differently transformed. So I repeat the plots under a common power transformation. The plotting code is identical to that above and has been suppressed.
tGseries <- -Gseries^(-0.2)
tMseries <- -Mseries^(-0.2)
;
The graphs appear much as in the previous section.
I have nothing more to add to what I have said above at this time. I do not wish to be negative about this software which has been incredibly useful to me in the past and I am sure continues to be very useful to many statisticians in the present. Comments would be welcome but please reply through the list through which you arrived at the article. I may revise this note later to incorporate further data and/or analyses.