Time series plots of Genstat and Minitab Google searches.

Obtaining the data from Google Trends we consider the monthly time series of relative search volume for the strings Genstat and Minitab from 2004 until the present. I have chosen these two statistical packages because their names are relatively unique, and because I am a former user of both.

gf <- read.csv("Genstat.csv", header=TRUE)[,-1]
mf <- read.csv("Minitab.csv", header=TRUE)[,-1]
Gseries <- ts(gf, frequency=12, start=c(2004,1))
Mseries <- ts(mf, frequency=12, start=c(2004,1))
Gseries
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2004  48  64  78  15  66  39  41  38  35  72  48  56
## 2005  39  42  35  88  42  85  52  17  61 100  71  42
## 2006  43  45  38  29  11  28  34  62  20  54  40   9
## 2007  20  46  29  15  36  14  13  35  34  28  41  23
## 2008  23  34  14  32  14  22  21  19  20  26  19  36
## 2009  19  28  30  18  24  23  17  15  18  26  23  15
## 2010  20  18  24  29  19  27  22  18  23  22  24  14
## 2011  15  21  21  28  22  11  15  13  18  16  25  18
## 2012  11  16  16  28  19  12  12  14  13  10  17   9
## 2013  16  15  18  21  11  13  15  12  16  11  14   9
## 2014   5  17  19  11  15   8   9   8  12  15   9   9
## 2015  10  13  17  14  15  13  10  11  12   9   9  10
## 2016  11   9  10  14  11  10   8   9   8  11   9   5
## 2017  10  11   9   8  10   8   5   8   8   7   7   8
## 2018   7  10
Mseries
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2004  81  96  90  88  74  75  70  66  93  93  92  74
## 2005  79 100  78  81  74  71  52  62  91  81  84  60
## 2006  66  72  72  76  70  43  46  61  72  70  71  43
## 2007  56  58  63  61  57  50  39  45  64  58  58  45
## 2008  50  53  49  58  53  47  41  43  58  54  53  42
## 2009  39  50  57  49  45  38  36  34  50  47  52  41
## 2010  40  50  53  52  43  41  34  36  48  46  48  37
## 2011  40  42  50  46  43  40  32  35  43  43  45  34
## 2012  36  40  42  44  39  34  30  28  41  39  41  31
## 2013  33  37  37  41  38  30  27  27  40  39  40  29
## 2014  30  38  37  40  36  33  27  29  40  39  40  32
## 2015  28  36  38  40  34  30  28  26  37  37  37  29
## 2016  24  30  33  35  33  27  23  26  32  32  35  26
## 2017  25  32  32  35  32  25  21  24  30  32  33  26
## 2018  25  29

Here are time series plots for the two series.

plot(Gseries, xlab="Year", ylim=c(0,100),
     main="Relative Volume of Genstat Queries")
abline(h = 0)

plot(Mseries, xlab="Year",  ylim=c(0,100),
     main="Relative Volume of Minitab Queries")
abline(h = 0)

The decline in Genstat usage seems much more rapid than that of Minitab. Indeed Minitab seems as if its decline may flatten out.

It should be noted that Genstat queries are much lower in volume than those for Minitab. Indeed when both are downloaded together from Google Trends, many Genstat values are left-censored. A rough comparison of order statistics suggests that Google receives around 30 times as many queries about Minitab as it does about Genstat. This may underestimate Genstat relative usage though, as Genstat users may be more likely to resolve queries within the user community than Minitab users.

Before examining the seasonal effects in these series we will consider BoxCox transformations.

library(fpp)
BoxCox.lambda(Gseries)
## [1] -0.3822953
BoxCox.lambda(Mseries)
## [1] -0.06255488

The faster decline of Genstat is reflected in the estimated BoxCox parameters, with the Genstat series requiring a harsher transformation than the Minitab.

Choosing transformation parameters close to these ML estimates we re-plot the time series after transformation.

tGseries <- -Gseries^(-0.5)
tMseries <- log(Mseries)
plot(tGseries, xlab="Year", 
     main="Transformed Relative Volume of Genstat Queries")

plot(tMseries, xlab="Year", 
     main="Transformed Relative Volume of Minitab Queries")

Seasonal plots of transformed series

From the time series plots above Minitab appears to show seasonal effects in Google query traffic whereas the Genstat picture of seasonality is less clear, possibly due to the Genstat series being more noisy because of lower query volumes. We will explore this graphically using seasonplot and monthplot from R’s fpp package.

seasonplot(tGseries, xlab="Month", 
  main="Transformed Relative Volume of Genstat Queries",
  year.labels=TRUE, col=1:8, pch=19)

seasonplot(tMseries, xlab="Month", 
  main="Transformed Relative Volume of Minitab  Queries",
  year.labels=TRUE, col=1:8, pch=19)

In the Minitab seasonplot the Northern Hemisphere Summer and midwinter emerge as periods of relatively low traffic with Spring and Fall having relatively high traffic.This may relate to vacation periods of Minitab users.

No clear seasonal patters are visible in the Genstat plot.

We now turn to monthplot output.

monthplot(tGseries,xlab="Month",xaxt="n",
  main="Transformed Relative Volume of Genstat Queries")
axis(1,at=1:12,labels=month.abb,cex=0.8)

monthplot(tMseries,xlab="Month",xaxt="n",
  main="Transformed Relative Volume of Minitab Queries")
axis(1,at=1:12,labels=month.abb,cex=0.8)

The strong declining trend and distinct seasonal effects are again visible in the Minitab monthplot. The Genstat trends are similar but subject to more noise. In the Genstat monthplot weak seasonal effects are also seen. These are generally similar to the Minitab seasonal effects but are weaker, suggesting that a proportion of the Genstat users have the same season-dependent usage patterns as the Minitab users.

Seasonal plots under a common transformation.

It may seem dubious to talk about the relative strength of the Genstat and Minitab seasonal effects when the series have been differently transformed. So I repeat the plots under a common power transformation. The plotting code is identical to that above and has been suppressed.

tGseries <- -Gseries^(-0.2)
tMseries <- -Mseries^(-0.2)

;

The graphs appear much as in the previous section.

Discussion

I have nothing more to add to what I have said above at this time. I do not wish to be negative about this software which has been incredibly useful to me in the past and I am sure continues to be very useful to many statisticians in the present. Comments would be welcome but please reply through the list through which you arrived at the article. I may revise this note later to incorporate further data and/or analyses.