The zeta distribution is a discrete probability distribution with shape parameter \(s \in (1,\infty)\), and probability mass function
\[ f_{s}(x)= \dfrac{x^{{-s}}}{\zeta (s)},\, \] where \(\zeta(s)\) is the Riemann zeta function, and \(x \in \{1,2,3,\dots\}\).
The likelihood function of \(s\), for a sample \({\bf x} = (x_1,\dots,x_n)^{\top}\) is given by:
\[ \begin{align} L(s \mid {\bf x}) &= \prod_{i=1}^n \dfrac{x_i^{{-s}}}{\zeta (s)}\\ &= \zeta(s)^{-n} \left( \prod_{i=1}^n x_i \right)^{-s}. \end{align} \]
This expression implies that a sufficient statistic for \(s\) is \(\prod_{i=1}^n x_i\), and consequently \(\dfrac{1}{n}\sum_{i=1}^n \log(x_i)\) is also sufficient for \(s\).
The log-likelihood function is:
\[ \begin{align} \ell(s \mid {\bf x}) &= -n \log \zeta(s) - s \sum_{i=1}^n \log x_i. \end{align} \]
One of the challenges in estimating the parameter \(s\) is that the normalising constant in the
zeta distribution is not available in closed-form. However, it is
acknowledged that several efficient implementations of the zeta function
are available (for example in the R package
VGAM). Thus, it is possible to directly implement the log-likelihood
function and optimise it using general-purpose methods
(e.g. optim
or nlminb
).
The mean of the zeta distribution is given by:
\[ E[X] = \dfrac{\zeta(s-1)}{\zeta(s)}, \quad s > 2. \] Then, the method of moments estimator can be obtained by solving \(\dfrac{\zeta(s-1)}{\zeta(s)} = \bar{\bf x}\) using numerical methods. This is restricted to \(s>2\).
See also: Inference for the Zeta distribution.
The AMLE, introduced by Rubio and Johansen (2013), consists of using Approximate Bayesian Computation (ABC) methods to approximate the likelihood function in cases where the likelihood is intractable and/or too computationally expensive to evaluate, and maximising this approximated likelihood function. The basic ABC algorithm is shown below:
Simulate \({\bf\theta}^{\prime}\) from the prior distribution \(\pi(\cdot)\).
Generate \({\bf y}\) from the model \(f(\cdot\vert{\bf\theta}^{\prime})\).
Accept \({\bf\theta}^{\prime}\) with probability \(\propto K_{\varepsilon}({\bf x} \vert {\bf y})\) otherwise return to step 1.
where \(K_{\varepsilon}({\bf x}\vert {\bf y})\) is a normalised Markov kernel. The ABC algorithm produces a sample from the following approximation to the posterior distribution:
\[ \widehat{\pi}_{\varepsilon}({\bf\theta}\vert{\bf x}) = \dfrac{\widehat{f}_{\varepsilon}({\bf x}\vert{\bf\theta})\pi({\bf\theta})}{\int_{\Theta}\widehat{f}_{\varepsilon}({\bf x}\vert{\bf t})\pi({\bf t})d{\bf t}}, \]
where
\[ \widehat{f}_{\varepsilon}({\bf x}\vert{\bf\theta}) = \int_{{\mathbb R}^{q\times n}}K_{\varepsilon}({\bf x}\vert {\bf y}) f({\bf y}\vert{\bf\theta}) d{\bf{y}} , \]
is an approximation of the likelihood function. Therefore, if we use a uniform prior on a compact set that contains the Maximum Likelihood Estimator (MLE), and the Maximum a Posteriori (MAP, the maximum of the posterior distribution) coincide. Thus, it is possible to approximate the MLE by using ABC methods to generate an approximate sample from the posterior and then approximating the MAP using this sample. This can be done by using the ABC algorithm in the first step of the AMLE algorithm, as shown below:
The AMLE Algorithm
Obtain a sample \({\bf \theta}^*_{\varepsilon}=({\bf \theta}^*_{\varepsilon,1},...,{\bf \theta}^*_{\varepsilon,k})\) from \(\widehat{\pi}_{\varepsilon}({\bf \theta}\vert{\bf x})\).
Using the sample \({\bf \theta}^*_{\varepsilon}\) construct a nonparametric estimator \(\widehat{\pi}_{k,\varepsilon}({\bf \theta}\vert{\bf x})\) of the density \(\widehat{\pi}_{\varepsilon}({\bf \theta}\vert{\bf x})\).
Calculate the maximum of \(\widehat{\pi}_{k,\varepsilon}({\bf \theta}\vert{\bf x})\), \(\tilde{{\bf \theta}}_{\varepsilon}\). This is an approximation of the MLE \(\widehat{{\bf \theta}}\).
Asymptotic properties of the AMLE are studied in Rubio and Johansen (2013).
The following R code illustrates how to estimate the parameter \(s\) using MLE, AMLE, and the method of moments, using a simulated data set.
rm(list=ls())
# Required packages
library(VGAM)
## Loading required package: stats4
## Loading required package: splines
library(modeest)
library(tolerance)
######################################################################################################
######################################################################################################
# AMLE
# https://doi.org/10.1214/13-EJS819
######################################################################################################
######################################################################################################
######################################################################################################
# Input variables
######################################################################################################
#--------------------------------------------------------------------------------------------------------
# dat : Vector of observations.
# N : ABC sample size to be used for the kernel density estimation
# tol : Tolerance for the ABC algorithm
# a1 : Left-end point of the support of the uniform prior distribution on $s$
# b1 : Right-end point of the support of the uniform prior distribution on $s$
# Simulations are based on the parameterisation s = s.tilde - 1 in the VGAM R package
#--------------------------------------------------------------------------------------------------------
# Metric
dist = function(x,y) return( sqrt( ( mean(log(x)) - mean(log(y)) )^2 ) )
# AMLE function
AMLE = function(dat,N,tol,a1,b1){
pp = qq = rep(0,N)
j=1
# pb = txtProgressBar(min = 1, max = N, initial = 1) # progress bar
while(j<N+1){
p = runif(1,a1,b1)
datABC <- rzeta(n = ns, shape = p)
if(dist(datABC,dat)<=tol) {
pp[j] = p
j = j+1
}
# setTxtProgressBar(pb,j)
}
# close(pb)
pB = density(pp)
MLE <- pB$x[which(pB$y==max(pB$y))]
out = list( MLE = MLE, ABCSamp = pp)
return(out)
}
######################################################################################################
######################################################################################################
# Method of Moments
######################################################################################################
######################################################################################################
# Estimating equation: method of moments
esteq <- Vectorize(function(par) zeta(par - 1)/zeta(par) - mean(dat0) )
######################################################################################################
######################################################################################################
# Log-likelihood
######################################################################################################
######################################################################################################
# Minus log-likelihood function
loglik <- Vectorize(function(par) ns*log( zeta(par) ) + par*sum(log(dat0)) )
######################################################################################################
######################################################################################################
# Simulated data
######################################################################################################
######################################################################################################
# Data simulation
set.seed(1234)
ns <- 1000
st <- 3 # s.tilde = s - 1
dat0 <- rzeta(n = ns, shape = st)
table(dat0)
## dat0
## 1 2 3 4 6 8
## 903 71 20 2 3 1
#--------------------------------------------------------------------------------------------
# MLE
#--------------------------------------------------------------------------------------------
# Direct implementation
OPTMLE <- optimize(loglik, interval = c(2,6))
curve(loglik, 2,6, xlab = "s", ylab = "log-likelihood", lwd = 2,
cex.axis = 1.5, cex.lab = 1.5)
s_MLE <- OPTMLE$minimum
s_MLE # MLE in the original parameterisation
## [1] 3.725665
# Using the R package 'tolerance'
out.zeta <- zm.ll(dat0, N = Inf, dist = "Zeta")
s_MLE2 <- as.numeric(coef(out.zeta))
s_MLE2 # MLE in the original parameterisation
## [1] 3.736972
#--------------------------------------------------------------------------------------------
# MME
# Applicable only for s > 2
#--------------------------------------------------------------------------------------------
# Estimating equation
curve(esteq, 2,5, xlab = "s", ylab = "Estimating equation", lwd = 2,
cex.axis = 1.5, cex.lab = 1.5)
# Estimator
opt <- uniroot(esteq, c(2,6))
# Method of Moments estimator
s_MME <- opt$root
s_MME # MME in the original parameterisation
## [1] 3.781259
#--------------------------------------------------------------------------------------------
# AMLE
#--------------------------------------------------------------------------------------------
# AMLE of the simulated data
ABC = AMLE(dat = dat0, N = 1000, tol = 0.01, a1 = 2, b1 = 5)
# AMLE
s_AMLE <- ABC$MLE + 1
s_AMLE # AMLE in the original parameterisation
## [1] 3.751683
# ABC posterior sample
hist(ABC$ABCSamp+1, breaks = 25, probability = TRUE,
xlab = "s", ylab = "Density", main = "ABC posterior",
cex.axis = 1.5, cex.lab = 1.5)
points(density(ABC$ABCSamp+1), type = "l", lwd = 2, col = "blue")
box()
# Alternative methods for finding the mode (AMLE)
asselin(ABC$ABCSamp + 1)
## [1] 3.738178
parzen(ABC$ABCSamp + 1)
## [1] 3.752353
#------------------------------------------------------------------------------------
# Comparison
#------------------------------------------------------------------------------------
cbind(s_MLE, s_MLE2, s_MME, s_AMLE)
## s_MLE s_MLE2 s_MME s_AMLE
## [1,] 3.725665 3.736972 3.781259 3.751683