The zeta distribution

The zeta distribution is a discrete probability distribution with shape parameter \(s \in (1,\infty)\), and probability mass function

\[ f_{s}(x)= \dfrac{x^{{-s}}}{\zeta (s)},\, \] where \(\zeta(s)\) is the Riemann zeta function, and \(x \in \{1,2,3,\dots\}\).

Maximum likelihood estimator (MLE)

The likelihood function of \(s\), for a sample \({\bf x} = (x_1,\dots,x_n)^{\top}\) is given by:

\[ \begin{align} L(s \mid {\bf x}) &= \prod_{i=1}^n \dfrac{x_i^{{-s}}}{\zeta (s)}\\ &= \zeta(s)^{-n} \left( \prod_{i=1}^n x_i \right)^{-s}. \end{align} \]

This expression implies that a sufficient statistic for \(s\) is \(\prod_{i=1}^n x_i\), and consequently \(\dfrac{1}{n}\sum_{i=1}^n \log(x_i)\) is also sufficient for \(s\).

The log-likelihood function is:

\[ \begin{align} \ell(s \mid {\bf x}) &= -n \log \zeta(s) - s \sum_{i=1}^n \log x_i. \end{align} \]

One of the challenges in estimating the parameter \(s\) is that the normalising constant in the zeta distribution is not available in closed-form. However, it is acknowledged that several efficient implementations of the zeta function are available (for example in the R package VGAM). Thus, it is possible to directly implement the log-likelihood function and optimise it using general-purpose methods (e.g. optim or nlminb).

Method of moments estimator (MME)

The mean of the zeta distribution is given by:

\[ E[X] = \dfrac{\zeta(s-1)}{\zeta(s)}, \quad s > 2. \] Then, the method of moments estimator can be obtained by solving \(\dfrac{\zeta(s-1)}{\zeta(s)} = \bar{\bf x}\) using numerical methods. This is restricted to \(s>2\).

Approximate maximum likelihood estimator (AMLE)

The AMLE, introduced by Rubio and Johansen (2013), consists of using Approximate Bayesian Computation (ABC) methods to approximate the likelihood function in cases where the likelihood is intractable and/or too computationally expensive to evaluate, and maximising this approximated likelihood function. The basic ABC algorithm is shown below:

Simulate \({\bf\theta}^{\prime}\) from the prior distribution \(\pi(\cdot)\).
Generate \({\bf y}\) from the model \(f(\cdot\vert{\bf\theta}^{\prime})\).
Accept \({\bf\theta}^{\prime}\) with probability \(\propto K_{\varepsilon}({\bf x} \vert {\bf y})\) otherwise return to step 1.

where \(K_{\varepsilon}({\bf x}\vert {\bf y})\) is a normalised Markov kernel. The ABC algorithm produces a sample from the following approximation to the posterior distribution:

\[ \widehat{\pi}_{\varepsilon}({\bf\theta}\vert{\bf x}) = \dfrac{\widehat{f}_{\varepsilon}({\bf x}\vert{\bf\theta})\pi({\bf\theta})}{\int_{\Theta}\widehat{f}_{\varepsilon}({\bf x}\vert{\bf t})\pi({\bf t})d{\bf t}}, \]

where

\[ \widehat{f}_{\varepsilon}({\bf x}\vert{\bf\theta}) = \int_{{\mathbb R}^{q\times n}}K_{\varepsilon}({\bf x}\vert {\bf y}) f({\bf y}\vert{\bf\theta}) d{\bf{y}} , \]

is an approximation of the likelihood function. Therefore, if we use a uniform prior on a compact set that contains the Maximum Likelihood Estimator (MLE), and the Maximum a Posteriori (MAP, the maximum of the posterior distribution) coincide. Thus, it is possible to approximate the MLE by using ABC methods to generate an approximate sample from the posterior and then approximating the MAP using this sample. This can be done by using the ABC algorithm in the first step of the AMLE algorithm, as shown below:

The AMLE Algorithm

Obtain a sample \({\bf \theta}^*_{\varepsilon}=({\bf \theta}^*_{\varepsilon,1},...,{\bf \theta}^*_{\varepsilon,k})\) from \(\widehat{\pi}_{\varepsilon}({\bf \theta}\vert{\bf x})\).
Using the sample \({\bf \theta}^*_{\varepsilon}\) construct a nonparametric estimator \(\widehat{\pi}_{k,\varepsilon}({\bf \theta}\vert{\bf x})\) of the density \(\widehat{\pi}_{\varepsilon}({\bf \theta}\vert{\bf x})\).
Calculate the maximum of \(\widehat{\pi}_{k,\varepsilon}({\bf \theta}\vert{\bf x})\), \(\tilde{{\bf \theta}}_{\varepsilon}\). This is an approximation of the MLE \(\widehat{{\bf \theta}}\).

Asymptotic properties of the AMLE are studied in Rubio and Johansen (2013).

R code

The following R code illustrates how to estimate the parameter \(s\) using MLE, AMLE, and the method of moments, using a simulated data set.

Required functions and packages

rm(list=ls())

# Required packages
library(VGAM)

## Loading required package: stats4

## Loading required package: splines

library(modeest)
library(tolerance)


######################################################################################################
######################################################################################################
# AMLE
# https://doi.org/10.1214/13-EJS819
######################################################################################################
######################################################################################################



######################################################################################################
# Input variables
######################################################################################################
#--------------------------------------------------------------------------------------------------------
# dat    :   Vector of observations.
# N      :   ABC sample size to be used for the kernel density estimation
# tol    :   Tolerance for the ABC algorithm
# a1     :   Left-end point of the support of the uniform prior distribution on $s$
# b1     :   Right-end point of the support of the uniform prior distribution on $s$
# Simulations are based on the parameterisation s = s.tilde - 1 in the VGAM R package
#-------------------------------------------------------------------------------------------------------- 

# Metric

dist = function(x,y) return( sqrt( ( mean(log(x)) - mean(log(y)) )^2 )  )

# AMLE function

AMLE = function(dat,N,tol,a1,b1){
  pp = qq = rep(0,N)
  j=1
#  pb = txtProgressBar(min = 1, max = N, initial = 1) # progress bar
  while(j<N+1){
    p = runif(1,a1,b1)
    datABC <- rzeta(n = ns, shape = p)
    if(dist(datABC,dat)<=tol) {
      pp[j] = p  
      j = j+1 
    }
#    setTxtProgressBar(pb,j)
  }
#  close(pb)
  pB = density(pp)
  MLE <- pB$x[which(pB$y==max(pB$y))]
  out = list( MLE = MLE, ABCSamp = pp)
  return(out)
}


######################################################################################################
######################################################################################################
# Method of Moments
######################################################################################################
######################################################################################################

# Estimating equation: method of moments
esteq <- Vectorize(function(par) zeta(par - 1)/zeta(par) - mean(dat0)  )


######################################################################################################
######################################################################################################
# Log-likelihood
######################################################################################################
######################################################################################################


# Minus log-likelihood function
loglik <- Vectorize(function(par) ns*log( zeta(par) ) + par*sum(log(dat0)) )

Data simulation

######################################################################################################
######################################################################################################
# Simulated data
######################################################################################################
######################################################################################################


# Data simulation
set.seed(1234)
ns <- 1000
st <- 3 # s.tilde = s - 1
dat0 <- rzeta(n = ns, shape = st)
table(dat0)

## dat0
##   1   2   3   4   6   8 
## 903  71  20   2   3   1

MLE

#--------------------------------------------------------------------------------------------
# MLE
#--------------------------------------------------------------------------------------------

# Direct implementation
OPTMLE <- optimize(loglik, interval = c(2,6))

curve(loglik, 2,6, xlab = "s", ylab = "log-likelihood", lwd = 2,
      cex.axis = 1.5, cex.lab = 1.5)

s_MLE <- OPTMLE$minimum 

s_MLE # MLE in the original parameterisation

## [1] 3.725665

# Using the R package 'tolerance'
out.zeta <- zm.ll(dat0, N = Inf, dist = "Zeta")

s_MLE2 <- as.numeric(coef(out.zeta))

s_MLE2 # MLE in the original parameterisation

## [1] 3.736972

MME

#--------------------------------------------------------------------------------------------
# MME
# Applicable only for s > 2
#--------------------------------------------------------------------------------------------

# Estimating equation
curve(esteq, 2,5, xlab = "s", ylab = "Estimating equation", lwd = 2,
      cex.axis = 1.5, cex.lab = 1.5)

# Estimator
opt <- uniroot(esteq, c(2,6))

# Method of Moments estimator
s_MME <- opt$root 

s_MME # MME in the original parameterisation

## [1] 3.781259

AMLE

#--------------------------------------------------------------------------------------------
# AMLE
#--------------------------------------------------------------------------------------------



# AMLE of the simulated data 

ABC = AMLE(dat = dat0, N = 1000, tol = 0.01, a1 = 2, b1 = 5)

# AMLE
s_AMLE <- ABC$MLE + 1

s_AMLE # AMLE in the original parameterisation

## [1] 3.751683

# ABC posterior sample

hist(ABC$ABCSamp+1, breaks = 25, probability = TRUE,
     xlab = "s", ylab = "Density", main = "ABC posterior", 
      cex.axis = 1.5, cex.lab = 1.5)
points(density(ABC$ABCSamp+1), type = "l", lwd = 2, col = "blue")
box()

# Alternative methods for finding the mode (AMLE)
asselin(ABC$ABCSamp + 1)

## [1] 3.738178

parzen(ABC$ABCSamp + 1)

## [1] 3.752353

Comparison

#------------------------------------------------------------------------------------
# Comparison
#------------------------------------------------------------------------------------
cbind(s_MLE, s_MLE2, s_MME, s_AMLE)

##         s_MLE   s_MLE2    s_MME   s_AMLE
## [1,] 3.725665 3.736972 3.781259 3.751683

Rubio, F. J., and A. M. Johansen. 2013. “A Simple Approach to Maximum Intractable Likelihood Estimation.” Electronic Journal of Statistics 7: 1632–54.

Estimation of the shape parameter of the Zeta Distribution

F. Javier Rubio

19 December, 2023