Pareto distribution

Introduction

R provides functions that return useful characteristics of many common probability distributions. The naming convention for these functions is a prefix, which identifies what the function does, followed by an abbreviation of the probability distribution’s name. These prefixes are:

  • p for “probability”, the cumulative distribution function (CDF)
  • q for “quantile”, the inverse CDF
  • d for “density”, the density function (PDF)
  • r for “random”, a random variable having the specified distribution.

For the normal distribution, these functions are pnorm, qnorm, dnorm, and rnorm, where the norm portion reminds us this is for the normal distribution. For the binomial distribution, these functions are pbinom, qbinom, dbinom, and rbinom. Click here for a list of probability distributions in base R. By “base R” it means not part of a package and immediately available once R is open.

The Pareto distribution is not available in base R, so we’re going to code it ourselves. For this in-class assignment, we’ll just code the quantile function, i.e., qpareto. Here’s a bit of background on deriving the Pareto’s quantile function (you don’t need to understand the subsequent PDF or CDF details, but pay close attention to the definition of quantile function \(Q(p)\)).

The Pareto family of distributions is parameterized by \(\alpha\) and \(x_0\) and has probability density function \[ f(x) = \begin{cases} \frac{(\alpha - 1)x_0^{\alpha - 1}}{x^{\alpha}}, &x > x_0,\\ 0, &x \leq x_0. \end{cases} \]

From the PDF it is relatively easy to compute the CDF, which is given by \[ F(x) = \begin{cases} 0 & x < x_0\\ 1 - \left(\frac{x_0}{x} \right)^{\alpha - 1} & x \geq x_0. \end{cases} \]

The quantile function is defined for \(0 \le p \le 1\), and it returns the value \(x_p\) such that \(F(x_p) = p\). For the Pareto distribution, the quantile function is given by \[ Q(p) = Q(p, \alpha, x_0) = {x_0}{(1-p)^{-\frac{1}{\alpha - 1}}}. \]

Using the definition of \(Q(p)\), we can compute the \(p\)th quantile for specific values of \(p\). For example, here are the medians (\(0.5\) quantiles) of Pareto distributions with \(x_0 = 1, \alpha = 3.5\); \(x_0 = 6\times 10^8, \alpha = 2.34\); and the \(0.92\) quantile of the Pareto distribution with \(x_0 = 1\times 10^6, \alpha = 2.5\).

1 * (1 - 0.5) ^ (-1 / (3.5 - 1))
6e8 * (1 - 0.5) ^ (-1 / (2.34 - 1))
1e6 * (1 - 0.92) ^ (-1 / (2.5 - 1))
[1] 1.319508
[1] 1006469227
[1] 5386087

It would be helpful to have a function that automated this process, both so we don’t have to remember the form of the quantile function for the Pareto distribution and so we avoid making mistakes.

We will build our function, qpareto, in a sequence of steps.

Step 1

qpareto.1()

Write a function called qpareto.1() that takes arguments p, alpha, and x0 and returns \(Q(p, \alpha, x_0)\) as defined above. Check to make sure your function returns the same answers as the three below.

qpareto.1(p = 0.5, alpha = 3.5, x0 = 1)
qpareto.1(p = 0.5, alpha = 2.34, x0 = 6e8)
qpareto.1(p = 0.92, alpha = 2.5, x0 = 1e6)
[1] 1.319508
[1] 1006469227
[1] 5386087

Hints

  • Recall R’s function syntax:
function_name <- function(arg_1, arg_2, arg_3) {
  # code that does the work goes here
  return(r.object)
}

Step 2

qpareto.2()

Most of the quantile functions in R have an argument lower.tail that is either TRUE or FALSE. If TRUE, the function returns the \(p\)th quantile. If FALSE, the function returns the \((1-p)\)th quantile, i.e., returns the value \(x_p\) such that \(F(x_p) = 1 - p\).

Create a function qpareto.2() that has an additional argument lower.tail which is by default set to TRUE. Your qpareto.2 function should test whether lower.tail is FALSE. If it is FALSE, the function should replace \(p\) by \(1-p\). Then pass either \(p\) or \(1-p\) to qpareto.1() to compute the appropriate quantile, i.e., qpareto.1() is called from inside of qpareto.2(). Check to make sure your function returns the same answers as the two below.

qpareto.2(p = 0.5, alpha = 3.5, x0 = 1)
qpareto.2(p = 0.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)
[1] 1.319508
[1] 5386087

There is a downside to writing the function the way we have. We need qpareto.1() to be in the workspace when qpareto.2() is called, but there is a big advantage. If we discover a better way to calculate quantiles of the Pareto distribution, we can rewrite qpareto.1() and the new version will automatically be used in qpareto.2().

Hints

  • Recall the if statement structure:
if (condition) {
  # code to be run when condition is TRUE
}

Step 3

qpareto()

Next, let’s add some check with regards to the function’s arguments. In the case of the Pareto quantile function, we need \(0\leq p\leq 1\), \(\alpha > 1\), and \(x_0 > 0\). We can use several if statements and stop functions to check arguments and to display error messages. Another option is to use the stopifnot() function. This function evaluates each of the expressions given as arguments. If any are not TRUE, the stop function is called, and a message is printed about the first untrue statement.

Write a function named qpareto() that adds a stopifnot() statement to the qpareto.2() function. The stopifnot() statement should check the validity of the three arguments p, x0, and alpha.

R Markdown will not compile if your R function stops due to the stopifnot() function. You can, and should, tell the offending R code chunks to ignore the stop call, by including error=TRUE as a chunk option.

Check to make sure your function returns the same answers as the five below. Remember to set the chunk option error=TRUE so your document will knit.

qpareto(p = 0.5, alpha = 3.5, x0 = 1)
[1] 1.319508
qpareto(p = 0.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)
[1] 5386087
qpareto(p = 1.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)
Error in qpareto(p = 1.08, alpha = 2.5, x0 = 1e+06, lower.tail = FALSE): p <= 1 is not TRUE
qpareto(p = 0.5, alpha = 0.5, x0 = -4)
Error in qpareto(p = 0.5, alpha = 0.5, x0 = -4): alpha > 1 is not TRUE
qpareto(p = 0.5, alpha = 2, x0 = -4)
Error in qpareto(p = 0.5, alpha = 2, x0 = -4): x0 > 0 is not TRUE

Hints

Below is an example of a function that uses a stop-if-not command.

ff <- function(p, y, z){
  stopifnot(p > 0, p < 1, y < z)
  return(c(p, y, z))
}
ff(p = 0.5, y = 3, z = 5)
[1] 0.5 3.0 5.0
ff(p = -1, y = 3, z = 5)
Error in ff(p = -1, y = 3, z = 5): p > 0 is not TRUE
ff(p = -1, y = 3, z = 2)
Error in ff(p = -1, y = 3, z = 2): p > 0 is not TRUE
ff(p = 2, y = 3, z = 5)
Error in ff(p = 2, y = 3, z = 5): p < 1 is not TRUE
ff(p = 0.5, y = 3, z = 2)
Error in ff(p = 0.5, y = 3, z = 2): y < z is not TRUE

Function find_cos_integer()

Write a function called find_cos_integer() which takes a value \(x\) between \(-1\) and \(1\) (inclusive) as an input, and it returns the smallest nonnegative integer \(k\) satisfying the inequality \(|\cos(k) - x| < 0.001\). Here are a few examples:

Try to include code which stops and returns a message if the user specifies an argument \(x\) which is either less than \(-1\) or greater than \(1\). Failure to do this may result in an infinite loop.

Below are some examples.

find_cos_integer(0.1)
find_cos_integer(0)
find_cos_integer(1)
[1] 7202
[1] 40459
[1] 0