Order Statistics with Applications to Basketball Data

Nicholas Burke

26 March 2019

Abstract

This Graduate project will discuss the topic of order statistics. Order statistics can provide efficient linear unbiased estimates of parameters, such as mean and standard deviation. Thus it can be used various aspects of life such as health care, finance and sports. This project will include a historical review of order statistics, some definitions as well as some theoretical properties to help further elaborate on this topic. This project will also included an application with respect to basketball data, and a simulation.

Outline

  • What are Order Statistics
  • Density Functions
  • Probability Distributions
  • An Example in Basketball Analytics

How is it used in Basketball?

  • Coaching
  • Rotations, play calling

  • Player Ranking
  • Leader boards in points, rebounds and assists

  • Decision Making
  • Allocation of funds on a roster

What are Order Statistics?

Consider a set \(X_1,X_2,...,X_n\) of independent and identically distributed (continuous) random variables

Let \(X_{(r)}\) denote the \(r^{th}\) smallest of \(X_1,X_2,...,X_n\) . The random variables \(X_{(1)},X_{(2)},...,X_{(n)}\) are called the order statistics and \(X_{(r)}\) the \(r^{th}\) order statistic, \(r=1,2,...,n\)

For the order statistics we thus have that \(\\\)

\(X_{(1)}\leq X_{(2)}\leq\cdots\leq X_{(n)}\)

\(\\\)

Example

Let us consider the set of numbers

\(6, 19, 1, 7, 15.\)

\(\\\)

In general, they are represented by \(x_i\):

\(x_1=6, x_2=19,x_3=1, x_4=7, x_5=15\).

\(\\\)

Then, the order statistics of this data would be:

\(x_{(1)}=1, x_{(2)}=6,x_{(3)}=7, x_{(4)}=15, x_{(5)}=19\).

Order Statistics

It is of natural interest to find the joint probability distribution of these ordered random variables, and we will begin by finding the marginal probability distributions of the extremes, that is

  • \(X_{(1)}\) denoted as the first-order statistic
\(X_{(1)}=min(X_1,...,X_n)\)

\(\\\)

  • \(X_{(n)}\) is the \(n^{th}\) order statistic \(\\\)
\(X_{(n)}=max(X_1,...,X_n)\)

The probability distribution of \(X_{(1)}\)

From the common distribution function \(F(x)\) it follows that

\(F_{(1)}(x)\)\(=Pr\{X_{(1)} \leq x\}\)

\(=Pr[min(X_1,X_2,...,X_n)\leq x]\)

\(=1-Pr[min(X_1,X_2,...,X_n)> x]]\)

\(=1-(Pr(X_1> x,X_2> x,...,X_n> x))\)

\(=1-(Pr(X_1> x)Pr(X_2> x)\cdots Pr(X_n> x))\)

\(=1-(1-F(x))^n\)

The probability distribution of \(X_{(n)}\)

From the common distribution function \(F(x)\) it follows that

\(F_{(n)}(x)\)\(=Pr\{X_{(n)} \leq x\}\)

\(=Pr(max(X_1,X_2,...,X_n)\leq x)\)

\(=Pr(X_1\leq x,X_2\leq x,...,X_n\leq x)\)

\(=Pr(X_1\leq x)Pr(X_2\leq x)\cdots Pr(X_n\leq x)\)

\(=[Pr(X_1\leq x)]^n\)

\(=F^n(x)\)

The probability distribution of \(X_{(r)}\)

We now generalize and look at the marginal probability distribution of \(X_{(r)}\) ;i.e the \(r^{th}\) order statistic, \(r=1,2,..n\)

Consider a set \(X_{(1)},...,X_{(n)}\) of independent and identically distributed (continuous) random variables with density function \(f(x)\) and distribution function \(F(x)\). For \(r=1,2,..,n\) the density of \(X_{(r)}\) is given by \(\\\)

\(f_{(r)}(x)=\frac{n!}{(r-1)!(n-r)!}(F^{r-1}(x))(1-F(x))^{n-r}f(x)\)
Thus,
\(F_{(r)}(x)=Pr(X_{(r)}\leq x)=\int_{-\infty}^{\infty}f_{(r)}(s)ds\)

The joint distribution of \(X_{(1)}\) and \(X_{(n)}\)

Consider a set \(X_1,X_2,...,X_n\) of independent and identically distributed (continuous) random variables with density function \(f(x)\) and distribution function \(F(x)\). The joint density of the extremes is given by

\(f_{1,n}(x,y)=n(n-1)(F(y)-F(x))^{n-2}f(x)f(y), x\leq y\)

The probability distribution of the Range

  • The sample range is the difference between the maximum and minimum values; i.e \(\\\)
\(R_n=Range(X_1,...,X_n)=X_{(n)}-X_{(1)}\)

\(\\\)

The density of \(R_n\) is then given by

\(f_{R_n}(r)=n(n-1)\int_{-\infty}^{\infty} (F(r+u)-F(u))^{n-2}f(u)f(r+u)du\)

for \(r\geq 0\)

The joint distribution of the order statistics

Consider a set \(X_1,X_3,...,X_n\) of independent and identically distributed (continuous) random variables with density function f(x) and distribution function \(F(x)\). The joint density of the order statistic is given by

\(f_{X_{(1)},...,X_{(n)}}(y_1,...,y_n) =n!f(y_1)f(y_2)\cdots f(y_n)\)

for \(-\infty<y_1\leq y_2<\cdots\leq y_n<\infty\)

The joint distribution of \(X_{(r)}\) and \(X_{(s)}\)

Based on the previous assumptions, the joint distribution of \(X_{(r)}\) and \(X_{(s)}\) \((1\leq r < s \leq n)\) is given by

\(f_{rs}(x,y)=C_{rs}F^{r-1}(x)f(x)[F(y)-F(x)]^{s-r-1}f(y)[1-F(y)]^{n-s}\)

\(\\\) where
\(C_{rs}=\frac{n!}{(r-1)!(s-r-1)!(n-s)!}\)

Formula for \(E(X_{(r)})\)

  • For the continuous case, we have \(\\\)
\(E_r=E(X_{(r)})\int_{-\infty}^{\infty}xf_r(x)dx\)

\(\\\)

  • An expanded formula for \(E_r\)
\(E_r=\frac{n!}{(r-1)!(n-r)!}\int_{-\infty}^{\infty}xF^{r-1}(x)f(x)[1-F(x)]^{n-r}dx\)

Example: Free Throws

In a basketball league they have tracked free throw attempts of each player over their career. A player can shoot thousands free throws over this time. Each player is independent of one another and are modeled by a continuous uniform distribution on (0,10)

5 random players are selected randomly.

What is the probability that the minimum number of free throw attempts is between 2000-6000?

What is the expected value for the maximum amount of free throw attempts?

Example: Free Throws

Distribution of \(X_1\)

The cdf is \(F(x)=\frac{x}{10}\) where \(0<x<10\)

\(F_{X_{(1)}}(x)\)\(=Pr(X_{(1)} \leq x)\)

\(=1-(1-F(x))^5\)

\(=1-(1-\frac{x}{10})^5\), \(0<x<10\)

Example: Free Throws

\(P(X_{(1)}\) is between 2 and 6\()=P(2<X_{(1)}<6)\)

\(=P(X_{(1)} \leq 6)-P(X_{(1)} \leq 2)\)

\(=[1-(1-\frac{6}{10})^5]-[1-(1-\frac{2}{10})^5]\)

\(=(1-\frac{2}{10})^5-(1-\frac{6}{10})^5\)

\(=0.8^5-0.4^5\)

\(=0.32\)

A 32% probability the minimum free attempts will be between 2000 and 6000

Example: Free Throws

\(F_{X_{(5)}}(x)\)=\(P(X_{(5)} \leq x )\)

\(=F^5(x)\)

\(=(\frac{x}{10})^5\)

\(f_{X_{(5)}}(x)\)\(=5(\frac{x}{10})^4\frac{1}{10}\)

\(=\frac{x^4}{2000^4}\)

\(E(X_5)=\)\(\int_0^{10}\frac{1}{2*10^4}x^4dx\)

\(=8.33\)

The expected max number of free throws attempts is 8330.

Referencess

  • Applications of Order Statistics in Health Data by Bernard G. Greenberg and Ahmed E. Sarhan
  • Computational Order Statistics by Colin Rose and Murray D. Smith
  • Order Statistics by H. A. Davis