understanding poit estimates(method of maximum likelihood vs method of moments) in one example

Example
Thinking
1. method of maximum likelihood
- 1.1 steps
- 1.2 Applied in this example
2. method of moments
- 2.1 基本思想：方程组求解
- 2.2 Applied in this example
3. Sufficient Statistics

Example

To estimate the weight of women gradutes(normally distributed), we have taken some samples randomly (weighted in pounds):

125 132 120 137 123 159 113 165 143 176

Q: what is the population weight of woment graduates?

Thinking

Our intuition tells us to take the average of the sample to estimate the population mean. And this is correct.

But why this works? Because the sample mean is the best guess to the population mean, or in other words, the sample mean maximize the possibility(or “likelyhood”) of getting the population mean.-This is the basic idea of the “method of maximum likelihood”. And at application, we need to prove it mathmatically

Why do we need “method of moments” then? Because sometimes it is too difficult to calculate through the maximum likelihood method.

直觉是用样本平均值估计总体体重。

为啥管用？因为用样本均值估计总体体重，猜对的可能性最大。——这个就是maximum likelihood的理论基础。

1. method of maximum likelihood

1.1 steps

1.写出密度函数f(xi;θ)=…

2.构建联合密度函数L(θ)=f(x1;θ)f(x2;θ)…f(xn;θ)

3.L(θ)求导=0，求出θ值

1.2 Applied in this example

构建 probability density function of Xi

\(\begin{equation*} f(x_i;\theta_1,\theta_2)=\dfrac{1}{\sqrt{\theta_2}\sqrt{2\pi}}\text{exp}\left[-\dfrac{(x_i-\theta_1)^2}{2\theta_2}\right] \end{equation*}\)

2.构建 the joint probability mass (ordensity) function of X1, X2, …,Xn

\(\begin{equation*} L(\theta_1,\theta_2)=\prod\limits_{i=1}^n f(x_i;\theta_1,\theta_2)=\theta^{-n/2}_2(2\pi)^{-n/2}\text{exp}\left[-\dfrac{1}{2\theta_2}\sum\limits_{i=1}^n(x_i-\theta_1)^2\right] \end{equation*}\)

3.求导

先ln

\(\begin{equation*} \text{log} L(\theta_1,\theta_2)=-\dfrac{n}{2}\text{log}\theta_2-\dfrac{n}{2}\text{log}(2\pi)-\dfrac{\sum(x_i-\theta_1)^2}{2\theta_2} \end{equation*}\)

再对 θ1和θ2分别求偏导

求导θ1得到 \(\begin{equation*} \sum x_i-n\theta_1=0 \end{equation*}\)

所以：

\[\begin{equation*} \hat{\theta}_1=\hat{\mu}=\dfrac{\sum x_i}{n}=\bar{x} \end{equation*}\]

求导θ2得到 \(\begin{equation*} -n\theta_2+\sum(x_i-\theta_1)^2=0 \end{equation*}\)

所以：

\[\begin{equation*} \hat{\theta}_2=\hat{\sigma}^2=\dfrac{\sum(x_i-\bar{x})^2}{n} \end{equation*}\]

2. method of moments

2.1 基本思想：方程组求解

把参数当作未知数，构建方程求解。1个参数就构建1个方程，2个参数就构建2个方程，以此类推。

2.1.1 构建方程的方法一：

第一个方程\(\begin{equation*} E(X) \end{equation*}\),记作 \(\begin{equation*} M1=E(X)=\mu=\dfrac{1}{n}\sum\limits_{i=1}^n X_i \end{equation*}\)

第二个方程\(\begin{equation*} E(X^2) \end{equation*}\),记作 \(\begin{equation*} M2=E(X^2)=\sigma^2+\mu^2=\dfrac{1}{n}\sum\limits_{i=1}^n X_i^2 \end{equation*}\)

以此类推

第k个方程 \(\begin{equation*} E[X^k] \end{equation*}\),记作\(\begin{equation*} M^k \end{equation*}\)

2.1.2 构建方程的方法二：

第一个方程\(\begin{equation*} E(X) \end{equation*}\),记作 \(\begin{equation*} M_1=\dfrac{1}{n}\sum\limits_{i=1}^n X_i=\bar{X} \end{equation*}\)

第二个方程\(\begin{equation*} E[(X-\mu)^2] \end{equation*}\),记作\(\begin{equation*} M_2^\ast=\dfrac{1}{n}\sum\limits_{i=1}^n (X_i-\bar{X})^2 \end{equation*}\)

以此类推

第k个方程\(\begin{equation*} E[(X-\mu)^k] \end{equation*}\),记作\(\begin{equation*} M^\ast_k \end{equation*}\)

2.2 Applied in this example

有两个参数需估计，所以2个方程就够了：

2.2.1 方法一：

第一个方程为\(\begin{equation*} E(X)=\mu=\dfrac{1}{n}\sum\limits_{i=1}^n X_i \end{equation*}\)

求得u的method of moments方法的估计值：\(\begin{equation*} \hat{\mu}_{MM}=\dfrac{1}{n}\sum\limits_{i=1}^n X_i=\bar{X} \end{equation*}\)

第二个方程为\(\begin{equation*} E(X^2)=\sigma^2+\mu^2=\dfrac{1}{n}\sum\limits_{i=1}^n X_i^2 \end{equation*}\)

用它求第二个参数： \(\begin{equation*} \hat{\sigma}^2_{MM}=\dfrac{1}{n}\sum\limits_{i=1}^n X_i^2-\mu^2=\dfrac{1}{n}\sum\limits_{i=1}^n X_i^2-\bar{X}^2 \end{equation*}\)

即：

\(\begin{equation*} \hat{\sigma}^2_{MM}=\dfrac{1}{n}\sum\limits_{i=1}^n( X_i-\bar{X})^2 \end{equation*}\)

2.2.2 方法二：

求u的方法同一

第一个方程为\(\begin{equation*} E(X)=\mu=\dfrac{1}{n}\sum\limits_{i=1}^n X_i \end{equation*}\)

求得u的method of moments方法的估计值：\(\begin{equation*} \hat{\mu}_{MM}=\dfrac{1}{n}\sum\limits_{i=1}^n X_i=\bar{X} \end{equation*}\)

第二个方程为 \(\begin{equation*} Var(X_i)=E(x_i-μ)^2=σ^2 \end{equation*}\)

求得σ2的method of moments方法的估计值：\(\begin{equation*} \hat{\sigma}^2_{MM}=\dfrac{1}{n}\sum\limits_{i=1}^n (X_i-\bar{X})^2 \end{equation*}\)

3. Sufficient Statistics

含义：Knowing the value of A is equivalent to knowing the value of B, and hence A is also sufficient for B.

If A is a single-valued function of B with a single-valued inverse.

If the conditional distribution Y does not depend on parameter p, then Y is a sufficient statistic for p.

3.1 Factorization Theorem

已知f(x1, x2, …, xn; θ)取决于 θ。如果这个函数能分解成两部分，则说它is sufficient for θ ，即： \(\begin{equation*} f(x_1, x_2, ... , x_n;\theta) = \phi [ u(x_1, x_2, ... , x_n);\theta ] h(x_1, x_2, ... , x_n) \end{equation*}\)

3.2 Exponential Form——第一选择

如果f(x;θ)能写成这种形式： \(\begin{equation*} f(x;\theta) =exp\left[K(x)p(\theta) + S(x) + q(\theta) \right] \end{equation*}\)

那么 \(\begin{equation*} \sum_{i=1}^{n} K(X_i) \end{equation*}\) is sufficient for θ.