HW3: Point Estimates in Statistics

2024-01-30

What is a Point Estimate?

A point estimate is a single value of a statistic. They are:

Derived from a sample of a population
Used to estimate population parameters (mean, variance, etc.)
Found by utilizing a point estimator

Point estimates are useful to derive “best-guess” insights when given a partial dataset.

Point Estimate vs Point Estimator

A point estimator is the method or formula used to find a point estimate, the result.

Example: Given this sample: x1 = 48.9, x2 = 47.0, x3 = 55.2, x4 = 44.9 We use the sample mean formula to estimate the population mean. \[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{1}{4} (x_1 + x_2 + x_3 + x_4) \\= \frac{1}{4} (48.9 + 47.0 + 55.2 + 44.9) = \frac{1}{4} \cdot 196.0 = 49.0 \] Our point estimator, the formula for sample mean, gives us an answer of 49.0, our point estimate.

Point Estimates on a Graph

Below is a plot of our sample including the calculated sample mean, highlighted in red here. Compared to the sample, this value perfectly shows the mean. Next, we’ll look at how to make a point estimate more or less accurate to its population statistic.

Changing the Sample Size

Point estimates, as one could imagine, become more accurate when given a larger sample to work with. Shown below, the left plot is given three sample values, and the right plot is given 20.

Notice how the right plot’s sample mean is visually closer to the population mean, shown below in R code.

## [1] 11.08

Accuracy and Sample Sizes

In general, larger sample size gives more confidence in the point estimates we generate. Let’s create two point estimates of mean containing an extreme outlier with one set of 3 values and one set of 6 values.

Set 1: \[ \frac{1}{3} (24.6 + 22.5 + 98.7) = 48.6 \] Set 2: \[ \frac{1}{6} (22.3 + 18.7 + 24.9 + 23.2 + 21.3 + 98.7) = 34.85 \]

Set 2, containing more values, is less impacted by the outlier (98.7) than set 1.

Other Point Estimates/Estimators

The sample mean point estimator is a great way to estimate the mean of a population, but not the only one.

We could also use:

Sample median \[ \text{Median}(x) = \begin{cases} x_{\left(\frac{n+1}{2}\right)}, & \text{if } n \text{ is odd} \\ \frac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2}+1\right)}}{2}, & \text{if } n \text{ is even} \end{cases} \]
Averaging the min and max of the sample \[ \text{Average of min and max} = \frac{\max(x_i) - \min(x_i)}{2} \]

Recap

Point estimates are a great tool when we don’t have access to a full population’s data. Using statistic formulas as point estimators, we can estimate the values and make inferences on how the rest of a dataset would behave.

Some things to remember:

Larger samples = more accurate point estimates
There can be multiple ways to estimate a population statistic, experiment with all
Analyze data for outliers first; these can throw off point estimates drastically