Point Estimation

2025-03-14

Point Estimation

Point estimation is used to estimate the mean, variance, and proportion of a population when it would take too long to test the entire population.

By testing a sample of the population we can quickly get a good estimation for these values.

Selecting a sample

It is important to select a sample that accurately describes the population as a whole.

For Example, if we test average height and pick a sample of the tallest people the mean estimation will be much higher than the actual mean.

Estimating Population Mean

The mean represents the average of the values in a data set.

Mean is calculated by summing all values of the sample and dividing by the total number of values in the sample.

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]

The sample mean of mpg of mtcars (21.03) is very close to the actual mean (20.090625) despite only taking a sample of 10 cars.

Variance

Variance is the measurement of the spread between values in a data set. Higher variance means more spread apart and lower means closer together.

Variance is calculated by finding the mean, subtracting it from each value, squaring the result, summing the squared values, and dividing by the number of values minus 1.

The sample variance of mpg of mtcars (46.6067778) is close to the actual variance (36.3241028) despite only taking a sample of 10 cars.

Proportion

The proportion is the ratio of values that have a certain characteristic for a data set.

Proportion is calculated by diving the number of values that have the characteristic by the total number of values.

\[ \hat{p} = \frac{x}{n} \]

The sample proportion of species=setosa of data set iris (0.3428571) is very close to the actual proportion (0.3333333) despite only taking a sample of 10 cars.

Mean Example

ggplot(mtcars, aes(x=mpg)) +
  geom_histogram(binwidth=3,fill='blue', color='black') +
  geom_vline(aes(xintercept = mpg_mean), color='red', linewidth = 2) +
  geom_vline(aes(xintercept = mpg_sample_mean), color='purple', linewidth = 2) +
  labs(title='sample mean vs actual mean', x='mpg', y='count') +
  annotate("text", x=mpg_mean -0.5, y = 7.2, label="Actual Mean", color='red', angle=90) +
  annotate("text", x=mpg_sample_mean -0.5, y = 7.2, label="Sample Mean", color='purple', angle=90)