2024-02-04

Introduction

  • Lets understand Point Estimation in Statistics.

  • We’ll explore how to estimate unknown parameters from sample data.

  • Point estimators are functions that are used to find an approximate value of a population parameter using random samples of the population.

  • The size of the sample decides the accuracy of the estimate.

    • Larger the sample size the more accurate the estimate.

Population and Sample

  • Population Parameter: Denoted by \(\theta\) (unknown)
    • Represents a characteristic of the entire population.
    • Typically unknown and needs to be estimated.
  • Sample Statistic: Estimated by \(\hat{\theta}\)
    • An estimate of the population parameter.
    • Obtained from the sample data.

Estimating the Mean (Eqn in Latex)

  • For the sample mean \(\bar{X}\), the point estimator is \(\mu\).

  • \[ \hat{\mu} = \bar{X} = \frac{\sum_{i=1}^{n} X_i}{n} \]

Estimating Variance (Eqn in Latex)

  • The point estimator for sample variance \(S^2\) is \(\sigma^2\).

  • \[ \hat{\sigma}^2 = S^2 = \frac{\sum_{i=1}^{n} (X_i - \bar{X})^2}{n-1} \]

Distribution of Sample and its Mean

Confidence Intervals (Eqn in Latex)

  • Confidence intervals has a range of likely values for the estimate.

  • \[ \text{{Confidence Interval: }} \hat{\theta} \pm \text{{Margin of Error}} \]

    Point Estimate Interval Estimate

Point Estimate of Population Example

Lets estimate the proportion of computer owners in a certain city that use antivirus. We survey a random sample of 20 citizens.

  • The calculated sample proportion = 0.6

Problem: Estimating Pollution Level

Lets estimate the average concentration of a harmful pollutant, \(\mu_{\text{pollutant}}\), in the air to implement mitigation strategies.

The first step will be to collect the data from various monitoring stations

Then we estimate the average concentration of the pollutant based on the sample data

Lets create a 3D scatter plot and also has estimated mean concentration

Visualization of the Data and point estimate

R code for the 3D Plot Before

library(plotly)

# Creating a hypothetical dataset 
set.seed(123)
monitoring_data <- data.frame(
  Longitude = rnorm(100, mean = 12, sd = 2),
  Latitude = rnorm(100, mean = 34, sd = 1),
  Pollutant_Concentration = rnorm(100, mean = 25, sd = 5)
)

# Calculating the point estimate for population
point_estimate <- mean(monitoring_data$Pollutant_Concentration)

plot_ly(monitoring_data, 
        x = ~Longitude, 
        y = ~Latitude, 
        z = ~Pollutant_Concentration,
        color = ~Pollutant_Concentration,
        size = ~Pollutant_Concentration,
        type = "scatter3d",
        mode = "markers",
        marker = list(colorbar = list(title = "Concentration"),
                      line = list(color = "red", width = 2),
                      size = 5),
        text = ~paste("Concentration: ", round(Pollutant_Concentration, 2)),
        showlegend = FALSE) %>%
  add_trace(x = mean(monitoring_data$Longitude),
            y = mean(monitoring_data$Latitude),
            z = mean(monitoring_data$Pollutant_Concentration),
            type = 'scatter3d',
            mode = 'markers',
            marker = list(color = "red", size = 7, symbol = 4),
            text = "Point Estimate") %>%
  colorbar(title = "Concentration", colors = 'Viridis') %>%
  layout(scene = list(title = "3D Scatter Plot - Pollutant Concentration Across the City",
                      xaxis = list(title = "Longitude"),
                      yaxis = list(title = "Latitude"),
                      zaxis = list(title = "Pollutant Concentration")),
         margin = list(l = 0, r = 0, b = 0, t = 0))
## Warning: `line.width` does not currently support multiple values.

## Warning: `line.width` does not currently support multiple values.

Additional Related Formulas (Eqn in Latex)

Thank You!