2023-04-13

Point Estimate with Confidence Interval

In this presentation, we will calculate the Point Estimate and Confidence Interval for the weight of men and women in the sample population given by the “heights” data set from the modelr package. For this scenario, we will assume this data set is a sample of all American men and women. For reference, the first five entries of the data set have been displayed below.

income height weight age marital sex education afqt
19000 60 155 53 married female 13 6.841
35000 70 156 51 married female 10 49.444
105000 65 195 52 married male 16 99.393
40000 63 197 54 married female 14 44.022
75000 66 190 49 married male 14 59.683
102000 68 200 49 divorced female 18 98.798

Calculating the Point Estimate of the Population Mean

The point estimate of the sample mean can be calculated using the equation below. The \(n\) in this equation will be the sample size, while \(x^i\) represents the sample size of your data set. We will use the female entries of the data set to demonstrate this process. Our final result shows that the point estimate for female height is approximately 64 inches. \[{\tiny\text{X}=\frac{\sum_{i = 1}^{n} x^i}{n}}\]

\[ {\tiny = \frac{60}{3604} + \frac{70}{3604} + \frac{63}{3604} + \frac{68}{3604} + \text{...}} \]

\[ {\tiny = \frac{231634}{3604}} \]

\[ {\tiny \text{~}64.2713} \]

Point Estimate R-Code

However, performing such arithmetic on a data collection with thousands of entries may be impossible. In that case, you could try using R to calculate those values for you. The R-Code below first generates two tables filtered by “sex,” then generates a data frame including the total number of rows (sample size) as well as the mean (point estimate) of each table. Then, using the gt library, we can generate a readable table of our results.

sample_men <- heights %>%
  filter(sex == "male") 
sample_women <- heights %>%
  filter(sex == "female") 
total_sample <- data.frame(Gender = c("Male", "Female"),
                           "Mean Height" = c(mean(sample_men$height), 
                                             mean(sample_women$height)),
                           "Sample Size" = c(nrow(sample_men), nrow(sample_women)))
colnames(total_sample) <- c("Gender", "Point Estimate", "Sample Size")
total_sample %>% gt()
Gender Point Estimate Sample Size
Male 70.10553 3402
Female 64.27137 3604

Point Estimate of Female Height (Histogram)

An example of how we may visualize this data is by creating a histogram from our data set. The point estimate for the height of the females in the sample data set, as visible by the dashed marker in the histogram, is roughly 64 inches .

Point Estimate of Male and Female Height (Box Plot)

A side-by-side comparison of male and female height measurements is shown in the box plot below. The blue “dot” on both plots indicates the mean height, which was identified using the ggplot stat_summary mean function.

Confidence Interval

We may now calculate a 95% Confidence Interval with our point estimates for both sample groups. The Confidence Interval gives values that, if the experiment were repeated, you may anticipate your results to fall inside. The formula is as follows: \(\text{x}\) represents our point estimate, \(\text{n}\) represents the sample size, \(\text{z}\) represents our level of confidence, and \(\text{s}\) represents our standard deviation. Our standard deviation has been calculated using the base R sd() funtion, and our \(\text{z}\) values have been taken from the standard z-table. \[\tiny{\text{Confidence Interval} = \text{x}\pm\text{z}\frac{s}{\sqrt{n}}}\] \[\tiny{\text{Male CI} = 70.1\pm\text{1.96}\frac{2.99}{\sqrt{3402}}}\] \[\tiny{\text{Male CI} = {70 - 70.2}}\] \[\tiny{\text{Female CI} = 64.3\pm\text{1.96}\frac{2.72}{\sqrt{3604}}}\] \[\tiny{\text{Female CI} = {64.2-64.4}}\]

## # A tibble: 2 × 2
##   sex    Standard
##   <fct>     <dbl>
## 1 male       2.99
## 2 female     2.72

Confidence Interval (Plotly)

This plot depicts the heights of both men and women along with the 95% Confidence Interval we calculated. The interval’s margin is incredibly narrow, making it challenging to see. Zooming into the interactive graphic is recomended for a clearer view of this parameter.

Citations

Engel, C. (2019, June 24). Data visualization with R. Data Visualization with R. Retrieved April 11, 2023, from https://cengel.github.io/R-data-viz/ .

Sievert, C. (2019, December 19). Interactive web-based data visualization with R, plotly, and shiny. Retrieved April 12, 2023, from https://plotly-r.com/

Sullivan, L. (n.d.). Confidence Intervals. Boston University School of Public Health. Retrieved April 12, 2023 from https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_confidence_intervals/bs704_confidence_intervals_print.html