Statistical Tools: Point Estimate and Confidence Interval

2023-04-13

Point Estimate with Confidence Interval

In this presentation, we will calculate the Point Estimate and Confidence Interval for the weight of men and women in the sample population given by the “heights” data set from the modelr package. For this scenario, we will assume this data set is a sample of all American men and women. For reference, the first five entries of the data set have been displayed below.

income	height	weight	age	marital	sex	education	afqt
19000	60	155	53	married	female	13	6.841
35000	70	156	51	married	female	10	49.444
105000	65	195	52	married	male	16	99.393
40000	63	197	54	married	female	14	44.022
75000	66	190	49	married	male	14	59.683
102000	68	200	49	divorced	female	18	98.798

Calculating the Point Estimate of the Population Mean

The point estimate of the sample mean can be calculated using the equation below. The \(n\) in this equation will be the sample size, while \(x^i\) represents the sample size of your data set. We will use the female entries of the data set to demonstrate this process. Our final result shows that the point estimate for female height is approximately 64 inches. \[{\tiny\text{X}=\frac{\sum_{i = 1}^{n} x^i}{n}}\]

\[ {\tiny = \frac{60}{3604} + \frac{70}{3604} + \frac{63}{3604} + \frac{68}{3604} + \text{...}} \]

\[ {\tiny = \frac{231634}{3604}} \]

\[ {\tiny \text{~}64.2713} \]

Point Estimate R-Code

However, performing such arithmetic on a data collection with thousands of entries may be impossible. In that case, you could try using R to calculate those values for you. The R-Code below first generates two tables filtered by “sex,” then generates a data frame including the total number of rows (sample size) as well as the mean (point estimate) of each table. Then, using the gt library, we can generate a readable table of our results.

sample_men <- heights %>%
  filter(sex == "male") 
sample_women <- heights %>%
  filter(sex == "female") 
total_sample <- data.frame(Gender = c("Male", "Female"),
                           "Mean Height" = c(mean(sample_men$height), 
                                             mean(sample_women$height)),
                           "Sample Size" = c(nrow(sample_men), nrow(sample_women)))
colnames(total_sample) <- c("Gender", "Point Estimate", "Sample Size")
total_sample %>% gt()

Gender	Point Estimate	Sample Size
Male	70.10553	3402
Female	64.27137	3604

Point Estimate of Female Height (Histogram)

An example of how we may visualize this data is by creating a histogram from our data set. The point estimate for the height of the females in the sample data set, as visible by the dashed marker in the histogram, is roughly 64 inches .

Point Estimate of Male and Female Height (Box Plot)

A side-by-side comparison of male and female height measurements is shown in the box plot below. The blue “dot” on both plots indicates the mean height, which was identified using the ggplot stat_summary mean function.

Confidence Interval

We may now calculate a 95% Confidence Interval with our point estimates for both sample groups. The Confidence Interval gives values that, if the experiment were repeated, you may anticipate your results to fall inside. The formula is as follows: \(\text{x}\) represents our point estimate, \(\text{n}\) represents the sample size, \(\text{z}\) represents our level of confidence, and \(\text{s}\) represents our standard deviation. Our standard deviation has been calculated using the base R sd() funtion, and our \(\text{z}\) values have been taken from the standard z-table. \[\tiny{\text{Confidence Interval} = \text{x}\pm\text{z}\frac{s}{\sqrt{n}}}\] \[\tiny{\text{Male CI} = 70.1\pm\text{1.96}\frac{2.99}{\sqrt{3402}}}\] \[\tiny{\text{Male CI} = {70 - 70.2}}\] \[\tiny{\text{Female CI} = 64.3\pm\text{1.96}\frac{2.72}{\sqrt{3604}}}\] \[\tiny{\text{Female CI} = {64.2-64.4}}\]

## # A tibble: 2 × 2
##   sex    Standard
##   <fct>     <dbl>
## 1 male       2.99
## 2 female     2.72

Confidence Interval (Plotly)

This plot depicts the heights of both men and women along with the 95% Confidence Interval we calculated. The interval’s margin is incredibly narrow, making it challenging to see. Zooming into the interactive graphic is recomended for a clearer view of this parameter.

Citations

Engel, C. (2019, June 24). Data visualization with R. Data Visualization with R. Retrieved April 11, 2023, from https://cengel.github.io/R-data-viz/ .

Sievert, C. (2019, December 19). Interactive web-based data visualization with R, plotly, and shiny. Retrieved April 12, 2023, from https://plotly-r.com/

Sullivan, L. (n.d.). Confidence Intervals. Boston University School of Public Health. Retrieved April 12, 2023 from https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_confidence_intervals/bs704_confidence_intervals_print.html