Harold Nelson
2025-03-17
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Create a histogram of the variable carat in the dataframe diamonds.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Does this appear to have a normal distribution?
No, there is extreme right skewness.
Compute the mean and standard deviation of the variable. Call them mean_carat and sd_carat. Display the values.
## [1] 0.7979397
## [1] 0.4740112
Write a code snippet which computes an estimate of the mean based on a sample of size sample_size. Run it repeatedly with sample_size = 2.
Repeat this exercise with sample_size values of 5, 10, 100, 500, 1000. What do you observe as you increase the sample size?
The estimates of xbar get better as the sample size increases.
Let’s look at the distribution of the values of xbar.
Begin by setting a vector of 1,000 estimates equal to zero using rep(). Then use a for loop to fill this vector with estimates of xbar based on random samples of size sample_size.
Compute the mean of the estimates as mean_estimate and the standard deviation of the estimates as sd_estimate. Display these values.
sample_size = 4
estimates = rep(0,1000)
for(i in 1:1000){
estimates[i] = mean(sample(diamonds$carat,sample_size))
}
mean_estimates = mean(estimates)
mean_estimates
## [1] 0.790455
## [1] 0.2306748
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Run the code with sample_size values of 1, 4, and 16. what do you observe?
These facts are examples of two relationships that we need to remember.
\[\mu_{\bar{x}}=\mu_{x}\] and \[\sigma_{\bar{x}} =\frac{\sigma_{x}}{\sqrt{n}}\]
More importantly, no matter what kind of distribution the original variable x had, the distribution of xbar approaches the normal distribution as a limit when the value of sample_size becomes large.