Week 5 Mid-Week Lab

Step 1. Load the data.

Step 2. Create the variable area.

EXERCISE 1: Creating a Sample; Observing Mean and Distribution

Calculate the summary statistics and obtain a histogram of the variable area of the population. Min. 1st Qu. Median Mean 3rd Qu. Max. 334 1126 1442 1500 1743 5642
Create a sample of the areas of 50 homes from the population and assign it to samp1.
Obtain the summary statistics of this sample of areas, create a histogram, and calculate the mean. [1] 1510.4
How do the sample statistics and distribution compare to that of the population found above in part a? Sample statistics and distribution are usually close to the population but it does allow variability.

EXERCISE 2:

Take a second sample, also of size 50, and call it samp2. Calculate the mean and histogram of samp2.[1] 1559.44
How does the mean of samp2 compare with the mean of samp1? How do the histograms compare? The mean will be different between sample 1 and 2 due to variability, as stated before, calculations should be close to the population size.
Now obtain a sample entitled samp3 but this time with a sample size of 100. Find the mean and histogram of samp3. [1] 1413.2
Now obtain a sample entitled samp4 but with a sample size of 1000. Find the mean and histogram of samp4.[1] 1510.757
Based on the results and comparing the means to the population results, which seems to provide a more accurate estimate of the population mean. Which sample size would you consider ‘good enough’ and why? The sample size I consider ‘good enough’ is sample 4, this is a larger sample and it allows us to get closer to the population mean.

Exercise 3:

Run the code in the lab. How many elements are there in sample_means50? 5,000
What is the mean of sample_means50, and how does it compare to the mean of the population found at the beginning? Describe the sampling distribution. What does the range of estimates mean? Would you expect the distribution to change if we instead collected 50,000 sample means? The sample distribution appears unimodal with a normal shape. Meaning the range of estimation will represent a combination of all sample means. This would display highest to lowest home areas. With a 50,000 sample mean, the distribution would be closer to normal, again meaning the sample is the closest to true population mean.
Obtain the sample_means10 and sample_means100 vectors, and plot their histograms along with the sample_means50 histogram.
Based on the plots, as the sample size increases does the variability of the estimate increase or decrease? Would a sample size of 50 probably be satisfactory to use to estimate the mean area of all Ames homes? Why or why not? What about a sample size of 10? What sample size would you prefer to use and why? Based on the plots, as the sample increases, the variability of the estimated area will decrease. In my opinion, the sample size of 50 is satisfactory based on reduced spread and approximate normal shape, allowing a reasonable estimate. With the sample size of 10, the variability is high which results in a less accurate estimate. The preferred sample size would be 100, due to it getting the closest estimate and lowest variable.

Week 5 Mid-Week Lab

Brianne Balsinger

1/25/26