OpenIntro Data

  1. Take a random sample of size 50 from price. Using this sample, what is your best point estimate of the population mean?
price<- ames$SalePrice
samp1 <- sample(price, 50)
mean(samp1)
## [1] 173831.1
  1. Since you have access to the population, simulate the sampling distribution for x¯price by taking 5000 samples from the population of size 50 and computing 5000 sample means. Store these means in a vector called sample_means50. Plot the data, then describe the shape of this sampling distribution. Based on this sampling distribution, what would you guess the mean home price of the population to be? Finally, calculate and report the population mean.

I would guess the mean to be about 1490.But the actual population mean is 1498.356.

area<-ames$Gr.Liv.Area
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
   samp <- sample(area, 50)
   sample_means50[i] <- mean(samp)
}
hist(sample_means50, breaks= 25)

mean(sample_means50)
## [1] 1498.356
  1. Change your sample size from 50 to 150, then compute the sampling distribution using the same method as above, and store these means in a new vector called sample_means150. Describe the shape of this sampling distribution, and compare it to the sampling distribution for a sample size of 50. Based on this sampling distribution, what would you guess to be the mean sale price of homes in Ames?

Based on this sampling distribution I would gues the man sale price of homes in Ames to be about $1500.

samp2<- sample(price, 150)
sample_means150<-rep(NA, 5000)
for(i in 1:5000){
   samp <- sample(area, 150)
   sample_means150[i] <- mean(samp)
}
hist(sample_means150, breaks= 25)

mean(sample_means150)
## [1] 1499.824
  1. Of the sampling distributions from 2 and 3, which has a smaller spread? If we’re concerned with making estimates that are more often close to the true value, would we prefer a distribution with a large or small spread?

The samplling distribution of size 150 has the smaller spread.If we want an estimate closer to the true value, we’d want a distribution with a large spread.

MileStone Data

  1. Select a numeric variable of interest from your dataset. Using the process modeled in this lab, plot the sampling distribution of the sample mean of your selected variable when the sample size is n=5. When creating your vector of sample means, sample with replacement.
area<-PokemonStat$Attack
sample_means5<-rep(NA, 5000)
for(i in 1:5000){
  samp<-sample(area, 5)
  sample_means5[i]<-mean(samp)
}
hist(sample_means5, breaks=25)

  1. Repeat Question 1 for sample sizes of n=25 and n=100.
sample_means25<-rep(NA, 5000)
sample_means100<-rep(NA, 5000)

for(i in 1:5000){
  samp<-sample(area, 25)
  sample_means25[i]<-mean(samp)
}
hist(sample_means25, breaks=25)

for(i in 1:5000){
  samp<-sample(area, 100)
  sample_means100[i]<-mean(samp)
}
hist(sample_means100, breaks=25)

  1. How does the center, spread, and shape of the samping distribution of the sample mean change for each of the three sample sizes?

All three sample sizes are similar in that theyre all unimodal with a close to normal distribution with the mean around 80. Out of all three sample sizes the one with 100 samples appears to be be the closest to a normal distribution.

  1. Select a sample statistic other than the sample mean. using the process modeled in the lab, plot the sampling distribution of the chosen sample statistic of your selected variable when the sample size is n=5. Sample with replacement.
area<-PokemonStat$Attack
sample_std5<-rep(NA,5000)
for(i in 1:5000){
  samp<-sample(area, 5)
  sample_std5[i]<-sd(samp)
}
hist(sample_std5, breaks=25)

  1. Repeat Question 4 for sample sizes of n=25 and n=100.
sample_std25<-rep(NA, 5000)
sample_std100<-rep(NA, 5000)

for(i in 1:5000){
  samp<-sample(area, 25)
  sample_std25[i]<-sd(samp)
}
hist(sample_std25, breaks=25)

for(i in 1:5000){
  samp<-sample(area, 100)
  sample_std100[i]<-sd(samp)
}
hist(sample_std100, breaks=25)

  1. How does the center, spread, and shape of the sampling distribution of your chosen sample statistic change for each of the three sample sizes?

All three sample sizes are similar in that theyre all unimodal with a close to normal distribution with the mean around 33 or 34. Out of all three sample sizes the one with 100 samples appears to be be the closest to a normal distribution.

R Markdown