WineProject

1. Suppose the population mean of the variable “density” is μ , do the following inferences:

a. Provide an estimate of μ based on the sample;

mu.density <- mean(red_wine$density)
mu.density

## [1] 0.9967467

The estimate of μ based on the sample mean of density is 0.997 i.e μ is very close to 1

b. Use the Central Limit Theorem (CLT) to quantify the variability of your estimate;

sd.density <- sd(red_wine$density)
sd.density

## [1] 0.001887334

var.density <- sd(red_wine$density)/sqrt(length(red_wine$density))
var.density

## [1] 4.71981e-05

The value of variability is 4.71981e-05 (close to zero)

c. Use the CLT to give a 95% confidence interval for μ.

hist(red_wine$density)

lc.density <- mu.density - 2*(var.density)
lc.density

## [1] 0.9966523

uc.density <- mu.density + 2*(var.density)
uc.density

## [1] 0.9968411

The 95% confidence interval for μ is (0.9966, 0.9968)

d. Use the bootstrap method to do parts b and c, and compare the results

mu.density.set <- NULL
for (k in 1:2000) {
  density.bootstrap <- sample(red_wine$density, size = 1599, replace = T)
  mu.density <- mean(density.bootstrap)
  mu.density.set[k] <- mu.density
}

sd(mu.density.set)

## [1] 4.827167e-05

var.density.set <- var(mu.density.set)
var.density.set

## [1] 2.330155e-09

conf.q.density <- quantile(mu.density.set, probs = c(0.025, 0.975))
conf.q.density

##      2.5%     97.5% 
## 0.9966526 0.9968431

The variability from the bootstrap method is almost equal to the variability calculated using the Central Limit Theorem. And same is the case with confidence interval.

2. Suppose the population mean of the variable “residual sugar” is μ , a

a. Provide an estimate of μ based on the sample;

mu.sugar <- mean(red_wine$`residual sugar`)
mu.sugar

## [1] 2.538806

The estimate of μ based on the sample mean of residual sugar is 2.5388

b. Noting that the sample distribution of “residual sugar” is highly skewed, can we use the CLT to quantify the variability of your estimate? Can we use the CLT to give a 95% confidence interval for μ? If yes, please give your solution. If no, explain why.

hist(red_wine$`residual sugar`)

mu.sugar <- mean(red_wine$`residual sugar`)
mu.sugar

## [1] 2.538806

sd.sugar <- sd(red_wine$`residual sugar`)
sd.sugar

## [1] 1.409928

var.sugar <- sd(red_wine$`residual sugar`)/ sqrt(length(red_wine$`residual sugar`))
var.sugar

## [1] 0.03525922

As long as the sample size is large, the distribution of the sample means will follow an approximate Normal distribution and hence CLT applies. And a left or right skew in the distribution does not impact the application of CLT

#2b CI

lc.sugar <- mu.sugar - 2*(var.sugar)
lc.sugar

## [1] 2.468287

uc.sugar <- mu.sugar + 2*(var.sugar)
uc.sugar

## [1] 2.609324

ci.sugar.clt <- c(lc.sugar, uc.sugar)
ci.sugar.clt

## [1] 2.468287 2.609324

The upper and lower levels of 95% confidence interval are (2.4683, 2.6093)

c. Use the bootstrap method to do part b. Is the bootstrap confidence interval symmetric? (hint: check the bootstrap distribution; see p. 25-26 in Lecture 4).

mu.sugar.set <- NULL
for (k in 1:2000) {
  sugar.bootstrap <- sample(red_wine$`residual sugar`, size=1599, replace=T)
mu.sugar <- mean(sugar.bootstrap)
mu.sugar.set[k] <- mu.sugar
}

sd(mu.sugar.set)

## [1] 0.03464122

#CI using Bootstrap

conf.q.sugar <- quantile(mu.sugar.set, probs=c(0.025, 0.975))
conf.q.sugar

##     2.5%    97.5% 
## 2.472620 2.609636

The variability using Bootstrap method is almost equal to the variability calculated using CLT

3. We classify those wines as “excellent” if their rating is at least 7. Suppose the population proportion of excellent wines is p. Do the following:

a. Use the CLT to derive a 95% confidence interval for p;

red_wine$excellent <- as.numeric(red_wine$quality > 6)

p <- mean(red_wine$excellent)
p

## [1] 0.1357098

var.excellent <- sqrt(p*(1 - p) / length(red_wine$excellent))
var.excellent

## [1] 0.008564681

lc.p <- p - 2*(var.excellent)
lc.p

## [1] 0.1185805

uc.p <- p + 2*(var.excellent)
uc.p

## [1] 0.1528392

The lower and upper limits of the 95% confidence interval for p, using CLT, are (0.1185805, 0.1528392)

b. Use the bootstrap method to derive a 95% confidence interval for p;

bootstrap_p.set <- NULL
for (k in 1:2500) {
  p.bootstrap <- sample(red_wine$excellent, size = 1599, replace = T)
bootstrap_p <- mean(p.bootstrap)
bootstrap_p.set[k] <- bootstrap_p
}

sd(bootstrap_p.set)

## [1] 0.008628812

conf.q.p <- quantile(bootstrap_p.set, probs = c(0.025, 0.975))
conf.q.p

##      2.5%     97.5% 
## 0.1194497 0.1532208

hist(bootstrap_p.set, freq = FALSE)
lines(density(bootstrap_p.set), lwd = 5, col = 'blue')

##### The lower and upper limits of the 95% confidence interval for p, using Bootstrap are (0.1200750, 0.1532208)

WineProject_B

Keerthi Chereddy

2023-09-20

1. Suppose the population mean of the variable “density” is μ , do the following inferences:

a. Provide an estimate of μ based on the sample;

The estimate of μ based on the sample mean of density is 0.997 i.e μ is very close to 1

b. Use the Central Limit Theorem (CLT) to quantify the variability of your estimate;

The value of variability is 4.71981e-05 (close to zero)

c. Use the CLT to give a 95% confidence interval for μ.

The 95% confidence interval for μ is (0.9966, 0.9968)

d. Use the bootstrap method to do parts b and c, and compare the results

The variability from the bootstrap method is almost equal to the variability calculated using the Central Limit Theorem. And same is the case with confidence interval.

2. Suppose the population mean of the variable “residual sugar” is μ , a

a. Provide an estimate of μ based on the sample;

The estimate of μ based on the sample mean of residual sugar is 2.5388

b. Noting that the sample distribution of “residual sugar” is highly skewed, can we use the CLT to quantify the variability of your estimate? Can we use the CLT to give a 95% confidence interval for μ? If yes, please give your solution. If no, explain why.

As long as the sample size is large, the distribution of the sample means will follow an approximate Normal distribution and hence CLT applies. And a left or right skew in the distribution does not impact the application of CLT

The upper and lower levels of 95% confidence interval are (2.4683, 2.6093)

c. Use the bootstrap method to do part b. Is the bootstrap confidence interval symmetric? (hint: check the bootstrap distribution; see p. 25-26 in Lecture 4).

The variability using Bootstrap method is almost equal to the variability calculated using CLT

3. We classify those wines as “excellent” if their rating is at least 7. Suppose the population proportion of excellent wines is p. Do the following:

a. Use the CLT to derive a 95% confidence interval for p;

The lower and upper limits of the 95% confidence interval for p, using CLT, are (0.1185805, 0.1528392)

b. Use the bootstrap method to derive a 95% confidence interval for p;

c. Compare the two intervals. Is there any difference worth our attention?

There is no significant difference between the intervals calculated