Requirements for the Assignment

Pre-Requisites to answering the questions – Load in “Stats4” package, load in the Red Wine Dataset and Preview top 100 rows

library(stats4)
wine <- read.csv(file = "winequality-red.csv", sep=";", header = T) # Load in the dataset
knitr::kable(head(wine,100), caption = "Red Wine Dataset")

Red Wine Dataset
fixed.acidity	volatile.acidity	citric.acid	residual.sugar	chlorides	free.sulfur.dioxide	total.sulfur.dioxide	density	pH	sulphates	alcohol	quality
7.4	0.700	0.00	1.90	0.076	11	34	0.9978	3.51	0.56	9.4	5
7.8	0.880	0.00	2.60	0.098	25	67	0.9968	3.20	0.68	9.8	5
7.8	0.760	0.04	2.30	0.092	15	54	0.9970	3.26	0.65	9.8	5
11.2	0.280	0.56	1.90	0.075	17	60	0.9980	3.16	0.58	9.8	6
7.4	0.700	0.00	1.90	0.076	11	34	0.9978	3.51	0.56	9.4	5
7.4	0.660	0.00	1.80	0.075	13	40	0.9978	3.51	0.56	9.4	5
7.9	0.600	0.06	1.60	0.069	15	59	0.9964	3.30	0.46	9.4	5
7.3	0.650	0.00	1.20	0.065	15	21	0.9946	3.39	0.47	10.0	7
7.8	0.580	0.02	2.00	0.073	9	18	0.9968	3.36	0.57	9.5	7
7.5	0.500	0.36	6.10	0.071	17	102	0.9978	3.35	0.80	10.5	5
6.7	0.580	0.08	1.80	0.097	15	65	0.9959	3.28	0.54	9.2	5
7.5	0.500	0.36	6.10	0.071	17	102	0.9978	3.35	0.80	10.5	5
5.6	0.615	0.00	1.60	0.089	16	59	0.9943	3.58	0.52	9.9	5
7.8	0.610	0.29	1.60	0.114	9	29	0.9974	3.26	1.56	9.1	5
8.9	0.620	0.18	3.80	0.176	52	145	0.9986	3.16	0.88	9.2	5
8.9	0.620	0.19	3.90	0.170	51	148	0.9986	3.17	0.93	9.2	5
8.5	0.280	0.56	1.80	0.092	35	103	0.9969	3.30	0.75	10.5	7
8.1	0.560	0.28	1.70	0.368	16	56	0.9968	3.11	1.28	9.3	5
7.4	0.590	0.08	4.40	0.086	6	29	0.9974	3.38	0.50	9.0	4
7.9	0.320	0.51	1.80	0.341	17	56	0.9969	3.04	1.08	9.2	6
8.9	0.220	0.48	1.80	0.077	29	60	0.9968	3.39	0.53	9.4	6
7.6	0.390	0.31	2.30	0.082	23	71	0.9982	3.52	0.65	9.7	5
7.9	0.430	0.21	1.60	0.106	10	37	0.9966	3.17	0.91	9.5	5
8.5	0.490	0.11	2.30	0.084	9	67	0.9968	3.17	0.53	9.4	5
6.9	0.400	0.14	2.40	0.085	21	40	0.9968	3.43	0.63	9.7	6
6.3	0.390	0.16	1.40	0.080	11	23	0.9955	3.34	0.56	9.3	5
7.6	0.410	0.24	1.80	0.080	4	11	0.9962	3.28	0.59	9.5	5
7.9	0.430	0.21	1.60	0.106	10	37	0.9966	3.17	0.91	9.5	5
7.1	0.710	0.00	1.90	0.080	14	35	0.9972	3.47	0.55	9.4	5
7.8	0.645	0.00	2.00	0.082	8	16	0.9964	3.38	0.59	9.8	6
6.7	0.675	0.07	2.40	0.089	17	82	0.9958	3.35	0.54	10.1	5
6.9	0.685	0.00	2.50	0.105	22	37	0.9966	3.46	0.57	10.6	6
8.3	0.655	0.12	2.30	0.083	15	113	0.9966	3.17	0.66	9.8	5
6.9	0.605	0.12	10.70	0.073	40	83	0.9993	3.45	0.52	9.4	6
5.2	0.320	0.25	1.80	0.103	13	50	0.9957	3.38	0.55	9.2	5
7.8	0.645	0.00	5.50	0.086	5	18	0.9986	3.40	0.55	9.6	6
7.8	0.600	0.14	2.40	0.086	3	15	0.9975	3.42	0.60	10.8	6
8.1	0.380	0.28	2.10	0.066	13	30	0.9968	3.23	0.73	9.7	7
5.7	1.130	0.09	1.50	0.172	7	19	0.9940	3.50	0.48	9.8	4
7.3	0.450	0.36	5.90	0.074	12	87	0.9978	3.33	0.83	10.5	5
7.3	0.450	0.36	5.90	0.074	12	87	0.9978	3.33	0.83	10.5	5
8.8	0.610	0.30	2.80	0.088	17	46	0.9976	3.26	0.51	9.3	4
7.5	0.490	0.20	2.60	0.332	8	14	0.9968	3.21	0.90	10.5	6
8.1	0.660	0.22	2.20	0.069	9	23	0.9968	3.30	1.20	10.3	5
6.8	0.670	0.02	1.80	0.050	5	11	0.9962	3.48	0.52	9.5	5
4.6	0.520	0.15	2.10	0.054	8	65	0.9934	3.90	0.56	13.1	4
7.7	0.935	0.43	2.20	0.114	22	114	0.9970	3.25	0.73	9.2	5
8.7	0.290	0.52	1.60	0.113	12	37	0.9969	3.25	0.58	9.5	5
6.4	0.400	0.23	1.60	0.066	5	12	0.9958	3.34	0.56	9.2	5
5.6	0.310	0.37	1.40	0.074	12	96	0.9954	3.32	0.58	9.2	5
8.8	0.660	0.26	1.70	0.074	4	23	0.9971	3.15	0.74	9.2	5
6.6	0.520	0.04	2.20	0.069	8	15	0.9956	3.40	0.63	9.4	6
6.6	0.500	0.04	2.10	0.068	6	14	0.9955	3.39	0.64	9.4	6
8.6	0.380	0.36	3.00	0.081	30	119	0.9970	3.20	0.56	9.4	5
7.6	0.510	0.15	2.80	0.110	33	73	0.9955	3.17	0.63	10.2	6
7.7	0.620	0.04	3.80	0.084	25	45	0.9978	3.34	0.53	9.5	5
10.2	0.420	0.57	3.40	0.070	4	10	0.9971	3.04	0.63	9.6	5
7.5	0.630	0.12	5.10	0.111	50	110	0.9983	3.26	0.77	9.4	5
7.8	0.590	0.18	2.30	0.076	17	54	0.9975	3.43	0.59	10.0	5
7.3	0.390	0.31	2.40	0.074	9	46	0.9962	3.41	0.54	9.4	6
8.8	0.400	0.40	2.20	0.079	19	52	0.9980	3.44	0.64	9.2	5
7.7	0.690	0.49	1.80	0.115	20	112	0.9968	3.21	0.71	9.3	5
7.5	0.520	0.16	1.90	0.085	12	35	0.9968	3.38	0.62	9.5	7
7.0	0.735	0.05	2.00	0.081	13	54	0.9966	3.39	0.57	9.8	5
7.2	0.725	0.05	4.65	0.086	4	11	0.9962	3.41	0.39	10.9	5
7.2	0.725	0.05	4.65	0.086	4	11	0.9962	3.41	0.39	10.9	5
7.5	0.520	0.11	1.50	0.079	11	39	0.9968	3.42	0.58	9.6	5
6.6	0.705	0.07	1.60	0.076	6	15	0.9962	3.44	0.58	10.7	5
9.3	0.320	0.57	2.00	0.074	27	65	0.9969	3.28	0.79	10.7	5
8.0	0.705	0.05	1.90	0.074	8	19	0.9962	3.34	0.95	10.5	6
7.7	0.630	0.08	1.90	0.076	15	27	0.9967	3.32	0.54	9.5	6
7.7	0.670	0.23	2.10	0.088	17	96	0.9962	3.32	0.48	9.5	5
7.7	0.690	0.22	1.90	0.084	18	94	0.9961	3.31	0.48	9.5	5
8.3	0.675	0.26	2.10	0.084	11	43	0.9976	3.31	0.53	9.2	4
9.7	0.320	0.54	2.50	0.094	28	83	0.9984	3.28	0.82	9.6	5
8.8	0.410	0.64	2.20	0.093	9	42	0.9986	3.54	0.66	10.5	5
8.8	0.410	0.64	2.20	0.093	9	42	0.9986	3.54	0.66	10.5	5
6.8	0.785	0.00	2.40	0.104	14	30	0.9966	3.52	0.55	10.7	6
6.7	0.750	0.12	2.00	0.086	12	80	0.9958	3.38	0.52	10.1	5
8.3	0.625	0.20	1.50	0.080	27	119	0.9972	3.16	1.12	9.1	4
6.2	0.450	0.20	1.60	0.069	3	15	0.9958	3.41	0.56	9.2	5
7.8	0.430	0.70	1.90	0.464	22	67	0.9974	3.13	1.28	9.4	5
7.4	0.500	0.47	2.00	0.086	21	73	0.9970	3.36	0.57	9.1	5
7.3	0.670	0.26	1.80	0.401	16	51	0.9969	3.16	1.14	9.4	5
6.3	0.300	0.48	1.80	0.069	18	61	0.9959	3.44	0.78	10.3	6
6.9	0.550	0.15	2.20	0.076	19	40	0.9961	3.41	0.59	10.1	5
8.6	0.490	0.28	1.90	0.110	20	136	0.9972	2.93	1.95	9.9	6
7.7	0.490	0.26	1.90	0.062	9	31	0.9966	3.39	0.64	9.6	5
9.3	0.390	0.44	2.10	0.107	34	125	0.9978	3.14	1.22	9.5	5
7.0	0.620	0.08	1.80	0.076	8	24	0.9978	3.48	0.53	9.0	5
7.9	0.520	0.26	1.90	0.079	42	140	0.9964	3.23	0.54	9.5	5
8.6	0.490	0.28	1.90	0.110	20	136	0.9972	2.93	1.95	9.9	6
8.6	0.490	0.29	2.00	0.110	19	133	0.9972	2.93	1.98	9.8	5
7.7	0.490	0.26	1.90	0.062	9	31	0.9966	3.39	0.64	9.6	5
5.0	1.020	0.04	1.40	0.045	41	85	0.9938	3.75	0.48	10.5	4
4.7	0.600	0.17	2.30	0.058	17	106	0.9932	3.85	0.60	12.9	6
6.8	0.775	0.00	3.00	0.102	8	23	0.9965	3.45	0.56	10.7	5
7.0	0.500	0.25	2.00	0.070	3	22	0.9963	3.25	0.63	9.2	5
7.6	0.900	0.06	2.50	0.079	5	10	0.9967	3.39	0.56	9.8	5
8.1	0.545	0.18	1.90	0.080	13	35	0.9972	3.30	0.59	9.0	6

1. Suppose the population mean of the variable “density” is μ, do the following inferences:

a. Provide an estimate of μ based on the sample

Note: The population includes all the red wines produced in the north of Portugal. The sample is these 1599 bottles of red wines tested. We use “the sample” (1599 rows of data) to make inference about the population mean parameter mu.

mean(wine$density)

## [1] 0.9967467

As can be seen from the code snippet above, the estimate of the mean is 0.996746679174484

b. Use the Central Limit Theorem (CLT) to quantify the variability of your estimate

sd(wine$density)/sqrt(nrow(wine))

## [1] 4.71981e-05

The variability of the estimate – calculated via (standard deviation)/sqrt(n) – is 4.71981e-05

c. Use the CLT to give a 95% confidence interval for μ

sd.mu.hat <- sd(wine$density)/sqrt(nrow(wine))
mu.hat <- mean(wine$density)
c(mu.hat-2*sd.mu.hat, mu.hat + 2*sd.mu.hat)

## [1] 0.9966523 0.9968411

The 95% confidence interval for the mean is [0.9966523, 0.9968411]

d. Use the bootstrap method to do parts b and c, and compare the results with those obtained from the CLT. State your findings.

mu.hat.set <- NULL
sample <- wine$density

for (k in 1:2000) {
  sample.boostrap <- sample(sample, size = 800, replace = T)
  mu.hat <- mean(sample.boostrap)
  mu.hat.set[k] <- mu.hat
}

sd(mu.hat.set)/sqrt(nrow(wine))

## [1] 1.655979e-06

# 95% confidence interval
quantile(mu.hat.set, probs = c(0.025, 0.975))

##      2.5%     97.5% 
## 0.9966198 0.9968737

The variability calculated via the bootstrap method is 1.636967e-06 and the 95% confidence interval is [0.9966148, 0.9968716].

Compared to the CLT, the variability is smaller by a factor of 20-30 and the 95% confidence interval is wider. The width/range of the bootstrap confidence interval is .0002568 and that of the CLT is .0001888. In addition to this, the bootstrap confidence interval’s lower bound is lower than that of CLT and the upper bound is greater than that of CLT.

e. Can we use a normal distribution to model “density”? If yes, what are the maximum likelihood estimates of the mean and standard deviation? Please provide their standard errors as well.

Note: To find the MLEs, you may consider multiplying the data by 10. This will solve a numerical problem you may encounter. You must transform back in the end.

hist(wine$density)

Provided the histogram above, it is safe to conclude that we can model “density” using a normal distribution.

Maximum Likelihood Estimates (MLE) are calculated below.

wine$density <- 10*wine$density

minuslog.lik <- function(mu, sigma){
  log.lik <- 0
  for(i in 1:800){
    log.lik <- log.lik+log(dnorm(wine$density[i], mean=mu, sd=sigma))
  }
  return(-log.lik)
}

est <- mle(minuslogl=minuslog.lik, start=list(mu = mean(wine$density), sigma = sd(wine$density)))

summary(est)

## Maximum likelihood estimation
## 
## Call:
## mle(minuslogl = minuslog.lik, start = list(mu = mean(wine$density), 
##     sigma = sd(wine$density)))
## 
## Coefficients:
##         Estimate   Std. Error
## mu    9.97646408 0.0005953600
## sigma 0.01683932 0.0004150282
## 
## -2 log L: -4265.989

To get the MLE of the mean and the standard deviation – as well as the standard errors – divide the values from the above output by 10. As a pre-requisite to getting the above estimates, had to multiply the values in this column of the dataset by 10.

Estimate – Mean = 9.97646408/10 = 0.997646408

Std. Error – Mean = 0.0005953600/10 = 0.000059536

Estimate – Mean = 0.01683932/10 = 0.001683932

Std. Error – Mean = 0.0004150282/10 = 0.00004150282

2. Suppose the population mean of the variable “residual sugar” is μ, answer the following questions.

a. Provide an estimate of μ based on the sample

mean(wine$residual.sugar)

## [1] 2.538806

The estimate of μ based on the sample is 2.538806

b. Noting that the sample distribution of “residual sugar” is highly skewed, can we use the CLT to quantify the variability of your estimate? Can we use the CLT to give a 95% confidence interval for μ? If yes, please give your solution. If no, explain why.

Yes – despite the fact that “residual sugar” is highly skewed, we can use the CLT to quantify the variability of this estimate and give a 95% confidence interval for μ. CLT can be applied to a distribution as long as it has a finite variance and it is IID (Independent and Identically Distributed).

# Variability of the estimate
sd(wine$residual.sugar)/sqrt(nrow(wine))

## [1] 0.03525922

# 95% confidence interval
sd.mu.hat <- sd(wine$residual.sugar)/sqrt(nrow(wine))
mu.hat <- mean(wine$residual.sugar)
c(mu.hat-2*sd.mu.hat, mu.hat + 2*sd.mu.hat)

## [1] 2.468287 2.609324

As can be seen in the above code snippet, the variability of μ estimate is 0.03525922 and the 95% confidence interval is [2.468287, 2.609324]

c. Use the bootstrap method to do part b. Is the bootstrap confidence interval symmetric?

mu.hat.set <- NULL
sample <- wine$residual.sugar

# getting the variability
for (k in 1:2000) {
  sample.boostrap <- sample(sample, size = 800, replace = T)
  mu.hat <- mean(sample.boostrap)
  mu.hat.set[k] <- mu.hat
}
var.bootstrap <- sd(mu.hat.set)
print(var.bootstrap)

## [1] 0.04987445

# 95% confidence interval
CI.bootstrap <- quantile(mu.hat.set, probs = c(0.025, 0.975))
print(CI.bootstrap)

##     2.5%    97.5% 
## 2.441059 2.636350

As can be seen from the output above, the bootstrap variability is 0.04888155 and the 95% confidence interval for μ estimate is [2.449434, 2.639439].

hist(mu.hat.set, freq = FALSE)
lines(density(mu.hat.set), lwd = 5, col='blue')
abline(v = c(CI.bootstrap[1], CI.bootstrap[2]), col = c('red','red'))
text(x=CI.bootstrap[1]-.01, y=5, srt=90, '2.5%')
text(x=CI.bootstrap[2]+.01, y=5, srt=90, '97.5%')

Provided the graphic above, it can be reasonably concluded that the bootstrap confidence interval is symmetric.

d. Can we use a normal distribution to model “residual sugar”? If no, what distribution do you think can approximate its empirical distribution? What parameters are needed to characterize such a distribution? what are their maximum likelihood estimates? Please provide their standard errors as well.

hist(wine$residual.sugar)

No, it would not be appropriate to model “residual sugar” using a normal distribution. I think a log-normal distribution – specifically the blue curve in the below snippet (https://en.wikipedia.org/wiki/Log-normal_distribution) – could be used to approximate its empirical distribution. Two parameters – mean and standard deviation – are needed to characterize this distribution.

minuslog.lik<-function(mu, sigma){
  log.lik<-0
  for(i in 1:800){
    log.lik = log.lik + log(dlnorm(x=wine$residual.sugar[i], meanlog = mu, sdlog = sigma))
  }
  return(-log.lik)
}

est.lognorm<-mle(minuslog=minuslog.lik, start=list(mu=log(mean(wine$residual.sugar)), sigma=log(sd(wine$residual.sugar))))


summary(est.lognorm)

## Maximum likelihood estimation
## 
## Call:
## mle(minuslogl = minuslog.lik, start = list(mu = log(mean(wine$residual.sugar)), 
##     sigma = log(sd(wine$residual.sugar))))
## 
## Coefficients:
##        Estimate  Std. Error
## mu    0.8919587 0.012070414
## sigma 0.3414029 0.008534798
## 
## -2 log L: 1977.921

As can be seen in the above output:

Estimate – mu = 0.8919587
Std. Error – mu = 0.012070414
Estimate – sigma = 0.3414029
Std. Error – sigma = 0.008534798

3. We classify those wines as “excellent” if their rating is at least 7. Suppose the population proportion of excellent wines is p. Do the following:

Note: before answering any part of question 3, you need to create a new column of data for the new variable “excellent” which has binary values (0 and 1). Its value is 1 if the wine rating >=7, and otherwise its value is 0. Then, you can make inference about the underlying proportion parameter p using this column of data.

wine$excellent <- as.integer(wine$quality >= 7)

Preview the new column to ensure it looks fine.

wine$excellent

##    [1] 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##   [38] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
##   [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
##  [223] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [260] 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0
##  [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0
##  [334] 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 1
##  [371] 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
##  [408] 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1
##  [445] 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [482] 1 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 1 1 1 1 1 1 0 0 1 0 0 0 1 1 0 0 0
##  [519] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 0 0
##  [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [630] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
##  [667] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [704] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [741] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [778] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0
##  [815] 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0
##  [852] 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1
##  [889] 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
##  [926] 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 0
##  [963] 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0
## [1000] 0 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 1
## [1037] 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0
## [1074] 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
## [1111] 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
## [1148] 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## [1185] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0
## [1222] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1259] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1296] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [1333] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1370] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1
## [1407] 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
## [1444] 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0
## [1481] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1518] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0
## [1555] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## [1592] 0 0 0 0 0 0 0 0

a. Use the CLT to derive a 95% confidence interval for p

mu.hat <- mean(wine$excellent)
sd.mu.hat <- sd(wine$residual.sugar)/sqrt(nrow(wine))
c(mu.hat-2*sd.mu.hat, mu.hat + 2*sd.mu.hat)

## [1] 0.06519138 0.20622826

As can be seen from the output above, the CLT 95% confidence interval for p is [0.06519138, 0.20622826]

b. Use the bootstrap method to derive a 95% confidence interval for p

mu.hat.set <- NULL
sample <- wine$excellent

for (k in 1:2000) {
  sample.boostrap <- sample(sample, size = 800, replace = T)
  mu.hat <- mean(sample.boostrap)
  mu.hat.set[k] <- mu.hat
}
quantile(mu.hat.set, probs = c(0.025, 0.975))

##   2.5%  97.5% 
## 0.1125 0.1600

As can be seen from the output above, the bootstrap 95% confidence interval for p is [0.1125, 0.1600]

c. Compare the two intervals. Is there any difference worth our attention?

Yeah, I think it’s worth noting that the CLT 95% confidence interval for p is considerably wider (approximately 3 times wider) than the bootstrap 95% confidence interval for p. The range of the CLT 95% confidence interval is 0.14103688 and the range of the bootstrap 95% confidence interval is .0475.

d. What is the maximum likelihood estimate of p and its standard error?

minuslog.lik<-function(p){
  -log(dbinom(x=sum(wine$excellent), size=nrow(wine), prob=p))
}

est <- mle(minuslog=minuslog.lik, start=list(p=0.5))
summary(est)

## Maximum likelihood estimation
## 
## Call:
## mle(minuslogl = minuslog.lik, start = list(p = 0.5))
## 
## Coefficients:
##    Estimate  Std. Error
## p 0.1357098 0.008564279
## 
## -2 log L: 7.072712

As can be seen in the output above, the estimate of p is 0.1357098 and it’s standard error is 0.008564279

Assignment 2: Part B

Benjamin Harris

2024-02-10

Requirements for the Assignment

Pre-Requisites to answering the questions – Load in “Stats4” package, load in the Red Wine Dataset and Preview top 100 rows

1. Suppose the population mean of the variable “density” is μ, do the following inferences:

2. Suppose the population mean of the variable “residual sugar” is μ, answer the following questions.

3. We classify those wines as “excellent” if their rating is at least 7. Suppose the population proportion of excellent wines is p. Do the following: