Over the recent years, the poultry industry is growing rapidly and so is the demand. To get the full potential out of broiler chickens it is important that the target weight at 7 days of age should be reached.These days the interest in early nutrition research has increased due to the high correlation between the weight and the diet given during the initial stage.
In this project we aim to find out if the different protein diets have different effects on the weights of the chicks.
The data provides information on the body weights of the chicks measured at birth and every second day thereafter until day 20. The weight is also measured on day 21. There are four groups of chicks on different protein diets.
The data set includes the following variables:
Weight: A numeric vector giving the body weight of the chick (gm).
Time: A numeric vector giving the number of days since birth when the measurement was made.
Diet: A factor with levels 1, 2, 3, 4 indicating which experimental diet the chick received.
library(tidyr)
head(ChickWeight)
## weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
The empirical CDF is plotted below
The estimated sample mean is found to be 121.82 gm. After performing bootstrap to find the standard error, we get a result of 2.94 gm. At a confidence level of 0.95 , we get the following confidence intervals for the mean :
Normal Confidence Interval :115.93 to 127.69
Pivotal : 116.1569 to 127.6125
Quantile : 116.02 to 127.47
Pivotal gives the best approximate for the confidence interval hence we can say that the sample mean is 121.82 gms and 95% of the times the interval 116.15gms to 127.61gms will have the population mean.
library(bootstrap)
data1=ChickWeight$weight
theta <- function(x){
mean(x)
}
results <- bootstrap(x=data1,3200,theta)
th.hat=mean(data1)#theta hat
Tboot<-results$thetastar#vector of bootstrap results
se.boot <- sqrt(var(Tboot))
##Normal
CI.normal<-c(th.hat-2*se.boot, th.hat+2*se.boot)
##Pivotal
CI.pivotal<-2*th.hat-quantile(Tboot,probs = c(0.975, 0.025))
#Quantile
CI.quantile<-quantile(Tboot,probs = c(0.025, 0.975) )
hist(Tboot,probability = T)
The MLE of the mean is 121.82 gms The asymptotic distribution is shown below. It can be seen that asymptotically the MLE of mean follows a normal distribution.
n<-length(data1)
mean.mle<-mean(data1)
sigma_hat<-sqrt((1/n)*sum((data1-mean(data1))^2))
data.norm<-rnorm(10000,mean.mle,sigma_hat)
hist(data.norm)
#
Hypothesis test is conducted to see if there is any significant difference in the weights of the chicks that were on two different diets.
diet.1<-dplyr::filter(ChickWeight, Diet==1)
diet.2<-dplyr::filter(ChickWeight, Diet==2)
mean.diet.1<- mean(diet.1$weight)
mean.diet.2<- mean(diet.2$weight)
sd.diet.1<-sd(diet.1$weight)
sd.diet.2<-sd(diet.2$weight)
z<-(mean.diet.1-mean.diet.2)/sqrt(((sd.diet.1^2)/length(diet.1$weight))+((sd.diet.2^2)/length(diet.2$weight)))
c<-qnorm(0.975)
hyp<-abs(z)>c
#We reject the null hypotheses
diff<-mean.diet.1-mean.diet.2
se.hat<-sqrt(((sd.diet.1^2)/length(diet.1$weight))+((sd.diet.2^2)/length(diet.2$weight)))
CI<-c(diff-2*se.hat,diff+2*se.hat)
Wald Test is used to compare the means of the two samples when the parameter is known. The results show the test statistic as -2.63 and the critical value is 1.96 . As the absolute value of the observed test statistic is more extreme than the critical value, there is strong evidence against the null. Hence, we reject the null hypothesis at a significance level of 0.5% and conclude that both the means are different.
#wald_test=(mean(data1)-mu_hat)/sqrt(mean(data1/n))
In the Bayesian framework:
We choose a probability density (prior distribution)-that expresses our beliefs about a parameter.
We choose a statistical model that reflects our beliefs about the distribution given the parameter.
After observing the data, we update our beliefs and calculate the posterior distribution.
Here, we have considered the prior to be 𝑓(𝜇)=1. For a normal distribution, the posterior distribution follows N(𝑥̅,𝜎2𝑛) where 𝑥̅ is the mean of the data, 𝜎2𝑛 is the variance. On approximating the posterior distribution using simulation (1000 times), we get the mean to be 121.818 gms.
simu.mu=rnorm(1000,mean = mean(data1), sd = 1/sqrt(length(diet.1$weight)))
hist(simu.mu, probability = T)
mean.bayesian<-mean(simu.mu)