\[Cov(F_n(u), F_n(v)) = E[F_n(u)F_n(v)]-E[F_n(u)]E[F_n(v)]\] \[= E[\frac{1}{n}\sum_{i} \mathbb{I_u}(X_i)\frac{1}{n}\sum_{j} \mathbb{I_v}(X_j)-E[\frac{1}{n}\sum_{i} \mathbb{I_u}(X_i)]E[\frac{1}{n}\sum_{j} \mathbb{I_v}(X_j)]\]
\[= E[\frac{1}{n^2}\sum_{i} \mathbb{I_u}(X_i)\sum_{j} \mathbb{I_v}(X_j)-F(u)F(v)\]
\[= \frac{1}{n^2}\sum_{i=j}[P(X_i\le u, X_j \le v) - F(u)F(v)]+\frac{1}{n^2}\sum_{i\ne j}[P(X_i\le u, X_j \le v) - F(u)F(v)]\]
Since observations are i.i.d., middle term is 0, therefore, let \(m = min(u,v)\),
\[Above = \frac{1}{n^2}\sum[F(m)-F(u)F(v)]=\frac{1}{n}[F(m)-F(u)F(v)]\]
And because for any \(u<v\): \[0<F(u)<F(v)\le1\] So: \[0<F^2(u)<F(u)F(v)\le F(u)\]
The Covariance is positive.
prob6_data <- c(14.27,15.15,13.98,15.40,14.04,14.10,13.75,14.23,14.80,
13.98,14.47,14.68,13.68,15.47,14.87,14.44,12.28,
14.90,14.65,13.33,15.31,13.73,15.28,14.57,17.09,15.91,
14.73,14.41,14.32,13.65,14.43,15.10,14.52,15.18,
14.19,13.64,15.02,13.96,12.92,15.63,14.49,15.21,14.77,
14.01,14.57,15.56,13.83,14.56,14.75,14.30,14.92,15.49,
15.38,13.66,15.03,14.41,14.62,15.47,15.13)
Fn <- ecdf(prob6_data)
Fn_values <- knots(Fn)
plot(ecdf(prob6_data), main = "ecdf")
hist(prob6_data,breaks = 30)
qqnorm(prob6_data); qqline(prob6_data)
quantile(prob6_data, c(.10, .25, .5, .75, .9))
## 10% 25% 50% 75% 90%
## 13.676 14.070 14.570 15.115 15.470
From the above we can say the data is approximately Normal.
10 (?). a. Since \(X_{(k)}\) ~ \(Unif(0,1)\), \(F(x) = x\) and \(f(x)=1\), so its density simplifies to: \[f_k(x) = n\left(\begin{array} {rrr} n-1 \\ k-1 \end{array}\right) x^{k-1}(1-x)^{n-k}\]
Therefore: \[E[X] = \int_{-\infty}^{\infty}xn\left(\begin{array} {rrr} n-1 \\ k-1 \end{array}\right)x^{k-1}(1-x)^{n-k}dx\] \[=\int_{-\infty}^{\infty}n\left(\begin{array} {rrr} n-1 \\ k-1 \end{array}\right)x^{k}(1-x)^{n-k}dx\] Let l = k+1, s=n+1: \[E[X]=\int_{-\infty}^{\infty}(s-1)\left(\begin{array} {rrr} s-2 \\ l-2 \end{array}\right)x^{l-1}(1-x)^{s-l}dx\] \[=\int_{-\infty}^{\infty}\left(\begin{array} {rrr} s-1 \\ l-2 \end{array}\right)x^{l-1}(1-x)^{s-l}dx\] \[=\frac{l-1}{s}\int_{-\infty}^{\infty}\frac{s}{l-1}\left(\begin{array} {rrr} s-1 \\ l-2 \end{array}\right)x^{l-1}(1-x)^{s-l}dx\] \[=\frac{l-1}{s}\int_{-\infty}^{\infty}s\left(\begin{array} {rrr} s-1 \\ l-1 \end{array}\right)x^{l-1}(1-x)^{s-l}dx\] \[=\frac{l-1}{s}=\frac{k+1-1}{n+1}=\frac{k}{n+1}\]
Similarly, \[E(X^2)=\frac{k(k+1)}{(n+1)(n+2)}\] Thus, \[Var(X)=\frac{1}{n+2}(\frac{k}{n+1})(1-\frac{k}{n+1})\] b.
\[f(t) = \frac{d}{dt}F(t)=\alpha\beta e^{-\alpha t^\beta}t^{\beta-1}\] Thus, hazard function is: \[h(t) = \frac{f(t)}{1-F(t)}=\alpha\beta t^{\beta-1}\]
T ~ Unif(0,24) therefore, \(f(t)=1/24\), \(F(t)=t/24\) Hazard function: \(h(t)=1/(24-t)\)
h <- function(t) {
return (1/(24 -t))
}
t <- seq(0, 23, by = .01)
plot(t, h(t), type = 'l')
The longer time is, the larger the value of hazard function is. So if he’ve been waiting for 5 hours, it more likely he’d be released than just for 1 hour.
Because under the assumption the distribution is fixed, each observation can be considered as Bernouli trial in which such observation can get failure or success as a result. So the number of failure (outlier) will follow Binomial distribution
Here, suppose each trial, probability to be outlier is 5/26, then probability to get 10 or more outlier is:
1-pbinom(9, size=26, prob=5/26)
## [1] 0.01787622
Here I run a simulation:
set.seed(122)
ori_sample <- c(rep(1,21),rep(0,5))
boot_samples <- matrix(sample(ori_sample, 2600000, replace = TRUE),ncol=100000)
getCount <-function(x){
return (26-sum(x))
}
countsOfOutliers <- lapply(data.frame(boot_samples), getCount)
sum(countsOfOutliers >= 10)
## [1] 1875
It’s close to 1759/100000 = 1.76%
Now let’s similate:
ori_sample <- c(rep(1,21),rep(0,5))
boot_samples_1000 <- matrix(sample(ori_sample, 26000, replace = TRUE),ncol=1000)
countsOfOutliers <- lapply(data.frame(boot_samples_1000), getCount)
sum(countsOfOutliers >= 10)
## [1] 16
1-pbinom(26, size=26, prob=5/26)
## [1] 0
The probability is almost 0. Let’s simulate:
set.seed(123)
ori_sample <- c(rep(1,21),rep(0,5))
boot_samples <- matrix(sample(ori_sample, 2600000, replace = TRUE),ncol=100000)
countsOfOutliers <- lapply(data.frame(boot_samples), getCount)
sum(countsOfOutliers >= 26)
## [1] 0
Even I run 100,000 times, no sample can get 26 outliers.