STAT6502 HW1

6.

prob6_data <- c(14.27,15.15,13.98,15.40,14.04,14.10,13.75,14.23,14.80,
             13.98,14.47,14.68,13.68,15.47,14.87,14.44,12.28,
             14.90,14.65,13.33,15.31,13.73,15.28,14.57,17.09,15.91,
             14.73,14.41,14.32,13.65,14.43,15.10,14.52,15.18,
             14.19,13.64,15.02,13.96,12.92,15.63,14.49,15.21,14.77,
             14.01,14.57,15.56,13.83,14.56,14.75,14.30,14.92,15.49,
             15.38,13.66,15.03,14.41,14.62,15.47,15.13)
Fn <- ecdf(prob6_data)
Fn_values <- knots(Fn)
plot(ecdf(prob6_data), main = "ecdf")

hist(prob6_data,breaks = 30)

qqnorm(prob6_data); qqline(prob6_data)

quantile(prob6_data, c(.10, .25, .5, .75, .9))

##    10%    25%    50%    75%    90% 
## 13.676 14.070 14.570 15.115 15.470

From the above we can say the data is approximately Normal.

10 (?). a. Since \(X_{(k)}\) ~ \(Unif(0,1)\), \(F(x) = x\) and \(f(x)=1\), so its density simplifies to: \[f_k(x) = n\left(\begin{array} {rrr} n-1 \\ k-1 \end{array}\right) x^{k-1}(1-x)^{n-k}\]

Therefore: \[E[X] = \int_{-\infty}^{\infty}xn\left(\begin{array} {rrr} n-1 \\ k-1 \end{array}\right)x^{k-1}(1-x)^{n-k}dx\] \[=\int_{-\infty}^{\infty}n\left(\begin{array} {rrr} n-1 \\ k-1 \end{array}\right)x^{k}(1-x)^{n-k}dx\] Let l = k+1, s=n+1: \[E[X]=\int_{-\infty}^{\infty}(s-1)\left(\begin{array} {rrr} s-2 \\ l-2 \end{array}\right)x^{l-1}(1-x)^{s-l}dx\] \[=\int_{-\infty}^{\infty}\left(\begin{array} {rrr} s-1 \\ l-2 \end{array}\right)x^{l-1}(1-x)^{s-l}dx\] \[=\frac{l-1}{s}\int_{-\infty}^{\infty}\frac{s}{l-1}\left(\begin{array} {rrr} s-1 \\ l-2 \end{array}\right)x^{l-1}(1-x)^{s-l}dx\] \[=\frac{l-1}{s}\int_{-\infty}^{\infty}s\left(\begin{array} {rrr} s-1 \\ l-1 \end{array}\right)x^{l-1}(1-x)^{s-l}dx\] \[=\frac{l-1}{s}=\frac{k+1-1}{n+1}=\frac{k}{n+1}\]

Similarly, \[E(X^2)=\frac{k(k+1)}{(n+1)(n+2)}\] Thus, \[Var(X)=\frac{1}{n+2}(\frac{k}{n+1})(1-\frac{k}{n+1})\] b.

\[f(t) = \frac{d}{dt}F(t)=\alpha\beta e^{-\alpha t^\beta}t^{\beta-1}\] Thus, hazard function is: \[h(t) = \frac{f(t)}{1-F(t)}=\alpha\beta t^{\beta-1}\]
T ~ Unif(0,24) therefore, \(f(t)=1/24\), \(F(t)=t/24\) Hazard function: \(h(t)=1/(24-t)\)

h <- function(t) {
  return (1/(24 -t))
}

t <- seq(0, 23, by = .01)
plot(t, h(t), type = 'l')

The longer time is, the larger the value of hazard function is. So if he’ve been waiting for 5 hours, it more likely he’d be released than just for 1 hour.

Because under the assumption the distribution is fixed, each observation can be considered as Bernouli trial in which such observation can get failure or success as a result. So the number of failure (outlier) will follow Binomial distribution
Here, suppose each trial, probability to be outlier is 5/26, then probability to get 10 or more outlier is:

1-pbinom(9, size=26, prob=5/26)

## [1] 0.01787622

Here I run a simulation:

set.seed(122)
ori_sample <- c(rep(1,21),rep(0,5))
boot_samples <- matrix(sample(ori_sample, 2600000, replace = TRUE),ncol=100000)
getCount <-function(x){
  return (26-sum(x))
}
countsOfOutliers <- lapply(data.frame(boot_samples), getCount)
sum(countsOfOutliers >= 10)

## [1] 1875

It’s close to 1759/100000 = 1.76%

1000 * 0.01787622 = roughly 18 times

Now let’s similate:

ori_sample <- c(rep(1,21),rep(0,5))
boot_samples_1000 <- matrix(sample(ori_sample, 26000, replace = TRUE),ncol=1000)
countsOfOutliers <- lapply(data.frame(boot_samples_1000), getCount)
sum(countsOfOutliers >= 10)

## [1] 16

1-pbinom(26, size=26, prob=5/26)

## [1] 0

The probability is almost 0. Let’s simulate:

set.seed(123)
ori_sample <- c(rep(1,21),rep(0,5))
boot_samples <- matrix(sample(ori_sample, 2600000, replace = TRUE),ncol=100000)
countsOfOutliers <- lapply(data.frame(boot_samples), getCount)
sum(countsOfOutliers >= 26)

## [1] 0

Even I run 100,000 times, no sample can get 26 outliers.

STAT6502 HW1

Jie Hu

January 17, 2017

5.

6.