Assignment 3-2021

Completed 16/16

1

Exponential distribution with \(\beta =95 \Rightarrow \lambda=1/95\).

a

#find the probability of at least 2 minutes = 120 seconds
1-pexp(120,1/95)

## [1] 0.2827597

b

phish <- read.csv("PHISHING.csv")
hist(phish$INTTIME, freq = FALSE)
curve(dexp(x,1/95), from = 0, to  = 500, add = TRUE)

mean(phish$INTTIME)

## [1] 95.52377

sd(phish$INTTIME)

## [1] 91.53912

Yes. It does seem to follow the distribution of an Exponental with \(\beta=95\). Also, the mean and standard deviation approximately equal those of the distribution.

2

Maximum flood level (in millions of cubic feet per second) over a 4-year period for the Susquehanna River at Harrisburg, Pennsylvania, follows approximately a gamma distribution with \(\alpha=3\) and \(\beta=0.07\).

a

\[\mu= \alpha*\beta=3*0.07=0.21\] \[\sigma^2=\alpha*\beta^2=3*0.07*0.07=0.0147\]

b

curve(dgamma(x,3,shape=1/0.07),xlim=c(0, 20))

A value of 0.6 million cubic feet per second wouldn’t be compatible with this distribution. I can infer that from their data that the maximum flood level never got as high as 0.4 million cubic feet per second.

3

Formula A ~ Gamma(\(\alpha_A=2\), \(\beta_A=2\))

Formula B ~ Gamma(\(\alpha_B=1\), \(\beta_B=4\))

a

For formula A the mean is \[\mu_A=\alpha_A*\beta_A=2*2=4\ minutes\]

For formula B the mean is \[\mu_B=\alpha_B*\beta_B=1*4=4\ minutes\]

b

For formula A the variance is \[\mu_A=\alpha_A*\beta_A^2=2*4=8\ minutes^2\]

For formula B the mean is \[\mu_B=\alpha_B*\beta_B^2=1*16=16\ minutes^2\]

c

pgamma(1,2,0.5)

## [1] 0.09020401

pgamma(1,1,0.25)

## [1] 0.2211992

According to r Formula B has a higher probability of generating a human reaction in less than a minute than Formual A.

4

Time until major repair required(years) ~ Weibull(\(alpha=2\),\(\beta=4\))

a

pweibull(2,2,2)

## [1] 0.6321206

0.632 of new washers will have to be repaired within the guarantee.

b

alpha=2
beta=4

mean=((beta)^(1/alpha))*gamma((alpha+1)/alpha)
mean

## [1] 1.772454

var= ((beta)^(2/alpha))*(gamma((alpha+2)/alpha)-gamma((alpha+1)/alpha)*gamma((alpha+1)/alpha))
stdev=sqrt(var)
stdev

## [1] 0.9265028

c

pweibull(mean+2*stdev,2,2)-pweibull(mean-2*stdev,2,2)

## [1] 0.9625964

d

mean-2*stdev

## [1] -0.08055165

mean+2*stdev

## [1] 3.625459

Considering that about 96% of the washers will need to be repaired before the 4 year mark, it is unlikely that very many washers will make it to the 6 year mark.

5

Y~Beta(\(\alpha=2\), \(\beta=9\))

a

\[\mu=\frac{\alpha}{\alpha+\beta}=\frac{2}{2+9}=\frac{2}{11}\approx 0.18182\]

\[\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}=\frac{18}{121*12}\approx0.01239\]

b

1-pbeta(0.4,2,9)

## [1] 0.0463574

c

pbeta(0.1,2,9)

## [1] 0.2639011

6

Y~Weibull

\[f=\left\{ \begin{array}{ll} \frac{1}{8}ye^\frac{-y^2}{16} & \quad 0 \leq y < \infty \\ 0 & \quad elsewhere \end{array} \right.\]

a

A Weibull distribution is given by \[f=\left\{ \begin{array}{ll} \frac{\alpha}{\beta}y^{\alpha-1}e^\frac{-y^\alpha}{\beta} & \quad 0 \leq y < \infty \\ 0 & \quad elsewhere \end{array} \right.\]

Thus, it follows that \(\alpha=2\) and \(\beta=16\).

b

alpha=2
beta=16

mean=((beta)^(1/alpha))*gamma((alpha+1)/alpha)
mean

## [1] 3.544908

var= ((beta)^(2/alpha))*(gamma((alpha+2)/alpha)-gamma((alpha+1)/alpha)*gamma((alpha+1)/alpha))
var

## [1] 3.433629

c

1-pweibull(6,2,8)

## [1] 0.5697828

There is a 56% probability that the chip won’t fail before 6 years.

7

X is the outcome (number of dots on face) of the first dice. Y is the outcome (number of dots on face) of the second dice.

a

Find \(p(x,y)\)

\[p(x,y)=\frac{x*\frac{1}{x}}{6}*\frac{y*\frac{1}{y}}{6}=\frac{1}{36}\]

b

Find \(p_1(x)\) and \(p_2(y)\)

\[P_1(x)=6*p(1,y)=6*p(2,y)=...=6*p(6,y)=\frac{1}{6}\]

\[P_2(y)=6*p(x,1)=6*p(x,2)=...=6*p(x,6)=\frac{1}{6}\]

c

find \(p_1(x|y)\) and \(p_2(y|x)\)

\[p_1(x|y)=\frac{p(x,y)}{p_2(y)}=\frac{\frac{1}{36}}{\frac{1}{6}}=\frac{6}{36}=\frac{1}{6}\]

\[p_2(y|x)=\frac{p(x,y)}{p_1(x)}=\frac{\frac{1}{36}}{\frac{1}{6}}=\frac{6}{36}=\frac{1}{6}\]

d

The probabilities from parts b and c are the same. This is due to the fact that x and y are independent of one another.

8

tab <- matrix(c(1,2,3,4,5,6,7,3,1,3,2,3,3,2,1,1,3,1,2,2,1), ncol=3, byrow=FALSE)
colnames(tab) <- c('Particle ID','Energy Level','Time Period')
tab <- as.table(tab)
tab

##   Particle ID Energy Level Time Period
## A           1            3           1
## B           2            1           1
## C           3            3           3
## D           4            2           1
## E           5            3           2
## F           6            3           2
## G           7            2           1

For a random particle X represents the Energy Level and Y represents the Time Period

a

The probability distribution for the above data is given by the following table where X is columns and Y is the rows

tab <- matrix(c(1/7,2/7,1/7,0,0,2/7,0,0,1/7), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('1','2','3')
tab <- as.table(tab)
addmargins(tab)

##             1         2         3       Sum
## 1   0.1428571 0.2857143 0.1428571 0.5714286
## 2   0.0000000 0.0000000 0.2857143 0.2857143
## 3   0.0000000 0.0000000 0.1428571 0.1428571
## Sum 0.1428571 0.2857143 0.5714286 1.0000000

b

The marginal distribution \(p_1(x)\) is

tab <- matrix(c(1/7,2/7,4/7), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_1(x)')
tab <- as.table(tab)
tab

##                1         2         3
## p_1(x) 0.1428571 0.2857143 0.5714286

c

The marginal distribution \(p_2(y)\) is

tab <- matrix(c(4/7,2/7,1/7), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_2(y)')
tab <- as.table(tab)
tab

##                1         2         3
## p_2(y) 0.5714286 0.2857143 0.1428571

d

Find \(p_2(y|x)\) for the data.

#when x=1, y only has one value
tab <- matrix(c(1,0,0), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_2(y|1)')
tab <- as.table(tab)
tab

##          1 2 3
## p_2(y|1) 1 0 0

#when x=2, y has 2 values
tab <- matrix(c(2/2,0,0), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_2(y|2)')
tab <- as.table(tab)
tab

##          1 2 3
## p_2(y|2) 1 0 0

#when x=3, y has 4 values 
tab <- matrix(c(1/4,2/4,1/4), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_2(y|3)')
tab <- as.table(tab)
tab

##             1    2    3
## p_2(y|3) 0.25 0.50 0.25

9

Let \(X=\) low bid (thousands of dollars) and let \(Y=\) estimate of fair cost of building the road (thousands of dollars).The joint probability density of X and Y is \[f(x,y)=\frac{e^{\frac{-y^2}{10}}}{10y}\] \(0<y<x<2y\)

a

\[f_2(y)=\int_{-\infty}^\infty f(x,y) dx= \int_{y}^{2y} \frac{e^{\frac{-y^2}{10}}}{10y} dx= \frac{e^{\frac{-y^2}{10}}}{10y}[x]_{y}^{2y}=\frac{e^{\frac{-y^2}{10}}}{10}\]

The resulting distribution is an Exponential Distribution.

b

Y~Exp(\(\beta=10\))

\[\mu_y=E(Y)=\beta=10\]

10

The joint density of X, the total time (in minutes) between an automobile’s arrival in the service queue and its leaving the system after servicing, and Y, the time (in minutes) the car waits in the queue before being serviced, is

\[f(x,y)=\left\{ \begin{array}{ll} ce^{-x^2} & \quad 0 \leq y \leq x; 0 \leq x <\infty \\ 0 & \quad elsewhere \end{array} \right.\]

a

\[\int_{-\infty}^\infty\int_{-\infty}^\infty f(x,y)dx dy=\int_{0}^\infty\int_{0}^x ce^{-x^2}dy dx=\int_{0}^\infty cxe^{-x^2}dx=-c\frac{e^{-x2}}{2}|_{0}^\infty=\frac{1}{2}c=1\]

Thus, \(c=2\).

b

\[f_1(x)=\int_{-\infty}^\infty f(x,y) dy= \int_{0}^x 2e^{-x^2}dy=2xe^{-x^2}\]

\[\int_{-\infty}^\infty f_1(x) dx= \int_{0}^\infty 2xe^{-x^2}dx=-\int_{0}^{-\infty} e^u du= \int_{-\infty}^0 e^udu=e^u|_{-\infty}^0=1-0=1\]

c

\[f_2(y|x)=\frac{f(x,y)}{f_1(x)}=\frac{2e^{-x2}}{2xe^{-x2}}=\frac{1}{x}\]

11

As an illustration of why the converse of Theorem 6.6 is not true, consider the joint distribution of two discrete random variables, X and Y, shown in the accompanying table. Show that \(Cov(x,y)=0\), but that X and Y are dependent.

\[Cov(x,y)=E(XY)-E(X)E(Y)\]

\[E(XY)=\sum_x\sum_y xyp(x,y)=0\]

\[E(x)=\sum_x xp_1(x)=0\]

\[E(y)=\sum_y yp_2(y)=0\]

Thus \(Cov(X,Y)=0\), but \(p(x,y) \neq p_1(x)*p_2(y)\). Therefore, X and Y are dependent.

12

Y~Unif(a=1,b=3) n=60 \(\bar Y=\frac{\sum_{i=1}^n Y_i}{n}\)

a

\[E(\bar Y)=\frac{n\mu_Y}{n}=\mu_Y=2\]

b

\[V(\bar Y)=(\frac{1}{n})^2(n\sigma_Y^2)=\frac{\sigma_Y^2}{n}=\frac{(3-1)^2}{12n}\approx 0.005556\]

c

By the Central Limit Theorem the sampling distribution will be a Normal Distribution.

d

pnorm(2.5,2,4/720)-pnorm(1.5,2,4/720)

## [1] 1

e

1-pnorm(2.2,2,4/720)

## [1] 0

13

Y~Bin(n=20,p=0.4)

Consider a random sample of 20 swordfish pieces from New York and Chicago supermarkets.

a

\[P(Y_d \leq 2)\approx P(Y_c \leq 2.5)\]

n=20
p=0.4
mean=n*p
stdev=sqrt(n*p*(1-p))
pnorm(2.5,mean,stdev)

## [1] 0.006029808

b

1-pnorm(10.5,mean,stdev)

## [1] 0.1269165

c

pbinom(2,20,0.4)

## [1] 0.003611472

1-pbinom(10,20,0.4)

## [1] 0.1275212

14

a

lc <- read.csv("LEADCOPP.csv")
lead <- lc$LEAD
t.test(lead,conf.level=0.99)

## 
##  One Sample t-test
## 
## data:  lead
## t = 2.325, df = 9, p-value = 0.04512
## alternative hypothesis: true mean is not equal to 0
## 99 percent confidence interval:
##  -1.147845  6.919045
## sample estimates:
## mean of x 
##    2.8856

b

lc <- read.csv("LEADCOPP.csv")
copp <- lc$COPPER
t.test(copp,conf.level=0.99)

## 
##  One Sample t-test
## 
## data:  copp
## t = 5.1746, df = 9, p-value = 0.0005836
## alternative hypothesis: true mean is not equal to 0
## 99 percent confidence interval:
##  0.1518726 0.6647274
## sample estimates:
## mean of x 
##    0.4083

c

The 99% of the confidence intervals will contain the true mean of the level of lead.

The 99% of the confidence intervals will contain the true mean of the level of copper.

d

We must assume that given a large number of these intervals 99% of them will contain the actual value and 1% of them won’t contain the actual value.

15

sola <- read.csv("SOLARAD.csv")

mo <- sola$STJOS
t.test(mo,n=64,conf.level=0.95)

## 
##  One Sample t-test
## 
## data:  mo
## t = 8.4147, df = 6, p-value = 0.0001535
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   889.0457 1618.0971
## sample estimates:
## mean of x 
##  1253.571

iowa <- sola$IOWA
t.test(iowa,n=64,conf.level=0.95)

## 
##  One Sample t-test
## 
## data:  iowa
## t = 6.6258, df = 6, p-value = 0.0005696
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   665.7471 1445.3958
## sample estimates:
## mean of x 
##  1055.571

This means that the mean value of solar irradation for both cities can be found within their respective interval 95% of the time.

16

dia <- read.csv("DIAZINON.csv")

a

day <- dia$DAY
night <- dia$NIGHT

t.test(day,conf.level=0.9)

## 
##  One Sample t-test
## 
## data:  day
## t = 4.1995, df = 10, p-value = 0.00183
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
##   7.637401 19.235326
## sample estimates:
## mean of x 
##  13.43636

t.test(night,conf.level = 0.9)

## 
##  One Sample t-test
## 
## data:  night
## t = 4.693, df = 10, p-value = 0.0008506
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
##  32.12916 72.56175
## sample estimates:
## mean of x 
##  52.34545

b

We must assume that 90% of the intervals created contain the true value.

c

Yes. The mean diazinon levels do differ from day to night.

Assignment 3-2021

Caleb Gray

11/1/2021

1

a

b

2

a

b

3

a

b

c

4

a

b

c

d

5

a

b

c

6

a

b

c

7

a

b

c

d

8

a

b

c

d

9

a

b

10

a

b

c

11

12

a

b

c

d

e

13

a

b

c

14

a

b

c

d

15

16

a

b

c