Completed 16/16
Exponential distribution with \(\beta =95 \Rightarrow \lambda=1/95\).
#find the probability of at least 2 minutes = 120 seconds
1-pexp(120,1/95)
## [1] 0.2827597
phish <- read.csv("PHISHING.csv")
hist(phish$INTTIME, freq = FALSE)
curve(dexp(x,1/95), from = 0, to = 500, add = TRUE)
mean(phish$INTTIME)
## [1] 95.52377
sd(phish$INTTIME)
## [1] 91.53912
Yes. It does seem to follow the distribution of an Exponental with \(\beta=95\). Also, the mean and standard deviation approximately equal those of the distribution.
Maximum flood level (in millions of cubic feet per second) over a 4-year period for the Susquehanna River at Harrisburg, Pennsylvania, follows approximately a gamma distribution with \(\alpha=3\) and \(\beta=0.07\).
\[\mu= \alpha*\beta=3*0.07=0.21\] \[\sigma^2=\alpha*\beta^2=3*0.07*0.07=0.0147\]
curve(dgamma(x,3,shape=1/0.07),xlim=c(0, 20))
A value of 0.6 million cubic feet per second wouldn’t be compatible with this distribution. I can infer that from their data that the maximum flood level never got as high as 0.4 million cubic feet per second.
Formula A ~ Gamma(\(\alpha_A=2\), \(\beta_A=2\))
Formula B ~ Gamma(\(\alpha_B=1\), \(\beta_B=4\))
For formula A the mean is \[\mu_A=\alpha_A*\beta_A=2*2=4\ minutes\]
For formula B the mean is \[\mu_B=\alpha_B*\beta_B=1*4=4\ minutes\]
For formula A the variance is \[\mu_A=\alpha_A*\beta_A^2=2*4=8\ minutes^2\]
For formula B the mean is \[\mu_B=\alpha_B*\beta_B^2=1*16=16\ minutes^2\]
pgamma(1,2,0.5)
## [1] 0.09020401
pgamma(1,1,0.25)
## [1] 0.2211992
According to r Formula B has a higher probability of generating a human reaction in less than a minute than Formual A.
Time until major repair required(years) ~ Weibull(\(alpha=2\),\(\beta=4\))
pweibull(2,2,2)
## [1] 0.6321206
0.632 of new washers will have to be repaired within the guarantee.
alpha=2
beta=4
mean=((beta)^(1/alpha))*gamma((alpha+1)/alpha)
mean
## [1] 1.772454
var= ((beta)^(2/alpha))*(gamma((alpha+2)/alpha)-gamma((alpha+1)/alpha)*gamma((alpha+1)/alpha))
stdev=sqrt(var)
stdev
## [1] 0.9265028
pweibull(mean+2*stdev,2,2)-pweibull(mean-2*stdev,2,2)
## [1] 0.9625964
mean-2*stdev
## [1] -0.08055165
mean+2*stdev
## [1] 3.625459
Considering that about 96% of the washers will need to be repaired before the 4 year mark, it is unlikely that very many washers will make it to the 6 year mark.
Y~Beta(\(\alpha=2\), \(\beta=9\))
\[\mu=\frac{\alpha}{\alpha+\beta}=\frac{2}{2+9}=\frac{2}{11}\approx 0.18182\]
\[\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}=\frac{18}{121*12}\approx0.01239\]
1-pbeta(0.4,2,9)
## [1] 0.0463574
pbeta(0.1,2,9)
## [1] 0.2639011
Y~Weibull
\[f=\left\{ \begin{array}{ll} \frac{1}{8}ye^\frac{-y^2}{16} & \quad 0 \leq y < \infty \\ 0 & \quad elsewhere \end{array} \right.\]
A Weibull distribution is given by \[f=\left\{ \begin{array}{ll} \frac{\alpha}{\beta}y^{\alpha-1}e^\frac{-y^\alpha}{\beta} & \quad 0 \leq y < \infty \\ 0 & \quad elsewhere \end{array} \right.\]
Thus, it follows that \(\alpha=2\) and \(\beta=16\).
alpha=2
beta=16
mean=((beta)^(1/alpha))*gamma((alpha+1)/alpha)
mean
## [1] 3.544908
var= ((beta)^(2/alpha))*(gamma((alpha+2)/alpha)-gamma((alpha+1)/alpha)*gamma((alpha+1)/alpha))
var
## [1] 3.433629
1-pweibull(6,2,8)
## [1] 0.5697828
There is a 56% probability that the chip won’t fail before 6 years.
X is the outcome (number of dots on face) of the first dice. Y is the outcome (number of dots on face) of the second dice.
Find \(p(x,y)\)
\[p(x,y)=\frac{x*\frac{1}{x}}{6}*\frac{y*\frac{1}{y}}{6}=\frac{1}{36}\]
Find \(p_1(x)\) and \(p_2(y)\)
\[P_1(x)=6*p(1,y)=6*p(2,y)=...=6*p(6,y)=\frac{1}{6}\]
\[P_2(y)=6*p(x,1)=6*p(x,2)=...=6*p(x,6)=\frac{1}{6}\]
find \(p_1(x|y)\) and \(p_2(y|x)\)
\[p_1(x|y)=\frac{p(x,y)}{p_2(y)}=\frac{\frac{1}{36}}{\frac{1}{6}}=\frac{6}{36}=\frac{1}{6}\]
\[p_2(y|x)=\frac{p(x,y)}{p_1(x)}=\frac{\frac{1}{36}}{\frac{1}{6}}=\frac{6}{36}=\frac{1}{6}\]
The probabilities from parts b and c are the same. This is due to the fact that x and y are independent of one another.
tab <- matrix(c(1,2,3,4,5,6,7,3,1,3,2,3,3,2,1,1,3,1,2,2,1), ncol=3, byrow=FALSE)
colnames(tab) <- c('Particle ID','Energy Level','Time Period')
tab <- as.table(tab)
tab
## Particle ID Energy Level Time Period
## A 1 3 1
## B 2 1 1
## C 3 3 3
## D 4 2 1
## E 5 3 2
## F 6 3 2
## G 7 2 1
For a random particle X represents the Energy Level and Y represents the Time Period
The probability distribution for the above data is given by the following table where X is columns and Y is the rows
tab <- matrix(c(1/7,2/7,1/7,0,0,2/7,0,0,1/7), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('1','2','3')
tab <- as.table(tab)
addmargins(tab)
## 1 2 3 Sum
## 1 0.1428571 0.2857143 0.1428571 0.5714286
## 2 0.0000000 0.0000000 0.2857143 0.2857143
## 3 0.0000000 0.0000000 0.1428571 0.1428571
## Sum 0.1428571 0.2857143 0.5714286 1.0000000
The marginal distribution \(p_1(x)\) is
tab <- matrix(c(1/7,2/7,4/7), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_1(x)')
tab <- as.table(tab)
tab
## 1 2 3
## p_1(x) 0.1428571 0.2857143 0.5714286
The marginal distribution \(p_2(y)\) is
tab <- matrix(c(4/7,2/7,1/7), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_2(y)')
tab <- as.table(tab)
tab
## 1 2 3
## p_2(y) 0.5714286 0.2857143 0.1428571
Find \(p_2(y|x)\) for the data.
#when x=1, y only has one value
tab <- matrix(c(1,0,0), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_2(y|1)')
tab <- as.table(tab)
tab
## 1 2 3
## p_2(y|1) 1 0 0
#when x=2, y has 2 values
tab <- matrix(c(2/2,0,0), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_2(y|2)')
tab <- as.table(tab)
tab
## 1 2 3
## p_2(y|2) 1 0 0
#when x=3, y has 4 values
tab <- matrix(c(1/4,2/4,1/4), ncol=3, byrow=TRUE)
colnames(tab) <- c('1','2','3')
rownames(tab) <- c('p_2(y|3)')
tab <- as.table(tab)
tab
## 1 2 3
## p_2(y|3) 0.25 0.50 0.25
Let \(X=\) low bid (thousands of dollars) and let \(Y=\) estimate of fair cost of building the road (thousands of dollars).The joint probability density of X and Y is \[f(x,y)=\frac{e^{\frac{-y^2}{10}}}{10y}\] \(0<y<x<2y\)
\[f_2(y)=\int_{-\infty}^\infty f(x,y) dx= \int_{y}^{2y} \frac{e^{\frac{-y^2}{10}}}{10y} dx= \frac{e^{\frac{-y^2}{10}}}{10y}[x]_{y}^{2y}=\frac{e^{\frac{-y^2}{10}}}{10}\]
The resulting distribution is an Exponential Distribution.
Y~Exp(\(\beta=10\))
\[\mu_y=E(Y)=\beta=10\]
The joint density of X, the total time (in minutes) between an automobile’s arrival in the service queue and its leaving the system after servicing, and Y, the time (in minutes) the car waits in the queue before being serviced, is
\[f(x,y)=\left\{ \begin{array}{ll} ce^{-x^2} & \quad 0 \leq y \leq x; 0 \leq x <\infty \\ 0 & \quad elsewhere \end{array} \right.\]
\[\int_{-\infty}^\infty\int_{-\infty}^\infty f(x,y)dx dy=\int_{0}^\infty\int_{0}^x ce^{-x^2}dy dx=\int_{0}^\infty cxe^{-x^2}dx=-c\frac{e^{-x2}}{2}|_{0}^\infty=\frac{1}{2}c=1\]
Thus, \(c=2\).
\[f_1(x)=\int_{-\infty}^\infty f(x,y) dy= \int_{0}^x 2e^{-x^2}dy=2xe^{-x^2}\]
\[\int_{-\infty}^\infty f_1(x) dx= \int_{0}^\infty 2xe^{-x^2}dx=-\int_{0}^{-\infty} e^u du= \int_{-\infty}^0 e^udu=e^u|_{-\infty}^0=1-0=1\]
\[f_2(y|x)=\frac{f(x,y)}{f_1(x)}=\frac{2e^{-x2}}{2xe^{-x2}}=\frac{1}{x}\]
As an illustration of why the converse of Theorem 6.6 is not true, consider the joint distribution of two discrete random variables, X and Y, shown in the accompanying table. Show that \(Cov(x,y)=0\), but that X and Y are dependent.
\[Cov(x,y)=E(XY)-E(X)E(Y)\]
\[E(XY)=\sum_x\sum_y xyp(x,y)=0\]
\[E(x)=\sum_x xp_1(x)=0\]
\[E(y)=\sum_y yp_2(y)=0\]
Thus \(Cov(X,Y)=0\), but \(p(x,y) \neq p_1(x)*p_2(y)\). Therefore, X and Y are dependent.
Y~Unif(a=1,b=3) n=60 \(\bar Y=\frac{\sum_{i=1}^n Y_i}{n}\)
\[E(\bar Y)=\frac{n\mu_Y}{n}=\mu_Y=2\]
\[V(\bar Y)=(\frac{1}{n})^2(n\sigma_Y^2)=\frac{\sigma_Y^2}{n}=\frac{(3-1)^2}{12n}\approx 0.005556\]
By the Central Limit Theorem the sampling distribution will be a Normal Distribution.
pnorm(2.5,2,4/720)-pnorm(1.5,2,4/720)
## [1] 1
1-pnorm(2.2,2,4/720)
## [1] 0
Y~Bin(n=20,p=0.4)
Consider a random sample of 20 swordfish pieces from New York and Chicago supermarkets.
\[P(Y_d \leq 2)\approx P(Y_c \leq 2.5)\]
n=20
p=0.4
mean=n*p
stdev=sqrt(n*p*(1-p))
pnorm(2.5,mean,stdev)
## [1] 0.006029808
1-pnorm(10.5,mean,stdev)
## [1] 0.1269165
pbinom(2,20,0.4)
## [1] 0.003611472
1-pbinom(10,20,0.4)
## [1] 0.1275212
lc <- read.csv("LEADCOPP.csv")
lead <- lc$LEAD
t.test(lead,conf.level=0.99)
##
## One Sample t-test
##
## data: lead
## t = 2.325, df = 9, p-value = 0.04512
## alternative hypothesis: true mean is not equal to 0
## 99 percent confidence interval:
## -1.147845 6.919045
## sample estimates:
## mean of x
## 2.8856
lc <- read.csv("LEADCOPP.csv")
copp <- lc$COPPER
t.test(copp,conf.level=0.99)
##
## One Sample t-test
##
## data: copp
## t = 5.1746, df = 9, p-value = 0.0005836
## alternative hypothesis: true mean is not equal to 0
## 99 percent confidence interval:
## 0.1518726 0.6647274
## sample estimates:
## mean of x
## 0.4083
The 99% of the confidence intervals will contain the true mean of the level of lead.
The 99% of the confidence intervals will contain the true mean of the level of copper.
We must assume that given a large number of these intervals 99% of them will contain the actual value and 1% of them won’t contain the actual value.
sola <- read.csv("SOLARAD.csv")
mo <- sola$STJOS
t.test(mo,n=64,conf.level=0.95)
##
## One Sample t-test
##
## data: mo
## t = 8.4147, df = 6, p-value = 0.0001535
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 889.0457 1618.0971
## sample estimates:
## mean of x
## 1253.571
iowa <- sola$IOWA
t.test(iowa,n=64,conf.level=0.95)
##
## One Sample t-test
##
## data: iowa
## t = 6.6258, df = 6, p-value = 0.0005696
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 665.7471 1445.3958
## sample estimates:
## mean of x
## 1055.571
This means that the mean value of solar irradation for both cities can be found within their respective interval 95% of the time.
dia <- read.csv("DIAZINON.csv")
day <- dia$DAY
night <- dia$NIGHT
t.test(day,conf.level=0.9)
##
## One Sample t-test
##
## data: day
## t = 4.1995, df = 10, p-value = 0.00183
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
## 7.637401 19.235326
## sample estimates:
## mean of x
## 13.43636
t.test(night,conf.level = 0.9)
##
## One Sample t-test
##
## data: night
## t = 4.693, df = 10, p-value = 0.0008506
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
## 32.12916 72.56175
## sample estimates:
## mean of x
## 52.34545
We must assume that 90% of the intervals created contain the true value.
Yes. The mean diazinon levels do differ from day to night.