HW 2 Solutions

We are given $P(C)=0.6$ and $P(C \cap J)=0.2$.

$P(J|C)=\frac{P(J\cap C)}{P(C)}=\frac{0.2}{0.6}=\frac{1}{3}$
Given $P(C|J)=0.4$, want $P(J)$. Since

\[0.4 = P(C|J)=\frac{P(C\cap J)}{P(J)}=\frac{0.2}{P(J)}\] \[P(J)=0.5\] 2. Denote “Left-handed” by “LH”, “Red-Green color-blinded” by “RG”, “Blue-Yellow color-blinded” by “BY”, “Completely color-blinded” by “Complete”, and “Any kind color-blinded” by “Color”. We are given $P(LH) = 0.11, P(RG)=0.0560, P(BY)=0.0224$, and $P(Complete)=0.0016$. We also know $P(Color)=0.11+0.0560+0.0224=0.08$.

$P(No ~color-blindness)=1-0.0560-0.0224-0.0016=0.92$
Given $P(LH\cap Color)=0.0091$, want $P(LH \cup Color)$. Since $P(LH\cup Color)=P(LH)+P(Color)-P(LH\cap Color)=0.11 + 0.08-0.0091=0.1809$
$P(RH\cap Color)=P(RH)\cdot P(Color)=(1-0.11)\cdot (0.08)=0.0712.$
Want $P(Color|LH)$. Since “Color” and “LH” are independent, $P(Color|LH)=P(Color)=0.08$.

HW4

Suppose $f(x)=2e^{-2x}, x>0$ is a probability density function of a random variable $X$. Determine

$P(X< 1)$

$P(X>1)$

$P(1<X<2)

the mean $\mu = E(X)$, and

the stanadrd deviation $\sigma$

Solution.

$P(X<1)=\int_0^1 2e^{-2x}dx=1-e^{-2}=0.8647$
$P(X>1)=\int_1^{\infty}2e^{-2x}dx=e^{-2}=0.1353$
$P(1<X<2)=\int_1^{2}2e^{-2x}dx=e^{-2}-e^{-4}=0.1170$
Since the distribution is exponential, the mean $\mu = \frac{1}{2}$, and
the standard deviation is the same as the mean for exponential distributions.

Suppose $f(x)=\frac{1}{3}x^2, -1<x<2$ is a probability density function of a random variable $X$. Determine

$P(X< 1)$

$P(X>1)$

$P(0<X<1)

the mean $\mu = E(X)$,

the standard deviation $\sigma$, and

$x$ such that $P(x<X)=0.05$

Solution.

$P(X<1)=\int_{-1}^1 \frac{1}{3}x^2dx=\frac{2}{9}$
$P(X>1)=\int_1^{2}\frac{1}{3}x^2dx=\frac{7}{9}$
$P(0<X<1)=\int_0^{1}\frac{1}{3}x^2dx=\frac{1}{9}$
$\mu = \int_{-1}^2 x\cdot \frac{1}{3}x^2dx=\frac{5}{4}$, and
The variance $\sigma^2=\int_{-1}^2 x^2\cdot \frac{1}{3}x^2dx-\mu^2=\frac{80}{51}$, so the standard deviation is the $\sqrt{\frac{80}{51}}\approx 1.2524$.
Since $x$ will have to be between $-1$ and 2, $P(x<X)=\int_x^{2}\frac{1}{3}x^2dx=\frac{8-x^3}{9}$. Setting $\frac{8-x^3}{9}=0.05$ yields $x\approx 1.9618$.

Suppose the cumulative distribution function of $X$ is

\[\begin{equation} F(x)= \begin{cases} 0, & x\le -1 \\ \frac{x+1}{3}, & -1<x<2\\ 1, & x\ge 2\\ \end{cases} \end{equation}\]

find

$P(X>1)$

$P(X<1.4)$

$P(0<X<1)$

$P(X>3)$

Solution.

$P(X>1)=1-P(X\le 1)=1-F(1)=1-\frac{1+1}{3}=\frac{1}{3}$
$P(X<1.4)\stackrel{\text{since X is continuous random variable}}= P(X\le 1.4)=F(1.4)=0.8$
$P(0<X<1)\stackrel{\text{since X is continuous random variable}}=F(1)-F(0)=\frac{2}{3}-\frac{1}{3}=\frac{1}{3}$
$P(X>3)=1-P(X\le 3)=1-F(3)=1-1=0$

Suppose $f(x)=2e^{-2x}, x>0$ is a probability density function of a random variable $X$. Determine its cumulative distribution function $F(x)$.

Solution.

For $x \le 0$, $F(x)=0$; for $x >0$, $F(x)=\int_0^x 2e^{-2x}dx=1-e^{-2x}$.

\[\begin{equation} F(x)= \begin{cases} 0, & x\le 0 \\ 1-e^{-2x}, & x>0\\ \end{cases} \end{equation}\]

Suppose $X$ is normally distributed with mean 100 and standard deviation 15. Find

$P(X<90)$.

$P(X>120)$

Solution.

$P(X<90)=P(X-\mu<90-\mu)=P(\frac{X-\mu}{\sigma}<\frac{90-\mu}{\sigma})=P(Z<-0.67)=0.2514$.
$P(X>90)=1-P(X\le 90)=1-0.2514=0.7486$

The time between calls is exponentially distributed with mean 15 minutes. Find

the probability that there is no call within 25 minutes.

the probability that there is at least one call within 10 minutes.

the probability that the first call arrives within 15 to 20 minutes after opening.

the length of an interval of time such that the probability of at least one call is in the interval is 0.8. Round to two decimal places.

Solution.

Let $X$ denote the time between calls. The probability density function of $X$ is \[f(x) = \frac{1}{15}e^{-\frac{1}{15}x}, ~~ x>0\] The cumulative distribution function is

\[F(x) = 1-e^{-\frac{1}{15}x}, ~~ x>0\]

$P(X>25)=1-P(X\le 25)=1-F(25)=e^{-\frac{25}{15}}=0.1899$

$P(X\le 10)=F(10)=1-e^{-\frac{10}{15}}=0.4866$

$P(15\le X\le 20)=F(20)-F(15)=0.2498$

Suppose that the interval is (0, b). We need to solve the equation $P(X<b)=0.8$. Since $P(X<b)=F(b)=1-e^{-\frac{1}{15}b}$, solving the equation $1-e^{-\frac{1}{15}b}=0.8$ gives $b=24.14$

HW 5

Given the joint probability mass function

**Nonparametric vs Parametric Tests**
$x$	$y$	$f(x,y)$
-1	0	1/6
2	1	1/3
3	-2	1/4
2	2	1/4

Determine the marginal probability mass function of $X$.
Determine the marginal probability mass function of $Y$.
Determine the mean and variance of $X$.
Determine the mean and variance of $Y$.
Determine the covariance between $X$ and $Y$.
Determine the correlation between $X$ and $Y$.
Determine $P(X<2.5, Y>-1)$.

Solution.

To determine the marginal distribution of $X$, we first have

$P(X=-1)=1/6$, $P(X=2)=1/3+1/4=7/12$, and $P(X=3)=1/4$. So the marginal probability mass function of $X$ is

**Nonparametric vs Parametric Tests**
$x$	$f_X(x)$
-1	1/6
2	7/12
3	1/4

To determine the marginal distribution of $Y$, we first have

$P(Y=0)=1/6$, $P(Y=1)=1/3$, and $P(Y=-2)=1/4+1/4=1/2$. So the marginal probability mass function of $Y$ is

**Nonparametric vs Parametric Tests**
$y$	$f_Y(y)$
0	1/6
-2	1/3
1	1/2

The mean of $X$ is $E(X)=(-1)(1/6)+(2)(7/12)+(3)(1/4)=1.75$. The variance is $(-1)^2(1/6)+(2)^2(7/12)+(3)^2(1/4)-1.75^2=1.6875$.
The mean of $Y$ is $E(Y)=(0)(1/6)+(1)(1/3)+(-2)(1/2)=-2/3$. The variance is $(0)^2(1/6)+(1)^2(1/3)+(-2)^2(1/2)-(-2/3)^2=17/9$.
To calculate the covariance, we need to fund $E(XY)$ first. \[E(XY)=(-1)(0)(1/6)+(2)(1)(1/3)+(3)(-2)(1/4)+(2)(-2)(1/4)=-11/6\]

The covariance is

\[Cov(X,Y)=E(XY)-E(X)E(Y)=-11/6-(1.75)(-2/3)=-2/3\]

The correlation is

\[\rho = \frac{Cov(X,Y)}{\sqrt{Var(X)}\sqrt{Var(Y)}}=\frac{-2/3}{\sqrt{1.6875}\sqrt{17/9}}=-0.373408\]

There are two pairs of (X, Y) satisfying the condition $X<2.5, Y>-1$. These pairs are $(-1,0)$ and $(2,1)$. So the desired probability is the sum of the corresponding probabilities, or $1/6+1/3=1/2$.

HW 6

The following R code might be useful.

# Create a data vector or array and store the data in x
x = c(23, 45, 12, 9, 15, 42, 40, 22, 25, 60, 28, 52, 44)
y = c(45, 85, 30, 17, 34, 86, 85, 50, 48, 115, 64, 100, 90)

# Calculate sample mean
mean(x)
# Calculate sample variance
var(x)

# Calculate sample standard deviation
sd(x)

# Calculate median
median(x)

# Calculate sample correlation between x and y
cor(x,y)

# Create histogram
hist(x)

# Create boxplot
boxplot(x)

# Create stem-and-leaf plot
stem(x)

# Create a scatter plot
plot(y~x)

Exam #1 Review

Some useful formula:

$P(A \cup B)=P(A)+P(B)-P(A\cap B)$.
For a discrete random variable, the mean $\mu=E(X)=\sum x_i p_i$ and variance $\sigma^2=V(X)=\sum x_i^2 p_i-\mu^2$.
For a continuous random variable, the mean $\mu=E(X)=\int_{-\infty}^{\infty}xf(x)dx$ and variance $\sigma^2=V(X)=\int_{-\infty}^{\infty}x^2f(x)dx-\mu^2$.
For two discrete random variables, the covariance $cov(X,Y)=E(XY)-E(X)E(Y)$ and correlation $\rho=\frac{cov(X,Y)}{\sigma_X\cdot \sigma_Y}$.
$E(aX)=aE(X)$, $E(X+c)=E(X)+c$
$V(aX)=a^2V(X)$, $V(X+c)=V(X)$
If $X$ and $Y$ are two independent random variables and $a~ \& ~b$ are constants, then $V(aX+bY)=a^2V(X)+b^2V(Y)$.
The sample variance of a sample is $s^2 = \frac{\sum_{i=1}^{n}(x_i -\bar{x})^2}{n-1}$.
The sample correlation between two quantitative variables is $r = \frac{\sum_{i=1}^{n}x_i y_i-n\bar{x}\bar{y} }{\sqrt{\sum x_i^2-n\bar{x}^2}\sqrt{\sum y_i^2-n\bar{y}^2}}$.
The binomial distribution has the probability mass function: $P(X=x)=\binom{n}{x}p^x (1-p)^{n-x}, ~~x = 0, 1, 2, \dots, n$
The Poisson distribution has the probability mass function: $P(X=x)=\frac{\lambda^x}{x!} e^{-\lambda}, ~~x = 0, 1, 2, \dots$
The geometric distribution has the probability mass function: $P(x)=(1-p)^{x-1}p, ~~x = 1, 2, \cdots$
The exponential distribution has the probability density function $f(x)=\lambda e^{-\lambda x}, ~ x>0$. The cumulative distribution function is $F(x)=1-e^{-\lambda x}, ~ x>0$. The mean and the standard deviation are both $\lambda$.
The uniform distribution has the probability density function $f(x)=\frac{1}{b-a}, ~ a<x<b$. The mean is $\frac{a+b}{2}$ and the standard deviation is $\frac{b-a}{\sqrt{12}}$.
The conditional probability is defined as $P(B|A)=\frac{P(A\cap B)}{P(A)}$, where $P(A)>0$.
If $X$ is continuous random variable with the probability density function $f(x)$, then $P(a<X<b)=\int_a^b f(x)dx$.
If $X\sim\text{N}(\mu_1, \sigma_1^2)$ and $Y\sim\text{N}(\mu_2, \sigma_2^2)$ are independent, then $X+Y\sim\text{N}(\mu_1+\mu_2, \sigma_1^2+\sigma_2^2)$.

Some typical problems:

If events $A$ and $B$ are independent with $P(A)=0.2$ and $P(B)=0.6$, then $P(A \cap B)=(0.2)(0.6)=0.12$.
If events $A$ and $B$ are disjoint (meaning that they can’t happen simultaneously) with $P(A)=0.2$ and $P(B)=0.6$, then $P(A \cup B)=0.2+0.6=0.8$.
If $P(X<10)=0.2$ and $P(X<20)= 0.6$, then $P(10\le X<20)=0.6-0.2=0.4$.
A series system consists of 5 components, each functioning independently with probability 0.9. What is the probability that the system functions? Answer: $0.9^5=0.59049$.
If $X\sim\text{N}(\mu_1=100, \sigma_1^2=200)$ and $Y\sim\text{N}(\mu_1=120, \sigma_1^2=25)$ are independent, then $P(X+Y>250)=?$. Answer: Since $X+Y\sim\text{N}(\mu=220, \sigma^2=225)$, $P(X+Y>250)=P(Z>\frac{250-220}{15})=P(Z>2)=1-P(Z\le 2)=0.9772.$

Chapter 9

We will provide examples

for testing $\mu$ using the 1-sample z test approach when $\sigma$ is known
for testing $\mu$ using the 1-sample t test approach when $\sigma$ is unknown
for testing $p$ using the 1-sample z test approach
for testing goodness of fit using the chi-square approach
for testing independence between two categorical variables using the chi-square approach

Testing $\mu$ using the 1-sample z test approach when $\sigma$ is known

Example 1. A two-sided test for a population mean with $\sigma$ known.

https://www.youtube.com/watch?v=BWJRsY-G8u0

Example 2. A left-sided test for a population mean with $\sigma$ known.

https://www.youtube.com/watch?v=oEW8Hd_xy1k

Example 3. A right-sided test for a population mean with $\sigma$ known.

To test if a population mean is greater than 20. A random sample of size 36 gives a sample mean 22. If the population standard deviation is 5, test, at level 0.05, that the population mean exceeds 20.

Solution.

The null and alternative hypotheses are:

\[H_0:\mu = 20 ~~~ vs ~~~ H_a: \mu > 20\] The test statistic value is

\[z_0=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}=\frac{22-20}{5/\sqrt{36}}=2.4\]

Since larger sample mean or larger $z_0$ suggestions rejection of the null hypothesis, the rejection region looks like $(c, \infty)$ with the critical value $c=z_{\alpha}$. By the standard normal table or the R code $qnorm(1-\alpha)$, $c=1.645$.

Since the test statistic value falls in the rejection region, reject the null hypothesis.

Equivalently, we can use the $p$-value approach. The $p$-value is the area to the right of the statistic value under the standard normal curve. By the standard normal table or the R code $1-pnorm(2.4)$, the $p$-value is 0.0082. Since the $p$-value is less than the significance level, reject the null hypothesis.

Testing $\mu$ using the 1-sample t test approach when $\sigma$ is unknown

Example 1.

A very useful video: https://www.youtube.com/watch?v=VPd8DOL13Iw

Example 2.

Your company wants to improve sales. Past sales data indicate that the average sales was $100 per transaction. After training your sales force, recent sales data (taken from a random sample of 25 salesmen) indicates an average of $130, with a standard deviation of $15. Did the training work? Test your hypothesis at a 0.05 significance level.

Solution.

The population mean $\mu$ is the parameter of interest. To test whether sales has been improved, we should have the null and alternative hypotheses as follows:

\[H_0: \mu=100 ~~~ vs ~~~ H_a: \mu>100\] The value of the test statistic is

\[t_0 = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}=\frac{130-100}{15/\sqrt{25}}=10\] with $n-1$ or 24 degrees of freedom.

Since larger $\bar{x}$’s or $t_0$’s suggest rejection of the null hypothesis, the rejection (or critical) region looks like $(c, \infty)$, where $c=t_{\alpha, n-1}$. We are given $\alpha=0.05$, so the critical value based on the $t_{24}$ distribution is 1.711, which is obtained by R code $qt(1-\alpha, n-1)$ or by a $t$-table.

Since the test statistic value 10 falls in the rejection region, we reject the null hypothesis.

Equivalently, we can calculate the $p$-value, which is the area under the $t_{24}$ distribution to the right of the test statistic value. Using the $t$ table or the R code $1-pt(10, 24)$, we know the $p$-value is smaller than 0.001 and thus smaller than the significance level 0.05. Again, we reject the null hypothesis.

In conclusion, the data provide sufficient evidence that the sales has been improved after training.

The following is a video explaining the above procedure:

https://www.youtube.com/watch?v=7ty2bO6VrUI

Example 3.

A firm claims that their product on average weighs 19 pounds. A supervisory authority doubts that the average weight is below 19 pounds, so it collects a random sample of 51 products made by the company from the market. The sample is 18.5 pounds with a standard deviation 3.2 pounds. Test appropriate hypotheses at the significance level 0.01. In order to prevent themselves from been sued by the company, should the authority use a larger or smaller significance level?

Solution.

The null and alternative hypotheses are:

\[H_0: \mu=19 ~~~ vs ~~~ H_a: \mu<19\] The value of the test statistic is

\[t_0 = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}=\frac{18.5-19}{3.2/\sqrt{51}}=-1.1158\] with $n-1$ or 50 degrees of freedom.

Since smaller $\bar{x}$’s or $t_0$’s suggest rejection of the null hypothesis, the rejection (or critical) region looks like $(-\infty, c)$, where $c=-t_{\alpha, n-1}$. We are given $\alpha=0.05$, so the critical value based on the $t_{50}$ distribution is $-1.6759$, which is obtained by R code $qt(\alpha, n-1)$ with $\alpha = 0.01, n=50$ or by a $t$-table.

Since the test statistic value $-1.1158$ does not fall in the rejection region, we fail to reject the null hypothesis.

Equivalently, we can calculate the $p$-value, which is the area under the $t_{50}$ distribution to the left of the test statistic value. Using the $t$ table or the R code $pt(-1.1158, 50)$, we know the $p$-value is 0.1349 and thus NOT smaller than the significance level 0.01. Again, we fail to reject the null hypothesis.

In conclusion, the data do not provide sufficient evidence that the average weight of the firm’s products is below 19 pounds.

The following is a video explaining the above procedure: https://www.youtube.com/watch?v=ZY5XxJ2aJNc

Testing $p$ using the 1-sample z test approach

Test each of the following using data: $n = 36, ~~\hat{p}=0.3$.

\[(a) ~~H_0:p=0.4 ~~ vs ~~ H_a: p<0.40\] \[(b) ~~H_0:p=0.4 ~~ vs ~~ H_a: p<0.40\]

\[(c) ~~H_0:p=0.4 ~~ vs ~~ H_a: p<0.40\] Solution.

The test statistic is the same for all 3 cases:

\[z_0=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}=\frac{0.3-0.40}{\sqrt{\frac{0.4(1-0.4)}{36}}}=-1.22\] (a) The rejection region looks like $(-\infty, c)$ with the critical value $c=-1.645$ obtained by using the standard normal table or R code $qnorm(0.05)$. Since the test statistic value does not fall in the rejection region, we fail to reject the null hypothesis. Equivalently, we can use the $p$-value approach. The $p$-value is obtained by using the standard normal table or R code $pnorm(-1.22)$, which is 0.11. Since the p-value is not smaller than the significance level 0.05, we again fail to reject the null hypothesis.

The rejection region looks like $(c, \infty)$ with the critical value $c=1.645$ obtained by using the standard normal table or R code $qnorm(0.95)$. Since the test statistic value does not fall in the rejection region, we fail to reject the null hypothesis. Equivalently, we can use the $p$-value approach. The $p$-value is obtained by using the standard normal table or R code $1-pnorm(-1.22)$, which is 0.89. Since the p-value is not smaller than the significance level 0.05, we again fail to reject the null hypothesis.
The rejection region looks like $(-\infty, -c)\cup (c, \infty)$ with the critical value $-c=-1.645$ and $c=1.645$ obtained by using the standard normal table or R code $qnorm(0.025)$ and $qnorm(0.975)$. Since the test statistic value does not fall in the rejection region, we fail to reject the null hypothesis. Equivalently, we can use the $p$-value approach. The $p$-value is obtained by using the standard normal table or R code $pnorm(-1.22)*2$, which is 0.22. Since the p-value is not smaller than the significance level 0.05, we again fail to reject the null hypothesis.

Testing goodness of fit using the chi-square approach

A quick video: https://www.youtube.com/watch?v=b3o_hjWKgQw

Example.

An IT specialist doubts that

32% of IT hardware failures are mainly due to exposure to extreme temperatures
24% are mainly due to ineffective cleaning routines
20% are mainly due to human error caused by poor training
24% are due to other reasons

Based on the past 5 years of data, she has the following results on IT hardware failures:
30 are mainly due to exposure to extreme temperatures
25 are mainly due to ineffective cleaning routines
18 are mainly due to human error caused by poor training
27 are due to other reasons

Test, at the 0.05 significance level, whether the claim of the IT specialist is supported by the data.

Step 1: Specify the null and alternative hypotheses. \[H_0: p_1=0.32, ~p_2=0.24, ~p_3=0.20, ~p_4=0.24 ~~ vs ~~H_a: \text{at least one proportion is wrongly specified}\]

Step 2: Calculate the expected frequencies under the null hypothesis. Then, calculate the value of the test statistic and determine the number of degrees of freedom. The expected frequencies are: $100\cdot 0.32= 32, 100\cdot 0.24= 24, 100\cdot 0.20= 20, 100\cdot 0.24= 24$, respectively, so the test statistic value is

\[\chi^2=\frac{(30-32)^2}{32}+\frac{(25-24)^2}{24}+\frac{(18-20)^2}{20}+\frac{(27-24)^2}{24}=0.74\]

Step 3: Calculate the critical value and the $p$-value using the chi-square distribution table. The critical region for the chi-square test always locates in the right tail of the chi-square distribution and looks like $(c, \infty)$ with the critical value $c=\chi_{\alpha, k-1}^2$. Here $k=4$ , so the number of degrees of freedom is 3. Using the chi-square table of the R code $pchisq(1-\alpha, df)$ with $\alpha=0.05$ and $df=3$, $c=7.81$. Since the test statistic value 0.74 does not fall in the rejection region, we fail to reject the null hypothesis.

Equivalently, the $p$-value is 0.86 obtained by the R code $1-pchisq(0.74, 3)$ or estimated to be larger than 0.05 by the chi-square table.

Step 4: Draw a conclusion. We conclude that the data match the claim well.

R code for all with the $p$-value approach:

x=c(30, 25, 18, 27)

chisq.test(x, p = c(0.32, 0.24, 0.20, 0.24))

Testing independence between two categorical variables using the chi-square approach

Here is a helpful video: https://www.youtube.com/watch?v=LE3AIyY_cn8

Example.

Refer to the data in the table below:

Test, at the 0.05 significance level, whether there is any gender gap in the choice of college majors.

Solution.

Step 1: Specify the null and alternative hypotheses. \[H_0: \text{there is any gender gap in the choice of college majors (gender and major are independent)}\] \[H_0: \text{there is a gender gap in the choice of college majors (gender and major are NOT independent)}\]

Step 2: Calculate the expected frequencies under the null hypothesis. Then, calculate the value of the test statistic and determine the number of degrees of freedom. The expected frequencies are 5.65, 8.35, 8.47, 12.53, 8.88, 13.12, respectively. The number of degrees of freedom is $(3-1)(2-1)=2$.

Step 3: Calculate the critical value and the $p$-value using the chi-square distribution table. The critical value is $c=\chi_{0.05}^2$ and the rejection region is $(c, \infty)$, where $c=5.99$ obtained by the chi-square distribution table or R code $qchisq(0.95,2)$. The $p$-value is 0.33.

Step 4: Make a decision & draw a conclusion. We fail to reject the null hypothesis. We conclude that the data do not provide sufficient evidence that there is a gender gap in the choice of college major.

R code for all with the $p$-value approach:

M=matrix(c(4,11,8,10,10,14),3)

chisq.test(M)

Chapter 10

R code

# 1. Confidence interval and test of the difference between means

# 1a. Assuming that both population variances are known
# 2-sample z-test
# Excel

# 1b. Assuming unknown but equal population variances
# 2-sample t-test, df = n1+n2-2
# R function available
x = c(23, 43, 45, 33, 51, 28, 52, 39, 44)
y = c(58, 76, 46, 63, 75, 51)

t.test(x, y, var.equal = TRUE, alternative = "less", conf.level = 0.95)
t.test(x, y, var.equal = TRUE, alternative = "greater", conf.level = 0.95)
t.test(x, y, var.equal = TRUE, alternative = "two.sided", conf.level = 0.95)

# 1c. Assuming unknown population variances
# 2-sample t-test, df = formula
# R function available
x = c(23, 43, 45, 33, 51, 28, 52, 39, 44)
y = c(58, 76, 46, 63, 75, 51)

t.test(x, y, var.equal=FALSE, alternative = "less", conf.level = 0.95)
t.test(x, y, var.equal=FALSE, alternative = "greater", conf.level = 0.95)
t.test(x, y, var.equal=FALSE, alternative = "two.sided", conf.level = 0.95)

# 1d. Paired data
x = c(34, 56, 33, 28, 45, 63, 51) # measurements based on method 1
y = c(35, 57, 30, 26, 40, 59, 48) # measurements based on method 2
t.test(x, y, paired = TRUE, alternative = "less", conf.level = 0.95)
t.test(x, y, paired = TRUE, alternative = "greater", conf.level = 0.95)
t.test(x, y, paired = TRUE, alternative = "two.sided", conf.level = 0.95)


# 2. Confidence interval and test of the difference between proportions

# Sample 1: n = 40, x = 28; Sample 2: n = 50, x = 34
n = c(40, 50)
x = c(28, 34)
prop.test(x, n, alternative = "less", correct = FALSE, conf.level = 0.95)
prop.test(x, n, alternative = "greater", correct = FALSE, conf.level = 0.95)
prop.test(x, n, alternative = "two.sided", correct = FALSE, conf.level = 0.95)

Exercise:

Use t test to see whether there is a difference in mean number of attacks before and after installing firewalls.

Attempts before Firewall: 56, 47, 49, 37, 38, 60, 50, 43, 43, 59, 50, 56, 54, 58

Attempts before Firewall: 53, 21, 32, 49, 45, 38, 44, 33, 32, 43, 53, 46, 36, 48, 39, 35, 37, 36, 39, 45

Homework 10 Solution

The diameter of steel rods manufactured on two different extrusion machines is being investigated. Two random samples of sizes $n_1=15, n_2=17$ are selected, and the sample means and sample variances are $\bar{x}_1=8.73, s^2_1=0.35,\bar{x}_2=8.68, s^2_2=0.40$, respectively. Assume that that the population variances are equal and that the data are drawn from a normal distribution.

Is there evidence to support the claim that the two machines produce rods with different mean diameters? Give bounds on the P-value used to make your conclusion. Use only Table V of Appendix A.
Construct a 95% confidence interval for the difference in mean rod diameter. Use only Table V of Appendix A. Round your answer to 3 decimal places.
Interpret this interval.

Two suppliers manufacture a plastic gear used in a laser printer. The impact strength of these gears, measured in foot-pounds, is an important characteristic. A random sample of 10 gears from supplier 1 results in $\bar{x}_1=289.30, s^2_1=22.5$, and another random sample of 16 gears from the second supplier results in $\bar{x}_2=322.10, s^2_2=21$.

Use only Table V of Appendix A.

Is there evidence to support the claim that supplier 2 provides gears with higher mean impact strength? Use 𝛼=0.05, and assume that both populations are normally distributed but the variances are not equal. Round your answer to 4 decimal places.
Do the data support the claim that the mean impact strength of gears from supplier 2 is at least 25 foot-pounds higher than that of supplier 1? Find bounds on the P-value making the same assumptions as in part (a). Round your answer to 2 decimal places.
Construct an appropriate 95% confidence interval on the difference in mean impact strength. Use only Table V of Appendix A. Round your answers to 3 decimal places.

Does the confidence interval support the claim that the mean impact strength of gears from supplier 2 is at least 25 foot-pounds higher than that of supplier 1?

The manager of a fleet of automobiles is testing two brands of radial tires and assigns one tire of each brand at random to the two rear wheels of eight cars and runs the cars until the tires wear out. The data (in kilometers) follow. Find a 99% confidence interval on the difference in the mean life.

Round your answer to 2 decimal places. Do not use commas.

Does the confidence interval constructed in the previous step indicate that one brand is better than the other?

Solution.

R code:

## 
##  Paired t-test
## 
## data:  x and y
## t = 1.8983, df = 7, p-value = 0.09945
## alternative hypothesis: true mean difference is not equal to 0
## 99 percent confidence interval:
##  -730.4546 2462.4546
## sample estimates:
## mean difference 
##             866

An article in the Journal of Aircraft (1986, Vol. 23, pp. 859-864) described a new equivalent plate analysis method formulation that is capable of modeling aircraft structures such as cranked wing boxes and that produces results similar to the more computationally intensive finite element analysis method. Natural vibration frequencies for the cranked wing box structure are calculated using both methods, and results for the first seven natural frequencies follow:

Do the data suggest that the two methods provide the same mean value for natural vibration frequency? Find an interval for P-value.
Find a 95% confidence interval on the mean difference between the two methods and use it to answer the question in part (a).

Round your answer to 3 decimal places.

Does the confidence interval indicate that the two methods provide different mean values for natural vibration frequency?

Solution.

R code:

## 
##  Paired t-test
## 
## data:  x and y
## t = -2.4481, df = 6, p-value = 0.04992
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -10.962958325  -0.002755961
## sample estimates:
## mean difference 
##       -5.482857

Two different types of injection-molding machines are used to form plastic parts. A part is considered defective if it has excessive shrinkage or is discolored. Two random samples, each of size 300, are selected, and 15 defective parts are found in the sample from machine 1, while 9 defective parts are found in the sample from machine 2. Is it reasonable to conclude that both machines produce the same proportion of defective parts? Use α = 0.05.

Test the hypothesis $H_0: p_1 = p_2$ verses $H_1: p_1 ≠ p_2$. What is $z_0$, the value of the test statistic? Round your answer to two decimal places (e.g. 98.76).

Is it reasonable to conclude that both machines produce the same proportion of defective parts?

Solution.

R code:

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  x out of n
## X-squared = 1.5625, df = 1, p-value = 0.2113
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.01131856  0.05131856
## sample estimates:
## prop 1 prop 2 
##   0.05   0.03

The prop.test returns a test statistic that equals $z_0^2$. So, to get $z_0$, we take the square root of the output statistic value (which is 1.0851 here). $z_0=\sqrt{1.0851}=1.04$. Why positive? It is because the first sample proportion is larger than the second! Refer to the $z_0$ formula!

A random sample of 500 adult residents of Maricopa County found that 359 were in favor of increasing the highway speed limit to 75 mph, while another sample of 400 adult residents of Pima County found that 295 were in favor of the increased speed limit. Do these data indicate that there is a difference in the support for in increasing the speed limit between the residents of the two counties? Use α = 0.05.

Test the hypothesis $H_0: p_1 = p_2$ verses $H_1: p_1 ≠ p_2$. What is $z_0$, the value of the test statistic? Round your answer to two decimal places (e.g. 98.76).
Is it reasonable to conclude that there is a difference in the support for increasing the speed limit between the residents of the two counties?

Solution.

R code:

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  x out of n
## X-squared = 0.42543, df = 1, p-value = 0.5142
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.0779364  0.0389364
## sample estimates:
## prop 1 prop 2 
## 0.7180 0.7375

The prop.test returns a test statistic that equals $z_0^2$. So, to get $z_0$, we take the square root of the output statistic value (which is 0.42543 here). $z_0=\sqrt{0.42543}=0.65$. Why positive? It is because the first sample proportion is larger than the second! Refer to the $z_0$ formula!

$p$-value = 0.5142.

A random sample of 500 adult residents of Maricopa County found that 384 were in favor of increasing the highway speed limit to 75 mph, while another sample of 400 adult residents of Pima County found that 281 were in favor of the increased speed limit. Construct a 95% confidence interval on the difference in the two proportions. Round your answer to four decimal places (e.g. 98.7654).

Solution.

R code:

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  x out of n
## X-squared = 4.9416, df = 1, p-value = 0.02622
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.007396526 0.123603474
## sample estimates:
## prop 1 prop 2 
## 0.7680 0.7025

The CI is between 0.007396526 and 0.1236.

Review of Exam 2

The best point estimate of a population parameter is the sample counterpart (called the statistic which is a function of observations). The precision of a point estimate is measured by the standard error. The accuracy is measured by the bias. If an estimator has 0 bias, it is said to be unbiased.
The sampling distribution of the sample mean: (a) it always has mean that equals the population mean and variance that equals the population variance divided by the sample size; (b) it is normally distributed when the population distribution is normal; (3) it is approximately normally distribution when the sample size is large and the population distribution is not normal.
The sampling distribution of the sample proportion: (a) it always has mean that equals the population proportion and variance that equals $p\cdot(1-p)$ divided by the sample size; (b) it is approximately normally distribution when the sample size is large.
The standard error of the sample mean is $\frac{\sigma}{\sqrt{n}}$ or $\frac{s}{\sqrt{n}}$ when $\sigma$ is unknown.
The standard error of the sample proportion is $\sqrt{\frac{p(1-p)}{n}}$ or $\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ when $p$ is unknown.
Test and find confidence interval about $p$, the $z$ method
Test and find confidence interval about $\mu$ ($\sigma$ known), the $z$ method
Test and find confidence interval about $\mu$ ($\sigma$ unknown), the $t$ method
Test and find confidence interval about $p_1-p_2$, the $z$ method
Test and find confidence interval about $\mu_1-\mu_2$ ($\sigma_1$ and $\sigma_2$ known), the $z$ method
Test and find confidence interval about $\mu_1-\mu_2$ ($\sigma_1$ and $\sigma_2$ unknown), the $t$ method
Paired data $t$ test and $t$ confidence interval
Sample size determination when estimating $p$
Sample size determination when estimating $\mu$ ($\sigma$ known)
Chi-square test for goodness of fit or independence

Some Examples

Given $n = 23, \bar{x}=50, s = 11$.

Test $H_0: \mu = 45$ vs $H_a:\mu<45$, using both the critical value method and the $p$-value method.
Test $H_0:\mu = 45$ vs $H_a:\mu>45$, using both the critical value method and the $p$-value method.
Test $H_0:\mu = 45$ vs $H_a:\mu\ne 45$, using both the critical value method and the $p$-value method.
Construct a 95% 2-sided confidence interval.
Construct a 95% confidence interval with a upper-bound.
Construct a 95% confidence interval with a lower-bound.

Solution

The problems must be done by hand, since we are not given the original sample data.

Given $n_1 = 23, \bar{x}_1=20, s_1 = 5$ and $n_2 = 32, \bar{x}_2=30, s_2 = 8$.

Test $H_0:\mu_1 = \mu_2$ vs. $H_a:\mu_1 < \mu_2$, using both the critical value method and the $p$-value method.
Test $H_0:\mu_1 = \mu_2$ vs. $H_a:\mu_1 > \mu_2$, using both the critical value method and the $p$-value method.
Test $H_0:\mu_1 = \mu_2$ vs. $H_a:\mu_1 \ne \mu_2$, using both the critical value method and the $p$-value method.
Construct a 95% 2-sided confidence interval.

Solution

The problems must be done by hand, since we are not given the original sample data.

Given paired data:

x: 34, 56, 78, 89, 77, 55

y: 44, 65, 70, 80, 75, 50

Test $H_0:\mu_d = 0$ vs. $H_a:\mu_d < 0$, using both the critical value method and the $p$-value method. $\alpha = 0.05$.
Test $H_0:\mu_d = 0$ vs. $H_a:\mu_d > 0$, using both the critical value method and the $p$-value method. $\alpha = 0.05$.
Test $H_0:\mu_d = 0$ vs. $H_a:\mu_d \ne 0$, using both the critical value method and the $p$-value method. $\alpha = 0.05$.
Construct a 95% 2-sided confidence interval.

Solution

The problems can be done by hand or R code, since we are given the original sample data.

R code:

x = c(34, 56, 78, 89, 77, 55)

y = c(44, 65, 70, 80, 75, 50)

t.test(x, y, paired = TRUE, alternative = "less) # for (a)

qt(alpha, df = 6-1) # Critical value for (a)

t.test(x, y, paired = TRUE, alternative = “greater”) # for (b)

qt(1-alpha, df = 6-1) # Critical value for (b)

t.test(x, y, paired = TRUE, alternative = “two.sided”) # for (c)

qt(alpha/2, df = 6-1); -qt(alpha/2, df = 6-1) # Critical values for (c)

Given data: n = 54 and x = 35,

Test, at significance level $\alpha=0.05$, whether the population proportion is less than 0.7.
Test, at significance level $\alpha=0.05$, whether the population proportion is greater than 0.7.
Test, at significance level $\alpha=0.05$, whether the population proportion is different from 0.7.
Construct a 95% confidence interval for the population proportion.

Solution

The problems can be done by hand or R code, since we are given the original sample data.

R code:

n=54

x = 35

prop.test(x, n, alternative = “less”, correct = FALSE) # for (a)

qnorm(alpha) # Critical value for (a)

prop.test(x, n, alternative = “greater”, correct = FALSE) # for (b)

qnorm(1-alpha) # Critical value for (b)

prop.test(x, n, alternative = “two.sided”, correct = FALSE) # for (c)

qnorm(alpha/2); -qnorm(alpha/2) # Critical values for (c)

Given data: $n_1 = 54, x_1=35$, and $n_1 = 63, x_1=41$,

Test, at significance level $\alpha=0.05$, whether the first population proportion is smaller.
Test, at significance level $\alpha=0.05$, whether the first population proportion is larger.
Test, at significance level $\alpha=0.05$, whether the population proportions different.
Construct a 95% confidence interval for the difference in population proportions ($p_1-p_2$).

Solution

The problems can be done by hand or R code, since we are given the original sample data.

R code:

n = c(54, 63)

x = c(35, 41)

prop.test(x, n, alternative = “less”, correct = FALSE) # for (a)

qnorm(alpha) # Critical value for (a)

prop.test(x, n, alternative = “greater”, correct = FALSE) # for (b)

qnorm(1-alpha) # Critical value for (b)

prop.test(x, n, alternative = “two.sided”, correct = FALSE) # for (c)

qnorm(alpha/2); -qnorm(alpha/2) # Critical values for (c)

Chi-square test for independence.

Test, at significance level 0.05, that the type of defect does not differ across shifts.

Solution

The problems can be done by hand or R code, since we are given the original sample data.

R code:

M = matrix(c(15, 26, 33, 21, 31, 17, 45, 34, 49, 13, 5, 20), 3, 4)

chisq.test(M)

Appendix

Some useful datasets: http://fs2.american.edu/baron/www/Book/

Statistical tables: https://read.wiley.com/books/9781119400363/page/152/section/top-of-page

Practice Questions for Stat 353 Homework Assignments

SZ

1/12/2022

HW 2 Solutions

HW4

HW 5

HW 6

Exam #1 Review

Chapter 9

Testing \(\mu\) using the 1-sample z test approach when \(\sigma\) is known

Testing \(\mu\) using the 1-sample t test approach when \(\sigma\) is unknown

Testing \(p\) using the 1-sample z test approach

Testing goodness of fit using the chi-square approach

Testing independence between two categorical variables using the chi-square approach

Chapter 10

Homework 10 Solution

Review of Exam 2

Some Examples

Appendix