Q1) Ferry & Management

\[ \text{Let : }\\ \text{X = daily total number of the passengers carried by a ferry company} \\ X \sim N(\mu = 12*10^3, \sigma = 2*10^3) \\\bar{x}_{new}=12.5*10^3 \\ n = 50 \text{ (days)} \]

\[ H_o : \mu = 12*10^3 \text{ [no difference]} \\ H_a : \mu > 12*10^3 \text{ [new managament helped]} \\ \]

z_o <- (12500-12000)/(2000/sqrt(50)) # Calc Test Stat
z_crit <- qnorm(1-.05) # Find Crit Value : One Sided 
paste0("z_o > z_crit : ", z_o > z_crit) # Hypothesis Test

## [1] "z_o > z_crit : TRUE"

paste0("P-Value: ",round(1 - pnorm(z_o), 3))

## [1] "P-Value: 0.039"

Therefore based on our sample-mean, we reject the NULL-HYP.

We believe there is stats. sign. difference to indicate that new management has made a positive difference – hence a one-sided test. There is a 5% ( \(\alpha\) ) chance we have accidentally come to this claim (Type1 Err).

Reference :

Hypothesis Testing - One Sample Mean

Q2) A’s at UCLA

Q2a) Est. P-val

\[ P_{\text{Long-Run-A's}}=.2 \\ \hat{p}_{A's} = \frac{50}{n_s} = .25 \\ \text{for : } n_s =200 \text{ students} \]

\[ H_o : p=.2 \text{ [Same]} \\ H_a : p>.2 \text{ [Increase in A's]} \]

\[ z_o = \frac{\hat{p} - p_o}{SE(p_o)} \text{ For: } SE(p_o) = \sqrt{\frac{p*p^c}{n}} \]

Z_o <- (.25 - .2)/(sqrt((.2*.8) / 200))
Z_crit <- qnorm(1-.05)
paste0("z_o > z_crit : ", Z_o > Z_crit) # Hypothesis Test

## [1] "z_o > z_crit : TRUE"

paste0("P-Value: ",round(1 - pnorm(Z_o), 3))

## [1] "P-Value: 0.039"

Therefore based on our sample-prop, we reject the NULL-HYP.

We believe there is stats. sign. difference to indicate that students are getting higher grades this quarter – hence a one-sided test. There is a 5% ( \(\alpha\) ) chance we have accidentally come to this claim (Type1 Err).

\[ \text{Equivalently Consider : } \\ z_o = \frac{\hat{p} - p_o}{SE} \implies z_o*SE = \hat{p} - p_o \implies p_o+z_o*SE = \hat{p} \]

Which plug in \(z_o = z_{crit}\) so we can see for which \(\hat{p}\) we reject \(H_o\) and we get :

\[ \hat{p}_{max}=0.247 \]

and \(\hat{p}_{A's} >\hat{p}_{max}\) therefore, we reject \(H_o\)

Where :

\[ \alpha = \mathbb{P}(\hat{p} ≥ .247 | p_o = .2) \]

Q2b) Est. Type 2 Err Rate

We suppose \(P_{\text{TRUE-Long-Run-A's}}=.3\) ; How likely is it that you accidentally reject ALT-HYP given \(P_{\text{TRUE-Long-Run-A's}}\) ( \(\beta\) )?

We wish to est. the following Prob : \(\beta = \mathbb{P}(\text{Rej. H}_o | p = .3)\).

Recall :

\[ p_o+z_o*SE = \hat{p} \]

se <- sqrt((.2*.8)/200)
prop_alph_cutoff <- se * Z_crit + .2
.25 > prop_alph_cutoff

## [1] TRUE

.3 + se

## [1] 0.3282843

Reference :

Calculating Power and the Probability of a Type II Error (A One-Tailed Example)

Q3) Neyman-Pearson

3a) Suppose we accept that \(\alpha = .1\) of the time, we accidentaly reject the \(H_o\)(NULL HYP). In the long-run, which dist most accurately detects a \(H_a\) (ALT HYP)?

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

##   x_i  Ho  Ha     Ratio
## 1  x5 0.1 0.3 0.3333333
## 2  x2 0.3 0.4 0.7500000
## 3  x4 0.1 0.1 1.0000000
## 4  x1 0.2 0.1 2.0000000
## 5  x3 0.3 0.1 3.0000000

Column	Meaning
`x_i`	These are the possible values your random variable \(X\) can take. Each row represents one possible outcome (e.g., x₁, x₂…).
`Ho`	This is the probability of seeing each outcome assuming the null hypothesis \(H_0\) is true. It defines the null distribution.
`Ha`	This is the probability of seeing each outcome assuming the alternative hypothesis \(H_a\) is true. It defines the alternative distribution.
`Ratio = Ho / Ha`	This is the likelihood ratio for each outcome, which tells you: “How likely is this outcome under \(H_o\) compared to \(H_a\)?”

Let \(r_i \in \text{Ratio}\), some \(r_i \rightarrow \infty\) when \(H_o\) is probably correct and therefore more convincing and \(r_i \rightarrow 0\) when \(H_a\) is more convincing. If we order \(\text{Ratio}\) by descending order, we are ordering it by when \(H_a\) is most to least probable.

##   x_i Type1_Err Type2_Err Power Confidence
## 1  x5       0.1       0.7   0.3        0.9
## 2  x2       0.3       0.6   0.4        0.7
## 3  x4       0.1       0.9   0.1        0.9
## 4  x1       0.2       0.9   0.1        0.8
## 5  x3       0.3       0.9   0.1        0.7

Assuming we take \(\alpha = .1\), the most powerful test ( \(1-\beta\) ) is that with the lowest \(\beta\) for our given \(\alpha\). The only option is with the rejection region associated with \(x_5\).

3b) Suppose we accept that \(\alpha = .4\) of the time, we accidentaly reject the \(H_o\)(NULL HYP). In the long-run, which dist most accurately detects a \(H_a\) (ALT HYP)?

##   x_i Type1_Err Type2_Err Power Confidence
## 1  x5       0.1       0.7   0.3        0.9
## 2  x2       0.3       0.6   0.4        0.7
## 3  x4       0.1       0.9   0.1        0.9
## 4  x1       0.2       0.9   0.1        0.8
## 5  x3       0.3       0.9   0.1        0.7

So now we can accept Type1-Err 40% of the time. So we have : \(\text{Rej. Region} = \{x_5, x_2\}\)

4)

4a) Derive \(\Lambda(X)\)

\[ H_o : \lambda = \text{Some Value} \\ H_a : \lambda = \text{Some BIGGER Value} \]

So as we can see graphically, the alternative \(H_a\) is arguing that the dist is more precise while \(H_o\) is arguing its less precise.

Let :

\[ \text{pdf}(x) = \frac{\lambda}{2} e^{-\lambda |x|}, \quad -\infty < x < \infty,\ \lambda > 0 \\ \]

\[ \text{Let : } \lambda_s = \lambda_o; \lambda_B = \lambda_1 \\ \text{small & BIG respectively} \]

Suppose we re-consider \(H_o, H_a\) in an alternative, equivalent form :

\[ H_o : \frac{\lambda_s}{2} e^{-\lambda_s |x|} \\ H_a : \frac{\lambda_B}{2} e^{-\lambda_B |x|} \]

Recall :

\[ \text{Likelyhood Ratio Stat.} \\ = \Lambda(X) \]

Notice \(\Lambda(X) \rightarrow \infty\) when \(H_o\) is more likely and \(\Lambda(X) \rightarrow 0\) when \(H_a\) is more likely.

\[ = \frac{\prod_{\forall x}\frac{\lambda_s}{2} e^{-\lambda_s |x|}}{\prod_{\forall x}\frac{\lambda_B}{2} e^{-\lambda_B |x|}} \\ = \frac{L(\lambda_s)}{L(\lambda_B)} \]

Simplify :

\[ \Lambda(X) \\ = \frac{\prod_{\forall x}\frac{\lambda_s}{2} e^{-\lambda_s |x|}}{\prod_{\forall x}\frac{\lambda_B}{2} e^{-\lambda_B |x|}} \\ = \frac{(\frac{\lambda_s}{2})^ne^{-n\lambda_s}e^{|x_1|}...e^{|x_n|}}{(\frac{\lambda_B}{2})^ne^{-n\lambda_B}e^{|x_1|}...e^{|x_n|}} \\ = \frac{(\frac{\lambda_s}{2})^ne^{-n\lambda_s}e^{\sum_{\forall X}|x_i|}}{(\frac{\lambda_B}{2})^ne^{-n\lambda_B}e^{\sum_{\forall X}|x_i|}} \\ = \frac{(\frac{\lambda_s}{2})^ne^{-n\lambda_s\sum_{\forall X}|x_i|}}{(\frac{\lambda_B}{2})^ne^{-n\lambda_B\sum_{\forall X}|x_i|}} \]

\[ Ln(\frac{(\frac{\lambda_s}{2})^ne^{-n\lambda_s\sum_{\forall X}|x_i|}}{(\frac{\lambda_B}{2})^ne^{-n\lambda_B\sum_{\forall X}|x_i|}}) \]

\[ = Ln((\frac{\lambda_s}{2})^ne^{-n\lambda_s\sum_{\forall X}|x_i|})- Ln((\frac{\lambda_B}{2})^ne^{-n\lambda_B\sum_{\forall X}|x_i|}) \]

\[ [nLn(\frac{\lambda_s}{2})-n\lambda_s\sum_{\forall X}|x_i|] - [nLn(\frac{\lambda_B}{2})-n\lambda_B\sum_{\forall X}|x_i|] \]

\[ = nLn(\frac{\lambda_s}{2})-n\lambda_s\sum_{\forall X}|x_i| - nLn(\frac{\lambda_B}{2})+n\lambda_B\sum_{\forall X}|x_i| \]

\[ = n[ Ln(\frac{\lambda_s}{2})-\lambda_s\sum_{\forall X}|x_i| - Ln(\frac{\lambda_B}{2})+\lambda_B\sum_{\forall X}|x_i| ] \]

\[ = n[ Ln(\frac{\lambda_s}{2})-Ln(\frac{\lambda_B}{2})+(\lambda_B-\lambda_s)\sum_{\forall X}|x_i| ] \]

\[ = n[ Ln(\lambda_s)-Ln(\lambda_B)+(\lambda_B-\lambda_s)\sum_{\forall X}|x_i| ] \]

Finally,

\[ Ln(\Lambda)= n[ Ln(\frac{\lambda_s}{\lambda_B})+(\lambda_B-\lambda_s)\sum_{\forall X}|x_i| ] \]

Consider that our Test-Statistic, is a function of our data \(X = \{x_1...x_n\}\)–and, recall that in our case \(\Lambda\) is large when \(H_o\) is more convincing that \(H_a\); and, vice versa. And since \(\lambda_{B} > \lambda_s\) we know \((\lambda_B-\lambda_s)\sum_{\forall X}|x_i| > 0\).

In other words, as \(\sum_{\forall X}|x_i| \rightarrow 0\), we find \(H_a\) more convincing and therefore Reject \(H_o\),

We reject \(H_o\) if \(\sum_{\forall X}|x_i| < C_{\text{riticalValue}}\) given some \(\alpha\) or equivalently, \(\sum_{\forall X}|x_i| \rightarrow 0\). As we expect graphically :

as more data clusters around 0, the more convinced we are that the \(\lambda\) is \(\lambda_B\) rather than \(\lambda_s\) as when \(\lambda \rightarrow \infty\) , we see value tend to be near 0.

4b) Is the test UNIFORMLY most powerful against :

\[ H_o : \lambda = \lambda_s \\ H_a : \lambda > \lambda_s \]

A test is uniformly most powerful when: It’s the best possible test (most likely to detect the truth) across all values of the alternative hypothesis — not just one.

This one-sided composite alternative: you’re not just testing one \(\lambda\) but for any larger one—\(\lambda_s\). And notice as \(\lambda \rightarrow \infty\), \(X\) becomes more concentrated near 0. So as :

\[ \sum_{\forall X}|x_i| \rightarrow 0\\ \]

If the sum is Small, the data is clustered near 0 and we have a larger \(\lambda\)

or,

\[ \sum_{\forall X}|x_i| \rightarrow \infty \]

If the sum is large, the data is spread out and we probably have a smaller \(\lambda\)

Therefore, the test is powerful because it responds consistently to changes in the parameter. As the parameter increases, the data becomes more concentrated around zero, and this statistic becomes smaller. Since this pattern is always in the same direction, the test can reliably use this information to distinguish between the null and the alternative.

Because of this consistent behavior and because the likelihood ratio depends on the data in a way that moves in only one direction as the parameter increases, the test is the best possible choice for every larger alternative value.

That is why this test is uniformly most powerful for this one-sided alternative.

10) Analysis of Variance : Stop Watches

Samples of each of three types of stopwatches were tested.

The following table gives thousands of cycles(on-off-restart) survived till some part of the mechanism failed.

##   TypeA TypeB TypeC
## 1   1.7  13.6  13.4
## 2   6.1  25.2  29.7
## 3  12.5  46.2  46.9
## 4  25.1    NA    NA
## 5  42.1    NA    NA

Test whether there is a significant difference among the types.

Construct the ANOVA table. State the null and the alternative hypotheses and conduct an F-test.

Setup : Consider that in this problem, we have one factor (Type) at 3 levels (A, B or C). We are curious if these types demonstrate a stats. sign. difference from the overall mean. Therefore, we use a : One way Analysis of variance :

\[ H_o : \mu_A = \mu_B = \mu_C \text{ [Type are all the same]} \\ H_a : \mu_i = \mu_j \text{ [At least One is different]} \]

Where we describe a population model as :

\[ Y_i = \mu_i + \epsilon_i \text{ For : } i\in\{A, B, C\} \]
Where \(\mu_i\) is a group mean given a specific level (A, B or C). And \(\epsilon_i\) is the noise we associate with a given group.

Where we assume :

\[ \epsilon_i \text{~} N(\mu_{\epsilon}=0, \sigma^2) \]

So we assume on average the noise is 0 [ie not a biased est] and variance of noise is constant [Homoscedasticity]

Graphically, there appears to be a significant difference. We will measure how convinced we are via the \(F-test\) using the one-way-anova

Verify Assumptions : Diagnostics :

Diagnostics Analysis :

As we can see can see, the Residuals vs. Fitted plot indicates approximately random scatter with an average residual of 0. Therefore, we can lead to believe the scatter is random and constant.

The QQ-Plot (Normality Plot) indicates our residuals appear to be approximately normal with the tail ends deviating—more specifically, data points appear to be T-distributed as indicated by the S-shape and heavy, tails which are typical of T-distributed data. And as we recall from past lectures, the T-dist. approx. the Normal dist as the \(df\) (sample size) increases. So we are quite satisfied with the normality condition aswell.

Overall, according to diagnostics plots, we can come to sound(ish) conclusions from our one-way-anova table. We should however be weary about the power as our sample size is quite small and therefore is more likely to be subject to random noise.

Conclusion :

##             Df Sum Sq Mean Sq F value Pr(>F)
## Type         2  375.4   187.7    0.69  0.529
## Residuals    8 2174.9   271.9

\[ F_o = \frac{MS_{Type}}{MS_{Err}}= \frac{MS_{Betweem}}{MS_{Within}}= \frac{187.7}{271.9} = 0.69 \\ \implies \\ p-val = 0.529 \]

Okay, so whats happening?

The summary Anova-table states that because \(MS_{Between}\) is relatively small compared to \(MS_{Within}\), we are lead to believe that there isnt a very large difference between Stopwatch Types. Our F-Value isnt sufficiently large s.t. we are statistically convinced of a considerable difference in means.

Therefore, we fail to reject the \(H_o\), we are more convinced that \(\mu_A = \mu_B = \mu_C \text{ [Type are all the same]}\) rather than \(H_a : \mu_i = \mu_j \text{ [At least One is different]}\)

HW 6: Hypothesis Testing and the Neyman-Pearson Framework

Isaiah C. Mireles

05/31/25