Hypothesis Testing!

class: title-slide

.row[
.col-7[
.title[
# Hypothesis Testing
]
.subtitle[
## Hypothesis Testing
]
.author[
### Laxmikant Soni [blog](https://laxmikants.github.io) [](https://github.com/laxmiaknts) [](https://twitter.com/laxmikantsoni09)
]

.affiliation[
]

]

.col-5[

.logo[

<img src="figures/rmarkdown.png" width="480" />
]

]
]

---

# Statistical Hypothesis Testing

.pull-left[

## Hypothesis Testing

* **Definition**: Hypothesis testing is a statistical method that uses sample data to evaluate a hypothesis about a population parameter.
* **Key Terms**:
  - **Null Hypothesis `$H_0$`**: Assumes no effect or no difference in the population.
  - **Alternative Hypothesis `$H_1$`**: Assumes an effect or difference exists.
  - **p-value**: Probability of obtaining a test result at least as extreme as the one observed, assuming `$H_0$` is true.
* **Example**: Testing if a new drug is more effective than a placebo. `$H_0$`: The drug has no effect. `$H_1$`: The drug has a positive effect.

]

.pull-right[

* **Hypothesis Testing Steps**:
  1. **State the Hypotheses**: Define `$H_0$` and `$H_1$`
  2. **Select a Significance Level `$alpha$`**: Common choices are 0.05, 0.01.
  3. **Compute the Test Statistic**: Based on the sample data (e.g., `$t-statistic$`, `$z-score$`.
  4. **Calculate the p-value**: Determines the strength of the evidence against  `$H_0$`.
  5. **Make a Decision**: Reject `$H_0$` if `$p \leq alpha$`; otherwise, do not reject `$H_0$`.

* **Formula for a Test Statistic (e.g., t-test)**:
  `$t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}$`
  
]

---

# Statistical Hypothesis Testing

.pull-top[

## Z-Test

| **Test Name**   | **Use**                                                                                                         | **Example**                                                                                                         |
|------------------|-----------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| **Z-Test**       | For testing means with a known population variance and a large sample size (typically $ n > 30 $).        | Testing if the mean height of a group of students differs from the known population mean height of 160 cm.        |

]

.pull-bottom[

Example Statement: "A researcher wants to determine if the average weight of adult males in a specific city differs from the known national average weight of 180 pounds. A random sample of 50 adult males from the city is taken, and their average weight is found to be 185 pounds with a known population standard deviation of 15 pounds. The researcher will use a Z-Test to assess whether this difference is statistically significant."

`$$Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}$$`

]

---

# Statistical Hypothesis Testing

.pull-top[

## t-Test

| **Test Name** | **Use** | **Example** |
|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| **t-Test** | When testing means with an unknown population variance and a small sample size (typically $ n < 30 $). | Testing if a new teaching method affects test scores by comparing scores before and after implementing the method. |
| - *One-sample t-test* | Compares sample mean to a known population mean. | Testing if the average score of a class of 20 students on a math exam is different from the known average score of 75. |
| - *Independent (two-sample) t-test* | Compares means of two independent groups. | Comparing the test scores of students from two different schools to see if one school performs better than the other. |
| - *Paired t-test* | Compares means of two related groups (e.g., before-and-after measurements). | Measuring the weight of participants before and after a diet program to see if there is a significant weight loss. |

]

.pull-bottom[

`$t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}$`

]
---

# Statistical Hypothesis Testing

.pull-top[

## F-Test

| **Test Name**           | **Use**                                                                                                                               | **Example**                                                                                                         |
|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| **F-Test**              | To compare variances of two populations, often as a preliminary test before conducting an ANOVA.                                     | Testing if two manufacturing processes have different levels of variability in production.                         |

]

.pull-bottom[

A company wants to compare the variability in assembly times between two different manufacturing processes to ensure consistent production. They collect sample assembly times from each process and use an F-Test to determine if there is a significant difference in the variances of the two processes
`$$F = \frac{\text{Variance}_1}{\text{Variance}_2}$$`
]

---

# Statistical Hypothesis Testing

.pull-top[

## ANOVA

**ANOVA (Analysis of Variance)**
   - **Types**:
     - **One-way ANOVA**: Tests for differences in means among three or more independent groups.
     - **Two-way ANOVA**: Tests for the influence of two categorical variables on a continuous outcome.
   - **Use**: When comparing means across multiple groups (more than two).
   - **Example**: Testing if three different fertilizers result in different crop yields.

]

.pull-bottom[

The formula for one-way ANOVA is given by:

`$$F = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}}$$`

Where:
- `$F$` = F statistic
- `$\text{MS}_{\text{between}}$` = Mean Square Between Groups
- `$\text{MS}_{\text{within}}$` = Mean Square Within Groups
]

---

# Statistical Hypothesis Testing

.pull-top[

## ANOVA Sum of Squares Calculations

The sum of squares for between and within groups are calculated as follows:

1. **Sum of Squares Between Groups** `$SS_{\text{between}}$`:

`$$SS_{\text{between}} = \sum_{i=1}^{k} n_i (\bar{X}_i - \bar{X})^2$$`

Where:
   - `$k$` = number of groups
   - `$n_i$` = number of observations in group `$i$`
   - `$\bar{X}_i$` = mean of group `$i$`
   - `$\bar{X}$` = overall mean of all groups combined
   
]

---

# Statistical Hypothesis Testing

.pull-top[

## ANOVA Sum of Squares Calculations

2. **Sum of Squares Within Groups** `$SS_{\text{within}}$`:

`$$SS_{\text{within}} = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_i)^2$$`

Where:
   - `$X_{ij}$` = individual observation `$j$` in group `$i$`
   - `$\bar{X}_i$` = mean of group `$i$`
   
]

---

# Statistical Hypothesis Testing

.pull-top[

## Chi-Square Test

* **Definition**: The Chi-Square Test is a statistical test used to determine if there is a significant association between categorical variables. It is particularly useful for examining the independence of variables in a contingency table.

* **Types of Chi-Square Tests**:
  - **Chi-Square Test for Independence**: Tests if there is a relationship between two categorical variables.
  - **Chi-Square Goodness-of-Fit Test**: Tests if the observed distribution of a single categorical variable fits an expected distribution.

]

---

# Statistical Hypothesis Testing

.pull-top[

## Chi-Square Test

* **Formula**: The Chi-Square statistic ($\chi^2$) is calculated as follows:

$$
\chi^2 = \sum \frac{(O - E)^2}{E}
$$

Where:
- `$\chi^2$` = Chi-Square statistic
- `$O$` = observed frequency
- `$E$` = expected frequency

* **Example**:

A researcher wants to test if there is an association between smoking habits (smoker, non-smoker) and exercise frequency (low, medium, high) among a group of adults. They collect a sample and record the observed frequencies in each category combination. They then use the Chi-Square Test for Independence to determine if there is a statistically significant association between smoking status and exercise frequency.

]

---

class: inverse, center, middle
# Thanks