2025-02-09

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

DAT301 Intro to Hypothesis Testing

Hypothesis testing is a statistical method used to make decisions or inferences about a population based on a sample of data.

The goal of hypothesis testing is to determine whether there is enough statistical evidence in the sample data to reject the null hypothesis in favor of the alternative hypothesis.

A good example of this would be the following scenario. Does a new drug improve recovery rates compared to the standard?

DAT301 Null and Alternative Hypotheses

  • Null Hypothesis (H₀): A statement of no effect, no difference, or status quo.

    \(H_0: \mu =\mu_0\)

  • Alternative Hypothesis (H₁ or Ha): A statement that contradicts the null hypothesis, representing a new effect or difference.

    \(H_a: \mu \neq \mu_0\)

In our example the Null Hypothesis and the Alternative Hypothesis can be the following:

  • Null Hypothesis: The recovery rate of the new drug is the same as that of the standard drug.

  • Alternative Hypothesis: The new drug improves recovery rates compared to the standard drug.

DAT301 Exploring the Data

DAT301 Recovery Rate

DAT301 Recovery Time

DAT301 Types of Hypothesis test

There are several number of tests that can be devised for Hypothesis testing.

The two most common hypothesis testing methods are the one-sided and two-sided test

The Chi-square tests compare the size of the discrepancies between the expected results and the actual results, given the size of the sample and the number of variables in the relationship.

For our example we will perform a chi-square test.

\(\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\)

  • The observed frequency is:

    \(O_i\)

  • The expected frequency is:

    \(E_i\)

  • Expected frequency is calculated as:

    \(E_i = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}\)

DAT301 Chi-square test

# Create a contingency table
recovery_table <- table(drug_recovery_data$Drug_Type, drug_recovery_data$Recover_Rate)
colnames(recovery_table) <- c("Not Recovered", "Recovered")
#View the table
print(recovery_table)
##           
##            Not Recovered Recovered
##   New                  3        47
##   Standard            14        36
#Perform a Chi-square test
Chi_test <- chisq.test(recovery_table)
#Output the test results
print(Chi_test)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  recovery_table
## X-squared = 7.0872, df = 1, p-value = 0.007764

DAT301 p-value

  • A p-value is a statistical measurement used to validate a hypothesis against observed data.

  • The lower the p-value the greater the statistical significance of the observed difference.

If the p-value < 0.05 then we reject the null hypothesis and conclude that the new drug has a significantly higher recovery rate.

Since the p-value is shown as 0.007764 we can safely reject the Null Hypothesis in this scenario and proceed with the Alternative hypothesis.