2025-02-09
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
DAT301 Intro to Hypothesis Testing
Hypothesis testing is a statistical method used to make decisions or inferences about a population based on a sample of data.
The goal of hypothesis testing is to determine whether there is enough statistical evidence in the sample data to reject the null hypothesis in favor of the alternative hypothesis.
A good example of this would be the following scenario. Does a new drug improve recovery rates compared to the standard?
DAT301 Null and Alternative Hypotheses
Null Hypothesis (H₀): A statement of no effect, no difference, or status quo.
\(H_0: \mu =\mu_0\)
Alternative Hypothesis (H₁ or Ha): A statement that contradicts the null hypothesis, representing a new effect or difference.
\(H_a: \mu \neq \mu_0\)
In our example the Null Hypothesis and the Alternative Hypothesis can be the following:
DAT301 Exploring the Data
DAT301 Recovery Rate

DAT301 Recovery Time

DAT301 Types of Hypothesis test
There are several number of tests that can be devised for Hypothesis testing.
The two most common hypothesis testing methods are the one-sided and two-sided test
The Chi-square tests compare the size of the discrepancies between the expected results and the actual results, given the size of the sample and the number of variables in the relationship.
For our example we will perform a chi-square test.
\(\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\)
The observed frequency is:
\(O_i\)
The expected frequency is:
\(E_i\)
Expected frequency is calculated as:
\(E_i = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}\)
DAT301 Chi-square test
# Create a contingency table
recovery_table <- table(drug_recovery_data$Drug_Type, drug_recovery_data$Recover_Rate)
colnames(recovery_table) <- c("Not Recovered", "Recovered")
#View the table
print(recovery_table)
##
## Not Recovered Recovered
## New 3 47
## Standard 14 36
#Perform a Chi-square test
Chi_test <- chisq.test(recovery_table)
#Output the test results
print(Chi_test)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: recovery_table
## X-squared = 7.0872, df = 1, p-value = 0.007764
DAT301 p-value
If the p-value < 0.05 then we reject the null hypothesis and conclude that the new drug has a significantly higher recovery rate.
Since the p-value is shown as 0.007764 we can safely reject the Null Hypothesis in this scenario and proceed with the Alternative hypothesis.