2023-11-12

1

Hypthesis Testing

Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population.

A statistical hypothesis is an assumption, statement or claim, which may or may not be true. Commonly we formulate two hypotheses a, so called, null hypothesis (H0) and an alternative hypothesis (H1). The following six step procedure can be used to conduct most common hypothesis tests.

The following six step procedure can be used to conduct most common hypothesis tests:

  1. H0: State the null hypothesis
  2. H1: State the alternative hypothesis
  3. Select a level of significance for the test (commonly = 0.1; 0.05; 0.01)
  4. Select an appropriate test statistic and determine the critical region
  5. Evaluate the test statistic from a random sample
  6. State the conclusion: Do not accept the null hypothesis, H0, if the test statistic value falls in the critical region, otherwise accept H0.

Summary of USArrest Data

summary(USArrests)
##      Murder          Assault         UrbanPop          Rape      
##  Min.   : 0.800   Min.   : 45.0   Min.   :32.00   Min.   : 7.30  
##  1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50   1st Qu.:15.07  
##  Median : 7.250   Median :159.0   Median :66.00   Median :20.10  
##  Mean   : 7.788   Mean   :170.8   Mean   :65.54   Mean   :21.23  
##  3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75   3rd Qu.:26.18  
##  Max.   :17.400   Max.   :337.0   Max.   :91.00   Max.   :46.00

P Value

P-value ≤ α: The difference between the means is statistically significantly (Reject H0). If the p-value is less than or equal to the significance level, the decision is to reject the null hypothesis.

P-value > α: The difference between the means is not statistically significant (Fail to reject H0).If the p-value is greater than the significance level, the decision is to fail to reject the null hypothesis.

Statistics

## 
## Call:
## lm(formula = assault_rates ~ population, data = USArrests)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -150.78  -61.85  -18.68   58.05  196.85 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  73.0766    53.8508   1.357   0.1811  
## population    1.4904     0.8027   1.857   0.0695 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 81.33 on 48 degrees of freedom
## Multiple R-squared:  0.06701,    Adjusted R-squared:  0.04758 
## F-statistic: 3.448 on 1 and 48 DF,  p-value: 0.06948
## [1] 0.2588717

Correlation between Population and Murder Arrests

Scatter Plot: Urban Population and Assault Arrests

#Slide with R code
g <- ggplot(data=USArrests, aes(x= population,y= assault_rates)) + geom_point()+ labs(x = "Population", y = "Assault")
g

Linear Regression Scatter Plot: Urban Population and Assault Arrests

#Slide with R code
g2 <- g + geom_smooth(method="lm") + labs(x = "Population", y = "Assault")
g2
## `geom_smooth()` using formula = 'y ~ x'