Introduction Definitions

Types of Hypotheses

Null Hypothesis: This is the default assumption that there are no significant differences between any of the specified populations in the data.

Alternative Hypothesis: This is the hypothesis being tested. It states that there is a statistically significant difference in the data.

P-Value

Definition: The p-value is the probability of the data and its differences occuring by random chance. It measures the strength of evidence against the null hypothesis.

Introduction Hypothesis Testing

Application:

  1. A confidence level is chosen, usually 95%. This represents the confidence with which the null hypothesis can be rejected.

  2. Statistical tests are used to find the p-value of a dataset.

  3. The p-value is used to determine if the null hypothesis can be rejected.

How?

If \(p < 0.5\) and \(p > 0.5\), then…

it suggests with 95% confidence that the observed data is unlikely to have occured by random chance. The null hypothesis can be rejected and the alternative hypothesis can be accepted.

If \(p < 0.5\) and \(p > 0.5\), then…

the null hypothesis cannot be rejected. There is insufficient evidence to suggest a significant difference between the populations being tested.

Example Data Set

Flowers Dataset

   treat nitrogen block height weight leafarea shootarea flowers
1    tip   medium     1    7.5   7.62     11.7      31.9       1
4    tip   medium     1   10.4   8.78     11.9      20.3       1
13   tip   medium     2    7.1   8.16     29.6       9.7       2
35   tip      low     1    6.4   5.97      8.7       7.3       2
46   tip      low     2    5.6   8.10     10.1       5.8       2
77 notip     high     2    6.0  13.68     16.2     133.7       2

Dataset contains 96 observations with 8 variables.

Variables of Interest - Nitrogen Level (low, medium, high) - Flowers (# of flowers)

Null Hypothesis: There is no correlation between nitrogen levels and amount of flowers

Alternative Hypothesis: The nitrogen level affects the amount of flowers.

Data Visualization P.1

Box Plot using ggplot

ggplot(data, aes(x = nitrogen, y = flowers, fill=nitrogen)) +
  geom_boxplot() +
  labs(title = "Distribution of flowers by Nitrogen Level",
       x = "Nitrogen Level",
       y = "Number of Flowers")+
  theme(plot.title = element_text(hjust = 0.5))

Data Visualization P.2

Density Plot using ggplot

Data Visualization P.3

Violin Plot using plotly

Testing

The hypotheses can be represented mathematically as:

Null Hypothesis: \(H_0 = \mu_1 = \mu_2 = \mu_3\)

Alternative Hypothesis: \(H_{\text{alt}} = \mu_1 \neq \mu_2 \neq\mu_3\)

Confidence Level: 95%

Significance Level:

\(\alpha = 1-0.95\)

\(\alpha = 0.5\)

Results

To compare the mean number of flowers between 2 nitrogen levels, a statistical test such as a t-test can be used.

If \(p < \alpha\), the null hypothesis can be rejected.There is evidence to support a statistically significant difference between the mean number of flowers for each nitrogen level.

If \(p > \alpha\), the null hypothesis cannot be rejected. There is no statistical significance within the data.