2023-10-15

Introduction

  • A p-value is a statistical measure that is used to determine the strength of evidence against a null hypothesis.
  • The null hypothesis is a statement that there is no effect or relationship between two variables.
  • The p-value is calculated by assuming that the null hypothesis is true and then determining the probability of obtaining the observed data or more extreme data.
  • A low p-value indicates that the data is unlikely to have occurred under the null hypothesis, and thus provides evidence against the null hypothesis.

Example

To understand P-Value and how it is used, we will be using an example of a p-value situation and create multiple plots to visualize it. In this example, we’ll be comparing the test scores of two groups - Group A and Group B - with one another.

Calculating Test Statistic

Suppose we conducted a t-test with the following results:

  • Sample mean difference: 5
  • Sample standard deviation: 10
  • Sample size: 50
  • Degrees of freedom: 98
  • Null hypothesis: \(\mu_A - \mu_B = 0\)
  • Alternative hypothesis: \(\mu_A - \mu_B \neq 0\)

We calculated the test statistic to be \(t = 2.5\).

Calculating P-Value

The formula to calculate the p-value is: \[p = 2 \cdot (1 - F(t))\]

where \(F(t)\) is the cumulative distribution function of the t-distribution with 98 degrees of freedom that we calculated from the last slide.

Using this formula, we find: \[p = 2 \cdot (1 - F(2.5))\]

Showcasing Plotly

Analysis

  • Group A and Group B have test scores that are approximately normally distributed
  • We performed a t-test to compare the means of these two groups and extract the p-value
  • The Plotly plot displays histograms for both groups and annotates the p-value on the plot.
  • What does this mean? The p-value indicates whether there is strong evidence against the null hypothesis that the means of the two groups are equal. If the p-value is less than a chosen significance level (in our case, 0.05), we reject the null hypothesis.

Showcasing ggplot

Showcasing ggplot (continued)

This is a sample of how the code is written.

ggplot(data, aes(x = Group, y = Scores)) +
  geom_boxplot(fill = "lightblue", alpha = 0.7) +
  labs(title = "Boxplot of Test Scores by Group (ggplot)",
       x = "Group",
       y = "Test Scores") +
  theme_minimal()

Analysis

  • Histogram of Test Scores (ggplot):
    • This histogram displays the distribution of test scores for Group A and Group B
    • Group A’s scores appear to be centered around 75, while Group B’s scores are centered around 80.
    • Group B tends to have higher scores.
  • Boxplot of Test Scores by Group (ggplot):
    • Median, quartiles, and potential outliers for each group
    • The boxplot confirms the central tendency difference between the two groups, with Group B having a higher median score
    • Group A has a wider spread of scores compared to Group B