p-value in hypothesis testing

2024-02-03

Introduction to p-value in hypothesis testing

-The P-value is a statistical measure that quantifies the strength of evidence against the null hypothesis.

-It represents the probability of obtaining test results as extreme as the observed results, assuming that the null hypothesis is true.

-A small P-value (typically less than the chosen significance level, often denoted as alpha) suggests strong evidence against the null hypothesis.

-Conversely, a large P-value indicates weak evidence against the null hypothesis and suggests that the observed data is consistent with the null hypothesis.

-P-values do not directly provide the probability of the null hypothesis being true or false, but rather indicate the strength of evidence against it.

Example for more understanding

-In order to comprehend P-Value and its application, we will employ an example of a p-value scenario and generate several charts to illustrate it. We will be comparing the test results of Groups A and B with each other in this case.

-We’ll calculate the P-value and visualize the data using plotly.

Process to calculate t value:

\(\text{Null hypothesis: } \mu_A - \mu_B = 0\)

\(\text{Alternative hypothesis: } \mu_A - \mu_B \neq 0\)

Formula for t value:

\(t = \frac{(M_A - M_B) - (\mu_A - \mu_B)} {S_(M_A - M_B)}\)

\(M_A\) = Sample mean of group A

\(M_B\) = Sample mean of group B

\(\mu_A\) = Population mean of group A

\(\mu_B\) = Population mean of group B

\(S_(M_A - M_B)\) = Estimated standard error for the difference between two means

Process to calculate p-value:

Formula

p = 2\(\cdot\)(1 - F(t))

Understanding

-We take \(\alpha = 0.05\)

-If p-value < \(\alpha\) , we reject the null hypothesis.

-If p-value > \(\alpha\) , we fail to reject the null hypothesis.

Following are the t values and p values:

[1] "p-value: 0.0011799406198469"

[1] "t-value: -3.34167488371412"

Plotly

Analysis

-The test results for Groups A and B are roughly regularly distributed.

-To compare the means of these two groups and get the p-value, we used a t-test.

-The Plotly graphic indicates the p-value on the plot and shows the histograms for the two groups.

-The p-value we got (0.001) is less than the significance level we set (\(\alpha\) = 0.05). So, now we can reject the null hypothesis i.e., \(\mu_A \neq \mu_B\)

Visualizatin of data(ggplot)

Histogram

The boxplot’s code(ggplot)

# Combining data into a single data frame
data <- data.frame(
  Group = c(rep("Group A", length(group_a)), 
            rep("Group B", length(group_b))),
  Test_Score = c(group_a, group_b)
)

# Creating box plot using ggplot
boxplot <- ggplot(data, aes(x = Group, y = Test_Score, color = Group)) +
  geom_boxplot() +
  labs(title = "Boxplot of Test Scores between Group A and Group B",
       x = "Group",
       y = "Test Scores") +
  theme_minimal()

Visualizatin of data(ggplot)

Box Plot

boxplot

The density curve’s code (ggplot)

# Combine data into a single data frame
data <- data.frame(
  group = rep(c("Group A", "Group B"), each = 50),
  score = c(group_a, group_b)
)

# Create ggplot density curve
densitycurve <- ggplot(data, aes(x = score, color = group)) +
  geom_density(alpha = 1) +
  labs(title = "Density Curve of Test Scores for Group A and Group B",
       x = "Test Scores",
       y = "Density") +
  theme_grey()

Visualizatin of data(ggplot)

Density curve

densitycurve

Analysis

Histogram

-The distribution of test results for Groups A and B is shown in this histogram.Group B has the highest score.

Box Plot

-It shows the median and quartiles for each group. The median of the group B is higher than group A.

Density Curve

-the spread of the density curve for Group B seems to be slightly wider than that of Group A, indicating that there might be greater variability in the test scores within Group B compared to Group A.

-Both groups exhibit approximately symmetric distributions, which suggests that the test scores are fairly normally distributed within each group.