Data Dive 7

After having exploring your dataset over the past few weeks, you should already have some questions. Devise at least two different null hypotheses based on two different aspects (e.g., columns) of your data. For each hypothesis: Come up with an alpha level, power level, and minimum effect size, and explain why you chose each value. Determine if you have enough data to perform a Neyman-Pearson hypothesis test. If you do, perform one and interpret results. If not, explain why. Perform a Fisher’s style test for significance, and interpret the p-value. So, technically, you have two hypothesis tests for each hypothesis, equating two four total tests. Build two visualizations that best illustrate the results from the two pairs of hypothesis tests, one for each null hypothesis. For each of the above tasks, you must explain to the reader what insight was gathered, its significance, and any further questions you have which might need to be further investigated.

Most of the columns are continous or have been converted continous so there is error in neyman as well as fisher style test.

Importing all the libraries

## Loading required package: ggplot2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ stringr   1.5.0
## ✔ forcats   1.0.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'kableExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Hypothesis testing 1

Null hypothesis \(H_0\): TThere is no significant difference in the mean admission grade between male and female students.

Alternative hypothesis \(H_a\):There is a significant difference in the mean admission grade between male and female students..

alpha level (α): A significance level of 0.05 (5%) is commonly used . we are willing to accept a 5% chance of making a Type I error

Power level (1-β): 0.80, here we aim for a power level of 0.80, which is a standard level to detect a true effect.

Minimum effect size (δ): Moderate effect size of 0.2

## 
##  Two Sample t-test
## 
## data:  data$Admission_Grade by data$Gender
## t = 3.8877, df = 98, p-value = 0.0001842
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  1.742816 5.377231
## sample estimates:
## mean in group 0 mean in group 1 
##        78.73204        75.17202

t-statistic: 3.8877 Degrees of freedom (df): 98 p-value: 0.0001842 These results indicate there is difference in the mean admission grade between male and female students. The p-value is very small suggesting that there is no significant difference in the means which proves alternate hypothesis (Ha)

Confidence Interval:

95 percent confidence interval: (1.742816, 5.377231) The 95 percent confidence interval for the difference in means suggests that we can be 95 percent confident that the true difference in means falls within this interval.Also the interval does not include 0, that proves rejection of the null hypothesis.

Sample Estimates:

Mean in group 0 (female): 78.73204 Mean in group 1 (male): 75.17202 These sample estimates indicate that, on average, female students have a higher admission grade (78.73) compared to male students (75.17).

Statistical Power:

Power: 0.8065 A power of 0.8065 signifies a high probability of correctly rejecting the null hypothesis when there is a true difference.

#fishers

## [1] "Result:"

## 
##  Fisher's Exact Test for Count Data
## 
## data:  contingency_table
## p-value = 0.001593
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.08494136 0.60606379
## sample estimates:
## odds ratio 
##  0.2344063

Test Result:

p-value: The p-value obtained from the test is 0.001593, which is less than the chosen significance level (e.g., α = 0.05). This is against the null hypothesis

Confidence Interval: The 95 percent confidence interval for the odds ratio is (0.085, 0.606). This interval does not include 1, further supporting the rejection of the null hypothesis.

Sample Estimates: The odds ratio is approximately 0.2344.

In summary, the Fisher’s Exact Test indicates a there is no support for the null hupothesis.

## Warning: Removed 2 rows containing missing values (`geom_bar()`).

## Warning: Removed 2 rows containing missing values (`geom_text()`).

Hypothesis testing 2

Null hypothesis \(H_0\): There is no significant difference in the scholarship holder and admission grade.

Alternative hypothesis \(H_a\):There is a significant difference in the scholarship holder and admission grade.

alpha level (α): A significance level of 0.05 (5%) is commonly used . we are willing to accept a 5% chance of making a Type I error

Power level (1-β): 0.80, here we aim for a power level of 0.80, which is a standard level to detect a true effect.

Minimum effect size (δ): Moderate effect size of 0.2

## [1] "T-Test Result:"

## 
##  Welch Two Sample t-test
## 
## data:  data$Admission_Grade by data$Scholarship_Holder
## t = 3.8877, df = 97.951, p-value = 0.0001843
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
##  1.742805 5.377243
## sample estimates:
##  mean in group No mean in group Yes 
##          78.73204          75.17202

## [1] "Neyman-Pearson Test Power Calculation:"

## 
##      Two-sample t test power calculation 
## 
##               n = 100
##               d = 0.2
##       sig.level = 0.05
##           power = 0.2906459
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

T-Test Result:

Test Type: Welch Two Sample t-test Data: Admission grade by Scholarship holder t-value: 3.8877 Degrees of Freedom (df): Approximately 97.951 p-value: 0.0001843 Alternative Hypothesis: The true difference in means between the “No” (non-scholarship holders) and “Yes” (scholarship holders) groups is not equal to 0. 95 Percent Confidence Interval: The estimated difference in means is between 1.742805 and 5.377243. Sample Estimates: The mean admission grade for the “No” group is 78.73204, while the mean for the “Yes” group is 75.17202. Neyman-Pearson Test Power Calculation:

Sample Size (n): 400 (number in each group) Effect Size (d): 0.2 Significance Level (α): 0.05 Power: Approximately 0.8065 Alternative Hypothesis: Two-sided

The results indicate a statistically significant difference in the mean admission grade between scholarship holders (“Yes” group) and non-scholarship holders (“No” group). The t-value is 3.8877, and the p-value is extremely small (0.0001843), This suggests strong evidence against the null hypothesis.

The Neyman-Pearson Test Power Calculation indicates that the test has a relatively high power of approximately 0.8065.

## [1] "Result:"

## 
##  Fisher's Exact Test for Count Data
## 
## data:  contingency_table
## p-value = 0.001593
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.08494136 0.60606379
## sample estimates:
## odds ratio 
##  0.2344063

Fisher test supports the alternate hypothesis

## Warning: Removed 1 rows containing missing values (`geom_bar()`).

## Warning: Removed 1 rows containing missing values (`geom_text()`).

## Warning: Removed 1 rows containing missing values (`geom_bar()`).

## Warning: Removed 1 rows containing missing values (`geom_text()`).

Data Dive 7/2

Pritesh Shah

2023-10-09

Data Dive 7

Most of the columns are continous or have been converted continous so there is error in neyman as well as fisher style test.

Hypothesis testing 1

Hypothesis testing 2