DAT 301 Homework 3

2024-03-14

Introduction: Hypothesis Testing

Hypothesis testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test.

-It is used to estimate the relationship between 2 statistical variables.

-Null Hyothesis (H0): The claim being made (We are trying to disprove this)

-Alternative Hypothesis (Ha): The hypothesis we are trying to prove

Example: Exam Data Set

Question: Does the number of hours a student studies for an exam impact their scores?

-Null Hypothesis (H0): There is no significant relationship between the number of hours students study and their exam scores.

-Alternative Hypothesis (Ha)): There is a significant relationship between the number of hours students study and their exam scores. The more hours students study, the higher their exam scores will be.

Slide with R Output: Exam Data Set

I created a dataset for the exam results of students. There are 50 students, with ID numbers from 1-50. Their scores, pass/fail status, and number of hours they studied are all on there.

##    Student_ID Exam_Score Pass_Fail Hours_Studied
## 1           1      71.50      Pass             5
## 2           2      91.53      Pass             7
## 3           3      76.36      Pass             5
## 4           4      95.32      Pass             6
## 5           5      97.62      Pass             9
## 6           6      61.82      Fail             2
## 7           7      81.12      Pass             5
## 8           8      95.70      Pass             8
## 9           9      82.06      Pass             2
## 10         10      78.26      Pass             1
## 11         11      98.27      Pass             9
## 12         12      78.13      Pass             9
## 13         13      87.10      Pass             6
## 14         14      82.91      Pass             5
## 15         15      64.12      Fail             9
## 16         16      95.99      Pass            10
## 17         17      69.84      Fail             4
## 18         18      61.68      Fail             6
## 19         19      73.12      Pass             8
## 20         20      98.18      Pass             6
## 21         21      95.58      Pass             6
## 22         22      87.71      Pass             7
## 23         23      85.62      Pass             1
## 24         24      99.77      Pass             6
## 25         25      86.23      Pass             2
## 26         26      88.34      Pass             1
## 27         27      81.76      Pass             2
## 28         28      83.77      Pass             4
## 29         29      71.57      Pass             5
## 30         30      65.88      Fail             6
## 31         31      98.52      Pass             3
## 32         32      96.09      Pass             9
## 33         33      87.63      Pass             4
## 34         34      91.82      Pass             6
## 35         35      60.98      Fail             9
## 36         36      79.11      Pass             9
## 37         37      90.34      Pass             7
## 38         38      68.66      Fail             3
## 39         39      72.73      Pass             8
## 40         40      69.27      Fail             9
## 41         41      65.71      Fail             3
## 42         42      76.58      Pass             7
## 43         43      76.55      Pass             3
## 44         44      74.75      Pass             7
## 45         45      66.10      Fail             6
## 46         46      65.55      Fail            10
## 47         47      69.32      Fail             5
## 48         48      78.64      Pass             5
## 49         49      70.64      Pass             8
## 50         50      94.31      Pass             3

A Histogram of Exam Scores Using GGPLOT2

This histogram shows that the majority of exam scores are between 70% and 80%

A boxplot of Exam Scores by Pass/Fail Using GGPLOT2:

R Code for the Boxplot

ggplot(data = exam_data, aes(x = Pass_Fail, y = Exam_Score, fill = Pass_Fail)) +

geom_boxplot() +

labs(title = “Boxplot of Exam Scores by Pass/Fail”,

   x = "Pass/Fail",
   
   y = "Exam Score") +

scale_fill_manual(values = c(“Pass” = “green”, “Fail” = “red”))

PLOTLY Scatterplot

This scatterplot shows the relationship between hours of studying and exam scores.

Latex

For independent samples t-test:

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \]

Conclusions

Histogram: The histogram shows that the majority of exam scores are between 70% and 80%.

Boxplot:The boxplot shows that the range of people who failed got a score of ~64%-69% and the people who passed got a score of ~79%-95 percent. In conclusion, there is a larger range of scores for people who passed than for people who failed.

Scatterplot: The scatterplot shows a correlation coefficient of 0.01330381.

Based off of the plots, there is a very weak relationship between the number of hours studied and exam scores. Therefore, our null hypothesis is true and the alternative hypothesis was proved false.