STA551 Final Presentation

class: center, middle, inverse, title-slide

.title[
# STA551 Final Presentation
]
.subtitle[
## A Cumulative Assessment on the Effects of Different Factors on the Final Exam Grades of Students
]
.author[
### Alice Xiang
]
.date[
### 2024-12-11
]

---

<h2 align="center"> Table of Contents</h2>

.pull-left[
- Introduction

- Description of the Data

- Research Questions

- EDA
]

.pull-right[
- Multiple Linear Regression

- Logistic Regression

- Summary and Discussion

- References and Appendix
]
---
.pull-left[

## Introduction

- Student performance is a multifaceted measure of the success of both individual students and academic institutions
- A large variety of factors contribute to student success, both on an individual and societal level
- Academic achievement an important factor in the future success of an individual.

## Description of the Data

We chose [this dataset](https://www.kaggle.com/datasets/lainguyn123/student-performance-factors) of various factors affecting student performance in exams. It includes information on study habits, attendance, parental involvement, and other aspects influencing academic success across 20 variables. 
]

.pull-right[
## Variables

The following are the variables included in the dataset (7 continuous, 13 categorical): 
.pull-left[
Numeric:
- Hours_Studied
- Attendance
- Sleep_Hours
- Previous_Scores
- Tutoring_Sessions
- Physical_Activity
- Exam_Score
]

.pull-right[
Categorical:
- Parental_Involvement
- Access_to_Resources
- Extracurricular_Activities
- Motivation_Level
- Internet_Access
- Family_Income
- Teacher_Quality
- School_Type
- Peer_Influence
- Learning_Disabilities
- Parental_Education_Level
- Distance_from_Home
- Gender
]

]
---
class: inverse center middle

## Research Question 1: How do different predictors relate to the final exam performance of students?

## Research Question 2: What factors best predict whether or not a student has a satisfactory (greater than or equal to 70%) final exam grade, and how accurately can their performance be predicted based on these factors?

---
class: inverse center middle

## EDA

---
## Continuous Response for MLR

.pull-left[
The distribution for the response variable of Exam_Score shows evidence of right skew: 
<img src="Final-Presentation_files/figure-html/unnamed-chunk-4-1.png" width="100%" />
]

.pull-right[
Distributions of the continuous explanatory variables and pairwise correlations:

Due to sparse categories in Tutoring_Sessions, we discretize it as follows: 0 Tutoring Sessions per month, 1, 2, 3, and 4+.

]
---
## Continuous Response (cont.)

Box plots of categorical predictors against Exam_Score:

.scroll-100[
<img src="Final-Presentation_files/figure-html/unnamed-chunk-7-1.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-7-2.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-7-3.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-7-4.png" width="100%" />
]

---

# Categorical Response

Mosaic plots to compare the binary response variable for the logistic model with the different categorical predictors:

.scroll-100[
<img src="Final-Presentation_files/figure-html/unnamed-chunk-8-1.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-2.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-3.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-4.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-5.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-6.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-7.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-8.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-9.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-10.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-11.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-12.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-8-13.png" width="100%" />

]

---

# EDA Takeaways:

.pull-left[
Continuous Response:
- Right skew to response variable Exam_Score
- Positive correlations for Attendance and Hours studied on Exam Score
- Parental Involvement, Access to Resources, Internet Access, Tutoring Sessions possibly positively associated with Exam Score
- Learning Disability possibly negatively associated with Exam Score
]

.pull-right[
Categorical Response: 
- Positive association evident across levels of Tutoring Sessions
- Learning Disability negatively association
- School type and Gender no obvious association
- All other predictors slight positive association
]

Analytic dataset available [here](https://github.com/xiang-a/sta551/blob/main/analytic_student_performance.csv)

---
class: inverse center middle

# Multiple Linear Regression

---

# Create Candidate Models

Full Model Parameter Estimates and Residual Plots:

.scroll-100[

Table: Full Model examining Student Final Exam Scores

|                                     |   Estimate| Std. Error|     t value| Pr(>&#124;t&#124;)|
|:------------------------------------|----------:|----------:|-----------:|------------------:|
|(Intercept)                          | 34.0593039|  0.3506238|  97.1391544|          0.0000000|
|Hours_Studied                        |  0.2951818|  0.0043406|  68.0053170|          0.0000000|
|Attendance                           |  0.1988267|  0.0022522|  88.2819301|          0.0000000|
|Parental_InvolvementMedium           |  0.9200661|  0.0683166|  13.4676882|          0.0000000|
|Parental_InvolvementHigh             |  1.9873553|  0.0754642|  26.3350667|          0.0000000|
|Access_to_ResourcesMedium            |  1.0567475|  0.0688563|  15.3471352|          0.0000000|
|Access_to_ResourcesHigh              |  2.0638510|  0.0752078|  27.4419711|          0.0000000|
|Extracurricular_ActivitiesYes        |  0.5592436|  0.0530058|  10.5506047|          0.0000000|
|Sleep_Hours                          | -0.0031099|  0.0177048|  -0.1756549|          0.8605707|
|Previous_Scores                      |  0.0490476|  0.0018078|  27.1303692|          0.0000000|
|Motivation_LevelMedium               |  0.5228284|  0.0603979|   8.6564051|          0.0000000|
|Motivation_LevelHigh                 |  1.0642365|  0.0753928|  14.1158855|          0.0000000|
|Internet_AccessYes                   |  0.9194475|  0.0980986|   9.3726882|          0.0000000|
|Family_IncomeMedium                  |  0.4937187|  0.0578762|   8.5306021|          0.0000000|
|Family_IncomeHigh                    |  1.0853227|  0.0719036|  15.0941294|          0.0000000|
|Teacher_QualityMedium                |  0.5083142|  0.0883149|   5.7557021|          0.0000000|
|Teacher_QualityHigh                  |  1.0633314|  0.0944731|  11.2553881|          0.0000000|
|School_TypePublic                    |  0.0338177|  0.0564628|   0.5989383|          0.5492354|
|Peer_InfluenceNeutral                |  0.5194792|  0.0705107|   7.3673823|          0.0000000|
|Peer_InfluencePositive               |  1.0235361|  0.0701717|  14.5861734|          0.0000000|
|Physical_Activity                    |  0.1884882|  0.0253228|   7.4434270|          0.0000000|
|Learning_DisabilitiesYes             | -0.8523793|  0.0848911| -10.0408598|          0.0000000|
|Parental_Education_LevelCollege      |  0.4843870|  0.0599099|   8.0852543|          0.0000000|
|Parental_Education_LevelPostgraduate |  0.9867580|  0.0687640|  14.3499174|          0.0000000|
|Distance_from_HomeModerate           |  0.3852309|  0.0948087|   4.0632437|          0.0000490|
|Distance_from_HomeNear               |  0.9075950|  0.0888929|  10.2099863|          0.0000000|
|GenderMale                           | -0.0433741|  0.0526039|  -0.8245404|          0.4096636|
|f_Tutoring_Sessions1                 |  0.5270752|  0.0706850|   7.4566783|          0.0000000|
|f_Tutoring_Sessions2                 |  1.0232187|  0.0752685|  13.5942458|          0.0000000|
|f_Tutoring_Sessions3                 |  1.4801647|  0.0913167|  16.2091266|          0.0000000|
|f_Tutoring_Sessions4+                |  2.2071292|  0.1145570|  19.2666507|          0.0000000|

```
                               GVIF Df GVIF^(1/(2*Df))
Hours_Studied              1.003022  1        1.001510
Attendance                 1.005648  1        1.002820
Parental_Involvement       1.009370  2        1.002334
Access_to_Resources        1.011683  2        1.002908
Extracurricular_Activities 1.004739  1        1.002367
Sleep_Hours                1.003864  1        1.001930
Previous_Scores            1.007149  1        1.003568
Motivation_Level           1.009013  2        1.002246
Internet_Access            1.004904  1        1.002449
Family_Income              1.009531  2        1.002374
Teacher_Quality            1.008066  2        1.002010
School_Type                1.004009  1        1.002002
Peer_Influence             1.009543  2        1.002377
Physical_Activity          1.008817  1        1.004399
Learning_Disabilities      1.004286  1        1.002141
Parental_Education_Level   1.008506  2        1.002120
Distance_from_Home         1.006296  2        1.001570
Gender                     1.002999  1        1.001499
f_Tutoring_Sessions        1.016697  4        1.002072
```

]

---

## Issues We See

- Normality assumption violated

- Many extreme values appear on residual plots

- Likely due to right skew of response variable

---

## Transformed Multiple Linear Regression Models

We create two candidate models with transformations. Both still show violations to our assumptions of Multiple Linear Regression.

.pull-left[

Box-Cox transformed model:

.scroll-100[
<img src="Final-Presentation_files/figure-html/unnamed-chunk-11-1.png" width="100%" />

Table: Transformed model

|                                     | Estimate| Std. Error|     t value| Pr(>&#124;t&#124;)|
|:------------------------------------|--------:|----------:|-----------:|------------------:|
|(Intercept)                          |        0|          0|  249.030649|                  0|
|Hours_Studied                        |        0|          0| -113.653120|                  0|
|Attendance                           |        0|          0| -146.728957|                  0|
|Parental_InvolvementMedium           |        0|          0|  -24.030604|                  0|
|Parental_InvolvementHigh             |        0|          0|  -43.951740|                  0|
|Access_to_ResourcesMedium            |        0|          0|  -25.129848|                  0|
|Access_to_ResourcesHigh              |        0|          0|  -43.924726|                  0|
|Extracurricular_ActivitiesYes        |        0|          0|  -16.694296|                  0|
|Previous_Scores                      |        0|          0|  -44.258904|                  0|
|Motivation_LevelMedium               |        0|          0|  -15.686697|                  0|
|Motivation_LevelHigh                 |        0|          0|  -23.884951|                  0|
|Internet_AccessYes                   |        0|          0|  -16.548903|                  0|
|f_Tutoring_Sessions1                 |        0|          0|  -13.639796|                  0|
|f_Tutoring_Sessions2                 |        0|          0|  -23.857457|                  0|
|f_Tutoring_Sessions3                 |        0|          0|  -28.491143|                  0|
|f_Tutoring_Sessions4+                |        0|          0|  -31.359251|                  0|
|Family_IncomeMedium                  |        0|          0|  -14.377178|                  0|
|Family_IncomeHigh                    |        0|          0|  -24.047458|                  0|
|Teacher_QualityMedium                |        0|          0|   -9.800508|                  0|
|Teacher_QualityHigh                  |        0|          0|  -18.542659|                  0|
|Peer_InfluenceNeutral                |        0|          0|  -12.547745|                  0|
|Peer_InfluencePositive               |        0|          0|  -23.929231|                  0|
|Physical_Activity                    |        0|          0|  -14.421992|                  0|
|Learning_DisabilitiesYes             |        0|          0|   18.885545|                  0|
|Parental_Education_LevelCollege      |        0|          0|  -14.437283|                  0|
|Parental_Education_LevelPostgraduate |        0|          0|  -25.070817|                  0|
|Distance_from_HomeModerate           |        0|          0|   -8.704588|                  0|
|Distance_from_HomeNear               |        0|          0|  -18.682458|                  0|

<img src="Final-Presentation_files/figure-html/unnamed-chunk-11-2.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-11-3.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-11-4.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-11-5.png" width="100%" />
]
]

.pull-right[

Log-transformed Model:

.scroll-100[

Table: Log model

|                                     |   Estimate| Std. Error|    t value| Pr(>&#124;t&#124;)|
|:------------------------------------|----------:|----------:|----------:|------------------:|
|(Intercept)                          |  3.7109083|  0.0041466| 894.937063|              0e+00|
|Hours_Studied                        |  0.0044048|  0.0000559|  78.792758|              0e+00|
|Attendance                           |  0.0029654|  0.0000290| 102.264739|              0e+00|
|Parental_InvolvementMedium           |  0.0139423|  0.0008798|  15.847852|              0e+00|
|Parental_InvolvementHigh             |  0.0296470|  0.0009718|  30.507395|              0e+00|
|Access_to_ResourcesMedium            |  0.0156402|  0.0008867|  17.637931|              0e+00|
|Access_to_ResourcesHigh              |  0.0304995|  0.0009684|  31.496171|              0e+00|
|Extracurricular_ActivitiesYes        |  0.0082189|  0.0006827|  12.038477|              0e+00|
|Previous_Scores                      |  0.0007273|  0.0000233|  31.242888|              0e+00|
|Motivation_LevelMedium               |  0.0079308|  0.0007779|  10.195408|              0e+00|
|Motivation_LevelHigh                 |  0.0159342|  0.0009710|  16.410800|              0e+00|
|Internet_AccessYes                   |  0.0139336|  0.0012633|  11.029447|              0e+00|
|f_Tutoring_Sessions1                 |  0.0079655|  0.0009103|   8.750876|              0e+00|
|f_Tutoring_Sessions2                 |  0.0153688|  0.0009694|  15.853939|              0e+00|
|f_Tutoring_Sessions3                 |  0.0223291|  0.0011760|  18.986792|              0e+00|
|f_Tutoring_Sessions4+                |  0.0327000|  0.0014755|  22.161389|              0e+00|
|Family_IncomeMedium                  |  0.0073716|  0.0007455|   9.888698|              0e+00|
|Family_IncomeHigh                    |  0.0159534|  0.0009260|  17.229035|              0e+00|
|Teacher_QualityMedium                |  0.0076180|  0.0011375|   6.697017|              0e+00|
|Teacher_QualityHigh                  |  0.0157941|  0.0012168|  12.980300|              0e+00|
|Peer_InfluenceNeutral                |  0.0077662|  0.0009080|   8.552951|              0e+00|
|Peer_InfluencePositive               |  0.0152199|  0.0009037|  16.841631|              0e+00|
|Physical_Activity                    |  0.0029033|  0.0003262|   8.901676|              0e+00|
|Learning_DisabilitiesYes             | -0.0129879|  0.0010931| -11.881636|              0e+00|
|Parental_Education_LevelCollege      |  0.0073585|  0.0007716|   9.536494|              0e+00|
|Parental_Education_LevelPostgraduate |  0.0148786|  0.0008855|  16.802960|              0e+00|
|Distance_from_HomeModerate           |  0.0060853|  0.0012211|   4.983268|              6e-07|
|Distance_from_HomeNear               |  0.0138094|  0.0011450|  12.060789|              0e+00|

<img src="Final-Presentation_files/figure-html/unnamed-chunk-12-1.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-12-2.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-12-3.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-12-4.png" width="100%" />
]
]

---

## Multiple Linear Regression: Goodness of Fit

Table: Goodness-of-fit Measures of Candidate Models

|                             |          SSE|      R.sq|     R.adj|         AIC|         BIC|
|:----------------------------|------------:|---------:|---------:|-----------:|-----------:|
|Full Model                   | 27237.233431| 0.7212230| 0.7199054|    9321.136|    9530.715|
|Transformed Model (Y^(-4.5)) |     0.000000| 0.8774757| 0.8769547| -272600.366| -272411.069|
|Log Model                    |     4.521172| 0.7760399| 0.7750876|  -46196.226|  -46006.929|

---

## Cross-validation and Model Selection

We perform a 5-fold Cross Validation to assess predictive performance of each model

.scroll-100[

Full model MSE the lowest, and recommended for practical prediction.

Test MSE:

```
[1] 8.301118
```

]

---

## Multiple Linear Regression Conclusions

- Multiple issues with violations to the assumptions of a multiple linear regression model were noted in all of the candidate models

- Transformed Model showed best goodness of fit

- Full model recommended for practical prediction

- All should be used with caution, due to the nonnormality of the response

---
class: inverse center middle

# Logistic Regression

---

## Creating Candidate Models

Lack of normality in response could mean that logistic regression will be the better performing technique in the creation of a model to make predictions on the data.

Full Model Parameter Estimates:

.scroll-100[

Table: Significance tests of logistic regression model

|                                     |    Estimate| Std. Error|     z value| Pr(>&#124;z&#124;)|
|:------------------------------------|-----------:|----------:|-----------:|------------------:|
|(Intercept)                          | -82.0749118|  2.9498382| -27.8235299|          0.0000000|
|Hours_Studied                        |   0.6880979|  0.0260858|  26.3783051|          0.0000000|
|Attendance                           |   0.4521552|  0.0164134|  27.5479637|          0.0000000|
|Parental_InvolvementMedium           |   2.2732254|  0.1945465|  11.6847392|          0.0000000|
|Parental_InvolvementHigh             |   4.4105088|  0.2437843|  18.0918517|          0.0000000|
|Access_to_ResourcesMedium            |   2.1525976|  0.1945273|  11.0657875|          0.0000000|
|Access_to_ResourcesHigh              |   4.5817352|  0.2429337|  18.8600209|          0.0000000|
|Extracurricular_ActivitiesYes        |   1.1554277|  0.1373981|   8.4093410|          0.0000000|
|Sleep_Hours                          |   0.0220834|  0.0435265|   0.5073548|          0.6119059|
|Previous_Scores                      |   0.1106114|  0.0058687|  18.8477807|          0.0000000|
|Motivation_LevelMedium               |   1.1026330|  0.1557666|   7.0787521|          0.0000000|
|Motivation_LevelHigh                 |   2.5179473|  0.2062185|  12.2100955|          0.0000000|
|Internet_AccessYes                   |   2.2717152|  0.2692970|   8.4357249|          0.0000000|
|Family_IncomeMedium                  |   1.2459385|  0.1508331|   8.2603788|          0.0000000|
|Family_IncomeHigh                    |   2.1629824|  0.1871887|  11.5550910|          0.0000000|
|Teacher_QualityMedium                |   1.5088299|  0.2398871|   6.2897490|          0.0000000|
|Teacher_QualityHigh                  |   2.6590168|  0.2612484|  10.1781158|          0.0000000|
|School_TypePublic                    |   0.1419680|  0.1404448|   1.0108456|          0.3120904|
|Peer_InfluenceNeutral                |   1.1422249|  0.1863663|   6.1289234|          0.0000000|
|Peer_InfluencePositive               |   2.3561428|  0.1947175|  12.1003121|          0.0000000|
|Physical_Activity                    |   0.4936347|  0.0656610|   7.5179256|          0.0000000|
|Learning_DisabilitiesYes             |  -1.9080873|  0.2433246|  -7.8417356|          0.0000000|
|Parental_Education_LevelCollege      |   1.2731081|  0.1538173|   8.2767550|          0.0000000|
|Parental_Education_LevelPostgraduate |   2.3466344|  0.1810533|  12.9610134|          0.0000000|
|Distance_from_HomeModerate           |   0.9585171|  0.2529030|   3.7900581|          0.0001506|
|Distance_from_HomeNear               |   2.2438366|  0.2471695|   9.0781275|          0.0000000|
|GenderMale                           |  -0.0608662|  0.1290146|  -0.4717779|          0.6370853|
|f_Tutoring_Sessions1                 |   1.1382940|  0.1866240|   6.0993992|          0.0000000|
|f_Tutoring_Sessions2                 |   2.3253771|  0.2012471|  11.5548357|          0.0000000|
|f_Tutoring_Sessions3                 |   3.1782923|  0.2430255|  13.0780179|          0.0000000|
|f_Tutoring_Sessions4+                |   4.9471930|  0.3104012|  15.9380622|          0.0000000|
]

---

## Reduced and Stepwise Models

.pull-left[
Reduced Model Parameter Estimates:

.scroll-100[
We make a reduced model starting with Tutoring Sessions and Learning Disabilities, which were the two categorical predictors we identified as having clear trends in the earlier mosaic plots.

Table: Summary table of Reduced Model

|                         |   Estimate| Std. Error|    z value| Pr(>&#124;z&#124;)|
|:------------------------|----------:|----------:|----------:|------------------:|
|(Intercept)              | -1.3949859|  0.0674027| -20.696290|          0.0000000|
|f_Tutoring_Sessions1     |  0.1809256|  0.0849366|   2.130124|          0.0331614|
|f_Tutoring_Sessions2     |  0.4529969|  0.0875031|   5.176925|          0.0000002|
|f_Tutoring_Sessions3     |  0.6143210|  0.1019523|   6.025572|          0.0000000|
|f_Tutoring_Sessions4+    |  0.9703233|  0.1206532|   8.042252|          0.0000000|
|Learning_DisabilitiesYes | -0.5014822|  0.1070342|  -4.685253|          0.0000028|

]
]

.pull-right[
.scroll-100[

Stepwise Model Parameter Estimates:

Table: Summary table of Stepwise Model

|                                     |    Estimate| Std. Error|    z value| Pr(>&#124;z&#124;)|
|:------------------------------------|-----------:|----------:|----------:|------------------:|
|(Intercept)                          | -81.8608385|  2.9340460| -27.900326|          0.0000000|
|Hours_Studied                        |   0.6884720|  0.0260886|  26.389720|          0.0000000|
|Attendance                           |   0.4518533|  0.0164115|  27.532678|          0.0000000|
|Parental_InvolvementMedium           |   2.2782241|  0.1946408|  11.704758|          0.0000000|
|Parental_InvolvementHigh             |   4.4153203|  0.2439017|  18.102867|          0.0000000|
|Access_to_ResourcesMedium            |   2.1525865|  0.1942602|  11.080947|          0.0000000|
|Access_to_ResourcesHigh              |   4.5812743|  0.2428184|  18.867079|          0.0000000|
|Extracurricular_ActivitiesYes        |   1.1643146|  0.1371816|   8.487396|          0.0000000|
|Previous_Scores                      |   0.1106960|  0.0058645|  18.875684|          0.0000000|
|Motivation_LevelMedium               |   1.1026390|  0.1556496|   7.084112|          0.0000000|
|Motivation_LevelHigh                 |   2.5159070|  0.2057967|  12.225208|          0.0000000|
|Internet_AccessYes                   |   2.2691498|  0.2692556|   8.427492|          0.0000000|
|Family_IncomeMedium                  |   1.2491640|  0.1507814|   8.284602|          0.0000000|
|Family_IncomeHigh                    |   2.1688944|  0.1870020|  11.598239|          0.0000000|
|Teacher_QualityMedium                |   1.5181398|  0.2394651|   6.339713|          0.0000000|
|Teacher_QualityHigh                  |   2.6640905|  0.2609433|  10.209461|          0.0000000|
|Peer_InfluenceNeutral                |   1.1463525|  0.1861646|   6.157735|          0.0000000|
|Peer_InfluencePositive               |   2.3583937|  0.1946989|  12.113030|          0.0000000|
|Physical_Activity                    |   0.4935082|  0.0655051|   7.533885|          0.0000000|
|Learning_DisabilitiesYes             |  -1.8861506|  0.2422159|  -7.787064|          0.0000000|
|Parental_Education_LevelCollege      |   1.2793946|  0.1535546|   8.331852|          0.0000000|
|Parental_Education_LevelPostgraduate |   2.3426961|  0.1805602|  12.974597|          0.0000000|
|Distance_from_HomeModerate           |   0.9538673|  0.2528677|   3.772200|          0.0001618|
|Distance_from_HomeNear               |   2.2399426|  0.2471267|   9.063945|          0.0000000|
|f_Tutoring_Sessions1                 |   1.1424932|  0.1863416|   6.131175|          0.0000000|
|f_Tutoring_Sessions2                 |   2.3250215|  0.2011508|  11.558601|          0.0000000|
|f_Tutoring_Sessions3                 |   3.1771363|  0.2427745|  13.086780|          0.0000000|
|f_Tutoring_Sessions4+                |   4.9440453|  0.3098494|  15.956287|          0.0000000|

The stepwise automatic variable selection process removed the predictors of Sleep_Hours, School_Type, and Gender.

]
]

---

## Goodness of Fit

The goodness-of-fit measures for the models are shown below.

Table: Comparison of global goodness-of-fit statistics

|              | Deviance.residual| Null.Deviance.Residual|      AIC|
|:-------------|-----------------:|----------------------:|--------:|
|Full Model    |          1645.635|               7143.332| 1707.635|
|Reduced Model |          7030.475|               7143.332| 7042.475|
|Final Model   |          1647.127|               7143.332| 1703.127|

- When a chi-squared test is performed, all perform significantly better than just the intercept
- Combined with the lower AIC score, we chose the stepwise model as the better performing model based on goodness of fit statistics.

---

## Cross-Validation and Model Selection

We perform 5-fold Cross Validation and look at ROC curves and AUC for each model, as well as assess its performance using a randomly selected holdout test dataset with 20% of the data.

.scroll-100[
<img src="Final-Presentation_files/figure-html/unnamed-chunk-20-1.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-20-2.png" width="100%" /><img src="Final-Presentation_files/figure-html/unnamed-chunk-20-3.png" width="100%" />

Table: Summary statistics of AUC for candidate models in 5-fold CV

|               |   Min.| 1st Qu.| Median|    Mean| 3rd Qu.|   Max.|
|:--------------|------:|-------:|------:|-------:|-------:|------:|
|Full Model     | 0.9748|  0.9772| 0.9850| 0.98286|  0.9877| 0.9896|
|Reduced Model  | 0.5637|  0.5732| 0.5866| 0.58622|  0.5977| 0.6099|
|Stepwise Model | 0.9749|  0.9773| 0.9850| 0.98288|  0.9877| 0.9895|

Test Data AUC:

Table: Test AUC for candidate models

| Full Model| Reduced Model| Stepwise Model|
|----------:|-------------:|--------------:|
|     0.9919|        0.5753|         0.9919|

The stepwise and full models perform very similarly, so the stepwise model is recommended on principles of parsimony

]

---

# Optimal Cut-off Probability

## Optimal Cut-off Probability

We will find the optimal cut-off probability of the stepwise model using the ROC curve constructed earlier.

---

## Logistic Regression Conclusions:

- Stepwise model recommended based on both goodness of fit and predictive performance

- Fewer issues with violation to regression assumptions due to binary nature of response

Summary of parameter estimates with odds ratios shown below:

- Learning Disability only parameter negatively associated

- Notably high odds ratios for Tutoring Sessions, Parental Involvement, and a high access to resources

.scroll-100[

Table: Summary Stats with Odds Ratios

|                                     |    Estimate| Std. Error|    z value| Pr(>&#124;z&#124;)|  odds.ratio|
|:------------------------------------|-----------:|----------:|----------:|------------------:|-----------:|
|(Intercept)                          | -81.8608385|  2.9340460| -27.900326|          0.0000000|   0.0000000|
|Hours_Studied                        |   0.6884720|  0.0260886|  26.389720|          0.0000000|   1.9906714|
|Attendance                           |   0.4518533|  0.0164115|  27.532678|          0.0000000|   1.5712215|
|Parental_InvolvementMedium           |   2.2782241|  0.1946408|  11.704758|          0.0000000|   9.7593331|
|Parental_InvolvementHigh             |   4.4153203|  0.2439017|  18.102867|          0.0000000|  82.7083314|
|Access_to_ResourcesMedium            |   2.1525865|  0.1942602|  11.080947|          0.0000000|   8.6070915|
|Access_to_ResourcesHigh              |   4.5812743|  0.2428184|  18.867079|          0.0000000|  97.6387347|
|Extracurricular_ActivitiesYes        |   1.1643146|  0.1371816|   8.487396|          0.0000000|   3.2037264|
|Previous_Scores                      |   0.1106960|  0.0058645|  18.875684|          0.0000000|   1.1170553|
|Motivation_LevelMedium               |   1.1026390|  0.1556496|   7.084112|          0.0000000|   3.0121044|
|Motivation_LevelHigh                 |   2.5159070|  0.2057967|  12.225208|          0.0000000|  12.3778309|
|Internet_AccessYes                   |   2.2691498|  0.2692556|   8.427492|          0.0000000|   9.6711748|
|Family_IncomeMedium                  |   1.2491640|  0.1507814|   8.284602|          0.0000000|   3.4874264|
|Family_IncomeHigh                    |   2.1688944|  0.1870020|  11.598239|          0.0000000|   8.7486062|
|Teacher_QualityMedium                |   1.5181398|  0.2394651|   6.339713|          0.0000000|   4.5637279|
|Teacher_QualityHigh                  |   2.6640905|  0.2609433|  10.209461|          0.0000000|  14.3548885|
|Peer_InfluenceNeutral                |   1.1463525|  0.1861646|   6.157735|          0.0000000|   3.1466942|
|Peer_InfluencePositive               |   2.3583937|  0.1946989|  12.113030|          0.0000000|  10.5739530|
|Physical_Activity                    |   0.4935082|  0.0655051|   7.533885|          0.0000000|   1.6380527|
|Learning_DisabilitiesYes             |  -1.8861506|  0.2422159|  -7.787064|          0.0000000|   0.1516545|
|Parental_Education_LevelCollege      |   1.2793946|  0.1535546|   8.331852|          0.0000000|   3.5944628|
|Parental_Education_LevelPostgraduate |   2.3426961|  0.1805602|  12.974597|          0.0000000|  10.4092636|
|Distance_from_HomeModerate           |   0.9538673|  0.2528677|   3.772200|          0.0001618|   2.5957287|
|Distance_from_HomeNear               |   2.2399426|  0.2471267|   9.063945|          0.0000000|   9.3927924|
|f_Tutoring_Sessions1                 |   1.1424932|  0.1863416|   6.131175|          0.0000000|   3.1345738|
|f_Tutoring_Sessions2                 |   2.3250215|  0.2011508|  11.558601|          0.0000000|  10.2269004|
|f_Tutoring_Sessions3                 |   3.1771363|  0.2427745|  13.086780|          0.0000000|  23.9779886|
|f_Tutoring_Sessions4+                |   4.9440453|  0.3098494|  15.956287|          0.0000000| 140.3368014|

]

---
class: inverse center middle

# Summary and Discussion

---

# Comparison of Techniques

- Both linear and logistic regression models found similar factors insignificant (Gender, School Type, and Sleep Hours)

- Both linear and logistic regression found having 4+ tutoring sessions in a month, a high amount of access to resources, a high amount of parental involvement highly significant

- Only predictor with a negative association was the presence of a learning disability for both models

- Multiple linear regression was severely limited by the nonnormality of the response variable, recommend bootstrapping, subgroup analysis

- Logistic regression limited by binary response, but performed well

- Logistic regression recommended for analysis of this dataset

---
# References and Appendix

(1) https://www.kaggle.com/datasets/lainguyn123/student-performance-factors

(2) https://www.bestcolleges.com/blog/passing-grade-college/

(3) https://www.registrar.psu.edu/grades/grading-system.cfm

(4) https://www.statology.org/null-residual-deviance/