Introduction

This study investigates factors influencing academic performance among university students using statistical analysis in R.

Research Objective

To determine whether study hours, attendance rate, sleep duration, and gender significantly predict GPA among university students.

Import Data

##   Student_ID Gender       Department Study_Hours_Per_Day Sleep_Hours_Per_Day
## 1          1   Male Computer Science                 5.2                 7.7
## 2          2   Male         Business                 4.3                 5.7
## 3          3   Male         Business                 5.5                 8.5
## 4          4 Female    Public Health                 6.8                 5.1
## 5          5 Female    Public Health                 4.1                 7.5
## 6          6   Male    Public Health                 4.1                 9.4
##   Attendance_Rate  GPA
## 1              74 1.50
## 2              81 1.62
## 3              87 1.61
## 4              91 2.03
## 5              70 1.64
## 6              79 1.50

Data Cleaning

##          Student_ID              Gender          Department Study_Hours_Per_Day 
##                   0                   0                   0                   0 
## Sleep_Hours_Per_Day     Attendance_Rate                 GPA 
##                   0                   0                   0
## [1] 0
## 'data.frame':    120 obs. of  7 variables:
##  $ Student_ID         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Gender             : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 2 1 2 2 ...
##  $ Department         : Factor w/ 4 levels "Business","Computer Science",..: 2 1 1 3 3 3 2 3 1 1 ...
##  $ Study_Hours_Per_Day: num  5.2 4.3 5.5 6.8 4.1 4.1 6.9 5.7 3.8 5.3 ...
##  $ Sleep_Hours_Per_Day: num  7.7 5.7 8.5 5.1 7.5 9.4 5.6 6.1 6.9 6.2 ...
##  $ Attendance_Rate    : int  74 81 87 91 70 79 77 75 100 86 ...
##  $ GPA                : num  1.5 1.62 1.61 2.03 1.64 1.5 1.78 1.57 1.69 1.92 ...

Descriptive statistics

##                     vars   n  mean    sd median trimmed   mad  min    max
## Student_ID             1 120 60.50 34.79  60.50   60.50 44.48  1.0 120.00
## Gender*                2 120  1.51  0.50   2.00    1.51  0.00  1.0   2.00
## Department*            3 120  2.38  1.03   2.00    2.35  1.48  1.0   4.00
## Study_Hours_Per_Day    4 120  4.38  1.38   4.40    4.38  1.04  1.0   8.20
## Sleep_Hours_Per_Day    5 120  6.88  1.18   7.00    6.84  1.19  4.4  10.00
## Attendance_Rate        6 120 82.03  8.91  82.00   82.03  8.90 50.0 100.00
## GPA                    7 120  1.67  0.17   1.62    1.64  0.19  1.5   2.23
##                      range  skew kurtosis   se
## Student_ID          119.00  0.00    -1.23 3.18
## Gender*               1.00 -0.03    -2.02 0.05
## Department*           3.00  0.11    -1.16 0.09
## Study_Hours_Per_Day   7.20  0.02    -0.07 0.13
## Sleep_Hours_Per_Day   5.60  0.25    -0.22 0.11
## Attendance_Rate      50.00 -0.20     0.31 0.81
## GPA                   0.73  1.02     0.47 0.02

The average GPA among students was approximately 1.67, while average study hours were 4.38 hours per day.

GPA Distribution

The GPA distribution appeared positively skewed, with most students recording GPA values below 1.6.

Study Hours vs GPA

Correlation analysis

## 
##  Pearson's product-moment correlation
## 
## data:  df$Study_Hours_Per_Day and df$GPA
## t = 8.4886, df = 118, p-value = 7.192e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4906475 0.7159618
## sample estimates:
##       cor 
## 0.6157381

A strong positive statistically significant relationship was observed between study hours and GPA.

Multiple Regression Analysis

## 
## Call:
## lm(formula = GPA ~ Study_Hours_Per_Day + Sleep_Hours_Per_Day + 
##     Attendance_Rate + Gender, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24030 -0.09445 -0.01424  0.07814  0.50834 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         0.930908   0.136063   6.842 4.02e-10 ***
## Study_Hours_Per_Day 0.080158   0.009087   8.821 1.43e-14 ***
## Sleep_Hours_Per_Day 0.006283   0.010752   0.584  0.56014    
## Attendance_Rate     0.004121   0.001404   2.934  0.00404 ** 
## GenderMale          0.003324   0.025048   0.133  0.89467    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1345 on 115 degrees of freedom
## Multiple R-squared:  0.4272, Adjusted R-squared:  0.4073 
## F-statistic: 21.44 on 4 and 115 DF,  p-value: 3.106e-13

Regression Diagnostics

Limitations of this study

  1. Limited Number of Predictors

The study examined only a small set of variables, namely study hours, sleep hours, attendance rate, gender, and department. Academic performance is a complex phenomenon influenced by numerous additional factors such as socioeconomic status, prior academic achievement, motivation, learning styles, mental health, access to learning resources, family support, and teaching quality. The omission of these factors may limit the model’s explanatory power.

  1. Moderate Explanatory Power of the Model

The multiple regression model explained approximately 42.7% of the variation in GPA. While this indicates a meaningful relationship between the predictors and academic performance, more than half of the variation remains unexplained. This suggests that other influential factors not included in the model may contribute to students’ academic outcomes.

  1. Cross-Sectional Nature of the Data

The dataset represents observations collected at a single point in time. Consequently, the study can identify associations between variables but cannot establish causal relationships. For example, although higher study hours are associated with higher GPA, the analysis cannot conclusively determine whether increased studying directly causes improved academic performance.

  1. Potential Self-Reporting Bias

Variables such as study hours and sleep hours are often self-reported by participants. Self-reported data may be subject to recall errors, exaggeration, or underreporting, which could affect the accuracy of the findings.

  1. Limited Generalizability

The findings are based on a specific sample of MSc students. Therefore, the results may not be generalizable to undergraduate students, students from other institutions, or learners in different educational and cultural contexts.

  1. Linear Regression Assumptions

Although diagnostic tests indicated that the assumptions of linear regression were reasonably satisfied, minor patterns observed in the residual plots suggest that the relationships between variables may not be perfectly linear. More advanced modeling techniques could potentially capture additional variation in academic performance.

  1. Sample Size Constraints

If the dataset contains a relatively small number of observations, the statistical power of the analysis may be limited. A larger sample size could improve the reliability and stability of the estimated relationships.

  1. Measurement of Academic Performance

Academic performance was measured solely using GPA. While GPA is a widely accepted indicator of academic success, it may not fully capture other dimensions of student achievement such as research productivity, practical skills, critical thinking abilities, or professional competencies.

Recommendations

Based on the findings of this study, several recommendations can be made to enhance academic performance among university students.

  1. Promote Structured Study Habits

Study hours emerged as the strongest predictor of GPA, indicating that students who dedicate more time to academic activities are more likely to achieve better academic outcomes. Universities should therefore encourage students to adopt structured study schedules through academic advising, study skills workshops, and time management training programs. Such interventions may help students develop consistent learning habits and improve academic performance.

  1. Strengthen Class Attendance Policies

Attendance rate was found to be a statistically significant predictor of GPA. This suggests that regular participation in lectures and academic activities contributes positively to student success. Institutions should consider implementing attendance monitoring systems, increasing engagement during lectures, and providing early interventions for students with poor attendance records. Faculty members can also adopt interactive teaching strategies that encourage consistent participation.

  1. Establish Academic Support Programs

Given that the regression model explained approximately 42.7% of the variation in GPA, a substantial proportion of academic performance remains influenced by factors not included in this study. Universities should strengthen tutoring services, peer mentorship programs, academic counseling, and learning support centers to address diverse student needs and promote academic success.

  1. Develop Early Warning Systems

Academic institutions should consider using attendance records and study engagement indicators to identify students who may be at risk of poor academic performance. Early identification would allow targeted interventions before academic difficulties become severe.

  1. Encourage Evidence-Based Student Development Programs

While sleep duration was not a statistically significant predictor in the final model, student wellbeing remains an important component of academic success. Universities should continue promoting healthy lifestyles, stress management, and mental wellness initiatives as part of a holistic approach to student development.

  1. Recommendations for Future Research

Future studies should investigate additional factors that may influence academic performance, including:

Including these variables may improve the explanatory power of future predictive models and provide a more comprehensive understanding of academic performance.

Conclusion

This study examined the factors influencing academic performance among university students using descriptive statistics, data visualization, correlation analysis, and multiple linear regression techniques in R.

The findings revealed a strong positive and statistically significant relationship between study hours and GPA. Multiple regression analysis further demonstrated that study hours and attendance rate were significant predictors of academic performance, while sleep duration and gender did not significantly predict GPA after controlling for other variables. Study hours emerged as the most influential predictor, highlighting the critical role of academic engagement and independent learning in student success.

The regression model explained approximately 42.7% of the variation in GPA, indicating that the included variables provide meaningful insight into academic performance while also suggesting that other factors contribute to student outcomes. Diagnostic tests indicated that the model met key regression assumptions reasonably well, supporting the reliability of the findings.

Overall, the results suggest that interventions aimed at improving study habits and increasing class attendance may contribute to enhanced academic achievement. These findings provide valuable evidence for educators, university administrators, and policymakers seeking to design strategies that support student success and improve educational outcomes.

The study demonstrates how data analytics and statistical modeling can be used to transform raw educational data into actionable insights that inform decision-making and evidence-based academic interventions.