What’s the problem?

AI tools have become increasing prevalent in many professional industries as well as in schools. 50,000 students were tracked across a semester to monitor their academic profiles and AI use behaviors.

My research question is if increased generative AI use leads to higher levels of academic burnout and lower levels of skill retention among students.

The complete data set and information about its contents can be found here: https://www.kaggle.com/datasets/laveshjadon/ai-impact-on-students

Accessing the Data

We first need to read out data from the csv file. The original dataset has 16 variables. None of the data points are missing.

student_data <- read.csv("AI_Student_Impact.csv")

I will subset the original dataset to only work with relevant factors. We only select the relevant 8.

relevant_student_data <- student_data[, c(
"Pre_Semester_GPA", "Weekly_GenAI_Hours", 
"Paid_Subscription", "Perceived_AI_Dependency",
"Anxiety_Level_During_Exams", "Post_Semester_GPA",
"Skill_Retention_Score", "Burnout_Risk_Level")]

Data Cleaning

str(relevant_student_data)
## 'data.frame':    50000 obs. of  8 variables:
##  $ Pre_Semester_GPA          : num  2.42 3.82 3.4 3.79 3.63 ...
##  $ Weekly_GenAI_Hours        : num  23.31 1.12 21.26 1.82 9.29 ...
##  $ Paid_Subscription         : chr  "True" "False" "False" "False" ...
##  $ Perceived_AI_Dependency   : int  5 3 5 2 4 4 8 2 1 3 ...
##  $ Anxiety_Level_During_Exams: int  6 9 9 2 4 5 7 1 5 8 ...
##  $ Post_Semester_GPA         : num  2.39 3.7 3.5 4 3.8 ...
##  $ Skill_Retention_Score     : num  86.4 69.4 73.9 63.6 100 ...
##  $ Burnout_Risk_Level        : chr  "High" "Low" "Medium" "Medium" ...

Both the Paid Subscription variable and Burnout Risk Level variable are character strings. In order to perform statistical analysis and visualization, I will convert them into factors.

Data Cleaning

relevant_student_data$Paid_Subscription <- factor(
  relevant_student_data$Paid_Subscription)
relevant_student_data$Burnout_Risk_Level <- factor(
  relevant_student_data$Burnout_Risk_Level, 
  levels = c("Low", "Medium", "High"), ordered = T)

We can check if the Paid Subscription and Burnout Risk Level variables changed to factors.

## 'data.frame':    50000 obs. of  8 variables:
##  $ Pre_Semester_GPA          : num  2.42 3.82 3.4 3.79 3.63 ...
##  $ Weekly_GenAI_Hours        : num  23.31 1.12 21.26 1.82 9.29 ...
##  $ Paid_Subscription         : Factor w/ 2 levels "False","True": 2 1 1 1 1 1 2 1 2 2 ...
##  $ Perceived_AI_Dependency   : int  5 3 5 2 4 4 8 2 1 3 ...
##  $ Anxiety_Level_During_Exams: int  6 9 9 2 4 5 7 1 5 8 ...
##  $ Post_Semester_GPA         : num  2.39 3.7 3.5 4 3.8 ...
##  $ Skill_Retention_Score     : num  86.4 69.4 73.9 63.6 100 ...
##  $ Burnout_Risk_Level        : Ord.factor w/ 3 levels "Low"<"Medium"<..: 3 1 2 2 2 3 2 2 2 3 ...

Basic Information about Data

## [1] 50000     8
## [1] "Pre_Semester_GPA"           "Weekly_GenAI_Hours"        
## [3] "Paid_Subscription"          "Perceived_AI_Dependency"   
## [5] "Anxiety_Level_During_Exams" "Post_Semester_GPA"         
## [7] "Skill_Retention_Score"      "Burnout_Risk_Level"

We have 50,000 data points with 8 variables to analyze. The variables are listed above.

Paid Subscription and Burnout Visualization

Correlation Test Between AI Hours and Skill Retention

I would like to know if there is any relationship between weekly AI use and Skill Retention in students.

## 
##  Pearson's product-moment correlation
## 
## data:  relevant_student_data$Weekly_GenAI_Hours and relevant_student_data$Skill_Retention_Score
## t = -26.593, df = 49998, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1267331 -0.1094470
## sample estimates:
##       cor 
## -0.118099

Correlation Test Between AI Hours and Skill Retention

Correlation Coefficient: -0.118

p-value = 2.2 x 10^-16

The data suggests that there is statistically significant but weak negative correlation between Weekly AI Hours and Skill Retention.

Visualizing AI Use vs Skill Retention

I want to take a look at AI Use vs Skill Retention. To do that, I will make a ggplot with all points plotted and a linear model line.

ggplot(relevant_student_data, aes(x = Weekly_GenAI_Hours, 
y = Skill_Retention_Score)) + geom_point(alpha = .3, color = "steelblue") + 
geom_smooth(method = "lm", color = "red") + 
labs(title = "Weekly AI Use vs Skill Retention", 
x = "Weekly AI Hours", y = "Skill Retention Score") + theme_minimal()

Visualizing AI Use vs Skill Retention

There is a slightly negative trend between weekly AI hours and skill retention. Most data points fall between 0 and 10 hours of weekly AI use.

Relating Student Burnout

I would also like to look at how a student’s burnout level may relate to their AI usage and skill retention. I will make a colored plot with burnout level as the color of the data points.

plot_ly(relevant_student_data, 
x =~ Weekly_GenAI_Hours, 
y =~ Skill_Retention_Score,
color =~ Burnout_Risk_Level,
type = "scatter",
mode = "markers") %>% plotly :: layout(
title = "Interactive AI Usage vs Skill Retention 
with colored Burnout Risk",
xaxis = list(title = "Weekly AI Hours"), 
yaxis = list(title = "Skill Retention Score"))

Relating Student Burnout

There seems to be a trend of higher levels of burnout with more weekly AI use.

Creating Change in GPA

I also want to take a look at the change in GPA as it relates to amount of weekly AI use. I will do this by making an animation which maps the change in GPA as AI use per week increases. For this, we first need to add the change in GPA to the dataframe.

relevant_student_data$GPA_Change <-
  relevant_student_data$Post_Semester_GPA -
  relevant_student_data$Pre_Semester_GPA
head(relevant_student_data$GPA_Change)
## [1] -0.025 -0.125  0.101  0.211  0.163  0.217

Visualizing Change in GPA

Conclusion

Research Question: Does increased Generative AI use lead to higher burnout and lower skill retention?

Findings

  • Statistically significant weak negative correlation between weekly AI use and skill retention.

  • Burnout appears consistent across all levels of AI use. Higher levels of burnout reported with higher levels of AI use.

  • GPA change seems more variable as AI use increases, but there is no clear trend.