Thesis: AI Use in Higher Education

Research Question

  Is the use of Artificial Intelligence Affecting Learning tendencies of Higher Education Students?

Hypothesis

  - Use of Artificial Intelligence in higher education is associated with declining active learning behaviors among students.
  - Use of Artificial Intelligence in higher education is associated with passive learning tendencies among students.
  - Use of Artificial Intelligence in higher education is associated with laziness among students.

Dependent Variables

 Active Learning: activeSum (AP1, AP2, AP3, AP4, AP5, AP6, AP7, AP8, AP10)
 Passive Learning: AP9
 Laziness: lazinessSum (L1, L2, L3, L4, L5, L6)

Independent Variables

AI_Use, Degree, AI_tool

Libraries

Importing Data

data <- read.csv("C:/Users/billy/OneDrive/Documents/ANLY 699/AIdata.csv")
str(data)
## 'data.frame':    64 obs. of  20 variables:
##  $ Timestamp                                                                                                            : chr  "2024/04/16 9:25:37 PM AST" "2024/04/16 10:53:10 PM AST" "2024/04/19 1:57:47 PM AST" "2024/04/20 2:59:21 PM AST" ...
##  $ I.am.a.student.pursuing.a.n..                                                                                        : chr  "Undergraduate Degree" "Undergraduate Degree" "Graduate Degree" "Graduate Degree" ...
##  $ I.frequently.use.AI.based.tools.or.systems.in.my.coursework.                                                         : int  3 3 3 4 5 2 1 1 5 3 ...
##  $ I.commonly.use.the.following.types.of.AI.tools.in.my.studies.                                                        : chr  "Grammarly" "Photoshop ai generator, ChatGPT " "ChatGPT" "Chatgpt, QuilBot" ...
##  $ Made.a.class.or.online.presentation...                                                                               : int  2 1 2 4 5 1 2 5 1 1 ...
##  $ Participated.in.a.community.based.project..e.g...volunteering..as.part.of.your.study.                                : int  1 1 3 1 1 1 5 5 2 2 ...
##  $ Discussed.ideas.from.your.readings.or.classes.with.others.outside.class..e.g...students..family.members..co.workers..: int  3 3 5 4 4 1 5 5 4 3 ...
##  $ Tutored.or.taught.other.university.students..paid.or.voluntary..                                                     : int  1 1 1 1 4 1 1 5 2 1 ...
##  $ I.communicated.or.worked.online.with.other.students.rather.than.use.AI.                                              : int  4 5 2 3 4 5 5 5 2 5 ...
##  $ Asked.questions.or.contributed.to.discussions.in.class.or.online.without.AI.assistance                               : int  5 5 4 4 5 5 5 5 1 5 ...
##  $ Instead.of.using.AI.tools..I.searched.online.for.resources.relevant.to.my.studies.                                   : int  5 4 5 3 4 3 5 5 1 4 ...
##  $ I.persisted.with.challenging.learning.activities.despite.initial.setbacks.without.using.AI.                          : int  3 5 3 2 3 3 5 5 4 3 ...
##  $ I.feel.less.engaged.with.the.course.material.when.I.use.AI.                                                          : int  4 2 4 5 5 1 1 5 1 2 ...
##  $ I.tend.to.come.up.with.ideas.independently.prior.to.using.AI.                                                        : int  2 5 4 3 2 5 3 1 5 4 ...
##  $ Do.you.feel.a.lack.of.ability.contributes.to.your.tendency.to.delay.tasks..                                          : int  3 4 2 2 2 1 5 5 4 5 ...
##  $ How.much.does.a.lack.of.interest.or.enthusiasm.impact.your.motivation.to.complete.tasks.promptly..                   : int  5 2 4 5 5 5 4 5 5 4 ...
##  $ To.what.degree.do.you.find.yourself.intentionally.delaying.tasks.without.any.external.pressure.or.influence..        : int  4 2 4 4 4 3 2 5 5 5 ...
##  $ How.often.does.AI.use.contribute.to.your.tendency.to.procrastinate.on.tasks..                                        : int  4 2 3 4 3 1 5 3 4 3 ...
##  $ I.feel.motivated.to.complete.my.school.work.without.using.AI.tools..                                                 : int  4 4 3 3 4 5 5 1 1 2 ...
##  $ The.availability.of.AI.increases.my.laziness.tendencies.                                                             : int  3 2 4 5 2 1 1 5 3 4 ...

Data Handling

data <- data[,-1]
colnames(data) <- c("Degree", "AI_use", "AI_tool", "AP1", "AP2", "AP3", "AP4", "AP5", "AP6", "AP7", "AP8", "AP9", "AP10", "L1", "L2", "L3", "L4", "L5", "L6")


exclusionCriteria <- data$AI_tool %in% c("almost never", "google scholar", "Google Scholar") 
data <- data[!exclusionCriteria, ]
summary(data)
##     Degree              AI_use        AI_tool               AP1       
##  Length:63          Min.   :1.000   Length:63          Min.   :1.000  
##  Class :character   1st Qu.:3.000   Class :character   1st Qu.:1.000  
##  Mode  :character   Median :4.000   Mode  :character   Median :2.000  
##                     Mean   :3.667                      Mean   :2.619  
##                     3rd Qu.:5.000                      3rd Qu.:4.000  
##                     Max.   :5.000                      Max.   :5.000  
##       AP2             AP3             AP4             AP5       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2.000  
##  1st Qu.:2.500   1st Qu.:4.000   1st Qu.:1.000   1st Qu.:3.000  
##  Median :4.000   Median :4.000   Median :1.000   Median :4.000  
##  Mean   :3.651   Mean   :4.127   Mean   :2.016   Mean   :4.063  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##       AP6             AP7             AP8             AP9       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:4.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :5.000   Median :4.000   Median :4.000   Median :4.000  
##  Mean   :4.286   Mean   :3.778   Mean   :3.825   Mean   :3.286  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##       AP10             L1              L2              L3       
##  Min.   :1.000   Min.   :1.000   Min.   :2.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:3.000   1st Qu.:4.000   1st Qu.:2.000  
##  Median :4.000   Median :4.000   Median :5.000   Median :3.000  
##  Mean   :3.444   Mean   :3.984   Mean   :4.317   Mean   :3.317  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##        L4              L5              L6       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :4.000   Median :4.000   Median :4.000  
##  Mean   :3.857   Mean   :3.429   Mean   :3.571  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000
# Preprocess responses to lowercase and remove whitespace
data$AI_tool <- tolower(trimws(data$AI_tool))

# Define custom levels based on categories
custom_levels <- c("chatgpt/openai", "grammarly", "photoshop ai generator", "quillbot", "tutorai", "google bard/gemini", "prowritingaid", "trinka", "consensus", "scite", "microsoft copilot", "cognii", "mathly", "unschooler", "duolingo", "claude.ai", "perplexity.ai", "pi.ai", "kiwi", "ivy.ai", "cramify.ai", "mindgrasp.ai", "teach anything", "soofy.io ai")

# Create a vector to store categorized responses
categorized_responses <- character(length(data$AI_tool))

# Loop through each processed response and categorize them
for (i in seq_along(data$AI_tool)) {
  # Check if the response contains specific keywords and categorize accordingly
  if ("chatgpt" %in% data$AI_tool[i] | " chatgpt " %in% data$AI_tool[i] | "openai" %in% data$AI_tool[i] | "chat got " %in% data$AI_tool[i] | "GPT-4" %in% data$AI_tool[i] | "llms - gpt4" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "chatgpt/openai"
  } else if ("grammarly" %in% data$AI_tool[i] | "www.Grammarly.com" %in% data$AI_tool[i]){
    categorized_responses[i] <- "grammarly"
  } else if ("photoshop ai generator" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "photoshop ai generator"
  } else if ("quillbot" %in% data$AI_tool[i] | " quilbot" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "quillbot"
  } else if ("tutorai" %in% data$AI_tool[i] | "tutor ai" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "tutorai"
  } else if ("google bard" %in% data$AI_tool[i] | "google gemini" %in% data$AI_tool[i] | "bard/gemini" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "google bard/gemini"
  } else if ("prowritingaid" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "prowritingaid"
  }else if ("trinka" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "trinka"
  }else if ("consensus" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "consensus"
  }else if ("scite" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "scite"
  }else if ("microsoft copilot" %in% data$AI_tool[i] | "copilot" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "microsoft copilot"
  }else if ("cognii" %in% data$AI_tool[i] | "Cognii.com" %in% data$AI_tool[i] ) {
    categorized_responses[i] <- "cognii"
  }else if ("mathly" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "mathly"
  }else if ("unschooler" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "unschooler"
  }else if ("duolingo" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "duolingo"
  }else if ("claude.ai" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "claude.ai"
  }else if ("perplexity.ai" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "perplexity.ai"
  }else if ("pi.ai" %in% data$AI_tool[i] | "pi" %in% data$AI_tool[i] ) {
    categorized_responses[i] <- "pi.ai"
  }else if ("kiwi" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "kiwi"
  }else if ("ivy.ai" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "ivy.ai"
  }else if ("cramify.ai" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "cramify.ai"
  }else if ("mindgrasp.ai" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "mindgrasp.ai"
  }else if ("teach anything" %in% data$AI_tool[i]) {
    categorized_responses[i] <- "teach anything"
  }else if ("soofy.io ai" %in% data$AI_tool[i] | "soofy.io AI writing tools" %in% data$AI_tool[i] ) {
    categorized_responses[i] <- "soofy.io ai"
  }
}
str(data)
## 'data.frame':    63 obs. of  19 variables:
##  $ Degree : chr  "Undergraduate Degree" "Undergraduate Degree" "Graduate Degree" "Graduate Degree" ...
##  $ AI_use : int  3 3 3 4 5 2 1 1 5 3 ...
##  $ AI_tool: chr  "grammarly" "photoshop ai generator, chatgpt" "chatgpt" "chatgpt, quilbot" ...
##  $ AP1    : int  2 1 2 4 5 1 2 5 1 1 ...
##  $ AP2    : int  1 1 3 1 1 1 5 5 2 2 ...
##  $ AP3    : int  3 3 5 4 4 1 5 5 4 3 ...
##  $ AP4    : int  1 1 1 1 4 1 1 5 2 1 ...
##  $ AP5    : int  4 5 2 3 4 5 5 5 2 5 ...
##  $ AP6    : int  5 5 4 4 5 5 5 5 1 5 ...
##  $ AP7    : int  5 4 5 3 4 3 5 5 1 4 ...
##  $ AP8    : int  3 5 3 2 3 3 5 5 4 3 ...
##  $ AP9    : int  4 2 4 5 5 1 1 5 1 2 ...
##  $ AP10   : int  2 5 4 3 2 5 3 1 5 4 ...
##  $ L1     : int  3 4 2 2 2 1 5 5 4 5 ...
##  $ L2     : int  5 2 4 5 5 5 4 5 5 4 ...
##  $ L3     : int  4 2 4 4 4 3 2 5 5 5 ...
##  $ L4     : int  4 2 3 4 3 1 5 3 4 3 ...
##  $ L5     : int  4 4 3 3 4 5 5 1 1 2 ...
##  $ L6     : int  3 2 4 5 2 1 1 5 3 4 ...
summary(data)
##     Degree              AI_use        AI_tool               AP1       
##  Length:63          Min.   :1.000   Length:63          Min.   :1.000  
##  Class :character   1st Qu.:3.000   Class :character   1st Qu.:1.000  
##  Mode  :character   Median :4.000   Mode  :character   Median :2.000  
##                     Mean   :3.667                      Mean   :2.619  
##                     3rd Qu.:5.000                      3rd Qu.:4.000  
##                     Max.   :5.000                      Max.   :5.000  
##       AP2             AP3             AP4             AP5       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2.000  
##  1st Qu.:2.500   1st Qu.:4.000   1st Qu.:1.000   1st Qu.:3.000  
##  Median :4.000   Median :4.000   Median :1.000   Median :4.000  
##  Mean   :3.651   Mean   :4.127   Mean   :2.016   Mean   :4.063  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##       AP6             AP7             AP8             AP9       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:4.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :5.000   Median :4.000   Median :4.000   Median :4.000  
##  Mean   :4.286   Mean   :3.778   Mean   :3.825   Mean   :3.286  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##       AP10             L1              L2              L3       
##  Min.   :1.000   Min.   :1.000   Min.   :2.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:3.000   1st Qu.:4.000   1st Qu.:2.000  
##  Median :4.000   Median :4.000   Median :5.000   Median :3.000  
##  Mean   :3.444   Mean   :3.984   Mean   :4.317   Mean   :3.317  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##        L4              L5              L6       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :4.000   Median :4.000   Median :4.000  
##  Mean   :3.857   Mean   :3.429   Mean   :3.571  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000

Data Preprocessing

Factoring Character Variables

data$Degree <- factor(data$Degree, 
                      labels = c("Undergraduate", "Graduate"))
data$AI_tool <- factor(categorized_responses, 
                       labels = custom_levels)

str(data)
## 'data.frame':    63 obs. of  19 variables:
##  $ Degree : Factor w/ 2 levels "Undergraduate",..: 2 2 1 1 2 1 1 2 1 2 ...
##  $ AI_use : int  3 3 3 4 5 2 1 1 5 3 ...
##  $ AI_tool: Factor w/ 24 levels "chatgpt/openai",..: 9 1 2 1 1 2 2 2 1 18 ...
##  $ AP1    : int  2 1 2 4 5 1 2 5 1 1 ...
##  $ AP2    : int  1 1 3 1 1 1 5 5 2 2 ...
##  $ AP3    : int  3 3 5 4 4 1 5 5 4 3 ...
##  $ AP4    : int  1 1 1 1 4 1 1 5 2 1 ...
##  $ AP5    : int  4 5 2 3 4 5 5 5 2 5 ...
##  $ AP6    : int  5 5 4 4 5 5 5 5 1 5 ...
##  $ AP7    : int  5 4 5 3 4 3 5 5 1 4 ...
##  $ AP8    : int  3 5 3 2 3 3 5 5 4 3 ...
##  $ AP9    : int  4 2 4 5 5 1 1 5 1 2 ...
##  $ AP10   : int  2 5 4 3 2 5 3 1 5 4 ...
##  $ L1     : int  3 4 2 2 2 1 5 5 4 5 ...
##  $ L2     : int  5 2 4 5 5 5 4 5 5 4 ...
##  $ L3     : int  4 2 4 4 4 3 2 5 5 5 ...
##  $ L4     : int  4 2 3 4 3 1 5 3 4 3 ...
##  $ L5     : int  4 4 3 3 4 5 5 1 1 2 ...
##  $ L6     : int  3 2 4 5 2 1 1 5 3 4 ...

Data Cleaning

# Missing Data
sum(is.na(data)) > 0 
## [1] FALSE
## No missing data present

# Duplicate Data
nrow(data[duplicated(data), ]) > 0
## [1] FALSE
## No duplicate data present

# Errors
sum(data[, -c(1,3)] < 0, na.rm = TRUE) > 0
## [1] FALSE
## No errors present

# Outliers
q1 <- quantile(data[, -c(1,3)], 0.25, na.rm = TRUE)
q3 <- quantile(data[, -c(1,3)], 0.75, na.rm = TRUE)
iqr <- q3 - q1
lower_bound <- q1 - 1.5 * iqr
upper_bound <- q3 + 1.5 * iqr
outliers <- data[, -c(1,3)] < lower_bound | data[, -c(1,3)] > upper_bound
sum(outliers, na.rm = TRUE) > 0 
## [1] FALSE
## no outliers present

Structure of data + New composite variables

data$avrgActive <- as.integer(rowMeans(data[c("AP1", "AP2", "AP3", "AP4", "AP5", "AP6", "AP7", "AP8", "AP10")]))
data$avrgLaziness <- as.integer(rowMeans(data[c("L1", "L2", "L3", "L4", "L5", "L6")]))

str(data)
## 'data.frame':    63 obs. of  21 variables:
##  $ Degree      : Factor w/ 2 levels "Undergraduate",..: 2 2 1 1 2 1 1 2 1 2 ...
##  $ AI_use      : int  3 3 3 4 5 2 1 1 5 3 ...
##  $ AI_tool     : Factor w/ 24 levels "chatgpt/openai",..: 9 1 2 1 1 2 2 2 1 18 ...
##  $ AP1         : int  2 1 2 4 5 1 2 5 1 1 ...
##  $ AP2         : int  1 1 3 1 1 1 5 5 2 2 ...
##  $ AP3         : int  3 3 5 4 4 1 5 5 4 3 ...
##  $ AP4         : int  1 1 1 1 4 1 1 5 2 1 ...
##  $ AP5         : int  4 5 2 3 4 5 5 5 2 5 ...
##  $ AP6         : int  5 5 4 4 5 5 5 5 1 5 ...
##  $ AP7         : int  5 4 5 3 4 3 5 5 1 4 ...
##  $ AP8         : int  3 5 3 2 3 3 5 5 4 3 ...
##  $ AP9         : int  4 2 4 5 5 1 1 5 1 2 ...
##  $ AP10        : int  2 5 4 3 2 5 3 1 5 4 ...
##  $ L1          : int  3 4 2 2 2 1 5 5 4 5 ...
##  $ L2          : int  5 2 4 5 5 5 4 5 5 4 ...
##  $ L3          : int  4 2 4 4 4 3 2 5 5 5 ...
##  $ L4          : int  4 2 3 4 3 1 5 3 4 3 ...
##  $ L5          : int  4 4 3 3 4 5 5 1 1 2 ...
##  $ L6          : int  3 2 4 5 2 1 1 5 3 4 ...
##  $ avrgActive  : int  2 3 3 2 3 2 4 4 2 3 ...
##  $ avrgLaziness: int  3 2 3 3 3 2 3 4 3 3 ...
dim(data)
## [1] 63 21
names(data)
##  [1] "Degree"       "AI_use"       "AI_tool"      "AP1"          "AP2"         
##  [6] "AP3"          "AP4"          "AP5"          "AP6"          "AP7"         
## [11] "AP8"          "AP9"          "AP10"         "L1"           "L2"          
## [16] "L3"           "L4"           "L5"           "L6"           "avrgActive"  
## [21] "avrgLaziness"
summary(data)
##            Degree       AI_use                        AI_tool        AP1       
##  Undergraduate:34   Min.   :1.000   chatgpt/openai        :12   Min.   :1.000  
##  Graduate     :29   1st Qu.:3.000   grammarly             :10   1st Qu.:1.000  
##                     Median :4.000   teach anything        : 5   Median :2.000  
##                     Mean   :3.667   consensus             : 3   Mean   :2.619  
##                     3rd Qu.:5.000   photoshop ai generator: 2   3rd Qu.:4.000  
##                     Max.   :5.000   quillbot              : 2   Max.   :5.000  
##                                     (Other)               :29                  
##       AP2             AP3             AP4             AP5       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2.000  
##  1st Qu.:2.500   1st Qu.:4.000   1st Qu.:1.000   1st Qu.:3.000  
##  Median :4.000   Median :4.000   Median :1.000   Median :4.000  
##  Mean   :3.651   Mean   :4.127   Mean   :2.016   Mean   :4.063  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##                                                                 
##       AP6             AP7             AP8             AP9       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:4.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :5.000   Median :4.000   Median :4.000   Median :4.000  
##  Mean   :4.286   Mean   :3.778   Mean   :3.825   Mean   :3.286  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##                                                                 
##       AP10             L1              L2              L3       
##  Min.   :1.000   Min.   :1.000   Min.   :2.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:3.000   1st Qu.:4.000   1st Qu.:2.000  
##  Median :4.000   Median :4.000   Median :5.000   Median :3.000  
##  Mean   :3.444   Mean   :3.984   Mean   :4.317   Mean   :3.317  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##                                                                 
##        L4              L5              L6          avrgActive   
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2.000  
##  1st Qu.:3.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:3.000  
##  Median :4.000   Median :4.000   Median :4.000   Median :3.000  
##  Mean   :3.857   Mean   :3.429   Mean   :3.571   Mean   :3.111  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##                                                                 
##   avrgLaziness  
##  Min.   :2.000  
##  1st Qu.:3.000  
##  Median :3.000  
##  Mean   :3.365  
##  3rd Qu.:4.000  
##  Max.   :5.000  
## 

Exploratory Data Analysis + Data Visualization

# Frequency of each variable - find mean of each question
## Active learning
hist(data$avrgActive, main = "Average Active Learning Frequency Plot", xlab = "Active Learning", ylab = "Frequency")

## Passive learning
hist(data$AP9, main = "Passive Leanring Frequency Plot", xlab = "Passive Learning", ylab = "Frequency")

## Laziness
hist(data$avrgLaziness, main = "Average LazinessFrequency Plot", xlab = "Laziness", ylab = "Frequency")

# Pie Chart of which AI tools were used the most
aitoolPlot <- plot_ly(data, labels = ~AI_tool, values = ~AI_use, type = 'pie',width = 800, height = 550)
aitoolPlot <- aitoolPlot %>% 
  layout(title = 'Percentage of AI Tools used',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
aitoolPlot
## Here we can see that chatGPT 26.2%, mindgrasp 8.93% and grammarly 8.33% are the top 3 AI tools used.



# Bar plot of which AI tools were used by Degree
topAItools <- data %>%
  filter(AI_tool %in% c("chatgpt/openai", "grammarly", "teach anything", "consensus", "duolingo", "claude.ai", "scite", "trinka", "prowritingaid", "microsoft copilot"))
aiToolDegreePlot <- ggplot(topAItools, aes(x = AI_tool, y = AI_use, fill = Degree)) +
            stat_summary(fun = base::mean,
               geom = "bar",
               position = "dodge") +
            theme_classic() +
            coord_cartesian(ylim = c(0.5, 5.5)) +
            labs(x = "AI Tools", 
                 y = "AI Use",
                 title = "Top 10 AI tools used by degree", 
                 color = "Category") +
            theme(legend.position = "right", axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))
ggplotly(aiToolDegreePlot)

Data Screening: Assumptions

str(data)
## 'data.frame':    63 obs. of  21 variables:
##  $ Degree      : Factor w/ 2 levels "Undergraduate",..: 2 2 1 1 2 1 1 2 1 2 ...
##  $ AI_use      : int  3 3 3 4 5 2 1 1 5 3 ...
##  $ AI_tool     : Factor w/ 24 levels "chatgpt/openai",..: 9 1 2 1 1 2 2 2 1 18 ...
##  $ AP1         : int  2 1 2 4 5 1 2 5 1 1 ...
##  $ AP2         : int  1 1 3 1 1 1 5 5 2 2 ...
##  $ AP3         : int  3 3 5 4 4 1 5 5 4 3 ...
##  $ AP4         : int  1 1 1 1 4 1 1 5 2 1 ...
##  $ AP5         : int  4 5 2 3 4 5 5 5 2 5 ...
##  $ AP6         : int  5 5 4 4 5 5 5 5 1 5 ...
##  $ AP7         : int  5 4 5 3 4 3 5 5 1 4 ...
##  $ AP8         : int  3 5 3 2 3 3 5 5 4 3 ...
##  $ AP9         : int  4 2 4 5 5 1 1 5 1 2 ...
##  $ AP10        : int  2 5 4 3 2 5 3 1 5 4 ...
##  $ L1          : int  3 4 2 2 2 1 5 5 4 5 ...
##  $ L2          : int  5 2 4 5 5 5 4 5 5 4 ...
##  $ L3          : int  4 2 4 4 4 3 2 5 5 5 ...
##  $ L4          : int  4 2 3 4 3 1 5 3 4 3 ...
##  $ L5          : int  4 4 3 3 4 5 5 1 1 2 ...
##  $ L6          : int  3 2 4 5 2 1 1 5 3 4 ...
##  $ avrgActive  : int  2 3 3 2 3 2 4 4 2 3 ...
##  $ avrgLaziness: int  3 2 3 3 3 2 3 4 3 3 ...
# Correlation
cor_matrix <- cor(data[,c(2, 12, 20:21)])
new_names <- c("AI Use", "Active Learning", "Passive Learning", "Laziness")
colnames(cor_matrix) <- new_names
rownames(cor_matrix) <- new_names
corrplot(cor_matrix, tl.col = "black", tl.srt = 15, tl.cex = 0.8, cl.cex = 0.8, number.cex = 0.8)
mtext("Correlation Matrix", side = 2, line = 1, cex = 1.2)

#corrplot(cor(data[,c(2, 12, 20:21)]))

# Normality
shapiro.test(data$AI_use)
## 
##  Shapiro-Wilk normality test
## 
## data:  data$AI_use
## W = 0.85063, p-value = 1.999e-06
shapiro.test(data$avrgActive)
## 
##  Shapiro-Wilk normality test
## 
## data:  data$avrgActive
## W = 0.84677, p-value = 1.531e-06
shapiro.test(data$AP9)
## 
##  Shapiro-Wilk normality test
## 
## data:  data$AP9
## W = 0.78694, p-value = 3.743e-08
shapiro.test(data$avrgLaziness) 
## 
##  Shapiro-Wilk normality test
## 
## data:  data$avrgLaziness
## W = 0.84312, p-value = 1.194e-06
# Homoscedasticity
bartlett.test(data$avrgActive, data$AI_use)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  data$avrgActive and data$AI_use
## Bartlett's K-squared = 8.0748, df = 4, p-value = 0.08887
bartlett.test(data$AP9, data$AI_use)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  data$AP9 and data$AI_use
## Bartlett's K-squared = 0.65198, df = 4, p-value = 0.9571
bartlett.test(data$avrgLaziness, data$AI_use)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  data$avrgLaziness and data$AI_use
## Bartlett's K-squared = 7.1794, df = 4, p-value = 0.1267
# Multicollinearity 
mutlicollinearModel1 <- lm(AI_use ~ avrgActive + AP9 + avrgLaziness, data = data)
vif(mutlicollinearModel1)
##   avrgActive          AP9 avrgLaziness 
##     1.578639     1.491100     2.174997
## Very moderate multicollinearity since all are below 5


model1 <- lm(avrgActive ~ AI_use + Degree + AI_tool, data = data)
model2 <- lm(AP9 ~ AI_use + Degree + AI_tool, data = data)
model3 <- lm(avrgLaziness ~ AI_use + Degree + AI_tool, data = data)

# Independence of Errors:
durbinWatsonTest(model1)
##  lag Autocorrelation D-W Statistic p-value
##    1      -0.1608455      2.267462   0.718
##  Alternative hypothesis: rho != 0
durbinWatsonTest(model2)
##  lag Autocorrelation D-W Statistic p-value
##    1     -0.08433773       2.14565   0.476
##  Alternative hypothesis: rho != 0
durbinWatsonTest(model3)
##  lag Autocorrelation D-W Statistic p-value
##    1      -0.1316651      2.261465   0.772
##  Alternative hypothesis: rho != 0
# Linearity 
ggplot(data, aes(x = fitted(model1), y = residuals(model1))) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(x = "Fitted values", y = "Residuals") +
  ggtitle("Residuals vs. Fitted Plot for Active Learning")

ggplot(data, aes(x = fitted(model2), y = residuals(model2))) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(x = "Fitted values", y = "Residuals") +
  ggtitle("Residuals vs. Fitted Plot for Passive Learning")

ggplot(data, aes(x = fitted(model3), y = residuals(model3))) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(x = "Fitted values", y = "Residuals") +
  ggtitle("Residuals vs. Fitted Plot for Laziness")

# No Outliers or Influential Observations:
## Assumption met when checking for outliers in preprocessing steps 

Subsetting data

agreeActiveSubset <- subset(data[, -c(4:19, 21)], avrgActive > 3)
disagreeActiveSubset <- subset(data[, -c(4:19, 21)], avrgActive < 3)
neutralActiveSubset <- subset(data[, -c(4:19, 21)], avrgActive == 3)
str(neutralActiveSubset)
## 'data.frame':    31 obs. of  4 variables:
##  $ Degree    : Factor w/ 2 levels "Undergraduate",..: 2 1 2 2 1 1 2 1 2 2 ...
##  $ AI_use    : int  3 3 5 3 5 5 4 3 4 3 ...
##  $ AI_tool   : Factor w/ 24 levels "chatgpt/openai",..: 1 2 1 18 23 5 19 13 8 2 ...
##  $ avrgActive: int  3 3 3 3 3 3 3 3 3 3 ...
agreeLazinessSubset <- subset(data[, -c(4:20)], avrgLaziness > 3)
disagreeLazinessSubset <- subset(data[, -c(4:20)], avrgLaziness < 3)
neutralLazinessSubset <- subset(data[, -c(4:20)], avrgLaziness == 3)


agreePassiveSubset <- subset(data[, -c(4:11,13:21)], AP9 > 3)
disagreePassiveSubset <- subset(data[, -c(4:11,13:21)], AP9 < 3)
neutralPassiveSubset <- subset(data[, -c(4:11,13:21)], AP9 == 3)
str(neutralPassiveSubset)
## 'data.frame':    3 obs. of  4 variables:
##  $ Degree : Factor w/ 2 levels "Undergraduate",..: 1 2 2
##  $ AI_use : int  1 1 2
##  $ AI_tool: Factor w/ 24 levels "chatgpt/openai",..: 4 4 23
##  $ AP9    : int  3 3 3

Regression Analysis

#ACTIVE LEARNING
activeLearningModel1 <- lm(avrgActive ~ AI_use, data = agreeActiveSubset)
summary(activeLearningModel1)
## 
## Call:
## lm(formula = avrgActive ~ AI_use, data = agreeActiveSubset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3925 -0.3925 -0.1536  0.6075  0.6075 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.79522    0.33821  11.222 1.07e-08 ***
## AI_use       0.11945    0.07665   1.559     0.14    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.45 on 15 degrees of freedom
## Multiple R-squared:  0.1394, Adjusted R-squared:  0.08199 
## F-statistic: 2.429 on 1 and 15 DF,  p-value: 0.14
# The results show that AI use influence on increased Active Learning is not significant 
## NOT significant F-statistic: 2.429 on 1 and 15 DF,  p-value: 0.14

activeLearningModel3 <- lm(avrgActive ~ AI_use, data = disagreeActiveSubset)
summary(activeLearningModel3)
## Warning in summary.lm(activeLearningModel3): essentially perfect fit: summary
## may be unreliable
## 
## Call:
## lm(formula = avrgActive ~ AI_use, data = disagreeActiveSubset)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.593e-15  6.370e-17  9.555e-17  1.592e-16  1.592e-16 
## 
## Coefficients:
##              Estimate Std. Error   t value Pr(>|t|)    
## (Intercept) 2.000e+00  3.561e-16 5.616e+15   <2e-16 ***
## AI_use      3.185e-17  9.877e-17 3.220e-01    0.752    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.59e-16 on 13 degrees of freedom
## Multiple R-squared:  0.561,  Adjusted R-squared:  0.5273 
## F-statistic: 16.61 on 1 and 13 DF,  p-value: 0.001311
# The results show that AI use influence on decreased Active Learning is significant 
## Significant F-statistic: 16.61 on 1 and 13 DF,  p-value: 0.001311

activeLearningModel2 <- lm(avrgActive ~ AI_use, data = neutralActiveSubset)
summary(activeLearningModel2)
## Warning in summary.lm(activeLearningModel2): essentially perfect fit: summary
## may be unreliable
## 
## Call:
## lm(formula = avrgActive ~ AI_use, data = neutralActiveSubset)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##      0      0      0      0      0 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        3          0     Inf   <2e-16 ***
## AI_use             0          0     NaN      NaN    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0 on 29 degrees of freedom
## Multiple R-squared:    NaN,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 1 and 29 DF,  p-value: NA
# NA

#PASSIVE LEARNING
passiveLearningModel1 <- lm(AP9 ~ AI_use, data = agreePassiveSubset)
summary(passiveLearningModel1)
## 
## Call:
## lm(formula = AP9 ~ AI_use, data = agreePassiveSubset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.91484  0.08516  0.08516  0.20839  0.57806 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.29871    0.23804  18.059   <2e-16 ***
## AI_use       0.12323    0.05743   2.146   0.0398 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3936 on 31 degrees of freedom
## Multiple R-squared:  0.1293, Adjusted R-squared:  0.1012 
## F-statistic: 4.604 on 1 and 31 DF,  p-value: 0.03983
#F-statistic: 4.604 on 1 and 31 DF,  p-value: 0.03983

passiveLearningModel3 <- lm(AP9 ~ AI_use, data = disagreePassiveSubset)
summary(passiveLearningModel3)
## 
## Call:
## lm(formula = AP9 ~ AI_use, data = disagreePassiveSubset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6226 -0.4760 -0.2318  0.4751  0.5728 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  1.13410    0.30458   3.724    0.001 **
## AI_use       0.09770    0.08119   1.203    0.240   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5048 on 25 degrees of freedom
## Multiple R-squared:  0.05476,    Adjusted R-squared:  0.01695 
## F-statistic: 1.448 on 1 and 25 DF,  p-value: 0.2401
#F-statistic: 1.448 on 1 and 25 DF,  p-value: 0.2401

passiveLearningModel2 <- lm(AP9 ~ AI_use, data = neutralPassiveSubset)
summary(passiveLearningModel2)
## Warning in summary.lm(passiveLearningModel2): essentially perfect fit: summary
## may be unreliable
## 
## Call:
## lm(formula = AP9 ~ AI_use, data = neutralPassiveSubset)
## 
## Residuals:
##         26         27         36 
##  3.846e-16 -3.846e-16 -4.930e-32 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  3.000e+00  9.421e-16  3.185e+15   <2e-16 ***
## AI_use      -3.846e-16  6.661e-16 -5.770e-01    0.667    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.439e-16 on 1 degrees of freedom
## Multiple R-squared:  0.5714, Adjusted R-squared:  0.1429 
## F-statistic: 1.333 on 1 and 1 DF,  p-value: 0.4544
#F-statistic: 1.333 on 1 and 1 DF,  p-value: 0.4544

#LAZINESS
lazinessModel1 <- lm(avrgLaziness ~ AI_use, data = agreeLazinessSubset)
summary(lazinessModel1)
## 
## Call:
## lm(formula = avrgLaziness ~ AI_use, data = agreeLazinessSubset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4308 -0.4308 -0.1077  0.5692  0.5692 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.6231     0.3283   11.04 3.35e-10 ***
## AI_use        0.1615     0.0748    2.16   0.0425 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4356 on 21 degrees of freedom
## Multiple R-squared:  0.1817, Adjusted R-squared:  0.1428 
## F-statistic: 4.664 on 1 and 21 DF,  p-value: 0.04252
#F-statistic: 4.664 on 1 and 21 DF,  p-value: 0.04252

lazinessModel3 <- lm(avrgLaziness ~ AI_use, data = disagreeLazinessSubset)
summary(lazinessModel3)
## Warning in summary.lm(lazinessModel3): essentially perfect fit: summary may be
## unreliable
## 
## Call:
## lm(formula = avrgLaziness ~ AI_use, data = disagreeLazinessSubset)
## 
## Residuals:
##  2  6 14 19 32 48 58 
##  0  0  0  0  0  0  0 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2          0     Inf   <2e-16 ***
## AI_use             0          0     NaN      NaN    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0 on 5 degrees of freedom
## Multiple R-squared:    NaN,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 1 and 5 DF,  p-value: NA
# NA

lazinessModel2 <- lm(avrgLaziness ~ AI_use, data = neutralLazinessSubset)
summary(lazinessModel2)
## Warning in summary.lm(lazinessModel2): essentially perfect fit: summary may be
## unreliable
## 
## Call:
## lm(formula = avrgLaziness ~ AI_use, data = neutralLazinessSubset)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.356e-16 -2.756e-16 -1.956e-16 -1.156e-16  7.378e-15 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  3.000e+00  7.028e-16  4.268e+15   <2e-16 ***
## AI_use      -8.000e-17  1.868e-16 -4.280e-01    0.671    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.35e-15 on 31 degrees of freedom
## Multiple R-squared:  0.5133, Adjusted R-squared:  0.4976 
## F-statistic:  32.7 on 1 and 31 DF,  p-value: 2.741e-06
#F-statistic:  32.7 on 1 and 31 DF,  p-value: 2.741e-06


#I aim to study whether AI use has an influence on Active Learning, passive learning tendencies and laziness. A regression analyses was conducted by using R. I found that, There is a direct relationship between . 
# Result shows that after controlling for typical open time, openrate and click through rate both have a significant influence on digital literacy.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.