Capstone Analysis

Author

Tia Kelly

#Introduction An instructional designer created a personalized learning platform for students in grades K-12. The platform uses behavioral data with a focus on time on task to adapt the content in real time.

The instructional designer and teacher(s) will benefit from understanding this data.

#Data Overview

#l label: first-look 
data <- read_csv("data/sci-online-classes.csv")

glimpse(data)
Rows: 603
Columns: 30
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 51943, 52326,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4208, 4325, 2086, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 3596, 2255, 1719, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject               <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "FrSc…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section               <chr> "02", "01", "01", "01", "01", "03", "01", "01", …
$ Gradebook_Item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ Points_Possible       <dbl> 5, 10, 10, 5, 438, 5, 10, 10, 443, 5, 12, 10, 5,…
$ Points_Earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, NA, 10.00, 425.…
$ Gender                <chr> "M", "F", "M", "M", "F", "F", "M", "F", "F", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q2                    <dbl> 4, 4, 4, 5, 3, NA, 5, 3, 3, NA, NA, 5, 3, 3, NA,…
$ q3                    <dbl> 4, 3, 4, 3, 3, NA, 3, 3, 3, NA, NA, 3, 3, 5, NA,…
$ q4                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 3, 5, NA,…
$ q5                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 4, 5, NA,…
$ q6                    <dbl> 5, 4, 4, 5, 4, NA, 5, 4, 3, NA, NA, 5, 3, 5, NA,…
$ q7                    <dbl> 5, 4, 4, 4, 4, NA, 4, 3, 3, NA, NA, 5, 3, 5, NA,…
$ q8                    <dbl> 5, 5, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q9                    <dbl> 4, 4, 3, 5, NA, NA, 5, 3, 2, NA, NA, 5, 2, 2, NA…
$ q10                   <dbl> 5, 4, 5, 5, 3, NA, 5, 3, 5, NA, NA, 4, 4, 5, NA,…
$ TimeSpent             <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ TimeSpent_hours       <dbl> 25.91944500, 23.04500167, 14.34055833, 26.643610…
$ TimeSpent_std         <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 4.6, 5.0, 3.0, 4.2, NA,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 4.00, 3.50, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…
str(data) 
spc_tbl_ [603 × 30] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ student_id           : num [1:603] 43146 44638 47448 47979 48797 ...
 $ course_id            : chr [1:603] "FrScA-S216-02" "OcnA-S116-01" "FrScA-S216-01" "OcnA-S216-01" ...
 $ total_points_possible: num [1:603] 3280 3531 2870 4562 2207 ...
 $ total_points_earned  : num [1:603] 2220 2672 1897 3090 1910 ...
 $ percentage_earned    : num [1:603] 0.677 0.757 0.661 0.677 0.865 ...
 $ subject              : chr [1:603] "FrScA" "OcnA" "FrScA" "OcnA" ...
 $ semester             : chr [1:603] "S216" "S116" "S216" "S216" ...
 $ section              : chr [1:603] "02" "01" "01" "01" ...
 $ Gradebook_Item       : chr [1:603] "POINTS EARNED & TOTAL COURSE POINTS" "ATTEMPTED" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" ...
 $ Grade_Category       : logi [1:603] NA NA NA NA NA NA ...
 $ FinalGradeCEMS       : num [1:603] 93.5 81.7 88.5 81.9 84 ...
 $ Points_Possible      : num [1:603] 5 10 10 5 438 5 10 10 443 5 ...
 $ Points_Earned        : num [1:603] NA 10 NA 4 399 NA NA 10 425 2.5 ...
 $ Gender               : chr [1:603] "M" "F" "M" "M" ...
 $ q1                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q2                   : num [1:603] 4 4 4 5 3 NA 5 3 3 NA ...
 $ q3                   : num [1:603] 4 3 4 3 3 NA 3 3 3 NA ...
 $ q4                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q5                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q6                   : num [1:603] 5 4 4 5 4 NA 5 4 3 NA ...
 $ q7                   : num [1:603] 5 4 4 4 4 NA 4 3 3 NA ...
 $ q8                   : num [1:603] 5 5 5 5 4 NA 5 3 4 NA ...
 $ q9                   : num [1:603] 4 4 3 5 NA NA 5 3 2 NA ...
 $ q10                  : num [1:603] 5 4 5 5 3 NA 5 3 5 NA ...
 $ TimeSpent            : num [1:603] 1555 1383 860 1599 1482 ...
 $ TimeSpent_hours      : num [1:603] 25.9 23 14.3 26.6 24.7 ...
 $ TimeSpent_std        : num [1:603] -0.181 -0.308 -0.693 -0.148 -0.235 ...
 $ int                  : num [1:603] 5 4.2 5 5 3.8 4.6 5 3 4.2 NA ...
 $ pc                   : num [1:603] 4.5 3.5 4 3.5 3.5 4 3.5 3 3 NA ...
 $ uv                   : num [1:603] 4.33 4 3.67 5 3.5 ...
 - attr(*, "spec")=
  .. cols(
  ..   student_id = col_double(),
  ..   course_id = col_character(),
  ..   total_points_possible = col_double(),
  ..   total_points_earned = col_double(),
  ..   percentage_earned = col_double(),
  ..   subject = col_character(),
  ..   semester = col_character(),
  ..   section = col_character(),
  ..   Gradebook_Item = col_character(),
  ..   Grade_Category = col_logical(),
  ..   FinalGradeCEMS = col_double(),
  ..   Points_Possible = col_double(),
  ..   Points_Earned = col_double(),
  ..   Gender = col_character(),
  ..   q1 = col_double(),
  ..   q2 = col_double(),
  ..   q3 = col_double(),
  ..   q4 = col_double(),
  ..   q5 = col_double(),
  ..   q6 = col_double(),
  ..   q7 = col_double(),
  ..   q8 = col_double(),
  ..   q9 = col_double(),
  ..   q10 = col_double(),
  ..   TimeSpent = col_double(),
  ..   TimeSpent_hours = col_double(),
  ..   TimeSpent_std = col_double(),
  ..   int = col_double(),
  ..   pc = col_double(),
  ..   uv = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Variable Descriptions

‘student_id’: unique student identifier ‘subject’: Course subject ‘semester’: Semester course was taken ‘section’: Course section identifier ‘FinalGradeCEMS’: Final Course grade (0-100) ‘TimeSpent_hours’: Total hours spent on LMS ‘percentage_earned’: Percentage of points earned ‘Gender’: Student Gender (“M” or “F”) ‘total_points_possible’: Total points available in course ‘Grade_Category’: Pass/fail category

#l label: native-omit
data_naive <- na.omit(data)
nrow(data) 
[1] 603
nrow(data_naive)
[1] 0
summary(data)
   student_id     course_id         total_points_possible total_points_earned
 Min.   :43146   Length:603         Min.   :  840         Min.   :  651      
 1st Qu.:85612   Class :character   1st Qu.: 2810         1st Qu.: 2050      
 Median :88340   Mode  :character   Median : 3583         Median : 2757      
 Mean   :86070                      Mean   : 4274         Mean   : 3245      
 3rd Qu.:92730                      3rd Qu.: 5069         3rd Qu.: 3875      
 Max.   :97441                      Max.   :15552         Max.   :12208      
                                                                             
 percentage_earned   subject            semester           section         
 Min.   :0.3384    Length:603         Length:603         Length:603        
 1st Qu.:0.7047    Class :character   Class :character   Class :character  
 Median :0.7770    Mode  :character   Mode  :character   Mode  :character  
 Mean   :0.7577                                                            
 3rd Qu.:0.8262                                                            
 Max.   :0.9106                                                            
                                                                           
 Gradebook_Item     Grade_Category FinalGradeCEMS   Points_Possible 
 Length:603         Mode:logical   Min.   :  0.00   Min.   :  5.00  
 Class :character   NA's:603       1st Qu.: 71.25   1st Qu.: 10.00  
 Mode  :character                  Median : 84.57   Median : 10.00  
                                   Mean   : 77.20   Mean   : 76.87  
                                   3rd Qu.: 92.10   3rd Qu.: 30.00  
                                   Max.   :100.00   Max.   :935.00  
                                   NA's   :30                       
 Points_Earned       Gender                q1              q2       
 Min.   :  0.00   Length:603         Min.   :1.000   Min.   :1.000  
 1st Qu.:  7.00   Class :character   1st Qu.:4.000   1st Qu.:3.000  
 Median : 10.00   Mode  :character   Median :4.000   Median :4.000  
 Mean   : 68.63                      Mean   :4.296   Mean   :3.629  
 3rd Qu.: 26.12                      3rd Qu.:5.000   3rd Qu.:4.000  
 Max.   :828.20                      Max.   :5.000   Max.   :5.000  
 NA's   :92                          NA's   :123     NA's   :126    
       q3              q4              q5              q6       
 Min.   :1.000   Min.   :1.000   Min.   :2.000   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:4.000   1st Qu.:4.000   1st Qu.:4.000  
 Median :3.000   Median :4.000   Median :4.000   Median :4.000  
 Mean   :3.327   Mean   :4.268   Mean   :4.191   Mean   :4.008  
 3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
 Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
 NA's   :123     NA's   :125     NA's   :127     NA's   :127    
       q7              q8              q9             q10       
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:4.000   1st Qu.:3.000   1st Qu.:4.000  
 Median :4.000   Median :4.000   Median :4.000   Median :4.000  
 Mean   :3.907   Mean   :4.289   Mean   :3.487   Mean   :4.101  
 3rd Qu.:4.750   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:5.000  
 Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
 NA's   :129     NA's   :129     NA's   :129     NA's   :129    
   TimeSpent       TimeSpent_hours    TimeSpent_std          int       
 Min.   :   0.45   Min.   :  0.0075   Min.   :-1.3280   Min.   :2.000  
 1st Qu.: 851.90   1st Qu.: 14.1983   1st Qu.:-0.6996   1st Qu.:3.900  
 Median :1550.91   Median : 25.8485   Median :-0.1837   Median :4.200  
 Mean   :1799.75   Mean   : 29.9959   Mean   : 0.0000   Mean   :4.219  
 3rd Qu.:2426.09   3rd Qu.: 40.4348   3rd Qu.: 0.4623   3rd Qu.:4.700  
 Max.   :8870.88   Max.   :147.8481   Max.   : 5.2188   Max.   :5.000  
 NA's   :5         NA's   :5          NA's   :5         NA's   :76     
       pc              uv       
 Min.   :1.500   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:3.333  
 Median :3.500   Median :3.667  
 Mean   :3.608   Mean   :3.719  
 3rd Qu.:4.000   3rd Qu.:4.167  
 Max.   :5.000   Max.   :5.000  
 NA's   :75      NA's   :75     
skim(data)
Data summary
Name data
Number of rows 603
Number of columns 30
_______________________
Column type frequency:
character 6
logical 1
numeric 23
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
course_id 0 1 12 13 0 26 0
subject 0 1 4 5 0 5 0
semester 0 1 4 4 0 3 0
section 0 1 2 2 0 4 0
Gradebook_Item 0 1 9 35 0 3 0
Gender 0 1 1 1 0 2 0

Variable type: logical

skim_variable n_missing complete_rate mean count
Grade_Category 603 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
student_id 0 1.00 86069.54 10548.60 43146.00 85612.50 88340.00 92730.50 97441.00 ▁▁▁▃▇
total_points_possible 0 1.00 4274.41 2312.74 840.00 2809.50 3583.00 5069.00 15552.00 ▇▅▂▁▁
total_points_earned 0 1.00 3244.69 1832.00 651.00 2050.50 2757.00 3875.00 12208.00 ▇▅▁▁▁
percentage_earned 0 1.00 0.76 0.09 0.34 0.70 0.78 0.83 0.91 ▁▁▃▇▇
FinalGradeCEMS 30 0.95 77.20 22.23 0.00 71.25 84.57 92.10 100.00 ▁▁▁▃▇
Points_Possible 0 1.00 76.87 167.51 5.00 10.00 10.00 30.00 935.00 ▇▁▁▁▁
Points_Earned 92 0.85 68.63 145.26 0.00 7.00 10.00 26.12 828.20 ▇▁▁▁▁
q1 123 0.80 4.30 0.68 1.00 4.00 4.00 5.00 5.00 ▁▁▂▇▇
q2 126 0.79 3.63 0.93 1.00 3.00 4.00 4.00 5.00 ▁▂▆▇▃
q3 123 0.80 3.33 0.91 1.00 3.00 3.00 4.00 5.00 ▁▃▇▅▂
q4 125 0.79 4.27 0.85 1.00 4.00 4.00 5.00 5.00 ▁▁▂▇▇
q5 127 0.79 4.19 0.68 2.00 4.00 4.00 5.00 5.00 ▁▂▁▇▅
q6 127 0.79 4.01 0.80 1.00 4.00 4.00 5.00 5.00 ▁▁▃▇▅
q7 129 0.79 3.91 0.82 1.00 3.00 4.00 4.75 5.00 ▁▁▅▇▅
q8 129 0.79 4.29 0.68 1.00 4.00 4.00 5.00 5.00 ▁▁▂▇▆
q9 129 0.79 3.49 0.98 1.00 3.00 4.00 4.00 5.00 ▁▃▇▇▃
q10 129 0.79 4.10 0.93 1.00 4.00 4.00 5.00 5.00 ▁▂▃▇▇
TimeSpent 5 0.99 1799.75 1354.93 0.45 851.90 1550.91 2426.09 8870.88 ▇▅▁▁▁
TimeSpent_hours 5 0.99 30.00 22.58 0.01 14.20 25.85 40.43 147.85 ▇▅▁▁▁
TimeSpent_std 5 0.99 0.00 1.00 -1.33 -0.70 -0.18 0.46 5.22 ▇▅▁▁▁
int 76 0.87 4.22 0.59 2.00 3.90 4.20 4.70 5.00 ▁▁▃▇▇
pc 75 0.88 3.61 0.64 1.50 3.00 3.50 4.00 5.00 ▁▁▇▅▂
uv 75 0.88 3.72 0.70 1.00 3.33 3.67 4.17 5.00 ▁▁▆▇▅
data_clean <- data |>
  filter(!is.na(FinalGradeCEMS))

nrow(data_clean)
[1] 573
sum(is.na(data_clean$FinalGradeCEMS))
[1] 0
head(data_clean)
data_clean <- data_clean |> clean_names()

glimpse(data_clean)
Rows: 573
Columns: 30
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 52326, 52446,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4325, 2086, 4655, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 2255, 1719, 3149, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject               <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "AnPh…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section               <chr> "02", "01", "01", "01", "01", "01", "01", "01", …
$ gradebook_item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ grade_category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ final_grade_cems      <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ points_possible       <dbl> 5, 10, 10, 5, 438, 10, 10, 443, 12, 10, 5, 10, 2…
$ points_earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, 10.00, 425.00, …
$ gender                <chr> "M", "F", "M", "M", "F", "M", "F", "F", "M", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 4, 3, 5, NA, 4, 4, N…
$ q2                    <dbl> 4, 4, 4, 5, 3, 5, 3, 3, NA, 5, 3, 3, NA, 2, 4, N…
$ q3                    <dbl> 4, 3, 4, 3, 3, 3, 3, 3, NA, 3, 3, 5, NA, 2, 3, N…
$ q4                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 5, 3, 5, NA, 4, 5, N…
$ q5                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 5, 4, 5, NA, 4, 4, N…
$ q6                    <dbl> 5, 4, 4, 5, 4, 5, 4, 3, NA, 5, 3, 5, NA, 4, 4, N…
$ q7                    <dbl> 5, 4, 4, 4, 4, 4, 3, 3, NA, 5, 3, 5, NA, 4, 5, N…
$ q8                    <dbl> 5, 5, 5, 5, 4, 5, 3, 4, NA, 4, 3, 5, NA, 4, 4, N…
$ q9                    <dbl> 4, 4, 3, 5, NA, 5, 3, 2, NA, 5, 2, 2, NA, 2, 4, …
$ q10                   <dbl> 5, 4, 5, 5, 3, 5, 3, 5, NA, 4, 4, 5, NA, 4, 4, N…
$ time_spent            <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ time_spent_hours      <dbl> 25.919445, 23.045002, 14.340558, 26.643610, 24.6…
$ time_spent_std        <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 5.0, 3.0, 4.2, NA, 4.4,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 3.50, 3.00, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…
names(data_clean)
 [1] "student_id"            "course_id"             "total_points_possible"
 [4] "total_points_earned"   "percentage_earned"     "subject"              
 [7] "semester"              "section"               "gradebook_item"       
[10] "grade_category"        "final_grade_cems"      "points_possible"      
[13] "points_earned"         "gender"                "q1"                   
[16] "q2"                    "q3"                    "q4"                   
[19] "q5"                    "q6"                    "q7"                   
[22] "q8"                    "q9"                    "q10"                  
[25] "time_spent"            "time_spent_hours"      "time_spent_std"       
[28] "int"                   "pc"                    "uv"                   
data_clean <- data_clean |>
  select(-grade_category)

glimpse(data_clean)
Rows: 573
Columns: 29
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 52326, 52446,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4325, 2086, 4655, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 2255, 1719, 3149, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject               <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "AnPh…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section               <chr> "02", "01", "01", "01", "01", "01", "01", "01", …
$ gradebook_item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ final_grade_cems      <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ points_possible       <dbl> 5, 10, 10, 5, 438, 10, 10, 443, 12, 10, 5, 10, 2…
$ points_earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, 10.00, 425.00, …
$ gender                <chr> "M", "F", "M", "M", "F", "M", "F", "F", "M", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 4, 3, 5, NA, 4, 4, N…
$ q2                    <dbl> 4, 4, 4, 5, 3, 5, 3, 3, NA, 5, 3, 3, NA, 2, 4, N…
$ q3                    <dbl> 4, 3, 4, 3, 3, 3, 3, 3, NA, 3, 3, 5, NA, 2, 3, N…
$ q4                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 5, 3, 5, NA, 4, 5, N…
$ q5                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 5, 4, 5, NA, 4, 4, N…
$ q6                    <dbl> 5, 4, 4, 5, 4, 5, 4, 3, NA, 5, 3, 5, NA, 4, 4, N…
$ q7                    <dbl> 5, 4, 4, 4, 4, 4, 3, 3, NA, 5, 3, 5, NA, 4, 5, N…
$ q8                    <dbl> 5, 5, 5, 5, 4, 5, 3, 4, NA, 4, 3, 5, NA, 4, 4, N…
$ q9                    <dbl> 4, 4, 3, 5, NA, 5, 3, 2, NA, 5, 2, 2, NA, 2, 4, …
$ q10                   <dbl> 5, 4, 5, 5, 3, 5, 3, 5, NA, 4, 4, 5, NA, 4, 4, N…
$ time_spent            <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ time_spent_hours      <dbl> 25.919445, 23.045002, 14.340558, 26.643610, 24.6…
$ time_spent_std        <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 5.0, 3.0, 4.2, NA, 4.4,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 3.50, 3.00, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…
selected_data <- data_clean |>
  select(student_id, subject, semester, time_spent_hours, time_spent_std)

head(selected_data)

Data Quality Issues

Missing Values: Checked for ‘NA’ entries | removed rows and columns with na by cleaning data Outliers: Looked for unusually high or low values Irrelevant data: cleaned data and selected data that was needed.

#Analysis

#First Visulization

#label: scatter plot 

ggplot(data_clean, aes(x = time_spent_hours, y = final_grade_cems)) +
  geom_point(color = "#378ADD", size = 2, alpha = 0.6) +
  geom_smooth(method = "lm", color = "#0F6E56", se = TRUE) +
  labs(
    title = "Time Spent vs. Final Grade",
    x = "Time Spent on LMS (hours)",
    y = "Final Grade"
  ) +
  theme_minimal()

The scatter plot shows the final grades of students in relation to time spent on LMS in hours.

#Second Visulization

#:label:create-data
set.seed(42)

data_lms <- data.frame(
  Student_ID = paste("Student", 1:40, sep = "_"),
  Week_1  = sample(6:20, 40, replace = TRUE),
  Week_2  = sample(6:20, 40, replace = TRUE),
  Week_3  = sample(6:20, 40, replace = TRUE),
  Week_4  = sample(6:20, 40, replace = TRUE),
  Week_5  = sample(6:20, 40, replace = TRUE),
  Week_6  = sample(6:20, 40, replace = TRUE),
  Week_7  = sample(6:20, 40, replace = TRUE),
  Week_8  = sample(6:20, 40, replace = TRUE),
  Week_9  = sample(6:20, 40, replace = TRUE),
  Week_10 = sample(6:20, 40, replace = TRUE),
  Week_11 = sample(6:20, 40, replace = TRUE),
  Week_12 = sample(6:20, 40, replace = TRUE),
  Week_13 = sample(6:20, 40, replace = TRUE),
  Week_14 = sample(6:20, 40, replace = TRUE),
  Week_15 = sample(6:20, 40, replace = TRUE),
  Week_16 = sample(6:20, 40, replace = TRUE)
)

head(data_lms)
week_cols    <- grep("^Week_", names(data_lms), value = TRUE)
average_time <- colMeans(data_lms[, week_cols])
average_time
 Week_1  Week_2  Week_3  Week_4  Week_5  Week_6  Week_7  Week_8  Week_9 Week_10 
 12.325  10.750  12.525  14.275  13.175  13.225  13.600  13.050  13.050  12.750 
Week_11 Week_12 Week_13 Week_14 Week_15 Week_16 
 13.475  12.125  12.900  13.700  13.750  13.650 
#label: prep-avg-table
average_time_table <- data.frame(
  Week               = factor(names(average_time), levels = names(average_time)),
  Average_Time_Spent = average_time
)

nrow(average_time_table)
[1] 16
head(average_time_table)
#label: lineplot-avg

ggplot(average_time_table, aes(x = Week, y = Average_Time_Spent, group = 1)) +
  geom_line(color = "#185FA5", linewidth = 1.2) +
  geom_point(color = "#185FA5", size = 3) +
  labs(
    title = "Trend of Average Time Spent per Week",
    x = "Week",
    y = "Average Hours"
  ) +
  theme_minimal() +
  theme(
    plot.title  = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

The line plot shows the average amount of time students spent on the LMS platform per week. According to the plot, the least amount of time was spent in week 2 and the most amount of time was spent in week 4.

#Findings Summary

According to the data, there is some correlation between time spent on the LMS platform and final grades. The data on the scatter plot shows that most students who spent a minimum of 20-50 hours on LMS received a higher final grades. The line plot data shows that student participation/time spent on week 2 went down significantly. It would be helpful to further analyze the data to figure out why student participation went down in week 2.