5_Capstone_Analysis_LauraBaker

Learning Analytics — Week 5 (Required)

Author

Laura Baker

Published

June 24, 2026

Introduction:

I will be using some data I created to show the completion rate and final scores of employees compliance training modules. This data is important not only for instructional designers and HR employees, but also for supervisors because compliance training is required for all employees. Currently, our training compliance rates are falling short of target benchmarks. Out of 64 required participants, only two employees have met the dual criteria of an 80% completion rate and an 80% final score. In my week 4 post, I talked about the possibility of making compliance training marginally more interesting by potentially creating a mystery that employees would complete that would also satisfy the compliance training requirements. I want to attempt to make this interesting without making it too childlike.

# Load dataset
compliance_data <- read.csv("Data for Compliance Trainings- L Baker - Sheet.csv")

# Quick preview
head(compliance_data)

glimpse(compliance_data)

Rows: 64
Columns: 6
$ Employee        <chr> "Employee1", "Employee2", "Employee3", "Employee4", "E…
$ Year            <int> 2026, 2026, 2026, 2026, 2026, 2026, 2026, 2026, 2026, …
$ Completion.Rate <dbl> 0.7, 0.5, 0.6, 0.4, 0.0, 0.0, 0.0, 0.3, 0.0, 0.5, 0.8,…
$ Final.Score     <dbl> 0.7, 0.4, 0.4, 0.2, 0.0, 0.0, 0.0, 0.3, 0.1, 0.4, 0.6,…
$ CR...FS         <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
$ Notes           <chr> "Checkpoints=CR", "Final Score = ?s Correct", "", "", …

Describe each variable:

I chose to use the following variables: Employee Number (to keep it anonymous and deidentified), the year (to make sure I have the correct data and year), completion rate (to see which employees have completed at least 80% of the compliance training), final score (to see which employees have received a final score of at least 80%), CR + FS (to see which employees have completed the training to satisfaction and which ones have not), and notes (to help me remember different things about my data).

# Clean and prep the data
cleaned_data <- compliance_data %>%
  mutate(
    Completion.Rate = ifelse(Completion.Rate == 10, 1.0, Completion.Rate),
    Final.Score = as.numeric(Final.Score),
    Passed_Both = ifelse(Completion.Rate >= 0.8 & Final.Score >= 0.8, "Yes", "No")
  )

# Check results
glimpse(cleaned_data)

Rows: 64
Columns: 7
$ Employee        <chr> "Employee1", "Employee2", "Employee3", "Employee4", "E…
$ Year            <int> 2026, 2026, 2026, 2026, 2026, 2026, 2026, 2026, 2026, …
$ Completion.Rate <dbl> 0.7, 0.5, 0.6, 0.4, 0.0, 0.0, 0.0, 0.3, 0.0, 0.5, 0.8,…
$ Final.Score     <dbl> 0.7, 0.4, 0.4, 0.2, 0.0, 0.0, 0.0, 0.3, 0.1, 0.4, 0.6,…
$ CR...FS         <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
$ Notes           <chr> "Checkpoints=CR", "Final Score = ?s Correct", "", "", …
$ Passed_Both     <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …

summary(cleaned_data)

   Employee              Year      Completion.Rate  Final.Score    
 Length:64          Min.   :2026   Min.   :0.000   Min.   :0.0000  
 Class :character   1st Qu.:2026   1st Qu.:0.000   1st Qu.:0.0000  
 Mode  :character   Median :2026   Median :0.300   Median :0.2000  
                    Mean   :2026   Mean   :0.325   Mean   :0.2578  
                    3rd Qu.:2026   3rd Qu.:0.525   3rd Qu.:0.4000  
                    Max.   :2026   Max.   :1.000   Max.   :0.8000  
   CR...FS             Notes           Passed_Both       
 Length:64          Length:64          Length:64         
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character

Data overview:

To see why our compliance training numbers are so low, I loaded our dataset of 64 employees into R to check how everyone is performing. The data tracks basic details like the employee’s ID and the training year, alongside their actual module completion rates and final quiz scores. When looking at the initial structure, I noticed the quiz scores were trapped as clunky text fractions like “7/10,” so I made sure they were converted into clean decimals that R can actually calculate. I also spotted a weird typo where one employee had a completion rate of “10” instead of 100%, which I quickly fixed to “1.0” so it wouldn’t warp our final analysis. Cleaning up these minor data hiccups gives us a perfectly accurate look at exactly where our learners are getting stuck. # Analytics type:

I think my data is descriptive analytics because the data is focused entirely on the question, ‘What happened?’ My current dataset simply gives a summary of historical facts—specifically, the exact completion rates and final quiz scores for our 64 employees—so I can see where the training program currently stands.

# Create summarized data for bar chart
chart1_data <- cleaned_data %>%
  count(Passed_Both)

# Simple bar plot
ggplot(chart1_data, aes(x = Passed_Both, y = n, fill = Passed_Both)) +
  geom_bar(stat = "identity", color = "black", width = 0.4) +
  scale_fill_manual(values = c("No" = "tomato", "Yes" = "springgreen3")) +
  labs(
    title = "How Many Employees Met the 80/80 Target?",
    x = "Met Completion and Score Benchmark",
    y = "Number of Employees"
  ) +
  theme_classic() +
  theme(legend.position = "none")

Interpretation of ggplot:

I think this plot makes the problem very clear. We have all but a couple of people who have not met the 80/80 target.

# Build heatmap data
heatmap_data <- cleaned_data %>%
  mutate(
    Comp_Bucket = cut(Completion.Rate, breaks = seq(0, 1, by = 0.2), include.lowest = TRUE),
    Score_Bucket = cut(Final.Score, breaks = seq(0, 1, by = 0.2), include.lowest = TRUE)
  ) %>%
  count(Comp_Bucket, Score_Bucket)

# Plot heatmap
ggplot(heatmap_data, aes(x = Comp_Bucket, y = Score_Bucket, fill = n)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "#f7fbff", high = "#084594") +
  labs(
    title = "Employee Training Heatmap",
    x = "Completion Rate",
    y = "Final Score",
    fill = "Count"
  ) +
  theme_minimal()

Interpretation of heatmap:

I had to use AI to help me figure out how to make this heatmap, but once I did, I loved the map. It shows just how many people are struggling in that bottom left square, while barely anyone is where they should be for the 80/80 target.

Final summary:

Looking at the final data, it is obvious that my current compliance training is hitting a massive roadblock, with a 97% of our employees missing the mark. When you actually break down the numbers, the main issue is that people are completely losing momentum; over half of our workforce dropped out before even hitting the halfway point of the modules. We also have a handful of employees who diligently powered through every single slide but still couldn’t pass the final test, which tells me the content isn’t actually preparing them for the questions. Overall, the data shows that the training is either too boring to finish or just too confusing to pass, so, I definitely need to rethink how I’m delivering this.