Posit Cloud Analysis

Learning Analytics - Capstone Project

Author

Meredith Fain

Published

June 22, 2026


Introduction

This analysis examines English language proficiency levels (listening, speaking, reading, and writing) across grade levels K (denoted as 0) through 4 in a K–4 ESL program. The educational problem being explored is uneven language development across domains, where students may demonstrate stronger receptive skills (listening/reading) than productive skills (speaking/writing), or vice versa. Understanding these patterns can help ESL teachers target instruction more effectively and provide appropriate scaffolding for different proficiency needs.



Data Overview & Text

Variables

Within the table below, the following data is found:

  • column 1 is assigned de-identified codes
  • column 2 is the grade level of the student
  • column 3 is their listening score from the ELPA21 test
  • column 4 is their speaking score from the ELPA21 test
  • column 5 is their reading score from the ELPA21 test
  • column 6 is their writing score from the ELPA21 test
CapstoneData <- read_csv("CapstoneSheet.csv")

CapstoneData
summary(CapstoneData)
   Student.ID     Grade.Level    Listening.Level Speaking.Level 
 Min.   :123.0   Min.   :0.000   Min.   :1.00    Min.   :1.000  
 1st Qu.:134.2   1st Qu.:1.000   1st Qu.:3.00    1st Qu.:2.000  
 Median :145.5   Median :2.000   Median :4.00    Median :3.000  
 Mean   :145.5   Mean   :1.826   Mean   :3.63    Mean   :3.261  
 3rd Qu.:156.8   3rd Qu.:3.000   3rd Qu.:4.00    3rd Qu.:5.000  
 Max.   :168.0   Max.   :4.000   Max.   :5.00    Max.   :5.000  
 Reading.Level   Writing.Level  
 Min.   :1.000   Min.   :1.000  
 1st Qu.:1.250   1st Qu.:2.000  
 Median :3.000   Median :3.000  
 Mean   :2.696   Mean   :2.565  
 3rd Qu.:4.000   3rd Qu.:3.000  
 Max.   :5.000   Max.   :5.000  
#| label: data quality assessment
colSums(is.na(CapstoneData))
     Student.ID     Grade.Level Listening.Level  Speaking.Level   Reading.Level 
              0               0               0               0               0 
  Writing.Level 
              0 
sum(duplicated(CapstoneData))
[1] 0
sapply(CapstoneData[, c("Listening.Level",
                    "Speaking.Level",
                    "Reading.Level",
                    "Writing.Level")],
       range)
     Listening.Level Speaking.Level Reading.Level Writing.Level
[1,]               1              1             1             1
[2,]               5              5             5             5
length(unique(CapstoneData$Student.ID)) == nrow(CapstoneData)
[1] TRUE

Data Quality Issues and Cleaning

After examining the dataset, no major data quality issues were identified:

  • No missing values were present.
  • No duplicate records were found.
  • All proficiency scores were within the expected range of 1–5.
  • Student IDs were unique for every observation.

Because the dataset was already clean, no data transformations or corrections were required before analysis.



Analysis

Heat Map

Heat Map Code/Set Up

library(dplyr)

heat_data <- CapstoneData %>%
  group_by(Grade.Level) %>%
  summarize(
    Listening = mean(Listening.Level),
    Speaking = mean(Speaking.Level),
    Reading = mean(Reading.Level),
    Writing = mean(Writing.Level)
  )

heat_data
library(tidyr)

heat_long <- pivot_longer(
  heat_data,
  cols = c(Listening, Speaking, Reading, Writing),
  names_to = "Skill",
  values_to = "AverageScore"
)

heat_long
library(ggplot2)

ggplot(heat_long,
       aes(x = Skill,
           y = factor(Grade.Level),
           fill = AverageScore)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(AverageScore, 2)),
            color = "black",
            size = 4) +
  scale_fill_gradient(
    low = "#ED9858",
    high = "#9EED9D"
  ) +
  labs(
    title = "Average Language Proficiency by Grade Level",
    x = "Language Skill",
    y = "Grade Level",
    fill = "Average Score"
  ) +
  theme_minimal()

Heat Map Analysis

The heat map displays the average listening, speaking, reading, and writing proficiency scores for students across grade levels 0 through 4. The grade levels are coded as follows:

  • 0 = Kindergarten
  • 1 = First Grade
  • 2 = Second Grade
  • 3 = Third Grade
  • 4 = Fourth Grade

Darker colors represent higher average scores. Grade 1 stands out as the strongest overall performer in the dataset, especially in listening and reading. It has the highest average listening score (4.42), with Grade 3 the next highest at 4.00. In reading, Grade 1 also leads with an average of 3.75, while Grades 2, 3, and 4 are all much lower at about 2.38, showing a clear gap in performance.

A clear pattern appears when comparing receptive and productive skills. Listening and reading (receptive skills) are generally stronger than speaking and writing (productive skills). This is especially noticeable in Grade 1, where listening (4.42) and reading (3.75) are noticeably higher than speaking (3.42) and writing (3.17). The same trend appears across other grades, with listening usually the highest and writing often the lowest.

Speaking in Grade 1 (3.42) is moderate compared to Grade 3 (3.88) and Grade 4 (3.75), so it is not the strongest in that area, but still fairly strong. Writing in Grade 1 (3.17) is still higher than Grades 2–4, which all average around 2.50.

Kindergarten (Grade 0) consistently has the lowest scores across all skills, particularly in reading (2.20) and writing (2.00), showing a clear gap between it and the higher grades. Overall, instead of a steady increase across grade levels, the pattern is uneven. Grade 1 appears to be the peak performing group in several skills, especially listening and reading, where it clearly outperforms the higher grades.

Box Plot

Box Plot Code/Set Up

library(ggplot2)
library(tidyr)

Capstone_long <- CapstoneData %>%
  pivot_longer(
    cols = c(Listening.Level, Speaking.Level, Reading.Level, Writing.Level),
    names_to = "Skill",
    values_to = "Score"
  )

Capstone_long

Listening Box Plot

boxplot(Listening.Level ~ Grade.Level,
        data = CapstoneData,
        main = "Listening Scores by Grade Level",
        xlab = "Grade Level",
        ylab = "Listening Score",
        col = c("#db7d93"))

Speaking Box Plot

boxplot(Speaking.Level ~ Grade.Level,
        data = CapstoneData,
        main = "Speaking Scores by Grade Level",
        xlab = "Grade Level",
        ylab = "Speaking Score",
        col = c("#f0db90"))

Reading Box Plot

boxplot(Reading.Level ~ Grade.Level,
        data = CapstoneData,
        main = "Reading Scores by Grade Level",
        xlab = "Grade Level",
        ylab = "Reading Score",
        col = c("#8bd180"))

Writing Box Plot

boxplot(Writing.Level ~ Grade.Level,
        data = CapstoneData,
        main = "Writing Scores by Grade Level",
        xlab = "Grade Level",
        ylab = "Writing Score",
        col = c("#7ca2cf"))

Box Plot Analysis

The box plots show the distribution of writing, speaking, listening, and reading scores across grade levels 0 through 4. Overall, there are clear differences in both the central tendency and spread of scores across grades and skills.

For listening, Grade 1 shows a notably high median score compared to other grades, with generally higher and more consistent performance. Grades 3 and 4 also show relatively strong listening scores, but with slightly more variability. Grade 0 (Kindergarten) has the lowest median listening score, indicating weaker receptive language skills at the lowest grade level.

For speaking, scores are more spread out across all grade levels. Grade 3 shows a relatively high median speaking score, slightly above Grade 1 and Grade 4. However, the variability within grades suggests less consistency in speaking performance compared to listening and reading.

For reading, Grade 1 again stands out with a higher median and a tighter distribution, showing more consistent performance. In contrast, Grades 2, 3, and 4 show lower medians and very similar distributions, suggesting that reading proficiency does not steadily increase with grade level in this dataset.

For writing, all grade levels show relatively lower medians compared to the other skills, with Grade 1 performing slightly better than Grades 2–4. The spread of scores is fairly consistent across grades, indicating that writing is a weaker and more uniform skill area overall.

Across all four box plots, a consistent pattern emerges: receptive skills (listening and reading) tend to have higher medians and stronger performance in Grade 1, while productive skills (speaking and writing) show lower scores and more variability across grades. Additionally, the expected upward progression in proficiency with higher grade levels is not consistently observed, suggesting uneven skill development across grades in this sample.

Bar plot

Bar Plot Code/Set Up

# the set up is partially done in the box plot section above

bar_data <- Capstone_long %>%
  group_by(Grade.Level, Skill) %>%
  mutate(Skill = factor(Skill,
                        levels = c("Listening.Level", "Speaking.Level",
                                   "Reading.Level", "Writing.Level"))) %>%
  summarize(MeanScore = mean(Score), .groups = "drop")

Bar Plot

ggplot(bar_data,
       aes(x = factor(Grade.Level),
           y = MeanScore,
           fill = Skill)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c(
    "Listening.Level" = "#db7d93",
    "Speaking.Level" = "#f0db90",
    "Reading.Level" = "#8bd180",
    "Writing.Level" = "#7ca2cf"
  )) +
  labs(
    title = "Average Language Proficiency by Grade Level",
    x = "Grade Level",
    y = "Average Score"
  ) +
  theme_minimal()

Bar Plot Analysis

The bar plot shows the average listening, speaking, reading, and writing scores across grade levels 0 through 4, making it easier to compare performance across both grades and skills.

Overall, Grade 1 stands out again as one of the strongest groups, especially in listening and reading. Listening is highest for Grade 1 at 4.42, with Grade 3 close behind at 4.00. Reading shows a similar pattern, where Grade 1 (3.75) is clearly higher than Grades 2–4, which are all around 2.38.

A clear difference appears between receptive and productive skills. Listening and reading are generally higher than speaking and writing across all grades. Writing is the lowest overall, with most grades sitting around 2.50, although Grade 1 is slightly higher at 3.17. Speaking varies more, with Grades 3 (3.88) and 4 (3.75) slightly higher than Grade 1 (3.42).

Kindergarten (Grade 0) consistently has the lowest scores across all skills, especially in writing and reading, showing a clear gap compared to the higher grades. Overall, the bar plot supports the earlier findings: Grade 1 performs particularly well in receptive skills, while writing remains the weakest area across the dataset.



Findings Summary

The analysis of the ELPA21 dataset shows differences in language proficiency across grade levels. Grade 1 stands out as the strongest overall group, particularly in listening and reading, where it outperforms most other grades. Across all grades, receptive skills (listening and reading) are generally stronger than productive skills (speaking and writing), while Kindergarten (Grade 0) consistently shows the lowest scores across all domains. Overall, the results suggest variation in performance by grade level rather than a steady developmental progression, with listening as the strongest skill and writing as the weakest. These patterns may also be influenced by the distribution of newcomer students across grades and differences in familiarity with online testing environments, particularly for younger students who may have lower digital literacy.