Report on Sankey

Author

Stuti and Tejaswini

Introduction

In modern academic curricula, electives play a pivotal role in personalizing a student’s education. By analyzing the path from foundational subjects to final-year electives, we gain valuable insights into student interests and academic flows. This report utilizes a synthetic dataset to trace the journey of students across three academic years using a Sankey diagram and a violin plot.

Objectives

  • Analyze student subject flow from Year 1 to Year 3.

  • Visualize transitions using a Sankey diagram.

  • Evaluate the distribution of Year 3 elective selections with a violin plot.

  • Interpret academic trends and derive insights for curriculum planning.

Dataset Description

Dataset Name: synthetic_grade_sheet.csv

Attributes:

  • Roll.Number: Unique student ID.

  • Subject_1 to Subject_10: Core subject names over three years.

  • Elective: Final year elective selected by the student.

Justification:

  • The dataset simulates academic records over three years.

  • Contains detailed progression across subjects.

  • The Elective column provides meaningful endpoints for visualization.

Methodology

Step 1: Load Required Libraries

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.2.1
✔ purrr     1.0.4     ✔ tidyr     1.3.1
✔ readr     2.1.5     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(networkD3)

Step 2: Load and Explore Data

data <- read.csv("synthetic_grade_sheet.csv")
head(data)
  Roll.Number Subject_1 Subject_2 Subject_3 Subject_4 Subject_5 Subject_6
1     21CS001         D         C         A         C         U         U
2     21CS002         D         B         A         A         S         C
3     21CS003         S         A         E         D         C         D
4     21CS004         A         D         E         C         S         A
5     21CS005         A         B         E         S         S         D
6     21CS006         U         S         D         B         U         D
  Subject_7 Subject_8 Subject_9 Subject_10       Elective
1         B         C         C          S Cyber Security
2         A         B         E          E   Data Science
3         U         A         E          U   Data Science
4         E         S         S          U   Data Science
5         A         S         E          C Cyber Security
6         S         U         B          A   Data Science
str(data)
'data.frame':   100 obs. of  12 variables:
 $ Roll.Number: chr  "21CS001" "21CS002" "21CS003" "21CS004" ...
 $ Subject_1  : chr  "D" "D" "S" "A" ...
 $ Subject_2  : chr  "C" "B" "A" "D" ...
 $ Subject_3  : chr  "A" "A" "E" "E" ...
 $ Subject_4  : chr  "C" "A" "D" "C" ...
 $ Subject_5  : chr  "U" "S" "C" "S" ...
 $ Subject_6  : chr  "U" "C" "D" "A" ...
 $ Subject_7  : chr  "B" "A" "U" "E" ...
 $ Subject_8  : chr  "C" "B" "A" "S" ...
 $ Subject_9  : chr  "C" "E" "E" "S" ...
 $ Subject_10 : chr  "S" "E" "U" "U" ...
 $ Elective   : chr  "Cyber Security" "Data Science" "Data Science" "Data Science" ...
summary(data)
 Roll.Number         Subject_1          Subject_2          Subject_3        
 Length:100         Length:100         Length:100         Length:100        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
  Subject_4          Subject_5          Subject_6          Subject_7        
 Length:100         Length:100         Length:100         Length:100        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
  Subject_8          Subject_9          Subject_10          Elective        
 Length:100         Length:100         Length:100         Length:100        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  

Step 3: Create Year-wise View

data_sankey <- data %>%
  select(Roll.Number, Year1 = Subject_1, Year2 = Subject_5, Year3 = Elective)

Step 5: Create Sankey Diagram

sankeyNetwork(
  Links = links,
  Nodes = nodes,
  Source = "source",
  Target = "target",
  Value = "value",
  NodeID = "name",
  fontSize = 14,
  nodeWidth = 30,
  colourScale = JS("d3.scaleOrdinal(d3.schemeCategory10)"),
  units = "students",
  sinksRight = FALSE
)

Violin Plot for Elective Choices

data_violin <- data %>%
  select(Year3 = Elective) %>%
  pivot_longer(cols = everything(), names_to = "Year", values_to = "Subject")

# Create Violin Plot

ggplot(data_violin, aes(x = Year, y = Subject, fill = Year)) +
  geom_violin(trim = FALSE, scale = "count", alpha = 0.8) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 1) +
  theme_minimal(base_size = 14) +
  labs(
    title = "Distribution of Elective Subject Choices",
    subtitle = "Violin plot showing frequency of Year 3 Elective Selections",
    x = "Academic Year",
    y = "Elective Subject",
    fill = "Year"
  ) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "none"
  )

Interpretation and Insights

  • Subject Retention: Several core subjects from Year 1 transition into Year 2, suggesting consistent academic paths.

  • Elective Popularity: Violin plot reveals electives with higher student interest, potentially due to better career outcomes or teaching.

  • Decision Bottlenecks: Sankey flows narrow around specific Year 2 subjects, indicating critical academic decision points.

Conclusion

This analysis effectively showcases student academic flows using clear visualizations. The insights can assist educators and administrators in optimizing subject offerings and improving student experience.