Introduction

This project perform as exploratory data analysis on student performance data. The objective is to understand marks obtained by different students in different subjects and identify patterns using graphs.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Create dataset

student_data <- data.frame(
  Name = c("Aman", "Riya", "Karan", "Priya", "Rahul", "Sneha", "Vikash", "Puja"),
  Math = c(78, 92, 67, 88, 73, 95, 60, 84),
  English = c(80, 85, 72, 90, 75, 93, 68, 87),
  Science = c(76, 89, 70, 91, 78, 96, 65, 85)
)
student_data$Total <- student_data$Math + student_data$English + student_data$Science
student_data$Percentage <- round(student_data$Total / 3, 2)
student_data
##     Name Math English Science Total Percentage
## 1   Aman   78      80      76   234      78.00
## 2   Riya   92      85      89   266      88.67
## 3  Karan   67      72      70   209      69.67
## 4  Priya   88      90      91   269      89.67
## 5  Rahul   73      75      78   226      75.33
## 6  Sneha   95      93      96   284      94.67
## 7 Vikash   60      68      65   193      64.33
## 8   Puja   84      87      85   256      85.33

Data structure

str(student_data)
## 'data.frame':    8 obs. of  6 variables:
##  $ Name      : chr  "Aman" "Riya" "Karan" "Priya" ...
##  $ Math      : num  78 92 67 88 73 95 60 84
##  $ English   : num  80 85 72 90 75 93 68 87
##  $ Science   : num  76 89 70 91 78 96 65 85
##  $ Total     : num  234 266 209 269 226 284 193 256
##  $ Percentage: num  78 88.7 69.7 89.7 75.3 ...
summary(student_data)
##      Name                Math          English         Science     
##  Length:8           Min.   :60.00   Min.   :68.00   Min.   :65.00  
##  Class :character   1st Qu.:71.50   1st Qu.:74.25   1st Qu.:74.50  
##  Mode  :character   Median :81.00   Median :82.50   Median :81.50  
##                     Mean   :79.62   Mean   :81.25   Mean   :81.25  
##                     3rd Qu.:89.00   3rd Qu.:87.75   3rd Qu.:89.50  
##                     Max.   :95.00   Max.   :93.00   Max.   :96.00  
##      Total         Percentage   
##  Min.   :193.0   Min.   :64.33  
##  1st Qu.:221.8   1st Qu.:73.92  
##  Median :245.0   Median :81.67  
##  Mean   :242.1   Mean   :80.71  
##  3rd Qu.:266.8   3rd Qu.:88.92  
##  Max.   :284.0   Max.   :94.67

Student Percentage

ggplot(student_data, aes(x = Name, y = Percentage, fill = Name)) + 
  geom_bar(stat = "identity") + 
  ggtitle("Student Percentage") + 
  theme_minimal()

# Subject wise comparison

subject_data <- student_data %>%
  pivot_longer(cols = c(Math, English, Science),
              names_to = "Subject",
              values_to = "Marks")
ggplot(subject_data, aes(x = Name, y = Marks, fill = Subject)) + 
  geom_bar(stat = "identity", position = "dodge") + 
  ggtitle("Subject Wise Marks comparison") + 
  theme_minimal()

# Average Marks In Each Subject

avg_marks <- data.frame(
  subject = c("Math", "English", "Science"),
  Average = c(
    mean(student_data$Math),
    mean(student_data$English),
    mean(student_data$Science)
    )
)
ggplot(avg_marks, aes(x = subject, y = Average, fill = subject)) + 
  geom_bar(stat = "identity") + 
  ggtitle("Average Marks In Each Subject") + 
  theme_minimal()

Cnclusion

The analysis show that Sneha has the highest percentage among all students. English and science score are generaly higher than math. The graph help in comparing student performance in an easy and visual way.