This project perform as exploratory data analysis on student performance data. The objective is to understand marks obtained by different students in different subjects and identify patterns using graphs.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
student_data <- data.frame(
Name = c("Aman", "Riya", "Karan", "Priya", "Rahul", "Sneha", "Vikash", "Puja"),
Math = c(78, 92, 67, 88, 73, 95, 60, 84),
English = c(80, 85, 72, 90, 75, 93, 68, 87),
Science = c(76, 89, 70, 91, 78, 96, 65, 85)
)
student_data$Total <- student_data$Math + student_data$English + student_data$Science
student_data$Percentage <- round(student_data$Total / 3, 2)
student_data
## Name Math English Science Total Percentage
## 1 Aman 78 80 76 234 78.00
## 2 Riya 92 85 89 266 88.67
## 3 Karan 67 72 70 209 69.67
## 4 Priya 88 90 91 269 89.67
## 5 Rahul 73 75 78 226 75.33
## 6 Sneha 95 93 96 284 94.67
## 7 Vikash 60 68 65 193 64.33
## 8 Puja 84 87 85 256 85.33
str(student_data)
## 'data.frame': 8 obs. of 6 variables:
## $ Name : chr "Aman" "Riya" "Karan" "Priya" ...
## $ Math : num 78 92 67 88 73 95 60 84
## $ English : num 80 85 72 90 75 93 68 87
## $ Science : num 76 89 70 91 78 96 65 85
## $ Total : num 234 266 209 269 226 284 193 256
## $ Percentage: num 78 88.7 69.7 89.7 75.3 ...
summary(student_data)
## Name Math English Science
## Length:8 Min. :60.00 Min. :68.00 Min. :65.00
## Class :character 1st Qu.:71.50 1st Qu.:74.25 1st Qu.:74.50
## Mode :character Median :81.00 Median :82.50 Median :81.50
## Mean :79.62 Mean :81.25 Mean :81.25
## 3rd Qu.:89.00 3rd Qu.:87.75 3rd Qu.:89.50
## Max. :95.00 Max. :93.00 Max. :96.00
## Total Percentage
## Min. :193.0 Min. :64.33
## 1st Qu.:221.8 1st Qu.:73.92
## Median :245.0 Median :81.67
## Mean :242.1 Mean :80.71
## 3rd Qu.:266.8 3rd Qu.:88.92
## Max. :284.0 Max. :94.67
ggplot(student_data, aes(x = Name, y = Percentage, fill = Name)) +
geom_bar(stat = "identity") +
ggtitle("Student Percentage") +
theme_minimal()
# Subject wise comparison
subject_data <- student_data %>%
pivot_longer(cols = c(Math, English, Science),
names_to = "Subject",
values_to = "Marks")
ggplot(subject_data, aes(x = Name, y = Marks, fill = Subject)) +
geom_bar(stat = "identity", position = "dodge") +
ggtitle("Subject Wise Marks comparison") +
theme_minimal()
# Average Marks In Each Subject
avg_marks <- data.frame(
subject = c("Math", "English", "Science"),
Average = c(
mean(student_data$Math),
mean(student_data$English),
mean(student_data$Science)
)
)
ggplot(avg_marks, aes(x = subject, y = Average, fill = subject)) +
geom_bar(stat = "identity") +
ggtitle("Average Marks In Each Subject") +
theme_minimal()
The analysis show that Sneha has the highest percentage among all students. English and science score are generaly higher than math. The graph help in comparing student performance in an easy and visual way.