Student Name Analysis

Student Data Analysis

This report analyzes the distribution of student names in our Data Science class based on their first letter.

Step 1: Creating the student data frame

Create a data frame containing the student names, last names and mail IDs:

students_data <- data.frame(
  full_name = c(
    "Gundapuneni, Srividya",
    "Leleina, Boaz",
    "Liu, Zehua",
    "Machiraju, Phaneendra sarath kumar",
    "Momanyi, Pauline nyaboke",
    "Mondi, Bhargav",
    "Muchiri, Dennis mwangi",
    "Muchue, Victoria Wanjiru",
    "Nalla, Lohith reddy",
    "Ndisya, Richard mulu",
    "Njoroge, John kiragu",
    "Oluko, Eddy kingstone",
    "Patel, Falgun yogeshkumar",
    "Putheti, Vamsi krishna",
    "Qiao, Feifan",
    "Sagwa, Susan",
    "Vemula, Harshith sai",
    "Vemula, Sai deepthi",
    "Wanjira, Janet gathoni"
  )
)

extract_names <- function(full_name) {
  parts <- strsplit(full_name, ", ")[[1]]
  last_name <- parts[1]
  first_name <- strsplit(parts[2], " ")[[1]][1]
  return(c(first_name, last_name))
}
names_list <- t(sapply(students_data$full_name, extract_names))
students_data$first_name <- names_list[, 1]
students_data$last_name <- names_list[, 2]
students_data$mail_id <- paste0(
  tolower(substr(students_data$first_name, 1, 1)),
  tolower(students_data$last_name),
  "@student.edu"
)
students_df <- students_data[, c("first_name", "last_name", "mail_id")]
knitr::kable(students_df, caption = "Student Information")

Student Information
first_name	last_name	mail_id
Srividya	Gundapuneni	sgundapuneni@student.edu
Boaz	Leleina	bleleina@student.edu
Zehua	Liu	zliu@student.edu
Phaneendra	Machiraju	pmachiraju@student.edu
Pauline	Momanyi	pmomanyi@student.edu
Bhargav	Mondi	bmondi@student.edu
Dennis	Muchiri	dmuchiri@student.edu
Victoria	Muchue	vmuchue@student.edu
Lohith	Nalla	lnalla@student.edu
Richard	Ndisya	rndisya@student.edu
John	Njoroge	jnjoroge@student.edu
Eddy	Oluko	eoluko@student.edu
Falgun	Patel	fpatel@student.edu
Vamsi	Putheti	vputheti@student.edu
Feifan	Qiao	fqiao@student.edu
Susan	Sagwa	ssagwa@student.edu
Harshith	Vemula	hvemula@student.edu
Sai	Vemula	svemula@student.edu
Janet	Wanjira	jwanjira@student.edu

Step 2: Creating a Histogram of First Letters

Analyzed how many student names start with each letter:

first_letters <- toupper(substr(students_data$first_name, 1, 1))
all_letters <- data.frame(
  letter = LETTERS,
  count = 0
)

letter_counts <- table(first_letters)
for (i in 1:length(letter_counts)) {
  letter <- names(letter_counts)[i]
  all_letters$count[all_letters$letter == letter] <- letter_counts[letter]
}

knitr::kable(all_letters, caption = "Distribution of first letters in student names")

Distribution of first letters in student names
letter	count
A	0
B	2
C	0
D	1
E	1
F	2
G	0
H	1
I	0
J	2
K	0
L	1
M	0
N	0
O	0
P	2
Q	0
R	1
S	3
T	0
U	0
V	2
W	0
X	0
Y	0
Z	1

Step 3: Visualizing the Distribution

Colorful barplot showing the count of student names by their first letter:

letter_plot <- ggplot(all_letters, aes(x = letter, y = count, fill = letter)) +
  geom_bar(stat = "identity", width = 0.7) +
  geom_text(aes(label = ifelse(count > 0, count, "")), vjust = -0.5, size = 3.5) +
  scale_fill_viridis_d() +  # Colorful palette
  labs(
    title = "Count of Student Names by First Letter",
    x = "First Letter of Name",
    y = "Number of Students"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text = element_text(face = "bold"),
    legend.position = "none"
  ) +
  scale_y_continuous(breaks = seq(0, max(all_letters$count), by = 1))
print(letter_plot)

ggsave("student_name_distribution.png", letter_plot, width = 10, height = 5)

Conclusion

The analysis shows the distribution of first letters among student names in our class. This simple analysis helps us understand the diversity of names in our classroom. As we can see, some letters appear much more frequently than others as the first letter of student names, while many letters don’t appear at all.

In particular, the letters P and S are the most common starting letters with 3 students each.