This report analyzes the distribution of student names in our Data Science class based on their first letter.
Create a data frame containing the student names, last names and mail IDs:
students_data <- data.frame(
full_name = c(
"Gundapuneni, Srividya",
"Leleina, Boaz",
"Liu, Zehua",
"Machiraju, Phaneendra sarath kumar",
"Momanyi, Pauline nyaboke",
"Mondi, Bhargav",
"Muchiri, Dennis mwangi",
"Muchue, Victoria Wanjiru",
"Nalla, Lohith reddy",
"Ndisya, Richard mulu",
"Njoroge, John kiragu",
"Oluko, Eddy kingstone",
"Patel, Falgun yogeshkumar",
"Putheti, Vamsi krishna",
"Qiao, Feifan",
"Sagwa, Susan",
"Vemula, Harshith sai",
"Vemula, Sai deepthi",
"Wanjira, Janet gathoni"
)
)
extract_names <- function(full_name) {
parts <- strsplit(full_name, ", ")[[1]]
last_name <- parts[1]
first_name <- strsplit(parts[2], " ")[[1]][1]
return(c(first_name, last_name))
}
names_list <- t(sapply(students_data$full_name, extract_names))
students_data$first_name <- names_list[, 1]
students_data$last_name <- names_list[, 2]
students_data$mail_id <- paste0(
tolower(substr(students_data$first_name, 1, 1)),
tolower(students_data$last_name),
"@student.edu"
)
students_df <- students_data[, c("first_name", "last_name", "mail_id")]
knitr::kable(students_df, caption = "Student Information")
| first_name | last_name | mail_id |
|---|---|---|
| Srividya | Gundapuneni | sgundapuneni@student.edu |
| Boaz | Leleina | bleleina@student.edu |
| Zehua | Liu | zliu@student.edu |
| Phaneendra | Machiraju | pmachiraju@student.edu |
| Pauline | Momanyi | pmomanyi@student.edu |
| Bhargav | Mondi | bmondi@student.edu |
| Dennis | Muchiri | dmuchiri@student.edu |
| Victoria | Muchue | vmuchue@student.edu |
| Lohith | Nalla | lnalla@student.edu |
| Richard | Ndisya | rndisya@student.edu |
| John | Njoroge | jnjoroge@student.edu |
| Eddy | Oluko | eoluko@student.edu |
| Falgun | Patel | fpatel@student.edu |
| Vamsi | Putheti | vputheti@student.edu |
| Feifan | Qiao | fqiao@student.edu |
| Susan | Sagwa | ssagwa@student.edu |
| Harshith | Vemula | hvemula@student.edu |
| Sai | Vemula | svemula@student.edu |
| Janet | Wanjira | jwanjira@student.edu |
Analyzed how many student names start with each letter:
first_letters <- toupper(substr(students_data$first_name, 1, 1))
all_letters <- data.frame(
letter = LETTERS,
count = 0
)
letter_counts <- table(first_letters)
for (i in 1:length(letter_counts)) {
letter <- names(letter_counts)[i]
all_letters$count[all_letters$letter == letter] <- letter_counts[letter]
}
knitr::kable(all_letters, caption = "Distribution of first letters in student names")
| letter | count |
|---|---|
| A | 0 |
| B | 2 |
| C | 0 |
| D | 1 |
| E | 1 |
| F | 2 |
| G | 0 |
| H | 1 |
| I | 0 |
| J | 2 |
| K | 0 |
| L | 1 |
| M | 0 |
| N | 0 |
| O | 0 |
| P | 2 |
| Q | 0 |
| R | 1 |
| S | 3 |
| T | 0 |
| U | 0 |
| V | 2 |
| W | 0 |
| X | 0 |
| Y | 0 |
| Z | 1 |
Colorful barplot showing the count of student names by their first letter:
letter_plot <- ggplot(all_letters, aes(x = letter, y = count, fill = letter)) +
geom_bar(stat = "identity", width = 0.7) +
geom_text(aes(label = ifelse(count > 0, count, "")), vjust = -0.5, size = 3.5) +
scale_fill_viridis_d() + # Colorful palette
labs(
title = "Count of Student Names by First Letter",
x = "First Letter of Name",
y = "Number of Students"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text = element_text(face = "bold"),
legend.position = "none"
) +
scale_y_continuous(breaks = seq(0, max(all_letters$count), by = 1))
print(letter_plot)
ggsave("student_name_distribution.png", letter_plot, width = 10, height = 5)
The analysis shows the distribution of first letters among student names in our class. This simple analysis helps us understand the diversity of names in our classroom. As we can see, some letters appear much more frequently than others as the first letter of student names, while many letters don’t appear at all.
In particular, the letters P and S are the most common starting letters with 3 students each.