First import the CSV file. NOTE: It’s very important that the columns are named like this, with ‘.Absent.S5’, ‘.Level.S5’, and ‘.Mark.S5’ on the end of each column name, because that’s what the code will look for.
F5.Results <- read.csv('/Users/jhonan/Downloads/students.csv',sep = ',', header = TRUE)
head(F5.Results)
## StudentName Dummy Accounting.Absent.S5 Accounting.Level.S5 Accounting.Mark.S5
## 1 Student01 0 H 60
## 2 Student02 0 A H NA
## 3 Student03 0 O 99
## 4 Student04 0 O 27
## 5 Student05 0 O 78
## 6 Student06 0 H 45
## Business.Absent.S5 Business.Level.S5 Business.Mark.S5
## 1 H 80
## 2 A O NA
## 3 H 67
## 4 O 68
## 5 H 78
## 6 A O NA
Now we’re going to rearrange the data to make it easier to work with and also more future-proof. We already have subject names in the column headers so we’ll just use whatever values are in the column names and turn these into rows - This way if a new subject column gets added or a subject is removed or renamed in the future it’ll automatically get imported and included in the calculations.
We’re going to create three dataframes from the file we imported. StudentMark will contain the student’s marks, StudentLevel contains the level (O or H), and StudentAbsent contains the ‘A’ value. These are just temporary dataframes, we’re going to combine them back together in the next step.
# These functions are in the tidyverse package
# install.packages("tidyverse") if you don't already have it
library(stringr, warn.conflicts=F, quietly=T) # for the str_remove() function
library(dplyr, warn.conflicts=F, quietly=T) # for the select() and mutate() functions
library(tibble, warn.conflicts=F, quietly=T) # tibbles are like dataframes but easier
library(tidyr, warn.conflicts=F, quietly=T) # for the pivot_longer() function
# 'matches' picks out the columns with either 'StudentName' or 'Mark' in the name
# 'pivot_longer' turns the columns into rows for the columns selected above
# 'mutate' and 'str_remove' deletes the '.Mark.S5' part from the end of the subject name
StudentMark <- F5.Results %>%
select(matches('StudentName|Mark')) %>%
pivot_longer(!StudentName, names_to = "Subject", values_to = "Mark") %>%
mutate(Subject = str_remove(Subject, ".Mark.S5"))
# 'matches' picks out the columns with either 'StudentName' or 'Level' in the name
StudentLevel <- F5.Results %>%
select(matches('StudentName|Level')) %>%
pivot_longer(!StudentName, names_to = "Subject", values_to = "Level") %>%
mutate(Subject = str_remove(Subject, ".Level.S5"))
# 'matches' picks out the columns with either 'StudentName' or 'Absent' in the name
StudentAbsent <- F5.Results %>%
select(matches('StudentName|Absent')) %>%
pivot_longer(!StudentName, names_to = "Subject", values_to = "Absent") %>%
mutate(Subject = str_remove(Subject, ".Absent.S5"))
Combine the three dataframes back into one using a left join function. The StudentFinal table contains everything but instead of storing it as multiple columns, we have it as rows.
StudentFinal <-
left_join(StudentMark, StudentLevel,
by = c("StudentName" = "StudentName", "Subject" = "Subject")) %>%
left_join(., StudentAbsent,
by = c("StudentName" = "StudentName", "Subject" = "Subject"))
head(StudentFinal)
## # A tibble: 6 × 5
## StudentName Subject Mark Level Absent
## <chr> <chr> <int> <chr> <chr>
## 1 Student01 Accounting 60 H ""
## 2 Student01 Business 80 H ""
## 3 Student02 Accounting NA H "A"
## 4 Student02 Business NA O "A"
## 5 Student03 Accounting 99 O ""
## 6 Student03 Business 67 H ""
Now the final output. Note there are two max values for ‘Business’. group_by() and top_n() does the ‘find max Mark for each Subject’ part. select() outputs only the columns I want. arrange() orders the values by subject and level.
StudentFinal %>%
group_by(Subject) %>%
top_n(1, Mark) %>%
select(Subject, StudentName, Level, Mark) %>%
arrange(Subject, Level)
## # A tibble: 3 × 4
## # Groups: Subject [2]
## Subject StudentName Level Mark
## <chr> <chr> <chr> <int>
## 1 Accounting Student03 O 99
## 2 Business Student01 H 80
## 3 Business Student07 O 80