students.knit

First import the CSV file. NOTE: It’s very important that the columns are named like this, with ‘.Absent.S5’, ‘.Level.S5’, and ‘.Mark.S5’ on the end of each column name, because that’s what the code will look for.

F5.Results <- read.csv('/Users/jhonan/Downloads/students.csv',sep = ',', header = TRUE)
head(F5.Results)

##   StudentName Dummy Accounting.Absent.S5 Accounting.Level.S5 Accounting.Mark.S5
## 1   Student01     0                                        H                 60
## 2   Student02     0                    A                   H                 NA
## 3   Student03     0                                        O                 99
## 4   Student04     0                                        O                 27
## 5   Student05     0                                        O                 78
## 6   Student06     0                                        H                 45
##   Business.Absent.S5 Business.Level.S5 Business.Mark.S5
## 1                                    H               80
## 2                  A                 O               NA
## 3                                    H               67
## 4                                    O               68
## 5                                    H               78
## 6                  A                 O               NA

Now we’re going to rearrange the data to make it easier to work with and also more future-proof. We already have subject names in the column headers so we’ll just use whatever values are in the column names and turn these into rows - This way if a new subject column gets added or a subject is removed or renamed in the future it’ll automatically get imported and included in the calculations.

We’re going to create three dataframes from the file we imported. StudentMark will contain the student’s marks, StudentLevel contains the level (O or H), and StudentAbsent contains the ‘A’ value. These are just temporary dataframes, we’re going to combine them back together in the next step.

# These functions are in the tidyverse package
# install.packages("tidyverse") if you don't already have it
library(stringr, warn.conflicts=F, quietly=T) # for the str_remove() function
library(dplyr, warn.conflicts=F, quietly=T) # for the select() and mutate() functions
library(tibble, warn.conflicts=F, quietly=T) # tibbles are like dataframes but easier
library(tidyr, warn.conflicts=F, quietly=T) # for the pivot_longer() function

# 'matches' picks out the columns with either 'StudentName' or 'Mark' in the name
# 'pivot_longer' turns the columns into rows for the columns selected above
# 'mutate' and 'str_remove' deletes the '.Mark.S5' part from the end of the subject name
StudentMark <- F5.Results %>%
  select(matches('StudentName|Mark')) %>%
  pivot_longer(!StudentName, names_to = "Subject", values_to = "Mark") %>% 
  mutate(Subject = str_remove(Subject, ".Mark.S5"))

# 'matches' picks out the columns with either 'StudentName' or 'Level' in the name
StudentLevel <- F5.Results %>%
  select(matches('StudentName|Level')) %>%
  pivot_longer(!StudentName, names_to = "Subject", values_to = "Level") %>%
  mutate(Subject = str_remove(Subject, ".Level.S5"))

# 'matches' picks out the columns with either 'StudentName' or 'Absent' in the name
StudentAbsent <- F5.Results %>%
  select(matches('StudentName|Absent')) %>%
  pivot_longer(!StudentName, names_to = "Subject", values_to = "Absent") %>%
  mutate(Subject = str_remove(Subject, ".Absent.S5"))

Combine the three dataframes back into one using a left join function. The StudentFinal table contains everything but instead of storing it as multiple columns, we have it as rows.

StudentFinal <-
  left_join(StudentMark, StudentLevel,
            by = c("StudentName" = "StudentName", "Subject" = "Subject")) %>%
  left_join(., StudentAbsent,
            by = c("StudentName" = "StudentName", "Subject" = "Subject"))

head(StudentFinal)

## # A tibble: 6 × 5
##   StudentName Subject     Mark Level Absent
##   <chr>       <chr>      <int> <chr> <chr> 
## 1 Student01   Accounting    60 H     ""    
## 2 Student01   Business      80 H     ""    
## 3 Student02   Accounting    NA H     "A"   
## 4 Student02   Business      NA O     "A"   
## 5 Student03   Accounting    99 O     ""    
## 6 Student03   Business      67 H     ""

Now the final output. Note there are two max values for ‘Business’. group_by() and top_n() does the ‘find max Mark for each Subject’ part. select() outputs only the columns I want. arrange() orders the values by subject and level.

StudentFinal %>%
  group_by(Subject) %>%
  top_n(1, Mark) %>%
  select(Subject, StudentName, Level, Mark) %>%
  arrange(Subject, Level)

## # A tibble: 3 × 4
## # Groups:   Subject [2]
##   Subject    StudentName Level  Mark
##   <chr>      <chr>       <chr> <int>
## 1 Accounting Student03   O        99
## 2 Business   Student01   H        80
## 3 Business   Student07   O        80