One part of programming involves writing functions. We have already seen examples of functions. Those functions primarily relies on existing functions. However, most useful functions typically use control structures.
Control structures are expressions used to control the execution and flow of the program based on the conditions provided in the statements. These structures are used to make a decision after assessing the variable.
Control structures allow you to put some “logic” into your R code. These are same for other languages such as Python, C etc.
If-else These are called conditional execution as the execution depends on some conditions.
The above code does nothing if the condition is false. If you want an additional part you need to add else in the second part.
If you would like to add more conditions this could be done by using else if
Just as a brief recap:
<, >, <=, >=, ==, !=.!, &, |, &&, ||, xor().%in%, ! ... %in% (character operations).calculate_uk_income_tax <- function(income) {
tax = 0
if (income <= 12500) {
tax = 0
}
if (income > 12500 && income <= 50000) {
tax = 0.2 * (income - 12500)
}
if (income > 50000 && income <= 150000) {
tax = 0.2 * (50000 - 12500) + 0.4 * (income - 50000)
}
if (income > 150000) {
tax= 0.2 * (50000 - 12500) + 0.4 * (150000 - 50000) + 0.45 * (income - 150000)
}
return(tax)
}or
calculate_uk_income_tax2 <- function(income) {
tax = 0
if (income <= 12500) {
tax = 0
} else if (income <= 50000) {
tax = 0.2 * (income - 12500)
} else if (income <= 150000) {
tax = 0.2 * (50000 - 12500) + 0.4 * (income - 50000)
} else {
tax = 0.2 * (50000 - 12500) + 0.4 * (150000 - 50000) + 0.45 * (income - 150000)
}
return(tax)
}If your code is nested or too long, you may want to put sanity checks in your function:
stop(): will show an error warning and immediately stop execution.warning(): issues a warning, but the program will still be executed.ifelse() also allows a control flows. ifelse(test, yes, no) is general syntax for it. For example
For loop
In addition to conditional execution, repetitive executions are very common. These are called loops. for is the simplest form of loops.
A typical use is to loop over an integer sequence \(i=\{1,2,3,...,n\}\).
This is iteration[1] 1
This is iteration[1] 2
This is iteration[1] 3
This is iteration[1] 4
This is iteration[1] 5
This is iteration[1] 6
This is iteration[1] 7
This is iteration[1] 8
This is iteration[1] 9
This is iteration[1] 10
Obviously this loop is pointless. However,
library(tidyverse)
exam_df <- data.frame(
name=c("Amos","Barnabas", "Chris", "Damien", "Ester", "Fairuz", "Gao" ),
year=c("Junior", "Senior", "Senior", "Senior", "Junior", "Senior", "Junior"),
english=c(60,66, 70,73, 55, 60, 70),
maths=c(90, 55, 63, 76, 52, 80, 64),
science=c(70, 62, 57, 43, 75, 80, 82),
history= c(55, 45, 62, 90, 41, 57, 60),
economics=c(42,45,60,44,57,65, 39),
stringsAsFactors = FALSE
)
#get the student names
students<- exam_df$name
#loop through each student by filtering data for that student
for(student in students){
scores<- exam_df %>%
filter(name==student)%>%
mutate(avgscore=mean(english, maths,science, history,economics))
#create student performance with their names
sp<- c(student, scores$avgscore)
cat(sp)
}Amos 60Barnabas 66Chris 70Damien 73Ester 55Fairuz 60Gao 70
Note that there are group of functions :apply(), lapply(), sapply(), vapply(), tapply(), mapply() that could be used instead of for loops. They are known as apply functions. General syntax is: apply(X,MARGIN,FUN) where
calculate_avg_score <- function(student) {
scores <- exam_df %>%
filter(name == student) %>%
mutate(avgscore = mean(english, maths, science, history, economics))
sp <- c(student, scores$avgscore)
return(sp)
}
#get the student names
students<- exam_df$name
# Apply the function using sapply
result <- lapply(students, calculate_avg_score)
# Print the results
print(result)[[1]]
[1] "Amos" "60"
[[2]]
[1] "Barnabas" "66"
[[3]]
[1] "Chris" "70"
[[4]]
[1] "Damien" "73"
[[5]]
[1] "Ester" "55"
[[6]]
[1] "Fairuz" "60"
[[7]]
[1] "Gao" "70"
Now if we go back to loops, we could get the average of each subject in each year.
# Unique years
unique_years <- unique(exam_df$year)
# Initialize an empty data frame to store results
average_scores <- data.frame(Year = character(0), Subject = character(0), AvgScore = numeric(0))
# Loop through each year
for (year in unique_years) {
year_data <- exam_df[exam_df$year == year, ]
print(year_data)
for (subject in colnames(year_data)[3:7]) {
avg_score <- mean(year_data[[subject]])
average_scores <- rbind(average_scores, data.frame(Year = year, Subject = subject, AvgScore = avg_score))
}
}
# Print the results
print(average_scores)Task: Write a for loop that finds the top performers in each subject in each year.
while loop
Another common form of loop.
Here’s an example of while loop for our toy data set.
i <- 1
while (i <= nrow(exam_df)) {
student <- exam_df$name[i]
english_score <- exam_df$english[i]
cat("Student:", student, "- English Score:", english_score, "\n")
i <- i + 1
}Student: Amos - English Score: 60
Student: Barnabas - English Score: 66
Student: Chris - English Score: 70
Student: Damien - English Score: 73
Student: Ester - English Score: 55
Student: Fairuz - English Score: 60
Student: Gao - English Score: 70
Student: Barnabas
Student: Chris
Student: Damien
Student: Gao
repeat loop
repeat is a loop which can be iterated many number of times but there is no exit condition to come out from the loop. it should be used with a break condition. Note that this type of loop is less common.
In some cases, when you want to skip an iteration you might want to use next().
Obviously, it is not possible to practice coding enough. One source that I’ d recommend is swirl.
In this part we will use the toolkit to produce a league table. I will use data from https://www.football-data.co.uk. The data is EPL 2022/23 season.
Note that we will not use all variables in the data set. In order to understand the names of variables refer https://www.football-data.co.uk/notes.txt. Let’s take the variables we will need.
| Column | Content |
|---|---|
| HomeTeam | Name of the team that played at home |
| AwayTeam | Name of the team that played away |
| FTHG | Full Time Home Goals |
| FTAG | Full Time Away Goals |
In order to create a league table,
we need to know total number of wins and draws for each team.
we need to calculate the number of league points won by each team. Whilst the winner gets 3 points, the loser gets 0 (zero) point. Each team gets 1 point in case of a draw.
# A tibble: 20 × 5
Team TotalWins TotalLosses TotalDraw TotalPoints
<chr> <int> <int> <int> <dbl>
1 Man City 28 5 5 89
2 Arsenal 26 6 6 84
3 Man United 23 9 6 75
4 Newcastle 19 5 14 71
5 Liverpool 19 9 10 67
6 Brighton 18 12 8 62
7 Aston Villa 18 13 7 61
8 Tottenham 18 14 6 60
9 Brentford 15 9 14 59
10 Fulham 15 16 7 52
11 Crystal Palace 11 15 12 45
12 Chelsea 11 16 11 44
13 Wolves 11 19 8 41
14 West Ham 11 20 7 40
15 Bournemouth 11 21 6 39
16 Nott'm Forest 9 18 11 38
17 Everton 8 18 12 36
18 Leicester 9 22 7 34
19 Leeds 7 21 10 31
20 Southampton 6 25 7 25
This part uses a very famous dataset called gapminder
incomeCategory by using gdpPercap. If the gdpPercap is less than 1000, then the country is called Low Income. If the gdpPercap is between 1000 and 12000 then the country is called Middle Income. If the gdpPercap is higher than 12000 then the country will be called High Income.for loop to calculate the average life expectancy for each income category.For the rest use the original data set.
4.Calculate the average life expectancy for each continent.
lag() that might be helpful for this purpose.