An algorithm is a step by step method of solving a problem, expressed as a finite list of well-defined instructions.
input: an algorithm has zero or more inputs
output: has one or more outputs, which are specifically determined by the inputs (if any)
finiteness: must always terminate after a finite number of steps
definiteness: each step must be precisely defined; the actions to be carried out must be rigorously and unambiguously specified
effectiveness: all operations to be performed must be sufficiently basic that they can be done exactly, and in a finite length of time
an internet search engine:
input: key words in the search box
output: a sorted list of websites
a trip planner
input: starting position, destination, a map of roads
output: the fastest route
a recommendation system (e.g. YouTube, Netflix…)
input: history of user’s utilisation
output: a suggestion of what they may also like
parameter estimation (in statistics)
input: data, model assumptions
output: estimated parameters (for example maximising likelihood, more in ACTL2131)
‘Pseudocode is a detailed yet readable description of what a computer program or algorithm must do, expressed in a formally-styled natural language rather than in a programming language’.
To be read by humans, not computers
Must be clear and concise
Not typically concerned with issues of software engineering
The Collatz sequence goes as follows: start at any positive integer
It is conjectured but not proven that the sequence always reaches \(1\), eventually. For example, the Collatz sequence starting at \(5\) is: \[ 5 \to 16 \to 8 \to 4 \to 2 \to 1 \] A function which takes any positive integer and returns the next number in the sequence is an algorithm! Expressed in pseudocode this would look like:
Declare 'collatzNext' as function of x
If x is 1
result is 1
If x is even
result is x/2
Otherwise
result is 3*x + 1
Return result
Pseudocode:
Let x=1
Let y=1
Let z=x+y
Repeat (n-3) times the following: # we need (n-3) numbers (since we have the first 3)
re-assign variable x to equal current value of y
re-assign variable y to equal current value of z
re-assign variable z to equal current value of x+y
Return z
R code:
# We do this in a function with argument n
get_fibo <- function(n){
x <- 1
y <- 1
z <- 2
for (i in 1:(n-3)){
x <- y # update x
y <- z # update y
z <- x + y # find next number in sequence
}
return(z)
}
get_fibo(7)
## [1] 13
Consider the problem of sorting a list of numbers, from smallest to biggest. How would you use an algorithm to do that task?
Have in mind the characteristics of an algorithm we saw before
Write first your solution as a pseudocode
Then, you can write an actual R code based on your pseudocode
There are many possible sorting algorithms. The one presented here is often called insertion sort
Input: a sequence of \(n\) numbers, \(A=(a_1,a_2,\ldots,a_n)\)
Output: a rearrangement \((a_1',a_2',\ldots,a_n')\) of the input sequence such that \(a_1'\leq a_2'\leq \ldots \leq a_n'\)
General strategy:

Pseudocode:
for j=2 to length(A)
# store the value of A[j] in a variable called key
key <- A[j]
# initiate the comparison with the (j-1)th element
i <- j-1
# the comparison continues as long as A[i]>key (or one exhausts the list)
while i>0 and A[i]>key
A[i+1] = A[i] # shift A[i] to the right (since its bigger than key)
A[i] = key
i=i-1 # move to the previous
In R code:
# We put this in a function 'insert_sort'
insert_sort <- function(A){
# Length of inputted sequence
n <- length(A)
# Scan all numbers, from 2 to n
for (j in 2:n){
# Store the jth value
key <- A[j]
# Start comparison one backwards
i <- j-1
# Do this until we reach 1, or we find a number smaller or equal to 'key'
while (i > 0 && A[i] > key){
# Swap the bigger number with 'key'
A[i+1] <- A[i]
A[i] <- key
i = i - 1
} # End 'while' loop
} # End 'for' loop
return(A)
} # End function
insert_sort(c(2,5,6,7,3,4,5,1))
## [1] 1 2 3 4 5 5 6 7
Clear structure and steps
Informative explanations
Not every detail is there, but each step should be unambiguous. Ask yourself: would a competent programmer be able to implement this without thinking much?
Use indentation: improves readability of codes
Don’t be afraid to comment (#commentsRgreat)
!
The ‘Divide and Conquer’ method to design an algorithm breaks down a complex task into simpler blocks, or ‘sub-problems’. Each of those blocks is a programmable step. Those algorithms have the general structure:
You are given a sorted list of numbers: \(A = (n_1, n_2, \ldots)\) and another number, say \(N\), which might be in the list. You want to find if the number is in the list and if so, its position in the list.
How would you apply the divide and conquer method to solve this? Write this as pseudocode.
A ‘naive’ solution would be to scan all the numbers, and stop whenever you find \(N\). That would work, but would not be efficient.
Instead, we can take advantage of the fact the list is sorted. General strategy:
In pseudocode:
# Let the list be A, with A[i] its i^th element
# Set the Left (L) and Right (R) indices of your searching range
L = 1 and R = n
If N < A[L] OR N > A[R], return 'unsuccessful'
Let m = floor((L+R)/2) # Update the middle point
If (A[m] == N), return m (the search is done)
If R = L, return 'unsuccessful'
If A[m]<N, let L = m + 1 and go back to step "let m = floor(..)"
If A[m]>N, let R = m - 1 and go back to step "let m = floor(..)"
In R code:
# We put this in a function 'find_N'
find_N <- function(A, N){
L <- 1 # Set left bound at 1st position
R <- length(A) # Set right bound at last position
if (N < A[L] || N > A[R]){return('unsuccessful')}
found <- F # Thus far, N has not be found
while (found == F){ # As long as we have not found N, we will keep going
m <- floor((L+R)/2) # Update the middle point
if (A[m] == N){return(paste(N, "is in position",m))}
if (R == L){return('unsuccessful')}
if (A[m]<N){L <- m + 1} # if N is bigger than middle point, update left bound
if (A[m]>N){R <- m - 1} # if N is smaller than middle point, update right bound
}
} # End function
find_N(A=1:50, N = 42)
## [1] "42 is in position 42"
find_N(A=c(4,8,10,12,14), N = 12)
## [1] "12 is in position 4"
find_N(A=c(4,8,10,12,14), N = 13)
## [1] "unsuccessful"
Recursion is a popular approach in ‘dividing and conquering’ a problem.
To cite this Reference:
‘The process in which a function calls itself directly or indirectly is called recursion and the corresponding function is called as recursive function. Using recursive algorithm, certain problems can be solved quite easily.’
An old joke: To understand recursion you must first understand recursion.
A recursive algorithm needs to follow three important ‘Laws’:
This is best understood through a simple example. Suppose we want to compute \(n! = n \times (n-1) \times \dots \times 1\). A classical way to do this is to use a loop as follows:
n <- 10
result<-1
for (i in 1:n){
result<-result*i
}
result
## [1] 3628800
A recursive solution would require us to define a function
my.factorial(n) which is used inside
itself!
my.factorial<-function(n){
# This is the base case
if (n==1){
return(1)
}
# If we are not at the base case, the algorithm moves towards the base case by calling itself
else{
return(n*my.factorial(n-1))
}
}
my.factorial(10)
## [1] 3628800
The \(\verb|my.factorial(10)|\) function call does the following:
\(10 \times \verb|my.factorial(9)|\)
\(10 \times 9 \times \verb|my.factorial(8)|\)
\(10 \times 9 \times 8 \times \verb|my.factorial(7)|\)
\(\dots\)
\(10 \times 9 \times 8 \times \dots 2 \times \verb|my.factorial(1)|\)
\(\verb|my.factorial(1)|\) returns \(1\) because we defined this behaviour here: \[ \verb|if (n==1){return(1)}| \]
Write a recursive algorithm that returns the \(n^{\text{th}}\) number in the Fibonacci sequence \(1,1,2,3,5,8,13,\ldots\)
my.fibo <-function(n){
# This is the base case
if (n==1 || n==2){
return(1)
}
# If we are not at the base case, the algorithm moves towards the base case by calling itself
else{
return(my.fibo(n-1) + my.fibo(n-2))
}
}
my.fibo(7)
## [1] 13
Note: This is a simple example to illustrate what a recursive algorithm is. However, it is a rather inefficient may of solving the Fibonacci sequence problem (much slower than a previous solution we gave). Can you see why?
As another example of algorithm, consider what we call a ‘happy number’:
For example, \(19\) is happy:
\[\begin{align*} & 1^2 + 9^2 = 82 \\ & 8^2 + 2^2 = 68 \\ & 6^2 + 8^2 = 100 \\ & 1^2 + 0^2 + 0^2 = 1. \end{align*}\]
\(4\) is sad:
\[\begin{align*} & 4^2 = 16 \\ & 1^2 + 6^2 = 37 \\ & 3^2 + 7^2 = 58 \\ & 5^2 + 8^2 = 89 \\ & 8^2 + 9^2 = 145 \\ & 1^2 + 4^2 + 5^2= 42 \\ & 4^2 + 2^2 = 20 \\ & 2^2 + 0^2 = 4 \\ & 4^2 = 16 \\ & \ldots \end{align*}\]
The tricky part here is to realise that if we ever get the same number twice in the sequence, then we don’t have a happy number, and the sequence will repeat itself forever. We then want to exit the process and declare the number sad. So, we need to keep a history of all numbers obtained.
In pseudocode:
# We do the task in a function
Declare function 'is_happy' with argument 'x'
Inside the function:
# We store all numbers of the sequence in a vector, S
Declare a vector S
While x is not 1 AND S contains only unique elements
- Isolate all digits of x: d1, d2, etc.
- Update x as d1^2 + d2^2 + ...
- Add x to vector S
# While loop is now completed
If x = 1, return TRUE
Otherwise, return FALSE
In R code:
is_happy <- function(x){
S <- NULL # declare an empty vector, to contain all numbers of the sequence
# while we have not reached 1, and we have not gotten any duplicates
while (x != 1 && any(duplicated(S)) == F){
x <- as.integer(strsplit(as.character(x), '')[[1]]) # get all digits of x in a vector
x <- sum(x^2)
S <- c(S,x)
}
ifelse(x == 1, "TRUE", "FALSE") # return result
}
is_happy(19)
## [1] "TRUE"
is_happy(4)
## [1] "FALSE"
Algorithms are the building blocks of computer programs. Arguably, the most widely used algorithm today is PageRank, the ranking system invented by Google co-founders Larry Page and Sergey Brin.
PageRank gives a score to any webpage (in a given network). This score represents the probability that a person who is randomly clicking links on the Internet lands on that specific page.
So if a page \(A\) has a \(0.5\) PageRank within a network, and someone is randomly clicking on links, there is a \(50\%\) chance this person will be directed to page \(A\) after clicking randomly on a given link.
The overall idea behind PageRank is very simple: pages that receive more links from other pages should have a higher score, see the illustration below. Each smiley face represents a webpage, and each pointed finger represents an outbound link from one page to another. The size of each smiley face is proportional to their Page Rank.
Said otherwise: PageRank gives a page’s ‘popularity’; websites that are frequently linked to have higher PageRanks.
The license for the image above can be found here Click me.
More formally, call \(A\) some
webpage, then
\[PR(A) = \frac{1-d}{N} + d
\left(\frac{PR(T_1)}{C(T_1)} + ... +
\frac{PR(T_n)}{C(T_n)}\right)\] Where:
-\(PR(A)\) is the PageRank of page \(A\)
-\(N\) is the total number of pages in the network
-\(PR(T_i)\) is the PageRank of pages \(T_i\) which links to page \(A\)
-\(C(T_i)\) is the total number of outbound links from page \(T_i\) to other pages
-\(d\) is a damping factor which can be set between \(0\) and \(1\) (we take it as \(1\) for the remainder of this section, and leave it to you to investigate it further if you are interested)
Reference - PageRanks are calculated recursively because the PageRank of a webpage \(A\) depends on the PageRank of other webpages, which in turn are dependent on the PageRank of \(A\)…
Assume that \(d=1\) and that there are only \(N=4\) websites on the internet: \(A\), \(B\), \(C\) and \(D\). Links are given by the image below. That is, \(A\) has a link to \(C\) only. \(B\) has a link to \(A\) and \(D\). \(C\) has a link to all three other pages. \(D\) has a link to \(A\) only.
Writing the pseudocode of your algorithm before actually coding is good practice (especially for complex tasks)
Decomposing a problem into several ‘easier’ tasks is, in general, a good approach
Each building block is an doable task
You connect the building blocks to get a solution
Have efficiency in mind: can the task be done in fewer/faster steps?
The fact modern computers are powerful is not a valid excuse for ‘sloppy’, inefficient programming. In your actuarial careers you will encounter huge datasets, and the running time of your programs is likely to be relevant. Be efficient!
Additional Reference: Click me