What is an algorithm?

An algorithm is a step by step method of solving a problem, expressed as a finite list of well-defined instructions.

Reference

Algorithms - Applications

Pseudocode - Definition

Reference

Example of Algorithm & Pseudocode: The Collatz Sequence

The Collatz sequence goes as follows: start at any positive integer

It is conjectured but not proven that the sequence always reaches \(1\), eventually. For example, the Collatz sequence starting at \(5\) is: \[ 5 \to 16 \to 8 \to 4 \to 2 \to 1 \] A function which takes any positive integer and returns the next number in the sequence is an algorithm! Expressed in pseudocode this would look like:

Declare 'collatzNext' as function of x
If x is 1 
  result is 1
If x is even 
  result is x/2
Otherwise 
  result is 3*x + 1
Return result

Advantages of pseudocode

Exercise - Fibonacci sequence

Sample solution

Pseudocode:

Let x=1
Let y=1
Let z=x+y

Repeat (n-3) times the following: # we need (n-3) numbers (since we have the first 3)
    re-assign variable x to equal current value of y
    re-assign variable y to equal current value of z
    re-assign variable z to equal current value of x+y
Return z

R code:

# We do this in a function with argument n
get_fibo <- function(n){
x <- 1
y <- 1
z <- 2

for (i in 1:(n-3)){
      x <- y # update x
      y <- z # update y
      z <- x + y # find next number in sequence
}
  return(z)
}
get_fibo(7)
## [1] 13

Exercise: The sorting problem

Consider the problem of sorting a list of numbers, from smallest to biggest. How would you use an algorithm to do that task?

Possible Solution - Strategy

There are many possible sorting algorithms. The one presented here is often called insertion sort

General strategy:

  1. Consider the \(j\) \(th\) number (starting at \(2\), finishing at \(n\))
  2. Compare it to the \((j-1)th\) number: if the \((j-1)th\) is bigger, swap them
  3. Then move to compare the (new) \((j-1)th\) number with the \((j-2)th\) number, swap them if the \((j-2)th\) is bigger. Repeat until either:
    • you can’t swap numbers
    • you reach the first number in the list
  4. Let \(j = j+1\) and go to step 1.

Possible Solution - Code

Pseudocode:

for j=2 to length(A)
    # store the value of A[j] in a variable called key
    key <- A[j] 
    # initiate the comparison with the (j-1)th element
    i <- j-1 
    # the comparison continues as long as A[i]>key (or one exhausts the list)
    while i>0 and A[i]>key 
      A[i+1] = A[i] # shift A[i] to the right (since its bigger than key)
      A[i]   = key
      i=i-1  # move to the previous

In R code:

# We put this in a function 'insert_sort'
insert_sort <- function(A){
  # Length of inputted sequence
  n <- length(A)
  # Scan all numbers, from 2 to n
  for (j in 2:n){
    # Store the jth value
    key <- A[j]
    # Start comparison one backwards
    i <- j-1
    # Do this until we reach 1, or we find a number smaller or equal to 'key'
    while (i > 0 && A[i] > key){
      # Swap the bigger number with 'key'
      A[i+1] <- A[i]
      A[i] <- key
      i = i - 1
    } # End 'while' loop
  } # End 'for' loop
return(A)
} # End function

insert_sort(c(2,5,6,7,3,4,5,1))
## [1] 1 2 3 4 5 5 6 7

Take aways for good pseudocode

Divide and Conquer Method

The ‘Divide and Conquer’ method to design an algorithm breaks down a complex task into simpler blocks, or ‘sub-problems’. Each of those blocks is a programmable step. Those algorithms have the general structure:

  1. Divide: Break the given problem into (smaller/simpler) sub-problems of the same type.
  2. Conquer: Recursively solve these sub-problems.
  3. Combine: Appropriately combine the answers to get the solution of the original problem.

Reference

Divide and Conquer: An Exercise

You are given a sorted list of numbers: \(A = (n_1, n_2, \ldots)\) and another number, say \(N\), which might be in the list. You want to find if the number is in the list and if so, its position in the list.

How would you apply the divide and conquer method to solve this? Write this as pseudocode.

Possible Solution

A ‘naive’ solution would be to scan all the numbers, and stop whenever you find \(N\). That would work, but would not be efficient.

Instead, we can take advantage of the fact the list is sorted. General strategy:

In pseudocode:

# Let the list be A, with A[i] its i^th element
# Set the Left (L) and Right (R) indices of your searching range
L = 1 and R = n
If  N < A[L] OR N > A[R], return 'unsuccessful' 
Let m = floor((L+R)/2) # Update the middle point
    If (A[m] == N), return m (the search is done)
    If R = L, return 'unsuccessful'
    If A[m]<N, let L = m + 1  and go back to step "let m = floor(..)"
    If A[m]>N, let R = m - 1  and go back to step "let m = floor(..)"

In R code:

# We put this in a function 'find_N'
find_N <- function(A, N){
  L <- 1 # Set left bound at 1st position
  R <- length(A) # Set right bound at last position
  if (N < A[L] || N > A[R]){return('unsuccessful')} 
  found <- F # Thus far, N has not be found
  while (found == F){ # As long as we have not found N, we will keep going
    m <- floor((L+R)/2) # Update the middle point
    if (A[m] == N){return(paste(N, "is in position",m))}
    if (R == L){return('unsuccessful')}
    if (A[m]<N){L <- m + 1} # if N is bigger than middle point, update left bound
    if (A[m]>N){R <- m - 1} # if N is smaller than middle point, update right bound
  }
} # End function
find_N(A=1:50, N = 42)
## [1] "42 is in position 42"
find_N(A=c(4,8,10,12,14), N = 12)
## [1] "12 is in position 4"
find_N(A=c(4,8,10,12,14), N = 13) 
## [1] "unsuccessful"

More on Algorithms: Recursion

‘The process in which a function calls itself directly or indirectly is called recursion and the corresponding function is called as recursive function. Using recursive algorithm, certain problems can be solved quite easily.’

An old joke: To understand recursion you must first understand recursion.

Three Laws of Recursion

A recursive algorithm needs to follow three important ‘Laws’:

  1. A recursive algorithm must have a base case. When such a base case is reached, the algorithm terminates.
  2. At each new step, a recursive algorithm must change its state and progress towards the base case.
  3. A recursive algorithm must call itself, recursively.

Reference

Example

This is best understood through a simple example. Suppose we want to compute \(n! = n \times (n-1) \times \dots \times 1\). A classical way to do this is to use a loop as follows:

n <- 10
  result<-1
  for (i in 1:n){
      result<-result*i
  }
  result
## [1] 3628800

A recursive solution would require us to define a function my.factorial(n) which is used inside itself!

my.factorial<-function(n){
    # This is the base case
    if (n==1){
        return(1)
    }
    # If we are not at the base case, the algorithm moves towards the base case by calling itself
    else{
        return(n*my.factorial(n-1))
    }
}
my.factorial(10)
## [1] 3628800

Breakdown

The \(\verb|my.factorial(10)|\) function call does the following:

  1. \(10 \times \verb|my.factorial(9)|\)

  2. \(10 \times 9 \times \verb|my.factorial(8)|\)

  3. \(10 \times 9 \times 8 \times \verb|my.factorial(7)|\)

\(\dots\)

\(10 \times 9 \times 8 \times \dots 2 \times \verb|my.factorial(1)|\)

\(\verb|my.factorial(1)|\) returns \(1\) because we defined this behaviour here: \[ \verb|if (n==1){return(1)}| \]

Necessary properties of recursion

Exercise

Write a recursive algorithm that returns the \(n^{\text{th}}\) number in the Fibonacci sequence \(1,1,2,3,5,8,13,\ldots\)

Solution

my.fibo <-function(n){
    # This is the base case
    if (n==1 || n==2){
        return(1)
    }
    # If we are not at the base case, the algorithm moves towards the base case by calling itself
    else{
        return(my.fibo(n-1) + my.fibo(n-2))
    }
}
my.fibo(7)
## [1] 13

Note: This is a simple example to illustrate what a recursive algorithm is. However, it is a rather inefficient may of solving the Fibonacci sequence problem (much slower than a previous solution we gave). Can you see why?

Beware of infinite loops

As another example of algorithm, consider what we call a ‘happy number’:

For example, \(19\) is happy:

\[\begin{align*} & 1^2 + 9^2 = 82 \\ & 8^2 + 2^2 = 68 \\ & 6^2 + 8^2 = 100 \\ & 1^2 + 0^2 + 0^2 = 1. \end{align*}\]

\(4\) is sad:

\[\begin{align*} & 4^2 = 16 \\ & 1^2 + 6^2 = 37 \\ & 3^2 + 7^2 = 58 \\ & 5^2 + 8^2 = 89 \\ & 8^2 + 9^2 = 145 \\ & 1^2 + 4^2 + 5^2= 42 \\ & 4^2 + 2^2 = 20 \\ & 2^2 + 0^2 = 4 \\ & 4^2 = 16 \\ & \ldots \end{align*}\]

Happy number - Sample Solution

The tricky part here is to realise that if we ever get the same number twice in the sequence, then we don’t have a happy number, and the sequence will repeat itself forever. We then want to exit the process and declare the number sad. So, we need to keep a history of all numbers obtained.

In pseudocode:

# We do the task in a function
Declare function 'is_happy' with argument 'x'
Inside the function:
  # We store all numbers of the sequence in a vector, S
  Declare a vector S
  While x is not 1 AND S contains only unique elements
    - Isolate all digits of x: d1, d2, etc.
    - Update x as d1^2 + d2^2 + ...
    - Add x to vector S
  # While loop is now completed
  If x = 1, return TRUE  
  Otherwise, return FALSE

In R code:

is_happy <- function(x){
S <- NULL # declare an empty vector, to contain all numbers of the sequence

# while we have not reached 1, and we have not gotten any duplicates
while (x != 1 && any(duplicated(S)) == F){ 
  x <- as.integer(strsplit(as.character(x), '')[[1]]) # get all digits of x in a vector
  x <- sum(x^2)
  S <- c(S,x)
}
  ifelse(x == 1, "TRUE", "FALSE") # return result
}
is_happy(19)
## [1] "TRUE"
is_happy(4)
## [1] "FALSE"

Advanced Algorithm: PageRank

The license for the image above can be found here Click me.

More formally, call \(A\) some webpage, then
\[PR(A) = \frac{1-d}{N} + d \left(\frac{PR(T_1)}{C(T_1)} + ... + \frac{PR(T_n)}{C(T_n)}\right)\] Where:

-\(PR(A)\) is the PageRank of page \(A\)

-\(N\) is the total number of pages in the network

-\(PR(T_i)\) is the PageRank of pages \(T_i\) which links to page \(A\)

-\(C(T_i)\) is the total number of outbound links from page \(T_i\) to other pages

-\(d\) is a damping factor which can be set between \(0\) and \(1\) (we take it as \(1\) for the remainder of this section, and leave it to you to investigate it further if you are interested)

Reference - PageRanks are calculated recursively because the PageRank of a webpage \(A\) depends on the PageRank of other webpages, which in turn are dependent on the PageRank of \(A\)

PageRank example

Assume that \(d=1\) and that there are only \(N=4\) websites on the internet: \(A\), \(B\), \(C\) and \(D\). Links are given by the image below. That is, \(A\) has a link to \(C\) only. \(B\) has a link to \(A\) and \(D\). \(C\) has a link to all three other pages. \(D\) has a link to \(A\) only.

  • Initially, all Pages are assigned the same rank, \(1/4 = 0.25\).
  • At the first iteration, we have \[\begin{align*} PR(A) & = \frac{0.25}{2} + \frac{0.25}{3} + \frac{0.25}{1} = 11/24 \\ PR(B) & = \frac{0.25}{3} = 2/24 \\ PR(C) & = \frac{0.25}{1} = 6/24 \\ PR(D) & = \frac{0.25}{2} + \frac{0.25}{3} = 5/24 \\ \end{align*}\]
  • Then for the second iterations, we use the new Page Ranks obtained in the first one to get updated Pages Ranks, i.e. \[\begin{align*} PR(A) & = \frac{2/24}{2} + \frac{6/24}{3} + \frac{5/24}{1} = 8/24 \\ PR(B) & = \frac{6/24}{3} = 2/24 \\ PR(C) & = \frac{11/24}{1} = 11/24 \\ PR(D) & = \frac{2/24}{2} + \frac{6/24}{3} = 3/24 \\ \end{align*}\]
  • And so on… Can you write an \(\verb|R|\) program to do this automatically and find the converging \(PR\) values?

Final remarks on Algorithms

  • Writing the pseudocode of your algorithm before actually coding is good practice (especially for complex tasks)

  • Decomposing a problem into several ‘easier’ tasks is, in general, a good approach

    • Each building block is an doable task

    • You connect the building blocks to get a solution

  • Have efficiency in mind: can the task be done in fewer/faster steps?

  • The fact modern computers are powerful is not a valid excuse for ‘sloppy’, inefficient programming. In your actuarial careers you will encounter huge datasets, and the running time of your programs is likely to be relevant. Be efficient!

  • Additional Reference: Click me