In this article we will explore the “greedy” approach to the scheduling problem. The problem is stated as follows: we have 10000 jobs to do, each with some length \(l_i\) and weight (importance) \(w_i\). Our goal is to arrange the schedule of doing these jobs (in other words, the order of doing jobs) in such way that minimizes the weighted sum of completion times: \[\sum_{i=1}^{n}w_j C_j,\]

where \(C_j\) is the completion time of \(j\)-th job in the schedule: \(C_j=\sum_{i=1}^{j}l_i\).

We have text file with 10000 jobs on which we will perform our algorithms

jobs<-read.table("http://spark-public.s3.amazonaws.com/algo2/datasets/jobs.txt",
                 skip=1,header=FALSE,row.names=NULL,col.names=c("weight","length"))

The file has two columns, first for weight and second for length of the job

head(jobs)
##   weight length
## 1      8     50
## 2     74     59
## 3     31     73
## 4     45     79
## 5     24     10
## 6     41     66

“Difference” greedy algorithm

First greedy algorithm we will implement is the one that schedules job in decreasing order of the difference (weight - length). To break ties we will schedule the job with the highest weight first. First, let’s reorder the jobs in this way:

jobs=jobs[order(jobs$weight-jobs$length,jobs$weight,decreasing = TRUE),]
head(jobs)
##      weight length
## 449      99      1
## 685     100      3
## 4246    100      3
## 250      99      2
## 3758     99      2
## 704      98      1

To compute weighted sum of completion times we need to multiply weight of every job by the cumulative length of job up to the current and then sum it for every job. The following line of code does exactly that.

options(digits=20)#to have the answer not in the exponentional form
sum(as.numeric(jobs[,1])*cumsum(jobs[,2]))
## [1] 69119377652

‘Ratio’ greedy algorithm

The ‘difference’ greedy algorithm is not the best one. Here we will instead use ratio \(\frac{weight}{length}\) to order the jobs. We will still use job’s weight to break the ties.

jobs=jobs[order(jobs$weight/jobs$length,jobs$weight,decreasing = TRUE),]
head(jobs)
##      weight length
## 449      99      1
## 704      98      1
## 2260     95      1
## 9546     95      1
## 1025     93      1
## 2423     93      1

Nothing changes in computing weighted sum of completion times:

sum(as.numeric(jobs[,1])*cumsum(jobs[,2]))
## [1] 67311454237

As we can see, second approach gives better result. No wonder, since it really is the optimal algorithm.