In this article we will explore the “greedy” approach to the scheduling problem. The problem is stated as follows: we have 10000 jobs to do, each with some length \(l_i\) and weight (importance) \(w_i\). Our goal is to arrange the schedule of doing these jobs (in other words, the order of doing jobs) in such way that minimizes the weighted sum of completion times: \[\sum_{i=1}^{n}w_j C_j,\]
where \(C_j\) is the completion time of \(j\)-th job in the schedule: \(C_j=\sum_{i=1}^{j}l_i\).
We have text file with 10000 jobs on which we will perform our algorithms
jobs<-read.table("http://spark-public.s3.amazonaws.com/algo2/datasets/jobs.txt",
skip=1,header=FALSE,row.names=NULL,col.names=c("weight","length"))
The file has two columns, first for weight and second for length of the job
head(jobs)
## weight length
## 1 8 50
## 2 74 59
## 3 31 73
## 4 45 79
## 5 24 10
## 6 41 66
First greedy algorithm we will implement is the one that schedules job in decreasing order of the difference (weight - length). To break ties we will schedule the job with the highest weight first. First, let’s reorder the jobs in this way:
jobs=jobs[order(jobs$weight-jobs$length,jobs$weight,decreasing = TRUE),]
head(jobs)
## weight length
## 449 99 1
## 685 100 3
## 4246 100 3
## 250 99 2
## 3758 99 2
## 704 98 1
To compute weighted sum of completion times we need to multiply weight of every job by the cumulative length of job up to the current and then sum it for every job. The following line of code does exactly that.
options(digits=20)#to have the answer not in the exponentional form
sum(as.numeric(jobs[,1])*cumsum(jobs[,2]))
## [1] 69119377652
The ‘difference’ greedy algorithm is not the best one. Here we will instead use ratio \(\frac{weight}{length}\) to order the jobs. We will still use job’s weight to break the ties.
jobs=jobs[order(jobs$weight/jobs$length,jobs$weight,decreasing = TRUE),]
head(jobs)
## weight length
## 449 99 1
## 704 98 1
## 2260 95 1
## 9546 95 1
## 1025 93 1
## 2423 93 1
Nothing changes in computing weighted sum of completion times:
sum(as.numeric(jobs[,1])*cumsum(jobs[,2]))
## [1] 67311454237
As we can see, second approach gives better result. No wonder, since it really is the optimal algorithm.