Linear Programming Optimization with R

Use linear programming tool in R to solve optimization problems.

Optimization is often used in operations research areas to solve the problems such as production planning, transportation networks design, warehouse location allocaiton , and scheduling where we try to maximize or minimize a linear function with numbers of decision variables and constraints.

Here I used one of my consulting projects where I helped our portfolio companies to select a wireless provider with the mix of data plans that can meet all their requirements (total number of lines and pooled data quantity) while cost them the least amount of money.

This kind of optimaztion can typically solved in Excel solver. However, since I have 20 portfolio companies with 2 providers and 2 Scenario for analysis, to do it in Excel, I will have to run the sover 80 times. It will be much easier to use R.

Load packages

library(lpSolve)

Load data

usage <- read.csv("usage.csv")
plan <- read.csv("wireless_data_plan.csv")

Usage Data

head(usage)

##   Company Num_Lines Data_Usage
## 1       A       134      397.5
## 2       B       350     1037.5
## 3       C      1510     3462.5
## 4       D      2260     4437.5
## 5       E       750     2875.0
## 6       F       410      612.5

str(usage)

## 'data.frame':    20 obs. of  3 variables:
##  $ Company   : Factor w/ 20 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Num_Lines : int  134 350 1510 2260 750 410 2930 1091 3350 7760 ...
##  $ Data_Usage: num  398 1038 3462 4438 2875 ...

As we can see that we have total 20 porfolio companies in the dataset with average number of lines and monthly data usage from past 3 months.

Now, I look at the summary statistics and the histograms of the company data.

Number of Lines : We can see that the average number of lines is around 1800 but most of companies have less than 2000 lines. Just one company is sort of outlier with more than 7000 lines.
Data Usage : Average usage per line is around 2.5GB with rage from 1GB to 4GB.

summary(usage$Num_Lines)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   134.0   779.2  1083.0  1774.0  1909.0  7760.0

summary(usage$Data_Usage/usage$Num_Lines)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.004   1.674   2.527   2.547   3.075   4.475

par(mfrow = c(1,2))
hist(usage$Num_Lines, main = "Number of Lines", xlab = "Number of Lines")
hist(usage$Data_Usage/usage$Num_Lines, main = "Data Usage", xlab = "Data Usage - GB")

Plan Data

head(plan)

##   Wireless_Carrier Data_GB Plan_Rate
## 1              ATT       3        60
## 2              ATT       4        75
## 3              ATT       5        85
## 4              ATT       6       100
## 5              VZW       1        56
## 6              VZW       2        60

str(plan)

## 'data.frame':    10 obs. of  3 variables:
##  $ Wireless_Carrier: Factor w/ 2 levels "ATT","VZW": 1 1 1 1 2 2 2 2 2 2
##  $ Data_GB         : int  3 4 5 6 1 2 4 6 8 10
##  $ Plan_Rate       : int  60 75 85 100 56 60 70 80 90 100

We can also see that we have different level of data plan from AT&T and Verizon Wireless for us to choose from. The goal of this analysis is to choose the carrier with the mix of different data plans with the lowest total cost while satisfies the number of lines and total data requirements

Create Objective Functions, constraints and constraint direction objects

obj.fun.att <- plan[1:4, 3]
obj.fun.vzw <- plan[5:10, 3]
constr.att <- matrix(c(1, 1, 1, 1, as.vector(plan[1:4, 2])), ncol = 4, byrow = TRUE)
constr.vzw <- matrix(c(1, 1, 1, 1, 1, 1, as.vector(plan[5:10, 2])), ncol = 6, byrow = TRUE)
constr.dir <- c("=", ">=")

We have two objective functions since we want to find the mix of plans with lowest cost from AT&T and Verizon. And there are two constraints. One is for total number of lines and total (pooled) data quantity. For total number of lines, I want the data plan to have the exactly same quantity, so I use “=”. But for total pooled quantity of data, as long as more data is avaialble than the data consumed, it is acceptable. So I used “>=” for data quantity constraint.

Create empty matrix to store the results

result.att <- matrix(0, nr = 20, nc = 5)
row.names(result.att) <- usage$Company
colnames(result.att) <- c("3GB", "4GB", "5GB", "6GB", "Cost")
result.vzw <- matrix(0, nr = 20, nc = 7)
row.names(result.vzw) <- usage$Company
colnames(result.vzw) <- c("1GB", "2GB", "4GB", "6GB", "8GB", "10GB", "Cost")

Create loop to run solver for each portfolio company with each provider

for (i in 1:20) {
  rhs <- usage[i,2:3]
  prod.sol <- lp("min", obj.fun.att, constr.att, constr.dir, rhs, compute.sens = TRUE, all.int = TRUE)
  result.att[i,5] <- prod.sol$objval
  result.att[i, 1:4] <- prod.sol$solution
}
for (i in 1:20) {
  rhs <- usage[i,2:3]
  prod.sol <- lp("min", obj.fun.vzw, constr.vzw, constr.dir, rhs, compute.sens = TRUE, all.int = TRUE)
  result.vzw[i,7] <- prod.sol$objval
  result.vzw[i, 1:6] <- prod.sol$solution
}

AT&T Optimization Result

result.att

##    3GB 4GB 5GB 6GB   Cost
## A  134   0   0   0   8040
## B  350   0   0   0  21000
## C 1510   0   0   0  90600
## D 2260   0   0   0 135600
## E  438   0 311   1  52815
## F  410   0   0   0  24600
## G 2930   0   0   0 175800
## H  286   0 805   0  85585
## I 3350   0   0   0 201000
## J 7760   0   0   0 465600
## K 4920   0   0   0 295200
## L  594   0 335   1  64215
## M  960   0   0   0  57600
## N 1792   0   0   0 107520
## O 1730   0   0   0 103800
## P 1406   0 247   1 105455
## Q  316   0 472   1  59180
## R  297   0   0   0  17820
## S 1075   0   0   0  64500
## T  796   0   0   0  47760

As we can see here, most of the allocation is with 3GB plan which makes sense because most of the companies use less than 3GB anyway. However, if the companies use more than 3GB, it seems like it is better to use higher data plan due to lower cost per GB.

Verizon Optimization Result

result.vzw

##    1GB  2GB 4GB 6GB 8GB 10GB   Cost
## A    0   69  65   0   0    0   8690
## B    0  258  66   0   1   25  22690
## C    1 1405  64   1   0   39  92816
## D   82 2178   0   0   0    0 135272
## E    1  528  65   0   1  155  51876
## F  207  203   0   0   0    0  23772
## G  785 2145   0   0   0    0 172660
## H    1  704  64   0   1  321  78966
## I 3337   13   0   0   0    0 187652
## J    1 7174  64   0   1  520 487066
## K 4215  705   0   0   0    0 278340
## L    1  680  64   1   0  184  63816
## M  645  315   0   0   0    0  55020
## N    0 1573   1   0   0  218 116250
## O    1 1571  66   0   1   91 108126
## P    1 1336  64   0   0  253 109996
## Q    0  523  65   0   1  200  56020
## R  148  149   0   0   0    0  17228
## S    1  890  66   0   0  118  69876
## T    0  796   0   0   0    0  47760

As for Verizon, we can see most of the companies have the mix of 2GB and 10GB plans to leverage the cheaper total rate from 2GB plan but lower per GB rate from 10GB plan.

Compare Overall Cost between AT&T and Verizon Wireless

comp <- as.data.frame(cbind(result.att[,5], result.vzw[,7]))
comp$lowest <- ifelse(comp[,1] > comp[,2], "vzw", "att")
colnames(comp) <- c("ATT", "VZW", "Lowest")
comp

##      ATT    VZW Lowest
## A   8040   8690    att
## B  21000  22690    att
## C  90600  92816    att
## D 135600 135272    vzw
## E  52815  51876    vzw
## F  24600  23772    vzw
## G 175800 172660    vzw
## H  85585  78966    vzw
## I 201000 187652    vzw
## J 465600 487066    att
## K 295200 278340    vzw
## L  64215  63816    vzw
## M  57600  55020    vzw
## N 107520 116250    att
## O 103800 108126    att
## P 105455 109996    att
## Q  59180  56020    vzw
## R  17820  17228    vzw
## S  64500  69876    att
## T  47760  47760    att

Second Scenario

Now we know what provider and plan to select given our current number of lines and usage. However, companies might want to purchase more data than what they are consuming now becuase first, the usage of data has been growing and expected to continue to do so, second, they want to avoid potential overage charges.

So now, I will build in a new variable as the percentage of data to purchase on top of what companies have been using in the past.

buffer <- c(1.2)
for (i in 1:20) {
  rhs <- c(usage[i,2],usage[i,3] * buffer)
  prod.sol <- lp("min", obj.fun.att, constr.att, constr.dir, rhs, compute.sens = TRUE, all.int = TRUE)
  result.att[i,5] <- prod.sol$objval
  result.att[i, 1:4] <- prod.sol$solution
}
for (i in 1:20) {
  rhs <- c(usage[i,2],usage[i,3] * buffer)
  prod.sol <- lp("min", obj.fun.vzw, constr.vzw, constr.dir, rhs, compute.sens = TRUE, all.int = TRUE)
  result.vzw[i,7] <- prod.sol$objval
  result.vzw[i, 1:6] <- prod.sol$solution
}

result.att

##    3GB 4GB 5GB 6GB   Cost
## A   97   0  36   1   8980
## B  253   0  96   1  23440
## C 1510   0   0   0  90600
## D 2260   0   0   0 135600
## E  150   0 600   0  60000
## F  410   0   0   0  24600
## G 2930   0   0   0 175800
## H    0   0 687 404  98795
## I 3350   0   0   0 201000
## J 7513   0 246   1 471790
## K 4920   0   0   0 295200
## L  248   0 681   1  72865
## M  960   0   0   0  57600
## N 1282   0 510   0 120270
## O 1730   0   0   0 103800
## P  860   0 794   0 119090
## Q    0   0 757  32  67545
## R  297   0   0   0  17820
## S  753   0 321   1  72565
## T  796   0   0   0  47760

result.vzw

##    1GB  2GB 4GB 6GB 8GB 10GB   Cost
## A    1   57  66   0   1    9   9086
## B    1  231  66   0   1   51  23726
## C    1 1318  65   0   1  125  96276
## D    1 2109  65   1   0   84 139626
## E    0  504   3   0   0  243  54750
## F   85  325   0   0   0    0  24260
## G    0 2899   3   0   0   28 176950
## H    1  581  65   1   0  443  83846
## I 2665  685   0   0   0    0 190340
## J    1 6678  65   0   1 1015 506876
## K 3090 1830   0   0   0    0 282840
## L    1  593  65   0   1  270  67276
## M  390  570   0   0   0    0  56040
## N    0 1439   2   0   0  351 121580
## O    0 1513   1   0   0  216 112450
## P    0 1199  66   0   1  388 115450
## Q    1  440  64   0   0  284  59336
## R   59  238   0   0   0    0  17584
## S    0  860   0   0   0  215  73100
## T    1  707  64   0   0   24  49356

comp <- as.data.frame(cbind(result.att[,5], result.vzw[,7]))
comp$lowest <- ifelse(comp[,1] > comp[,2], "vzw", "att")
colnames(comp) <- c("ATT", "VZW", "Lowest")
comp

##      ATT    VZW Lowest
## A   8980   9086    att
## B  23440  23726    att
## C  90600  96276    att
## D 135600 139626    att
## E  60000  54750    vzw
## F  24600  24260    vzw
## G 175800 176950    att
## H  98795  83846    vzw
## I 201000 190340    vzw
## J 471790 506876    att
## K 295200 282840    vzw
## L  72865  67276    vzw
## M  57600  56040    vzw
## N 120270 121580    att
## O 103800 112450    att
## P 119090 115450    vzw
## Q  67545  59336    vzw
## R  17820  17584    vzw
## S  72565  73100    att
## T  47760  49356    att

Linear Programming Optimization with R

Edward Tsai

February 6, 2017