Use linear programming tool in R to solve optimization problems.
Optimization is often used in operations research areas to solve the problems such as production planning, transportation networks design, warehouse location allocaiton , and scheduling where we try to maximize or minimize a linear function with numbers of decision variables and constraints.
Here I used one of my consulting projects where I helped our portfolio companies to select a wireless provider with the mix of data plans that can meet all their requirements (total number of lines and pooled data quantity) while cost them the least amount of money.
This kind of optimaztion can typically solved in Excel solver. However, since I have 20 portfolio companies with 2 providers and 2 Scenario for analysis, to do it in Excel, I will have to run the sover 80 times. It will be much easier to use R.
Load packages
library(lpSolve)
Load data
usage <- read.csv("usage.csv")
plan <- read.csv("wireless_data_plan.csv")
Usage Data
head(usage)
## Company Num_Lines Data_Usage
## 1 A 134 397.5
## 2 B 350 1037.5
## 3 C 1510 3462.5
## 4 D 2260 4437.5
## 5 E 750 2875.0
## 6 F 410 612.5
str(usage)
## 'data.frame': 20 obs. of 3 variables:
## $ Company : Factor w/ 20 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Num_Lines : int 134 350 1510 2260 750 410 2930 1091 3350 7760 ...
## $ Data_Usage: num 398 1038 3462 4438 2875 ...
As we can see that we have total 20 porfolio companies in the dataset with average number of lines and monthly data usage from past 3 months.
Now, I look at the summary statistics and the histograms of the company data.
summary(usage$Num_Lines)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 134.0 779.2 1083.0 1774.0 1909.0 7760.0
summary(usage$Data_Usage/usage$Num_Lines)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.004 1.674 2.527 2.547 3.075 4.475
par(mfrow = c(1,2))
hist(usage$Num_Lines, main = "Number of Lines", xlab = "Number of Lines")
hist(usage$Data_Usage/usage$Num_Lines, main = "Data Usage", xlab = "Data Usage - GB")
Plan Data
head(plan)
## Wireless_Carrier Data_GB Plan_Rate
## 1 ATT 3 60
## 2 ATT 4 75
## 3 ATT 5 85
## 4 ATT 6 100
## 5 VZW 1 56
## 6 VZW 2 60
str(plan)
## 'data.frame': 10 obs. of 3 variables:
## $ Wireless_Carrier: Factor w/ 2 levels "ATT","VZW": 1 1 1 1 2 2 2 2 2 2
## $ Data_GB : int 3 4 5 6 1 2 4 6 8 10
## $ Plan_Rate : int 60 75 85 100 56 60 70 80 90 100
We can also see that we have different level of data plan from AT&T and Verizon Wireless for us to choose from. The goal of this analysis is to choose the carrier with the mix of different data plans with the lowest total cost while satisfies the number of lines and total data requirements
Create Objective Functions, constraints and constraint direction objects
obj.fun.att <- plan[1:4, 3]
obj.fun.vzw <- plan[5:10, 3]
constr.att <- matrix(c(1, 1, 1, 1, as.vector(plan[1:4, 2])), ncol = 4, byrow = TRUE)
constr.vzw <- matrix(c(1, 1, 1, 1, 1, 1, as.vector(plan[5:10, 2])), ncol = 6, byrow = TRUE)
constr.dir <- c("=", ">=")
We have two objective functions since we want to find the mix of plans with lowest cost from AT&T and Verizon. And there are two constraints. One is for total number of lines and total (pooled) data quantity. For total number of lines, I want the data plan to have the exactly same quantity, so I use “=”. But for total pooled quantity of data, as long as more data is avaialble than the data consumed, it is acceptable. So I used “>=” for data quantity constraint.
Create empty matrix to store the results
result.att <- matrix(0, nr = 20, nc = 5)
row.names(result.att) <- usage$Company
colnames(result.att) <- c("3GB", "4GB", "5GB", "6GB", "Cost")
result.vzw <- matrix(0, nr = 20, nc = 7)
row.names(result.vzw) <- usage$Company
colnames(result.vzw) <- c("1GB", "2GB", "4GB", "6GB", "8GB", "10GB", "Cost")
Create loop to run solver for each portfolio company with each provider
for (i in 1:20) {
rhs <- usage[i,2:3]
prod.sol <- lp("min", obj.fun.att, constr.att, constr.dir, rhs, compute.sens = TRUE, all.int = TRUE)
result.att[i,5] <- prod.sol$objval
result.att[i, 1:4] <- prod.sol$solution
}
for (i in 1:20) {
rhs <- usage[i,2:3]
prod.sol <- lp("min", obj.fun.vzw, constr.vzw, constr.dir, rhs, compute.sens = TRUE, all.int = TRUE)
result.vzw[i,7] <- prod.sol$objval
result.vzw[i, 1:6] <- prod.sol$solution
}
AT&T Optimization Result
result.att
## 3GB 4GB 5GB 6GB Cost
## A 134 0 0 0 8040
## B 350 0 0 0 21000
## C 1510 0 0 0 90600
## D 2260 0 0 0 135600
## E 438 0 311 1 52815
## F 410 0 0 0 24600
## G 2930 0 0 0 175800
## H 286 0 805 0 85585
## I 3350 0 0 0 201000
## J 7760 0 0 0 465600
## K 4920 0 0 0 295200
## L 594 0 335 1 64215
## M 960 0 0 0 57600
## N 1792 0 0 0 107520
## O 1730 0 0 0 103800
## P 1406 0 247 1 105455
## Q 316 0 472 1 59180
## R 297 0 0 0 17820
## S 1075 0 0 0 64500
## T 796 0 0 0 47760
As we can see here, most of the allocation is with 3GB plan which makes sense because most of the companies use less than 3GB anyway. However, if the companies use more than 3GB, it seems like it is better to use higher data plan due to lower cost per GB.
Verizon Optimization Result
result.vzw
## 1GB 2GB 4GB 6GB 8GB 10GB Cost
## A 0 69 65 0 0 0 8690
## B 0 258 66 0 1 25 22690
## C 1 1405 64 1 0 39 92816
## D 82 2178 0 0 0 0 135272
## E 1 528 65 0 1 155 51876
## F 207 203 0 0 0 0 23772
## G 785 2145 0 0 0 0 172660
## H 1 704 64 0 1 321 78966
## I 3337 13 0 0 0 0 187652
## J 1 7174 64 0 1 520 487066
## K 4215 705 0 0 0 0 278340
## L 1 680 64 1 0 184 63816
## M 645 315 0 0 0 0 55020
## N 0 1573 1 0 0 218 116250
## O 1 1571 66 0 1 91 108126
## P 1 1336 64 0 0 253 109996
## Q 0 523 65 0 1 200 56020
## R 148 149 0 0 0 0 17228
## S 1 890 66 0 0 118 69876
## T 0 796 0 0 0 0 47760
As for Verizon, we can see most of the companies have the mix of 2GB and 10GB plans to leverage the cheaper total rate from 2GB plan but lower per GB rate from 10GB plan.
Compare Overall Cost between AT&T and Verizon Wireless
comp <- as.data.frame(cbind(result.att[,5], result.vzw[,7]))
comp$lowest <- ifelse(comp[,1] > comp[,2], "vzw", "att")
colnames(comp) <- c("ATT", "VZW", "Lowest")
comp
## ATT VZW Lowest
## A 8040 8690 att
## B 21000 22690 att
## C 90600 92816 att
## D 135600 135272 vzw
## E 52815 51876 vzw
## F 24600 23772 vzw
## G 175800 172660 vzw
## H 85585 78966 vzw
## I 201000 187652 vzw
## J 465600 487066 att
## K 295200 278340 vzw
## L 64215 63816 vzw
## M 57600 55020 vzw
## N 107520 116250 att
## O 103800 108126 att
## P 105455 109996 att
## Q 59180 56020 vzw
## R 17820 17228 vzw
## S 64500 69876 att
## T 47760 47760 att
Second Scenario
Now we know what provider and plan to select given our current number of lines and usage. However, companies might want to purchase more data than what they are consuming now becuase first, the usage of data has been growing and expected to continue to do so, second, they want to avoid potential overage charges.
So now, I will build in a new variable as the percentage of data to purchase on top of what companies have been using in the past.
buffer <- c(1.2)
for (i in 1:20) {
rhs <- c(usage[i,2],usage[i,3] * buffer)
prod.sol <- lp("min", obj.fun.att, constr.att, constr.dir, rhs, compute.sens = TRUE, all.int = TRUE)
result.att[i,5] <- prod.sol$objval
result.att[i, 1:4] <- prod.sol$solution
}
for (i in 1:20) {
rhs <- c(usage[i,2],usage[i,3] * buffer)
prod.sol <- lp("min", obj.fun.vzw, constr.vzw, constr.dir, rhs, compute.sens = TRUE, all.int = TRUE)
result.vzw[i,7] <- prod.sol$objval
result.vzw[i, 1:6] <- prod.sol$solution
}
result.att
## 3GB 4GB 5GB 6GB Cost
## A 97 0 36 1 8980
## B 253 0 96 1 23440
## C 1510 0 0 0 90600
## D 2260 0 0 0 135600
## E 150 0 600 0 60000
## F 410 0 0 0 24600
## G 2930 0 0 0 175800
## H 0 0 687 404 98795
## I 3350 0 0 0 201000
## J 7513 0 246 1 471790
## K 4920 0 0 0 295200
## L 248 0 681 1 72865
## M 960 0 0 0 57600
## N 1282 0 510 0 120270
## O 1730 0 0 0 103800
## P 860 0 794 0 119090
## Q 0 0 757 32 67545
## R 297 0 0 0 17820
## S 753 0 321 1 72565
## T 796 0 0 0 47760
result.vzw
## 1GB 2GB 4GB 6GB 8GB 10GB Cost
## A 1 57 66 0 1 9 9086
## B 1 231 66 0 1 51 23726
## C 1 1318 65 0 1 125 96276
## D 1 2109 65 1 0 84 139626
## E 0 504 3 0 0 243 54750
## F 85 325 0 0 0 0 24260
## G 0 2899 3 0 0 28 176950
## H 1 581 65 1 0 443 83846
## I 2665 685 0 0 0 0 190340
## J 1 6678 65 0 1 1015 506876
## K 3090 1830 0 0 0 0 282840
## L 1 593 65 0 1 270 67276
## M 390 570 0 0 0 0 56040
## N 0 1439 2 0 0 351 121580
## O 0 1513 1 0 0 216 112450
## P 0 1199 66 0 1 388 115450
## Q 1 440 64 0 0 284 59336
## R 59 238 0 0 0 0 17584
## S 0 860 0 0 0 215 73100
## T 1 707 64 0 0 24 49356
comp <- as.data.frame(cbind(result.att[,5], result.vzw[,7]))
comp$lowest <- ifelse(comp[,1] > comp[,2], "vzw", "att")
colnames(comp) <- c("ATT", "VZW", "Lowest")
comp
## ATT VZW Lowest
## A 8980 9086 att
## B 23440 23726 att
## C 90600 96276 att
## D 135600 139626 att
## E 60000 54750 vzw
## F 24600 24260 vzw
## G 175800 176950 att
## H 98795 83846 vzw
## I 201000 190340 vzw
## J 471790 506876 att
## K 295200 282840 vzw
## L 72865 67276 vzw
## M 57600 56040 vzw
## N 120270 121580 att
## O 103800 112450 att
## P 119090 115450 vzw
## Q 67545 59336 vzw
## R 17820 17584 vzw
## S 72565 73100 att
## T 47760 49356 att