Connect with me on Open Science Framework | Contact me via LinkedIn
Depending on your machine it might be necessary to use right-click -> open in new browser window.
R analysis script presenting the exemplary DEA (Charnes et al., 1979) with one input and one output variable discussed in chapter 4 of the Marketingmanagement lecture. Throughout the script we apply functions provided by the package Benchmarking (see, Bogetoft & Otto (2022), as well as the corresponding book Bogetoft & Otto (2011)).
If this exercise grabs your attention, please check-out our study programs at the Chemnitz University of Technology by clicking on the logo (Germany)
Loading packages
R is a context-sensitive language. Thus, ‘data’ will be interpreted not in the same way as ‘Data’ will.
In R most functionality is provided by additional packages.
Most of the packages are well-documented, see: https://cran.r-project.org/
The code chunk below first evaluates if the package pacman is already installed to your machine. If yes, the corresponding package will be loaded. If not, R will install the package.
Alternatively, you can do this manually first by executing install.packages(“pacman”) and then library(pacman).
The second line then loads the package pacman.
The third line uses the function p_load() from the pacman package to install (if necessary) and loads all packages that we provide as arguments (e.g., Benchmarking, which provides functions for Data Envelopment Analysis (DEA) and Statistical Frontier Analysis (SFA)).
if (!"pacman" %in% rownames(installed.packages())) install.packages("pacman")
library(pacman)
pacman::p_load(haven, kableExtra, knitr,
labelled, tidyverse, Benchmarking)
In all code chunks throughout this script you are able to receive additional help on each used function by clicking on its name (or via right-click and then open in new browser tab). Alternatively, when coding, we can see which arguments a function understands by pressing ‘F1’ while setting the cursor into the function’s name.
Here is the R session info which gives you information on my machine, all loaded packages and their version:
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
[5] LC_TIME=German_Germany.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] downloadthis_0.3.1 mime_0.12 htmltools_0.5.2
[4] Benchmarking_0.30 quadprog_1.5-8 ucminf_1.1-4
[7] lpSolveAPI_5.5.2.0-17.7 quarto_1.1 ggpubr_0.4.0
[10] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[13] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[16] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
[19] tidyselect_1.1.2 labelled_2.9.1 knitr_1.39
[22] kableExtra_1.3.4 haven_2.5.0 pacman_0.5.1
loaded via a namespace (and not attached):
[1] httr_1.4.3 jsonlite_1.8.0 viridisLite_0.4.0 carData_3.0-5
[5] modelr_0.1.8 assertthat_0.2.1 cellranger_1.1.0 yaml_2.3.5
[9] pillar_1.7.0 backports_1.4.1 glue_1.6.2 digest_0.6.29
[13] ggsignif_0.6.3 rvest_1.0.2 colorspace_2.0-3 pkgconfig_2.0.3
[17] broom_0.8.0 scales_1.2.0 webshot_0.5.3 processx_3.5.3
[21] svglite_2.1.0 later_1.3.0 tzdb_0.3.0 generics_0.1.2
[25] car_3.0-13 ellipsis_0.3.2 withr_2.5.0 cli_3.3.0
[29] magrittr_2.0.3 crayon_1.5.1 readxl_1.4.0 evaluate_0.15
[33] ps_1.7.0 fs_1.5.2 fansi_1.0.3 rstatix_0.7.0
[37] xml2_1.3.3 tools_4.2.0 hms_1.1.1 lifecycle_1.0.1
[41] munsell_0.5.0 reprex_2.0.1 compiler_4.2.0 systemfonts_1.0.4
[45] rlang_1.0.2 grid_4.2.0 rstudioapi_0.13 base64enc_0.1-3
[49] rmarkdown_2.14 gtable_0.3.0 abind_1.4-5 DBI_1.1.2
[53] R6_2.5.1 lubridate_1.8.0 fastmap_1.1.0 utf8_1.2.2
[57] bsplus_0.1.3 stringi_1.7.6 Rcpp_1.0.8.3 vctrs_0.4.1
[61] dbplyr_2.1.1 xfun_0.31
Import lecture data
Make sure that the file ‘lecture_data_dea.rds’ (see Download option above) is located at your RStudio project’s directory.
In the code chunk below, we hand-in the file name to the readRDS() function. To execute the call, just set the cursor somewhere in the corresponding code line and press ‘Ctr’+‘Enter’. If you encounter any problems with the read functions, please try to copy your file to the desktop and then use: ‘lecture_data_dea<-readRDS(file.choose())’. This will open a dialogue that allows for a manual selection of the file. The second line of code calls for the results.
lecture_data_dea <- readRDS("lecture_data_dea.rds")
lecture_data_dea
brands print_advertisement online_advertisement brand_recall
1 A 4 15 10
2 B 2 12 7
3 C 10 3 10
4 D 8 4 12
5 E 12 6 12
6 F 3 14 15
7 G 5 12 16
8 H 9 10 17
9 I 6 10 19
10 J 9 18 21
11 K 14 19 21
Now, the object lecture_data_dea (a data frame) became part of our working environment. We can access the data frame. For example, we can view the data frame d via the function View(). We can also preview the first part of the data set by head(). We see that the data set correspondents to the example we have discussed in chapter 4. Specifically, the example is adapted from Bauer et al. (2006, p. 278).
Create Frontier function
Following the example in the lecture, we define the variable print_advertisement as input and the variable brand_recall as output. Thus we are benchmarking each of the 11 brands in the data set with regard to its ability to efficiently transform print advertising inputs into brand recall output.
Creating a frontier function for this problem is straightforward. Below we use the dea.plot() function from the Benchmarking package. Here we hand in the print advertising variable as x argument, which defines the input variable presented at the x-axes. Furthermore, we define the brand recall variable as output argument y. Finally, we set the variable containing the brand names as argument for txt to show brand names (i.e., A to K) in the plot.
dea.plot(x = lecture_data_dea$print_advertisement,
y = lecture_data_dea$brand_recall,
txt = lecture_data_dea$brands)
The resulting plot mirrors those discussed during lecture. In an essence, we find 5 efficient brands located onto the frontier function (i.e., B, F, I, J, and K). Four of these brands are the best practice benchmarks for their respective input level and therewith receive an efficiency score of 100%. Brand K is not regarded as perfectly efficient because of the Inputslack (i.e., Brand J achieves the same level of output at a significantly lower input level). Other brands are inefficient because they “produce” lower outputs at a comparable input level (see J and H), or need more input to end at a comparable output level.
Data Envelopment Analysis (DEA)
Next, we proceed with the DEA analysis, which includes calculating input and output orientated (in)efficiency values, and the weights 𝛌 for the benchmarking brands.
Input-orientated DEA
In the code chunk below, we apply the dea() function.
Here, we set the print advertising variable as X and the brand recall variable as Y.
In addition, we set ORIENTATION = “in” in order to start with the input orientated (in)efficiency measures.
We assign the results of the DEA to a new object named dea_input.
In the second part of the code, we combine the brand name variable with the input-orientated in(efficiency) scores obtained via DEA, to present the results in a tabular layout. Within this part, we used the eff() function, which extracts the (in)efficiency scores from the DEA results object.
Within the dea() function two arguments X and Y are expected (both upper-case letters), which is different from the code above where both arguments are lower-case letters.
dea_input <- dea(
X = lecture_data_dea$print_advertisement,
Y = lecture_data_dea$brand_recall,
ORIENTATION = "in"
)
data.frame(brands = lecture_data_dea$brands, efficiency = eff(dea_input))
brands efficiency
1 A 0.5937500
2 B 1.0000000
3 C 0.2375000
4 D 0.3281250
5 E 0.2187500
6 F 1.0000000
7 G 0.7500000
8 H 0.5000000
9 I 1.0000000
10 J 1.0000000
11 K 0.6428571
As we have seen in the lecture, the brands B, F, I, and J are input-efficient, whereas brand H receives an efficiency score of only 50%. Thus, to become efficient, H must decrease it’s input level by 50% whilst maintaining the same level of brand recall (17). For other brand the deficiency is even higher. For example, brand C receives a score of 0.238. Thus, C must decrease it’s input for approximately 76.2% whilst maintaining the level of output (10). The question is from which best practice benchmarks can H learn? The answer to this questions leads to the weights 𝛌 for the benchmarking brands.
We can request these weights by applying the lambda() function.
lambda(dea_input)
L2 L6 L9 L10
[1,] 0.625 0.375 0.00 0
[2,] 1.000 0.000 0.00 0
[3,] 0.625 0.375 0.00 0
[4,] 0.375 0.625 0.00 0
[5,] 0.375 0.625 0.00 0
[6,] 0.000 1.000 0.00 0
[7,] 0.000 0.750 0.25 0
[8,] 0.000 0.500 0.50 0
[9,] 0.000 0.000 1.00 0
[10,] 0.000 0.000 0.00 1
[11,] 0.000 0.000 0.00 1
Lets explain the resultant output. The rows are numbered from 1 to 11 where, again, 8 corresponds to brand H. For this particular brand we can see that two other brands serve as best practice benchmark (i.e., L6 and L9). These labels correspond to the 6th and 7th brand in the data set. Put differently, the best practice benchmarks for H are F and I (as discussed during lecture). Both benchmarks received a weight 𝛌 of 0.5. Thus, brand managers of H could most likely learn from these to brands because they have a comparable input level and transform inputs very well to outputs.
In the case of brand C (line 3), we see that two brands serve as benchmarks with a weight of 0.625 (B) and 0.375 (F). The outputs of B and F are 7 and 15. The inputs are 2 and 3. Thus the target values for improvement are 0.625⋅2+0.375⋅3=2,375 (input) and 0.625⋅7+0.375⋅15=10 (output).
The summary() function below summarizes the distribution of (in)efficiency scores among the set of brands.
summary(dea_input)
Summary of efficiencies
VRS technology and input orientated efficiency
Number of firms with efficiency==1 are 4 out of 11
Mean efficiency: 0.661
---
Eff range # %
0.2<= E <0.3 2 18.2
0.3<= E <0.4 1 9.1
0.4<= E <0.5 0 0.0
0.5<= E <0.6 2 18.2
0.6<= E <0.7 1 9.1
0.7<= E <0.8 1 9.1
0.8<= E <0.9 0 0.0
0.9<= E <1 0 0.0
E ==1 4 36.4
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2188 0.4141 0.6429 0.6610 1.0000 1.0000
Again, the output tells us that we find 4 efficient brands. In conjunction, the mean (in)efficiency is only 66.1%.
Output-orientated DEA
Next, we proceed with output-orientated DEA. Here we ask for each inefficient brand: how much must a brand increase it’s output by a given input level. The code below is very similar to the code that we have used for the input-orientated variant, except for two details. First, we name the results object differently (dea_output). Second, we set ORIENTATION = “out”.
dea_output <- dea(X = lecture_data_dea$print_advertisement,
Y = lecture_data_dea$brand_recall,
ORIENTATION = "out")
lambda(dea_output)
L2 L6 L9 L10
[1,] 0 0.6666667 0.3333333 0.0000000
[2,] 1 0.0000000 0.0000000 0.0000000
[3,] 0 0.0000000 0.0000000 1.0000000
[4,] 0 0.0000000 0.3333333 0.6666667
[5,] 0 0.0000000 0.0000000 1.0000000
[6,] 0 1.0000000 0.0000000 0.0000000
[7,] 0 0.3333333 0.6666667 0.0000000
[8,] 0 0.0000000 0.0000000 1.0000000
[9,] 0 0.0000000 1.0000000 0.0000000
[10,] 0 0.0000000 0.0000000 1.0000000
[11,] 0 0.0000000 0.0000000 1.0000000
data.frame(brands = lecture_data_dea$brands, efficiency = eff(dea_output)-1)
brands efficiency
1 A 0.6333333
2 B 0.0000000
3 C 1.1000000
4 D 0.6944444
5 E 0.7500000
6 F 0.0000000
7 G 0.1041667
8 H 0.2352941
9 I 0.0000000
10 J 0.0000000
11 K 0.0000000
summary(dea_output)
Summary of efficiencies
VRS technology and output orientated efficiency
Number of firms with efficiency==1 are 5 out of 11
Mean efficiency: 1.32
---
Eff range # %
F ==1 5 45.5
1< F =<1.1 0 0.0
1.1< F =<1.2 1 9.1
1.2< F =<1.3 1 9.1
1.3< F =<1.5 0 0.0
1.5< F =< 2 3 27.3
2< F =< 5 1 9.1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 1.104 1.320 1.664 2.100
In the code above, we subtracting 1 from the (in)efficiency scores to present the same format as we discussed during lecture. Thus, a value of 0.2353 for H means that this brand should increase it’s output by 23.53% whilst maintaining it’s input level (9).
Your Turn
Try to conduct a similar DEA with the other input variable provided by the data set (online_advertisment).
Using RStudio Cloud
If you are new to the R universe (rumors say there are still folks outside), it is also an option to run all of the code presented in the script via RStudio Cloud. This solution allows you to analyze your data using RStudio, directly from your web browser, which comes with several advantages:
- For you as student its for free (just sign up)
- No need to install R & RStudio
- No need to update R and packages
- All packages used are already installed and loaded in the corresponding RStudio Cloud project https://rstudio.cloud/content/4152691.
Just make sure to create a copy of the provided project to your own account! Otherwise, results are not saved (it’s a temporary project).
References
Citation
@misc{dr.marcellichters,
author = {Univ.-Prof. Dr. Marcel Lichters},
title = {Lecture {Marketingmanagement:} {Chapter} 4 - {Example}
{Brand} {Efficiency} via {Data} {Envelopment} {Analysis} {(DEA)}},
date = {},
url = {https://rpubs.com/M_Lichters/DEA},
langid = {en}
}