This work is inspired by R documentation and Source Code from https://rdrr.io/cran/frontier/man/frontier.html
in order to find firm’s technical and cost efficiency, one can employ stochastic frontier analysis by enacting frontier package in R. If you do not have the package already, you can ignore this following syntax by removing “#” symbol at the beginning of the code chunk.
#install.packages("frontier")
Activate the frontier library:
library(frontier)
## Warning: package 'frontier' was built under R version 4.1.2
## Loading required package: micEcon
## Warning: package 'micEcon' was built under R version 4.1.2
##
## If you have questions, suggestions, or comments regarding one of the 'micEcon' packages, please use a forum or 'tracker' at micEcon's R-Forge site:
## https://r-forge.r-project.org/projects/micecon/
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Please cite the 'frontier' package as:
## Tim Coelli and Arne Henningsen (2013). frontier: Stochastic Frontier Analysis. R package version 1.1. http://CRAN.R-Project.org/package=frontier.
##
## If you have questions, suggestions, or comments regarding the 'frontier' package, please use a forum or 'tracker' at frontier's R-Forge site:
## https://r-forge.r-project.org/projects/frontier/
We will utilize data example from the FRONTIER 4.1 developed by Coelli, T. (1996). The first trial using (cross-section data). explore the glimpse of the data to check the outline and variables inside the dataframe
data( front41Data )
head(front41Data)
## firm output capital labour
## 1 1 12.778 9.416 35.134
## 2 2 24.285 4.643 77.297
## 3 3 20.855 5.095 89.799
## 4 4 13.213 4.935 35.698
## 5 5 12.018 8.717 27.878
## 6 6 15.284 1.066 92.174
inspect variables inside the data:
names(front41Data)
## [1] "firm" "output" "capital" "labour"
str(front41Data)
## 'data.frame': 60 obs. of 4 variables:
## $ firm : int 1 2 3 4 5 6 7 8 9 10 ...
## $ output : num 12.8 24.3 20.9 13.2 12 ...
## $ capital: num 9.42 4.64 5.09 4.93 8.72 ...
## $ labour : num 35.1 77.3 89.8 35.7 27.9 ...
Execute Cobb-Douglas production frontier:
cobbDouglas <- sfa( log( output ) ~ log( capital ) + log( labour ),
data = front41Data )
display the output of stochastic frontier analysis:
summary(cobbDouglas)
## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 7 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.561619 0.202617 2.7718 0.0055742 **
## log(capital) 0.281102 0.047643 5.9001 3.632e-09 ***
## log(labour) 0.536480 0.045252 11.8555 < 2.2e-16 ***
## sigmaSq 0.217000 0.063909 3.3955 0.0006851 ***
## gamma 0.797207 0.136424 5.8436 5.109e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -17.02722
##
## cross-sectional data
## total number of observations = 60
##
## mean efficiency: 0.7405678
Using another data from about rice producers in the Philippines (panel data):
data(riceProdPhil)
head(riceProdPhil)
## YEARDUM FMERCODE PROD AREA LABOR NPK OTHER PRICE AREAP LABORP
## 1 1 1 7.87 2.5 161 207.5 52.863763 5 3821.768 66.23404
## 2 1 2 10.35 3.8 184 303.5 284.441761 5 3029.267 80.43000
## 3 1 3 9.98 3.4 170 252.0 77.616495 5 3413.692 70.40329
## 4 1 4 4.83 1.4 68 88.0 14.297674 5 4438.461 68.50588
## 5 1 5 8.74 3.6 130 149.8 82.675703 5 2527.747 81.89538
## 6 1 6 1.84 0.5 34 21.0 5.991926 5 4937.495 51.74824
## NPKP OTHERP AGE EDYRS HHSIZE NADULT BANRAT
## 1 14.072675 44.77946 37 10 7 4 1.00
## 2 15.239012 10.52803 35 6 5 2 0.93
## 3 13.809524 41.50158 48 10 5 5 0.58
## 4 9.886364 61.17833 37 6 3 2 0.36
## 5 13.067023 31.52229 48 6 4 4 0.72
## 6 13.817143 54.24925 49 10 3 3 1.00
inspect the data, check data type of each variables:
names(riceProdPhil)
## [1] "YEARDUM" "FMERCODE" "PROD" "AREA" "LABOR" "NPK"
## [7] "OTHER" "PRICE" "AREAP" "LABORP" "NPKP" "OTHERP"
## [13] "AGE" "EDYRS" "HHSIZE" "NADULT" "BANRAT"
str(riceProdPhil)
## 'data.frame': 344 obs. of 17 variables:
## $ YEARDUM : int 1 1 1 1 1 1 1 1 1 1 ...
## $ FMERCODE: int 1 2 3 4 5 6 7 8 9 10 ...
## $ PROD : num 7.87 10.35 9.98 4.83 8.74 ...
## $ AREA : num 2.5 3.8 3.4 1.4 3.6 0.5 1.75 1.7 3 3.4 ...
## $ LABOR : num 161 184 170 68 130 34 91 114 211 192 ...
## $ NPK : num 208 304 252 88 150 ...
## $ OTHER : num 52.9 284.4 77.6 14.3 82.7 ...
## $ PRICE : num 5 5 5 5 5 5 5 5 5 5 ...
## $ AREAP : num 3822 3029 3414 4438 2528 ...
## $ LABORP : num 66.2 80.4 70.4 68.5 81.9 ...
## $ NPKP : num 14.07 15.24 13.81 9.89 13.07 ...
## $ OTHERP : num 44.8 10.5 41.5 61.2 31.5 ...
## $ AGE : int 37 35 48 37 48 49 25 52 54 62 ...
## $ EDYRS : int 10 6 10 6 6 10 10 6 10 6 ...
## $ HHSIZE : int 7 5 5 3 4 3 3 8 6 4 ...
## $ NADULT : int 4 2 5 2 4 3 2 8 4 2 ...
## $ BANRAT : num 1 0.93 0.58 0.36 0.72 1 1 1 0.5 0.93 ...
Employing Error Components Frontier (Battese & Coelli 1992) with observation-specific efficiencies (ignoring the panel structure):
rice <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
data = riceProdPhil )
summary( rice )
## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 9 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.043183 0.252212 -4.1361 3.532e-05 ***
## log(AREA) 0.355520 0.060336 5.8923 3.808e-09 ***
## log(LABOR) 0.333288 0.062692 5.3163 1.059e-07 ***
## log(NPK) 0.271276 0.035238 7.6984 1.378e-14 ***
## sigmaSq 0.238634 0.026750 8.9208 < 2.2e-16 ***
## gamma 0.885391 0.035545 24.9093 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -86.20268
##
## cross-sectional data
## total number of observations = 344
##
## mean efficiency: 0.7229731
Trying another attempt still using Error Components Frontier (Battese & Coelli 1992) with “true” fixed individual effects and observation-specific efficiencies:
riceTrue <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) +
factor( FMERCODE ), data = riceProdPhil )
## Warning in sfa(log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) +
## factor(FMERCODE), : the parameter 'gamma' is close to the boundary of the
## parameter space [0,1]: this can cause convergence problems and can negatively
## affect the validity and reliability of statistical tests and might be caused by
## model misspecification
summary( riceTrue )
## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 36 iterations:
## cannot find a parameter vector that results in a log-likelihood value
## larger than the log-likelihood value obtained in the previous step
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.14901871 0.50175482 0.2970 0.76647
## log(AREA) 0.53155134 0.08227644 6.4606 1.043e-10 ***
## log(LABOR) 0.19377818 0.11052201 1.7533 0.07955 .
## log(NPK) 0.12389081 0.07741476 1.6004 0.10952
## factor(FMERCODE)2 0.27513422 0.77815475 0.3536 0.72366
## factor(FMERCODE)3 -0.01343728 0.58862288 -0.0228 0.98179
## factor(FMERCODE)4 0.20057297 0.67502902 0.2971 0.76637
## factor(FMERCODE)5 -0.03117240 0.39710877 -0.0785 0.93743
## factor(FMERCODE)6 -0.19563666 0.80169651 -0.2440 0.80721
## factor(FMERCODE)7 0.15801987 0.71936787 0.2197 0.82613
## factor(FMERCODE)8 0.04052639 0.88469483 0.0458 0.96346
## factor(FMERCODE)9 0.04479220 0.63256363 0.0708 0.94355
## factor(FMERCODE)10 0.15591061 0.90545593 0.1722 0.86329
## factor(FMERCODE)11 -0.30092416 0.24749552 -1.2159 0.22403
## factor(FMERCODE)12 0.20205519 0.75543029 0.2675 0.78911
## factor(FMERCODE)13 -0.09055144 0.91246418 -0.0992 0.92095
## factor(FMERCODE)14 -0.00191789 0.62478311 -0.0031 0.99755
## factor(FMERCODE)15 -0.38096390 0.60345085 -0.6313 0.52784
## factor(FMERCODE)16 -0.12911985 0.83178627 -0.1552 0.87664
## factor(FMERCODE)17 0.37877389 0.18927034 2.0012 0.04537 *
## factor(FMERCODE)18 0.28977587 0.47352222 0.6120 0.54057
## factor(FMERCODE)19 0.39447872 0.19488167 2.0242 0.04295 *
## factor(FMERCODE)20 0.13030699 0.57221457 0.2277 0.81986
## factor(FMERCODE)21 0.14694707 0.44069363 0.3334 0.73880
## factor(FMERCODE)22 0.06868002 0.65553066 0.1048 0.91656
## factor(FMERCODE)23 -0.03685168 0.42198817 -0.0873 0.93041
## factor(FMERCODE)24 0.00946378 0.49612205 0.0191 0.98478
## factor(FMERCODE)25 0.15369959 0.68591584 0.2241 0.82270
## factor(FMERCODE)26 -0.06728899 0.60446614 -0.1113 0.91136
## factor(FMERCODE)27 0.04860041 0.85920711 0.0566 0.95489
## factor(FMERCODE)28 0.08921273 0.80328445 0.1111 0.91157
## factor(FMERCODE)29 -0.01676247 0.43853474 -0.0382 0.96951
## factor(FMERCODE)30 -0.48339665 0.24290957 -1.9900 0.04659 *
## factor(FMERCODE)31 0.22707161 0.20506536 1.1073 0.26816
## factor(FMERCODE)32 0.52550734 0.38819505 1.3537 0.17583
## factor(FMERCODE)33 -0.00309571 0.87978492 -0.0035 0.99719
## factor(FMERCODE)34 -0.71432603 0.33774672 -2.1150 0.03443 *
## factor(FMERCODE)35 0.18284349 0.67311351 0.2716 0.78590
## factor(FMERCODE)36 0.00068349 0.36595801 0.0019 0.99851
## factor(FMERCODE)37 0.45285689 0.52943159 0.8554 0.39235
## factor(FMERCODE)38 0.30217252 0.82054086 0.3683 0.71268
## factor(FMERCODE)39 -0.14721505 0.71132349 -0.2070 0.83604
## factor(FMERCODE)40 -0.15882656 0.75442030 -0.2105 0.83326
## factor(FMERCODE)41 0.15199062 0.47236365 0.3218 0.74763
## factor(FMERCODE)42 0.16804383 0.61336860 0.2740 0.78411
## factor(FMERCODE)43 0.12419902 0.16661692 0.7454 0.45602
## sigmaSq 0.21011941 0.01420696 14.7899 < 2.2e-16 ***
## gamma 0.99987606 0.00632933 157.9751 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: 0.3139739
##
## cross-sectional data
## total number of observations = 344
##
## mean efficiency: 0.7258831
add data set with information about its panel structure:
library( "plm" )
## Warning: package 'plm' was built under R version 4.1.2
ricePanel <- pdata.frame( riceProdPhil, c( "FMERCODE", "YEARDUM" ) )
Still using Error Components Frontier from Battese & Coelli 1992 with time-invariant efficiencies:
riceTimeInv <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
data = ricePanel )
summary( riceTimeInv )
## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 11 iterations:
## cannot find a parameter vector that results in a log-likelihood value
## larger than the log-likelihood value obtained in the previous step
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.832169 0.275249 -3.0233 0.0025 **
## log(AREA) 0.453897 0.063801 7.1143 1.125e-12 ***
## log(LABOR) 0.288924 0.063639 4.5400 5.625e-06 ***
## log(NPK) 0.227544 0.040859 5.5690 2.562e-08 ***
## sigmaSq 0.155377 0.024202 6.4201 1.362e-10 ***
## gamma 0.464312 0.088023 5.2749 1.328e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -86.43042
##
## panel data
## number of cross-sections = 43
## number of time periods = 8
## total number of observations = 344
## thus there are 0 observations not in the panel
##
## mean efficiency: 0.8187968
This time, Error Components Frontier (Battese & Coelli 1992) with time-variant efficiencies:
riceTimeVar <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
data = ricePanel, timeEffect = TRUE )
summary( riceTimeVar )
## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 13 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.753919 0.278409 -2.7080 0.0067700 **
## log(AREA) 0.474918 0.065422 7.2593 3.891e-13 ***
## log(LABOR) 0.300096 0.063940 4.6934 2.687e-06 ***
## log(NPK) 0.199462 0.042633 4.6786 2.888e-06 ***
## sigmaSq 0.129956 0.021767 5.9703 2.368e-09 ***
## gamma 0.369635 0.107268 3.4459 0.0005691 ***
## time 0.058910 0.031049 1.8973 0.0577859 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -84.55036
##
## panel data
## number of cross-sections = 43
## number of time periods = 8
## total number of observations = 344
## thus there are 0 observations not in the panel
##
## mean efficiency of each year
## 1 2 3 4 5 6 7 8
## 0.7848440 0.7950311 0.8048370 0.8142661 0.8233236 0.8320156 0.8403494 0.8483325
##
## mean efficiency: 0.8178749
Now, Technical Efficiency Effects Frontier (Battese & Coelli 1995) (efficiency effects model with intercept):
riceZ <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) |
EDYRS + BANRAT, data = riceProdPhil )
summary( riceZ )
## Efficiency Effects Frontier (see Battese & Coelli 1995)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 46 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.051802 0.245714 -4.2806 1.864e-05 ***
## log(AREA) 0.379757 0.058010 6.5464 5.896e-11 ***
## log(LABOR) 0.321033 0.060681 5.2905 1.220e-07 ***
## log(NPK) 0.263802 0.033752 7.8160 5.454e-15 ***
## Z_(Intercept) -2.746018 8.520456 -0.3223 0.7472
## Z_EDYRS -0.028582 0.213517 -0.1339 0.8935
## Z_BANRAT -3.634977 8.119596 -0.4477 0.6544
## sigmaSq 1.666313 3.705721 0.4497 0.6530
## gamma 0.978795 0.045296 21.6088 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -77.31363
##
## cross-sectional data
## total number of observations = 344
##
## mean efficiency: 0.7849266
Technical Efficiency Effects Frontier (Battese & Coelli 1995) (efficiency effects model without intercept):
riceZ2 <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) |
EDYRS + BANRAT - 1, data = riceProdPhil )
summary( riceZ2 )
## Efficiency Effects Frontier (see Battese & Coelli 1995)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 24 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.008580 0.246890 -4.0851 4.405e-05 ***
## log(AREA) 0.384099 0.059501 6.4553 1.080e-10 ***
## log(LABOR) 0.318983 0.061278 5.2055 1.935e-07 ***
## log(NPK) 0.260355 0.034371 7.5748 3.597e-14 ***
## Z_EDYRS -0.058069 0.087303 -0.6651 0.50596
## Z_BANRAT -1.405673 0.906085 -1.5514 0.12081
## sigmaSq 0.597476 0.308705 1.9354 0.05294 .
## gamma 0.944931 0.029038 32.5409 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -77.84954
##
## cross-sectional data
## total number of observations = 344
##
## mean efficiency: 0.7707605
Finally for Cost Frontier (with land as quasi-fixed input):
riceProdPhil$cost <- riceProdPhil$LABOR * riceProdPhil$LABORP +
riceProdPhil$NPK * riceProdPhil$NPKP
riceCost <- sfa( log( cost ) ~ log( PROD ) + log( AREA ) + log( LABORP )
+ log( NPKP ), data = riceProdPhil, ineffDecrease = FALSE )
summary( riceCost )
## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency increases the endogenous variable (as in a cost function)
## The dependent variable is logged
## Iterative ML estimation terminated after 11 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.870429 0.201403 29.1477 < 2.2e-16 ***
## log(PROD) 0.461678 0.034405 13.4188 < 2.2e-16 ***
## log(AREA) 0.506498 0.035118 14.4226 < 2.2e-16 ***
## log(LABORP) 0.413795 0.032272 12.8222 < 2.2e-16 ***
## log(NPKP) 0.070512 0.054245 1.2999 0.193642
## sigmaSq 0.070379 0.014346 4.9059 9.301e-07 ***
## gamma 0.530293 0.187373 2.8301 0.004653 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: 39.93057
##
## cross-sectional data
## total number of observations = 344
##
## mean efficiency: 0.8631699
Disclaimer: I do not own this analysis, anyone can circulate and applying the method without citating this markdown.Please refer to the initial source code for further reference and citation.