Stochastic Frontier Analysis

This work is inspired by R documentation and Source Code from https://rdrr.io/cran/frontier/man/frontier.html

in order to find firm’s technical and cost efficiency, one can employ stochastic frontier analysis by enacting frontier package in R. If you do not have the package already, you can ignore this following syntax by removing “#” symbol at the beginning of the code chunk.

#install.packages("frontier")

Activate the frontier library:

library(frontier)

## Warning: package 'frontier' was built under R version 4.1.2

## Loading required package: micEcon

## Warning: package 'micEcon' was built under R version 4.1.2

## 
## If you have questions, suggestions, or comments regarding one of the 'micEcon' packages, please use a forum or 'tracker' at micEcon's R-Forge site:
## https://r-forge.r-project.org/projects/micecon/

## Loading required package: lmtest

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Please cite the 'frontier' package as:
## Tim Coelli and Arne Henningsen (2013). frontier: Stochastic Frontier Analysis. R package version 1.1. http://CRAN.R-Project.org/package=frontier.
## 
## If you have questions, suggestions, or comments regarding the 'frontier' package, please use a forum or 'tracker' at frontier's R-Forge site:
## https://r-forge.r-project.org/projects/frontier/

We will utilize data example from the FRONTIER 4.1 developed by Coelli, T. (1996). The first trial using (cross-section data). explore the glimpse of the data to check the outline and variables inside the dataframe

data( front41Data )
head(front41Data)

##   firm output capital labour
## 1    1 12.778   9.416 35.134
## 2    2 24.285   4.643 77.297
## 3    3 20.855   5.095 89.799
## 4    4 13.213   4.935 35.698
## 5    5 12.018   8.717 27.878
## 6    6 15.284   1.066 92.174

inspect variables inside the data:

names(front41Data)

## [1] "firm"    "output"  "capital" "labour"

str(front41Data)

## 'data.frame':    60 obs. of  4 variables:
##  $ firm   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ output : num  12.8 24.3 20.9 13.2 12 ...
##  $ capital: num  9.42 4.64 5.09 4.93 8.72 ...
##  $ labour : num  35.1 77.3 89.8 35.7 27.9 ...

Execute Cobb-Douglas production frontier:

cobbDouglas <- sfa( log( output ) ~ log( capital ) + log( labour ),
                    data = front41Data )

display the output of stochastic frontier analysis:

summary(cobbDouglas)

## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 7 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
## 
## final maximum likelihood estimates
##              Estimate Std. Error z value  Pr(>|z|)    
## (Intercept)  0.561619   0.202617  2.7718 0.0055742 ** 
## log(capital) 0.281102   0.047643  5.9001 3.632e-09 ***
## log(labour)  0.536480   0.045252 11.8555 < 2.2e-16 ***
## sigmaSq      0.217000   0.063909  3.3955 0.0006851 ***
## gamma        0.797207   0.136424  5.8436 5.109e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -17.02722 
## 
## cross-sectional data
## total number of observations = 60 
## 
## mean efficiency: 0.7405678

Using another data from about rice producers in the Philippines (panel data):

data(riceProdPhil)
head(riceProdPhil)

##   YEARDUM FMERCODE  PROD AREA LABOR   NPK      OTHER PRICE    AREAP   LABORP
## 1       1        1  7.87  2.5   161 207.5  52.863763     5 3821.768 66.23404
## 2       1        2 10.35  3.8   184 303.5 284.441761     5 3029.267 80.43000
## 3       1        3  9.98  3.4   170 252.0  77.616495     5 3413.692 70.40329
## 4       1        4  4.83  1.4    68  88.0  14.297674     5 4438.461 68.50588
## 5       1        5  8.74  3.6   130 149.8  82.675703     5 2527.747 81.89538
## 6       1        6  1.84  0.5    34  21.0   5.991926     5 4937.495 51.74824
##        NPKP   OTHERP AGE EDYRS HHSIZE NADULT BANRAT
## 1 14.072675 44.77946  37    10      7      4   1.00
## 2 15.239012 10.52803  35     6      5      2   0.93
## 3 13.809524 41.50158  48    10      5      5   0.58
## 4  9.886364 61.17833  37     6      3      2   0.36
## 5 13.067023 31.52229  48     6      4      4   0.72
## 6 13.817143 54.24925  49    10      3      3   1.00

inspect the data, check data type of each variables:

names(riceProdPhil)

##  [1] "YEARDUM"  "FMERCODE" "PROD"     "AREA"     "LABOR"    "NPK"     
##  [7] "OTHER"    "PRICE"    "AREAP"    "LABORP"   "NPKP"     "OTHERP"  
## [13] "AGE"      "EDYRS"    "HHSIZE"   "NADULT"   "BANRAT"

str(riceProdPhil)

## 'data.frame':    344 obs. of  17 variables:
##  $ YEARDUM : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ FMERCODE: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ PROD    : num  7.87 10.35 9.98 4.83 8.74 ...
##  $ AREA    : num  2.5 3.8 3.4 1.4 3.6 0.5 1.75 1.7 3 3.4 ...
##  $ LABOR   : num  161 184 170 68 130 34 91 114 211 192 ...
##  $ NPK     : num  208 304 252 88 150 ...
##  $ OTHER   : num  52.9 284.4 77.6 14.3 82.7 ...
##  $ PRICE   : num  5 5 5 5 5 5 5 5 5 5 ...
##  $ AREAP   : num  3822 3029 3414 4438 2528 ...
##  $ LABORP  : num  66.2 80.4 70.4 68.5 81.9 ...
##  $ NPKP    : num  14.07 15.24 13.81 9.89 13.07 ...
##  $ OTHERP  : num  44.8 10.5 41.5 61.2 31.5 ...
##  $ AGE     : int  37 35 48 37 48 49 25 52 54 62 ...
##  $ EDYRS   : int  10 6 10 6 6 10 10 6 10 6 ...
##  $ HHSIZE  : int  7 5 5 3 4 3 3 8 6 4 ...
##  $ NADULT  : int  4 2 5 2 4 3 2 8 4 2 ...
##  $ BANRAT  : num  1 0.93 0.58 0.36 0.72 1 1 1 0.5 0.93 ...

Employing Error Components Frontier (Battese & Coelli 1992) with observation-specific efficiencies (ignoring the panel structure):

rice <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
             data = riceProdPhil )
summary( rice )

## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 9 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
## 
## final maximum likelihood estimates
##              Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) -1.043183   0.252212 -4.1361 3.532e-05 ***
## log(AREA)    0.355520   0.060336  5.8923 3.808e-09 ***
## log(LABOR)   0.333288   0.062692  5.3163 1.059e-07 ***
## log(NPK)     0.271276   0.035238  7.6984 1.378e-14 ***
## sigmaSq      0.238634   0.026750  8.9208 < 2.2e-16 ***
## gamma        0.885391   0.035545 24.9093 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -86.20268 
## 
## cross-sectional data
## total number of observations = 344 
## 
## mean efficiency: 0.7229731

Trying another attempt still using Error Components Frontier (Battese & Coelli 1992) with “true” fixed individual effects and observation-specific efficiencies:

riceTrue <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) + 
                   factor( FMERCODE ),  data = riceProdPhil )

## Warning in sfa(log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) +
## factor(FMERCODE), : the parameter 'gamma' is close to the boundary of the
## parameter space [0,1]: this can cause convergence problems and can negatively
## affect the validity and reliability of statistical tests and might be caused by
## model misspecification

summary( riceTrue )

## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 36 iterations:
## cannot find a parameter vector that results in a log-likelihood value
## larger than the log-likelihood value obtained in the previous step
## 
## final maximum likelihood estimates
##                       Estimate  Std. Error  z value  Pr(>|z|)    
## (Intercept)         0.14901871  0.50175482   0.2970   0.76647    
## log(AREA)           0.53155134  0.08227644   6.4606 1.043e-10 ***
## log(LABOR)          0.19377818  0.11052201   1.7533   0.07955 .  
## log(NPK)            0.12389081  0.07741476   1.6004   0.10952    
## factor(FMERCODE)2   0.27513422  0.77815475   0.3536   0.72366    
## factor(FMERCODE)3  -0.01343728  0.58862288  -0.0228   0.98179    
## factor(FMERCODE)4   0.20057297  0.67502902   0.2971   0.76637    
## factor(FMERCODE)5  -0.03117240  0.39710877  -0.0785   0.93743    
## factor(FMERCODE)6  -0.19563666  0.80169651  -0.2440   0.80721    
## factor(FMERCODE)7   0.15801987  0.71936787   0.2197   0.82613    
## factor(FMERCODE)8   0.04052639  0.88469483   0.0458   0.96346    
## factor(FMERCODE)9   0.04479220  0.63256363   0.0708   0.94355    
## factor(FMERCODE)10  0.15591061  0.90545593   0.1722   0.86329    
## factor(FMERCODE)11 -0.30092416  0.24749552  -1.2159   0.22403    
## factor(FMERCODE)12  0.20205519  0.75543029   0.2675   0.78911    
## factor(FMERCODE)13 -0.09055144  0.91246418  -0.0992   0.92095    
## factor(FMERCODE)14 -0.00191789  0.62478311  -0.0031   0.99755    
## factor(FMERCODE)15 -0.38096390  0.60345085  -0.6313   0.52784    
## factor(FMERCODE)16 -0.12911985  0.83178627  -0.1552   0.87664    
## factor(FMERCODE)17  0.37877389  0.18927034   2.0012   0.04537 *  
## factor(FMERCODE)18  0.28977587  0.47352222   0.6120   0.54057    
## factor(FMERCODE)19  0.39447872  0.19488167   2.0242   0.04295 *  
## factor(FMERCODE)20  0.13030699  0.57221457   0.2277   0.81986    
## factor(FMERCODE)21  0.14694707  0.44069363   0.3334   0.73880    
## factor(FMERCODE)22  0.06868002  0.65553066   0.1048   0.91656    
## factor(FMERCODE)23 -0.03685168  0.42198817  -0.0873   0.93041    
## factor(FMERCODE)24  0.00946378  0.49612205   0.0191   0.98478    
## factor(FMERCODE)25  0.15369959  0.68591584   0.2241   0.82270    
## factor(FMERCODE)26 -0.06728899  0.60446614  -0.1113   0.91136    
## factor(FMERCODE)27  0.04860041  0.85920711   0.0566   0.95489    
## factor(FMERCODE)28  0.08921273  0.80328445   0.1111   0.91157    
## factor(FMERCODE)29 -0.01676247  0.43853474  -0.0382   0.96951    
## factor(FMERCODE)30 -0.48339665  0.24290957  -1.9900   0.04659 *  
## factor(FMERCODE)31  0.22707161  0.20506536   1.1073   0.26816    
## factor(FMERCODE)32  0.52550734  0.38819505   1.3537   0.17583    
## factor(FMERCODE)33 -0.00309571  0.87978492  -0.0035   0.99719    
## factor(FMERCODE)34 -0.71432603  0.33774672  -2.1150   0.03443 *  
## factor(FMERCODE)35  0.18284349  0.67311351   0.2716   0.78590    
## factor(FMERCODE)36  0.00068349  0.36595801   0.0019   0.99851    
## factor(FMERCODE)37  0.45285689  0.52943159   0.8554   0.39235    
## factor(FMERCODE)38  0.30217252  0.82054086   0.3683   0.71268    
## factor(FMERCODE)39 -0.14721505  0.71132349  -0.2070   0.83604    
## factor(FMERCODE)40 -0.15882656  0.75442030  -0.2105   0.83326    
## factor(FMERCODE)41  0.15199062  0.47236365   0.3218   0.74763    
## factor(FMERCODE)42  0.16804383  0.61336860   0.2740   0.78411    
## factor(FMERCODE)43  0.12419902  0.16661692   0.7454   0.45602    
## sigmaSq             0.21011941  0.01420696  14.7899 < 2.2e-16 ***
## gamma               0.99987606  0.00632933 157.9751 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: 0.3139739 
## 
## cross-sectional data
## total number of observations = 344 
## 
## mean efficiency: 0.7258831

add data set with information about its panel structure:

library( "plm" )

## Warning: package 'plm' was built under R version 4.1.2

  ricePanel <- pdata.frame( riceProdPhil, c( "FMERCODE", "YEARDUM" ) )

Still using Error Components Frontier from Battese & Coelli 1992 with time-invariant efficiencies:

riceTimeInv <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
                    data = ricePanel )
summary( riceTimeInv )

## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 11 iterations:
## cannot find a parameter vector that results in a log-likelihood value
## larger than the log-likelihood value obtained in the previous step
## 
## final maximum likelihood estimates
##              Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) -0.832169   0.275249 -3.0233    0.0025 ** 
## log(AREA)    0.453897   0.063801  7.1143 1.125e-12 ***
## log(LABOR)   0.288924   0.063639  4.5400 5.625e-06 ***
## log(NPK)     0.227544   0.040859  5.5690 2.562e-08 ***
## sigmaSq      0.155377   0.024202  6.4201 1.362e-10 ***
## gamma        0.464312   0.088023  5.2749 1.328e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -86.43042 
## 
## panel data
## number of cross-sections = 43 
## number of time periods = 8 
## total number of observations = 344 
## thus there are 0 observations not in the panel
## 
## mean efficiency: 0.8187968

This time, Error Components Frontier (Battese & Coelli 1992) with time-variant efficiencies:

riceTimeVar <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
                    data = ricePanel, timeEffect = TRUE )
summary( riceTimeVar )

## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 13 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
## 
## final maximum likelihood estimates
##              Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) -0.753919   0.278409 -2.7080 0.0067700 ** 
## log(AREA)    0.474918   0.065422  7.2593 3.891e-13 ***
## log(LABOR)   0.300096   0.063940  4.6934 2.687e-06 ***
## log(NPK)     0.199462   0.042633  4.6786 2.888e-06 ***
## sigmaSq      0.129956   0.021767  5.9703 2.368e-09 ***
## gamma        0.369635   0.107268  3.4459 0.0005691 ***
## time         0.058910   0.031049  1.8973 0.0577859 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -84.55036 
## 
## panel data
## number of cross-sections = 43 
## number of time periods = 8 
## total number of observations = 344 
## thus there are 0 observations not in the panel
## 
## mean efficiency of each year
##         1         2         3         4         5         6         7         8 
## 0.7848440 0.7950311 0.8048370 0.8142661 0.8233236 0.8320156 0.8403494 0.8483325 
## 
## mean efficiency: 0.8178749

Now, Technical Efficiency Effects Frontier (Battese & Coelli 1995) (efficiency effects model with intercept):

riceZ <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) |
                EDYRS + BANRAT, data = riceProdPhil )
summary( riceZ )

## Efficiency Effects Frontier (see Battese & Coelli 1995)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 46 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
## 
## final maximum likelihood estimates
##                Estimate Std. Error z value  Pr(>|z|)    
## (Intercept)   -1.051802   0.245714 -4.2806 1.864e-05 ***
## log(AREA)      0.379757   0.058010  6.5464 5.896e-11 ***
## log(LABOR)     0.321033   0.060681  5.2905 1.220e-07 ***
## log(NPK)       0.263802   0.033752  7.8160 5.454e-15 ***
## Z_(Intercept) -2.746018   8.520456 -0.3223    0.7472    
## Z_EDYRS       -0.028582   0.213517 -0.1339    0.8935    
## Z_BANRAT      -3.634977   8.119596 -0.4477    0.6544    
## sigmaSq        1.666313   3.705721  0.4497    0.6530    
## gamma          0.978795   0.045296 21.6088 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -77.31363 
## 
## cross-sectional data
## total number of observations = 344 
## 
## mean efficiency: 0.7849266

Technical Efficiency Effects Frontier (Battese & Coelli 1995) (efficiency effects model without intercept):

riceZ2 <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) |
                 EDYRS + BANRAT - 1, data = riceProdPhil )
summary( riceZ2 )

## Efficiency Effects Frontier (see Battese & Coelli 1995)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 24 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
## 
## final maximum likelihood estimates
##              Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) -1.008580   0.246890 -4.0851 4.405e-05 ***
## log(AREA)    0.384099   0.059501  6.4553 1.080e-10 ***
## log(LABOR)   0.318983   0.061278  5.2055 1.935e-07 ***
## log(NPK)     0.260355   0.034371  7.5748 3.597e-14 ***
## Z_EDYRS     -0.058069   0.087303 -0.6651   0.50596    
## Z_BANRAT    -1.405673   0.906085 -1.5514   0.12081    
## sigmaSq      0.597476   0.308705  1.9354   0.05294 .  
## gamma        0.944931   0.029038 32.5409 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -77.84954 
## 
## cross-sectional data
## total number of observations = 344 
## 
## mean efficiency: 0.7707605

Finally for Cost Frontier (with land as quasi-fixed input):

riceProdPhil$cost <- riceProdPhil$LABOR * riceProdPhil$LABORP +
  riceProdPhil$NPK * riceProdPhil$NPKP
riceCost <- sfa( log( cost ) ~ log( PROD ) + log( AREA ) + log( LABORP )
                 + log( NPKP ), data = riceProdPhil, ineffDecrease = FALSE )
summary( riceCost )

## Error Components Frontier (see Battese & Coelli 1992)
## Inefficiency increases the endogenous variable (as in a cost function)
## The dependent variable is logged
## Iterative ML estimation terminated after 11 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
## 
## final maximum likelihood estimates
##             Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) 5.870429   0.201403 29.1477 < 2.2e-16 ***
## log(PROD)   0.461678   0.034405 13.4188 < 2.2e-16 ***
## log(AREA)   0.506498   0.035118 14.4226 < 2.2e-16 ***
## log(LABORP) 0.413795   0.032272 12.8222 < 2.2e-16 ***
## log(NPKP)   0.070512   0.054245  1.2999  0.193642    
## sigmaSq     0.070379   0.014346  4.9059 9.301e-07 ***
## gamma       0.530293   0.187373  2.8301  0.004653 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: 39.93057 
## 
## cross-sectional data
## total number of observations = 344 
## 
## mean efficiency: 0.8631699

Disclaimer: I do not own this analysis, anyone can circulate and applying the method without citating this markdown.Please refer to the initial source code for further reference and citation.

Stochastic Frontier Analysis

Sachnaz Desta Oktarina

1/26/2022