Let’s extend the stochastic frontier analysis (SFA) by incorporating a Bayesian approach. Bayesian Stochastic Frontier Analysis (BSFA) allows us to incorporate prior information about the parameters and to fully quantify uncertainty in the estimates, not just through point estimates but also by providing credible intervals.

1. Bayesian Stochastic Frontier Analysis with brms Package

We’ll use the brms package in R, which provides a flexible framework for Bayesian generalized (non-)linear multivariate multilevel models using Stan. The brms package can fit stochastic frontier models with a Bayesian approach by specifying a log-normal likelihood and incorporating inefficiency via the model’s random effects.

First, let’s install and load the necessary packages:

# Install and load the required packages

library(brms)
## Warning: package 'brms' was built under R version 4.0.5
## Loading required package: Rcpp
## Warning: package 'Rcpp' was built under R version 4.0.5
## Loading 'brms' package (version 2.16.3). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
## 
## Attaching package: 'brms'
## The following object is masked from 'package:stats':
## 
##     ar
library(rstan)
## Warning: package 'rstan' was built under R version 4.0.5
## Loading required package: StanHeaders
## Warning: package 'StanHeaders' was built under R version 4.0.5
## Loading required package: ggplot2
## rstan (Version 2.21.5, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file

2. Define the Model

We will specify a Bayesian stochastic frontier model using a Cobb-Douglas production function, similar to the classical approach, but with Bayesian estimation.

# Simulate the agricultural data (if not already available)
set.seed(123)
n <- 100  # Number of observations
farm_size <- rnorm(n, mean = 50, sd = 10)
labor <- rnorm(n, mean = 30, sd = 5)
fertilizer <- rnorm(n, mean = 100, sd = 20)
seed <- rnorm(n, mean = 10, sd = 2)
year <- sample(2000:2020, n, replace = TRUE)
epsilon <- rnorm(n, 0, 0.2)  # Random noise

# Assuming a Cobb-Douglas production function for yield
beta_0 <- 1
beta_farm_size <- 0.3
beta_labor <- 0.4
beta_fertilizer <- 0.2
beta_seed <- 0.1
beta_year <- 0.01

# Generate yield (output)
yield <- exp(beta_0 + beta_farm_size * log(farm_size) + 
             beta_labor * log(labor) +
             beta_fertilizer * log(fertilizer) +
             beta_seed * log(seed) + 
             beta_year * (year - 2000) + epsilon)

# Combine data into a data frame
data <- data.frame(farm_size, labor, fertilizer, seed, year, yield)

3. Fit the Bayesian Stochastic Frontier Model

Now we will fit the Bayesian Stochastic Frontier Model using brms. We assume that inefficiency follows a half-normal distribution, which we can model by including a random effect in the model.

# Define the formula for the Cobb-Douglas production function
# We use log(yield) and include a random effect for inefficiency (farm-specific)
formula <- bf(log(yield) ~ log(farm_size) + log(labor) + log(fertilizer) + log(seed) + (1 | year))

# Fit the Bayesian Stochastic Frontier Model
bsfa_model <- brm(
  formula = formula,
  data = data,
  family = gaussian(),  # Normal distribution for the log-transformed outcome
  prior = c(
    set_prior("normal(0, 5)", class = "b"),          # Priors for coefficients
    set_prior("student_t(3, 0, 10)", class = "sigma") # Prior for the residual standard deviation
  ),
  chains = 4,  # Number of Markov chains
  iter = 2000, # Number of iterations per chain
  warmup = 1000,  # Warm-up (burn-in) iterations
  seed = 123   # For reproducibility
)
## Compiling Stan program...
## Start sampling
## 
## SAMPLING FOR MODEL '1406e9a66719def158ce763db90a346c' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.657 seconds (Warm-up)
## Chain 1:                0.25 seconds (Sampling)
## Chain 1:                0.907 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '1406e9a66719def158ce763db90a346c' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.501 seconds (Warm-up)
## Chain 2:                0.217 seconds (Sampling)
## Chain 2:                0.718 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '1406e9a66719def158ce763db90a346c' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.601 seconds (Warm-up)
## Chain 3:                0.267 seconds (Sampling)
## Chain 3:                0.868 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '1406e9a66719def158ce763db90a346c' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.517 seconds (Warm-up)
## Chain 4:                0.267 seconds (Sampling)
## Chain 4:                0.784 seconds (Total)
## Chain 4:
# Print a summary of the model
summary(bsfa_model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log(yield) ~ log(farm_size) + log(labor) + log(fertilizer) + log(seed) + (1 | year) 
##    Data: data (Number of observations: 100) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~year (Number of levels: 21) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.08      0.03     0.01     0.15 1.00      765      993
## 
## Population-Level Effects: 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept         0.18      0.83    -1.42     1.80 1.00     5640     2735
## logfarm_size      0.39      0.10     0.19     0.60 1.00     5927     2643
## loglabor          0.34      0.12     0.10     0.57 1.00     5627     3034
## logfertilizer     0.33      0.11     0.12     0.54 1.00     5397     3084
## logseed           0.16      0.09    -0.01     0.34 1.00     6819     2834
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.18      0.02     0.16     0.22 1.00     2575     2724
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

4. Diagnostics and Model Assessment

After fitting the model, it’s important to check the diagnostics to ensure that the Markov chains have converged and that the model fits the data well.

# Plot diagnostics
plot(bsfa_model)

# Posterior predictive checks
#pp_check(bsfa_model)

#Residuals Plot Issue

# Extract fitted values
fitted_values <- fitted(bsfa_model, summary = FALSE)[, 1]

# Compute residuals manually
residuals <- data$yield - fitted_values

# Create a data frame for plotting
plot_data <- data.frame(fitted_values = fitted_values, residuals = residuals)

# Plot the residuals vs. fitted values
residuals_plot <- ggplot(plot_data, aes(x = fitted_values, y = residuals)) +
  geom_point() +
  labs(x = "Fitted Values", y = "Residuals", title = "Residuals vs. Fitted Values") +
  theme_minimal()

print(residuals_plot)

#Posterior Predictive Checks Make sure to update the pp_check function:

# Posterior predictive checks
pp_check_plot <- pp_check(bsfa_model, ndraws = 100) +
  ggtitle("Posterior Predictive Checks")

print(pp_check_plot)

The Bayesian approach to stochastic frontier analysis allows us to:

  1. Model the production function using prior information and quantify uncertainty more comprehensively.
  2. Incorporate inefficiency as a random effect and obtain posterior distributions for efficiency scores.
  3. Visualize posterior distributions of parameters and efficiency scores with credible intervals.
  4. Perform diagnostics to ensure that the Bayesian model is well-specified and that the chains have converged.

You can run the code in your R environment to fit a Bayesian Stochastic Frontier Model using brms, and further customize the model according to your specific needs or research questions.

References: Najkar, N.2024. Bayesian Approaches in Stochastic Frontier Analysis with R: Models, Methodologies, and Applications. Self_Published. ISBN: 1230008201768