Let’s extend the stochastic frontier analysis (SFA) by incorporating a Bayesian approach. Bayesian Stochastic Frontier Analysis (BSFA) allows us to incorporate prior information about the parameters and to fully quantify uncertainty in the estimates, not just through point estimates but also by providing credible intervals.
brms
PackageWe’ll use the brms package in R, which provides a
flexible framework for Bayesian generalized (non-)linear multivariate
multilevel models using Stan. The brms package
can fit stochastic frontier models with a Bayesian approach by
specifying a log-normal likelihood and incorporating inefficiency via
the model’s random effects.
First, let’s install and load the necessary packages:
# Install and load the required packages
library(brms)
## Warning: package 'brms' was built under R version 4.0.5
## Loading required package: Rcpp
## Warning: package 'Rcpp' was built under R version 4.0.5
## Loading 'brms' package (version 2.16.3). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
##
## Attaching package: 'brms'
## The following object is masked from 'package:stats':
##
## ar
library(rstan)
## Warning: package 'rstan' was built under R version 4.0.5
## Loading required package: StanHeaders
## Warning: package 'StanHeaders' was built under R version 4.0.5
## Loading required package: ggplot2
## rstan (Version 2.21.5, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
We will specify a Bayesian stochastic frontier model using a Cobb-Douglas production function, similar to the classical approach, but with Bayesian estimation.
# Simulate the agricultural data (if not already available)
set.seed(123)
n <- 100 # Number of observations
farm_size <- rnorm(n, mean = 50, sd = 10)
labor <- rnorm(n, mean = 30, sd = 5)
fertilizer <- rnorm(n, mean = 100, sd = 20)
seed <- rnorm(n, mean = 10, sd = 2)
year <- sample(2000:2020, n, replace = TRUE)
epsilon <- rnorm(n, 0, 0.2) # Random noise
# Assuming a Cobb-Douglas production function for yield
beta_0 <- 1
beta_farm_size <- 0.3
beta_labor <- 0.4
beta_fertilizer <- 0.2
beta_seed <- 0.1
beta_year <- 0.01
# Generate yield (output)
yield <- exp(beta_0 + beta_farm_size * log(farm_size) +
beta_labor * log(labor) +
beta_fertilizer * log(fertilizer) +
beta_seed * log(seed) +
beta_year * (year - 2000) + epsilon)
# Combine data into a data frame
data <- data.frame(farm_size, labor, fertilizer, seed, year, yield)
Now we will fit the Bayesian Stochastic Frontier Model using
brms. We assume that inefficiency follows a half-normal
distribution, which we can model by including a random effect in the
model.
# Define the formula for the Cobb-Douglas production function
# We use log(yield) and include a random effect for inefficiency (farm-specific)
formula <- bf(log(yield) ~ log(farm_size) + log(labor) + log(fertilizer) + log(seed) + (1 | year))
# Fit the Bayesian Stochastic Frontier Model
bsfa_model <- brm(
formula = formula,
data = data,
family = gaussian(), # Normal distribution for the log-transformed outcome
prior = c(
set_prior("normal(0, 5)", class = "b"), # Priors for coefficients
set_prior("student_t(3, 0, 10)", class = "sigma") # Prior for the residual standard deviation
),
chains = 4, # Number of Markov chains
iter = 2000, # Number of iterations per chain
warmup = 1000, # Warm-up (burn-in) iterations
seed = 123 # For reproducibility
)
## Compiling Stan program...
## Start sampling
##
## SAMPLING FOR MODEL '1406e9a66719def158ce763db90a346c' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.657 seconds (Warm-up)
## Chain 1: 0.25 seconds (Sampling)
## Chain 1: 0.907 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL '1406e9a66719def158ce763db90a346c' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.501 seconds (Warm-up)
## Chain 2: 0.217 seconds (Sampling)
## Chain 2: 0.718 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL '1406e9a66719def158ce763db90a346c' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.601 seconds (Warm-up)
## Chain 3: 0.267 seconds (Sampling)
## Chain 3: 0.868 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL '1406e9a66719def158ce763db90a346c' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.517 seconds (Warm-up)
## Chain 4: 0.267 seconds (Sampling)
## Chain 4: 0.784 seconds (Total)
## Chain 4:
# Print a summary of the model
summary(bsfa_model)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: log(yield) ~ log(farm_size) + log(labor) + log(fertilizer) + log(seed) + (1 | year)
## Data: data (Number of observations: 100)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~year (Number of levels: 21)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.08 0.03 0.01 0.15 1.00 765 993
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.18 0.83 -1.42 1.80 1.00 5640 2735
## logfarm_size 0.39 0.10 0.19 0.60 1.00 5927 2643
## loglabor 0.34 0.12 0.10 0.57 1.00 5627 3034
## logfertilizer 0.33 0.11 0.12 0.54 1.00 5397 3084
## logseed 0.16 0.09 -0.01 0.34 1.00 6819 2834
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.18 0.02 0.16 0.22 1.00 2575 2724
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
After fitting the model, it’s important to check the diagnostics to ensure that the Markov chains have converged and that the model fits the data well.
# Plot diagnostics
plot(bsfa_model)
# Posterior predictive checks
#pp_check(bsfa_model)
#Residuals Plot Issue
# Extract fitted values
fitted_values <- fitted(bsfa_model, summary = FALSE)[, 1]
# Compute residuals manually
residuals <- data$yield - fitted_values
# Create a data frame for plotting
plot_data <- data.frame(fitted_values = fitted_values, residuals = residuals)
# Plot the residuals vs. fitted values
residuals_plot <- ggplot(plot_data, aes(x = fitted_values, y = residuals)) +
geom_point() +
labs(x = "Fitted Values", y = "Residuals", title = "Residuals vs. Fitted Values") +
theme_minimal()
print(residuals_plot)
#Posterior Predictive Checks Make sure to update the pp_check function:
# Posterior predictive checks
pp_check_plot <- pp_check(bsfa_model, ndraws = 100) +
ggtitle("Posterior Predictive Checks")
print(pp_check_plot)
The Bayesian approach to stochastic frontier analysis allows us to:
You can run the code in your R environment to fit a Bayesian
Stochastic Frontier Model using brms, and further customize
the model according to your specific needs or research questions.
References: Najkar, N.2024. Bayesian Approaches in Stochastic Frontier Analysis with R: Models, Methodologies, and Applications. Self_Published. ISBN: 1230008201768