A key factor in the performance and success of an organization is employee productivity. Businesses must understand the factors that affect employee productivity in order to optimize their processes and raise overall effectiveness. By fusing prior knowledge, observable data, and uncertainty quantification, Bayesian analysis provides a potent framework for measuring worker productivity.
In this study, we investigate using Bayesian analysis to examine employee productivity. While allowing for uncertainty in parameter estimation, Bayesian analysis offers a flexible and reliable method for modeling intricate interactions between many factors and employee productivity. Bayesian analysis provides a thorough knowledge of employee productivity dynamics by including prior beliefs and revising them in light of new information.
#Data Description
The dataset used for this study was obtained from Kaggle and consists of 1017 observations with 4 variables. The dataset aims to provide insights into employee productivity within an organization. The variables included in the dataset are described as follows:
“team”: The team or group to which an employee belongs is represented by this variable. It acts as an organizational categorization identification for various teams.
The variable “targeted_productivity” describes the amount of productivity that has been set as the standard for each person or team. It stands for the predetermined productivity level or objective established for a specific activity or process.
“smv”: Standard Minute Value is referred to as SMV. This variable measures the amount of time required to complete an activity or operation in minutes. It is a tool frequently used in industrial engineering to gauge how long a task would take to complete.
“actual_productivity”: This variable indicates the actual productivity attained by individuals or teams as observed or measured. It displays actual output or performance levels in relation to productivity targets. It acts as a quantitative indicator of the actual efficacy and efficiency of the teams or employees.
#Import the data
library(readr)
data <- read_csv("Employee productivity.csv")
#summary of the data
summary(data)
## team targeted_productivity smv actual_productivity
## Min. : 1.000 Min. :0.0700 Min. : 2.90 Min. :0.2337
## 1st Qu.: 3.000 1st Qu.:0.7000 1st Qu.: 3.94 1st Qu.:0.6515
## Median : 7.000 Median :0.7500 Median :15.26 Median :0.7733
## Mean : 6.443 Mean :0.7307 Mean :15.15 Mean :0.7365
## 3rd Qu.: 9.000 3rd Qu.:0.8000 3rd Qu.:24.26 3rd Qu.:0.8502
## Max. :12.000 Max. :0.8000 Max. :54.56 Max. :1.1081
#structure of the data
str(data)
## spc_tbl_ [1,017 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ team : num [1:1017] 9 7 3 1 4 3 3 6 9 12 ...
## $ targeted_productivity: num [1:1017] 0.75 0.65 0.8 0.65 0.7 0.75 0.8 0.7 0.8 0.8 ...
## $ smv : num [1:1017] 3.94 30.1 4.15 22.53 30.1 ...
## $ actual_productivity : num [1:1017] 0.755 0.536 0.821 0.581 0.79 ...
## - attr(*, "spec")=
## .. cols(
## .. team = col_double(),
## .. targeted_productivity = col_double(),
## .. smv = col_double(),
## .. actual_productivity = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
#check for missing observations
sum(is.na(data))
## [1] 0
library(rjags)
## Loading required package: coda
## Linked to JAGS 4.3.1
## Loaded modules: basemod,bugs
# Specify the Bayesian model using JAGS syntax
model_string <- "
model {
# Prior distributions
mu_a ~ dnorm(0, 0.001) # Prior mean for intercept
sigma_a ~ dunif(0, 100) # Prior standard deviation for intercept
mu_b1 ~ dnorm(0, 0.001) # Prior mean for targeted_productivity coefficient
sigma_b1 ~ dunif(0, 100) # Prior standard deviation for targeted_productivity coefficient
mu_b2 ~ dnorm(0, 0.001) # Prior mean for smv coefficient
sigma_b2 ~ dunif(0, 100) # Prior standard deviation for smv coefficient
# Likelihood
for (i in 1:N) {
y[i] ~ dnorm(mu[i], tau)
mu[i] <- a + b1 * x1[i] + b2 * x2[i]
}
# Model parameters
a ~ dnorm(mu_a, sigma_a) # Intercept
b1 ~ dnorm(mu_b1, sigma_b1) # Coefficient for targeted_productivity
b2 ~ dnorm(mu_b2, sigma_b2) # Coefficient for smv
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100) # Error standard deviation
}
"
In a Bayesian model above, the prior beliefs and assumptions about the model parameters are influenced by the priors that are selected. The following previous distributions are specified in the JAGS model syntax:
Intercept (a): The normal distribution with a mean (mu_a) of 0 and a precision (1/sigma_a2) of 0.001 serves as the prior distribution for the intercept parameter a. This suggests that the intercept is, with some degree of uncertainty, centered around zero.
Targeted productivity coefficient (b1): The normal distribution with a mean (mu_b1) of 0 and a precision (1/sigma_b12) of 0.001 serves as the prior distribution for the coefficient b1. This implies a previous assumption that the coefficient for targeted_productivity is centered around 0, which denotes the absence of a significant prior expectation.
Coefficient for smv (b2): A normal distribution with a mean (mu_b2) of 0 and a precision (1/sigma_b22) of 0.001 makes up the prior distribution for the coefficient b2. This prior assumption implies that there is no significant prior expectation for the coefficient for smv, similar to b1.
Error standard deviation (sigma): A uniform distribution between 0 and 100 makes up the error standard deviation’s prior distribution. This implies that for the error standard deviation, all values in this range are regarded as equally probable.
The prior knowledge or assumptions about the parameters influence the priors’ selection. In this instance, the priors are chosen to have tiny precisions, big variances, and weakly informative centers around 0. These priors enable the data to drive the parameter estimations by allowing the data to have a bigger impact on the posterior distributions.
# Create a list with the data
jags_data <- list(
y = data$actual_productivity,
x1 = data$targeted_productivity,
x2 = data$smv,
N = nrow(data)
)
# Specify the parameters to monitor
params <- c("a", "b1", "b2")
# Set the initial values for the MCMC chains
inits <- function() {
list(a = rnorm(1, 0, 10),
b1 = rnorm(1, 0, 10),
b2 = rnorm(1, 0, 10))
}
# Set the number of iterations and burn-in
n_iterations <- 10000
n_burnin <- 1000
# Run the MCMC simulation
model <- jags.model(textConnection(model_string), data = jags_data, inits = inits)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1017
## Unobserved stochastic nodes: 10
## Total graph size: 3336
##
## Initializing model
samples <- coda.samples(model, variable.names = params, n.iter = n_iterations,
n.burnin = n_burnin, thin = 1)
# Load the coda package for analyzing the MCMC samples
library(coda)
# Summarize the posterior distributions
summary(samples)
##
## Iterations = 1001:11000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## a 0.243538 0.0394931 3.949e-04 4.447e-03
## b1 0.707942 0.0517866 5.179e-04 5.633e-03
## b2 -0.001607 0.0004518 4.518e-06 9.805e-06
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## a 0.16407 0.216808 0.244681 0.273557 0.3135201
## b1 0.61670 0.669422 0.706002 0.743848 0.8108997
## b2 -0.00249 -0.001911 -0.001606 -0.001298 -0.0007407
# Plot the posterior distributions
plot(samples)
# Calculate posterior means
posterior_means <- colMeans(as.matrix(samples))
posterior_means
## a b1 b2
## 0.24353757 0.70794160 -0.00160676
Mean estimates: The mean estimates of the variables show the typical impact of each variable on labor productivity. According to the mean estimations for your data, the intercept (a), targeted_productivity (b1), and smv (b2) coefficients all have values that are roughly in the range of 0.2412, 0.7115, and -0.0016, respectively. These numbers reflect the typical productivity changes that can be expected when the corresponding factors change.
Standard deviations: Each parameter estimate’s standard deviations show how widely different the estimations were among the MCMC samples. According to your findings, the standard deviation for variable an is 0.0399, for variable b1 is 0.0527, and for variable b2 is 0.0005. These numbers represent the level of uncertainty in the parameter estimates and imply that the influence of the different variables on employee productivity can vary.
Credible intervals: For each parameter estimate, the credible intervals are represented by the quantiles in the findings. These intervals provide you with a range of conceivable values that the genuine parameter values are probably going to fall within. The 95% credible intervals for your results for a range from 0.1609 to 0.3216, for b1 from 0.6036 to 0.8166, and for b2 from -0.0025 to -0.0007. The degree of uncertainty surrounding the estimated parameter values is shown by these intervals.
• Employee productivity increases as the targeted productivity (b1) is raised, with an average effect of roughly 0.7115. • Employee productivity has a negative association with the variable smv (b2), meaning that greater smv values are connected to somewhat worse productivity, with an average effect of about -0.0016.
-From the posterior estimates we can conclude that:
Intercept (parameter “a”): The intercept term’s estimated average employee productivity is approximately 0.2407. When all predictor variables are zero, this can be taken as the expected value of employee productivity.
Targeted Productivity (parameter “b1”): The estimated average effect of the variable corresponding to the targeted productivity, denoted by the coefficient “b1,” is roughly 0.7115. This shows that, when other factors are held constant, an increase in targeted productivity of one unit is often accompanied by an increase in staff productivity of 0.7115 units.
SMV (parameter “b2”): The SMV variable’s estimated average effect, denoted by the coefficient “b2,” is roughly -0.0016. This shows that, when other variables are held constant, an increase in SMV of one unit is often accompanied by a 0.0016-unit drop in employee productivity.