Verification and Validation of Simulation Models: R

Author
Affiliations

Dr. Md. Nasif Hossain

Department of Statistics, Pabna University of Science and Technology, Pabna, Bangladesh

Published

May 12, 2026

Independence Square | PUST Source: PUST Photographic Society Facebook page

Loading Package in R

#for easy data manipulation and visualization
library(tidyverse)

#for easily computing cross-validation methods
library(caret)

Background

Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888. Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries. The data collected are for 47 French-speaking “provinces” at about 1888.

Statistics

  1. Fertility: common standardized fertility measure

  2. Agriculture: % of males involved in agriculture as occupation

  3. Examination: % draftees receiving highest mark on army examination

  4. Education: % education beyond primary school for draftees.

  5. Catholic: % ‘catholic’ (as opposed to ‘protestant’).

  6. Infant.Mortality: live births who live less than 1 year.

Load dataset

data("swiss")
head(swiss)
             Fertility Agriculture Examination Education Catholic
Courtelary        80.2        17.0          15        12     9.96
Delemont          83.1        45.1           6         9    84.84
Franches-Mnt      92.5        39.7           5         5    93.40
Moutier           85.8        36.5          12         7    33.77
Neuveville        76.9        43.5          17        15     5.16
Porrentruy        76.1        35.3           9         7    90.57
             Infant.Mortality
Courtelary               22.2
Delemont                 22.2
Franches-Mnt             20.2
Moutier                  20.3
Neuveville               20.6
Porrentruy               26.6
#Inspectthedata
sample_n(swiss,3)
            Fertility Agriculture Examination Education Catholic
Orbe             57.4        54.1          20         6     4.20
Sierre           92.2        84.6           3         3    99.46
Rive Gauche      42.8        27.7          22        29    58.33
            Infant.Mortality
Orbe                    15.3
Sierre                  16.3
Rive Gauche             19.3

Model performance metrics to estimate prediction error

To assess how well a model predicts new data, the basic strategy is:

  1. Build (train) the model using a training dataset
    Fit the model on a subset of the data, called the training set, so that it can learn the underlying patterns and relationships.

  2. Apply the model to a separate test dataset to make predictions
    Use the fitted model to predict outcomes for a new, unseen dataset, called the test set.

  3. Compute the prediction error
    Compare the predicted values with the actual observed values in the test set using appropriate performance metrics.

Model Validation

The Validation Set Approach

# Split the data into training and test set

set.seed(123)

training.samples<-swiss$Fertility%>% createDataPartition(p = 0.8, list = FALSE)
train.data<-swiss[training.samples, ]
test.data<-swiss[-training.samples, ]

Build a model with train dataset

# Build the model
model <-lm(Fertility ~., data = train.data)
summary(model)

Call:
lm(formula = Fertility ~ ., data = train.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.1389  -4.8632   0.6051   5.1009  14.0352 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      74.19340   12.45329   5.958 1.09e-06 ***
Agriculture      -0.18635    0.07921  -2.353   0.0248 *  
Examination      -0.31997    0.27590  -1.160   0.2545    
Education        -0.87355    0.19479  -4.485 8.35e-05 ***
Catholic          0.10648    0.04179   2.548   0.0157 *  
Infant.Mortality  0.78075    0.45373   1.721   0.0947 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.417 on 33 degrees of freedom
Multiple R-squared:  0.7145,    Adjusted R-squared:  0.6713 
F-statistic: 16.52 on 5 and 33 DF,  p-value: 3.653e-08
# Make predictions and compute the R2, RMSE and MAE
predictions <-model %>% predict(test.data)

data.frame( R2 = R2(predictions, test.data$Fertility),
            RMSE = RMSE(predictions, test.data$Fertility),
            MAE = MAE(predictions, test.data$Fertility))
         R2     RMSE      MAE
1 0.5946201 6.410914 5.651552
# Prediction error rate
PER <-RMSE(predictions, test.data$Fertility)/mean(test.data$Fertility)
PER
[1] 0.08800157

Leave one out cross validation-LOOCV

# Define training control
train.control<-trainControl(method = "LOOCV")

# Train the model
model <-train(Fertility ~., data = swiss, method = "lm", trControl= train.control)

# Summarize the results
print(model)
Linear Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 46, 46, 46, 46, 46, 46, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  7.738618  0.6128307  6.116021

Tuning parameter 'intercept' was held constant at a value of TRUE

K-fold cross-validation

#K-fold cross-validation(cv)
#Define training control
set.seed(123)

train.control<-trainControl(method="cv",number=10)

#Train the model
model<-train(Fertility~.,data=swiss,method="lm",
             trControl=train.control)

#Summarize the results
print(model)
Linear Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ... 
Resampling results:

  RMSE      Rsquared   MAE    
  7.424916  0.6922072  6.31218

Tuning parameter 'intercept' was held constant at a value of TRUE

Repeated K-fold cross-validation

# Repeated K-fold cross-validation
# Define training control
set.seed(123)

train.control<-trainControl(method = "repeatedcv", number = 10, repeats = 3)

# Train the model
model <-train(Fertility ~., data = swiss, method = "lm", trControl= train.control)

# Summarize the results
print(model)
Linear Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ... 
Resampling results:

  RMSE      Rsquared   MAE    
  7.357186  0.6992415  6.15871

Tuning parameter 'intercept' was held constant at a value of TRUE

Model Comparison (Validation across models)

# Linear model
lm_model <- train(Fertility ~ ., data = swiss, method = "lm",
                  trControl = trainControl(method="cv", number=10))

# Ridge regression
ridge_model <- train(Fertility ~ ., data = swiss,
                     method = "ridge",
                     trControl = trainControl(method="cv", number=10),
                     tuneLength = 5)

lm_model
Linear Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 43, 42, 43, 42, 42, 42, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  7.762255  0.6715954  6.376696

Tuning parameter 'intercept' was held constant at a value of TRUE
ridge_model
Ridge Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 43, 42, 42, 43, 43, 41, ... 
Resampling results across tuning parameters:

  lambda  RMSE      Rsquared   MAE     
  0e+00   7.207681  0.6976287  6.063376
  1e-04   7.207527  0.6976531  6.063195
  1e-03   7.206162  0.6978695  6.061578
  1e-02   7.194424  0.6997459  6.046308
  1e-01   7.179693  0.7020423  6.029846

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.1.

The 10-fold cross-validation results indicate that the standard linear regression model provides superior predictive performance compared to ridge regression. The linear model explains approximately 77.5% of the variance in fertility, with lower RMSE and MAE values. The lack of improvement under ridge regularization suggests that multicollinearity is not a dominant issue and that the model is not overfitting. From a verification and validation perspective, this supports the adequacy, stability, and generalizability of the linear regression model for this dataset.

Acknowledgement

We sincerely thank Prof. Dr. Md. Rezaul KarimSir and the Department of Statistics, Rajshahi University, for their valuable educational resources. The content used here has been adapted from the LDAL website solely for learning and teaching purposes.

References

  1. Lazic, S. E., & Roche, H. L. (2012). Introducing Monte Carlo methods with R.

  2. Rubinstein, R. Y., & Kroese, D. P. (2011). Simulation and the Monte Carlo method (Vol. 707). John Wiley & Sons.

  3. Banks, J. (1998). Handbook of simulation. Wiley, New York.

  4. Law, A. M., & Kelton, W. D. (2000). Simulation modeling and analysis. McGraw-Hill, Boston, Burr Ridge, USA.

  5. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: With applications in R. Springer.

  6. Robert, C. P., & Casella, G. (2010). Introducing Monte Carlo methods with R (Vol. 18). Springer, New York.

  7. Ross, S. M. (2006). Simulation (4th ed.). Academic Press.

  8. Singh, V. P. (2009). System modeling and simulation. New Age International (P) Limited, New Delhi.

  9. Suess, E. A., & Trumbo, B. E. (2010). Introduction to probability simulation and Gibbs sampling with R. Springer.