Verification and Validation of Simulation Models: R

Author
Affiliations

Dr. Md. Nasif Hossain

Department of Statistics, Pabna University of Science and Technology, Pabna, Bangladesh

Published

May 12, 2026

Independence Square | PUST Source: PUST Photographic Society Facebook page

Loading Package in R

#for easy data manipulation and visualization
library(tidyverse)

#for easily computing cross-validation methods
library(caret)

Background

Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888. Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries. The data collected are for 47 French-speaking “provinces” at about 1888.

Statistics

  1. Fertility: common standardized fertility measure

  2. Agriculture: % of males involved in agriculture as occupation

  3. Examination: % draftees receiving highest mark on army examination

  4. Education: % education beyond primary school for draftees.

  5. Catholic: % ‘catholic’ (as opposed to ‘protestant’).

  6. Infant.Mortality: live births who live less than 1 year.

Load dataset

data("swiss")
head(swiss)
             Fertility Agriculture Examination Education Catholic
Courtelary        80.2        17.0          15        12     9.96
Delemont          83.1        45.1           6         9    84.84
Franches-Mnt      92.5        39.7           5         5    93.40
Moutier           85.8        36.5          12         7    33.77
Neuveville        76.9        43.5          17        15     5.16
Porrentruy        76.1        35.3           9         7    90.57
             Infant.Mortality
Courtelary               22.2
Delemont                 22.2
Franches-Mnt             20.2
Moutier                  20.3
Neuveville               20.6
Porrentruy               26.6
#Inspectthedata
sample_n(swiss,3)
             Fertility Agriculture Examination Education Catholic
Yverdon           65.4        49.5          15         8      6.1
Echallens         68.3        72.6          18         2     24.2
Franches-Mnt      92.5        39.7           5         5     93.4
             Infant.Mortality
Yverdon                  22.5
Echallens                21.2
Franches-Mnt             20.2

Model performance metrics to estimate prediction error

To assess how well a model predicts new data, the basic strategy is:

  1. Build (train) the model using a training dataset
    Fit the model on a subset of the data, called the training set, so that it can learn the underlying patterns and relationships.

  2. Apply the model to a separate test dataset to make predictions
    Use the fitted model to predict outcomes for a new, unseen dataset, called the test set.

  3. Compute the prediction error
    Compare the predicted values with the actual observed values in the test set using appropriate performance metrics.

Model Validation

The Validation Set Approach

# Split the data into training and test set

set.seed(123)

training.samples<-swiss$Fertility%>% createDataPartition(p = 0.8, list = FALSE)
train.data<-swiss[training.samples, ]
test.data<-swiss[-training.samples, ]

Build a model with train dataset

# Build the model
model <-lm(Fertility ~., data = train.data)
summary(model)

Call:
lm(formula = Fertility ~ ., data = train.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.1389  -4.8632   0.6051   5.1009  14.0352 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      74.19340   12.45329   5.958 1.09e-06 ***
Agriculture      -0.18635    0.07921  -2.353   0.0248 *  
Examination      -0.31997    0.27590  -1.160   0.2545    
Education        -0.87355    0.19479  -4.485 8.35e-05 ***
Catholic          0.10648    0.04179   2.548   0.0157 *  
Infant.Mortality  0.78075    0.45373   1.721   0.0947 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.417 on 33 degrees of freedom
Multiple R-squared:  0.7145,    Adjusted R-squared:  0.6713 
F-statistic: 16.52 on 5 and 33 DF,  p-value: 3.653e-08
# Make predictions and compute the R2, RMSE and MAE
predictions <-model %>% predict(test.data)

data.frame( R2 = R2(predictions, test.data$Fertility),
            RMSE = RMSE(predictions, test.data$Fertility),
            MAE = MAE(predictions, test.data$Fertility))
         R2     RMSE      MAE
1 0.5946201 6.410914 5.651552
# Prediction error rate
PER <-RMSE(predictions, test.data$Fertility)/mean(test.data$Fertility)
PER
[1] 0.08800157

Leave one out cross validation-LOOCV

# Define training control
train.control<-trainControl(method = "LOOCV")

# Train the model
model <-train(Fertility ~., data = swiss, method = "lm", trControl= train.control)

# Summarize the results
print(model)
Linear Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 46, 46, 46, 46, 46, 46, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  7.738618  0.6128307  6.116021

Tuning parameter 'intercept' was held constant at a value of TRUE

K-fold cross-validation

#K-fold cross-validation(cv)
#Define training control
set.seed(123)

train.control<-trainControl(method="cv",number=10)

#Train the model
model<-train(Fertility~.,data=swiss,method="lm",
             trControl=train.control)

#Summarize the results
print(model)
Linear Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ... 
Resampling results:

  RMSE      Rsquared   MAE    
  7.424916  0.6922072  6.31218

Tuning parameter 'intercept' was held constant at a value of TRUE

Repeated K-fold cross-validation

# Repeated K-fold cross-validation
# Define training control
set.seed(123)

train.control<-trainControl(method = "repeatedcv", number = 10, repeats = 3)

# Train the model
model <-train(Fertility ~., data = swiss, method = "lm", trControl= train.control)

# Summarize the results
print(model)
Linear Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ... 
Resampling results:

  RMSE      Rsquared   MAE    
  7.357186  0.6992415  6.15871

Tuning parameter 'intercept' was held constant at a value of TRUE

Model Comparison (Validation across models)

# Linear model
lm_model <- train(Fertility ~ ., data = swiss, method = "lm",
                  trControl = trainControl(method="cv", number=10))

# Ridge regression
ridge_model <- train(Fertility ~ ., data = swiss,
                     method = "ridge",
                     trControl = trainControl(method="cv", number=10),
                     tuneLength = 5)

lm_model
Linear Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 43, 42, 43, 42, 42, 42, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  7.762255  0.6715954  6.376696

Tuning parameter 'intercept' was held constant at a value of TRUE
ridge_model
Ridge Regression 

47 samples
 5 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 43, 42, 42, 43, 43, 41, ... 
Resampling results across tuning parameters:

  lambda  RMSE      Rsquared   MAE     
  0e+00   7.207681  0.6976287  6.063376
  1e-04   7.207527  0.6976531  6.063195
  1e-03   7.206162  0.6978695  6.061578
  1e-02   7.194424  0.6997459  6.046308
  1e-01   7.179693  0.7020423  6.029846

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.1.

The 10-fold cross-validation results indicate that the standard linear regression model provides superior predictive performance compared to ridge regression. The linear model explains approximately 77.5% of the variance in fertility, with lower RMSE and MAE values. The lack of improvement under ridge regularization suggests that multicollinearity is not a dominant issue and that the model is not overfitting. From a verification and validation perspective, this supports the adequacy, stability, and generalizability of the linear regression model for this dataset.

Exercise

Using the swiss dataset in R, complete the following tasks:

a) Load the swiss dataset and explore its structure using appropriate R functions (e.g., head(), str(), and summary()).

b) Randomly split the dataset into: 80% Training Set and 20% Testing Set

Set a random seed to ensure reproducibility.

c) Fit a multiple linear regression model using the training dataset, with Fertility as the response variable and all remaining variables as predictors.

Then:

  • Predict Fertility values for the test dataset.

  • Calculate model performance measures such as:

  • Mean Squared Error (MSE)

  • Root Mean Squared Error (RMSE)

  • Mean Absolute Error (MAE)

d) Using the training dataset, evaluate the regression model using the following validation techniques:

  1. Validation Set Approach (Train-Test Split)

  2. Leave-One-Out Cross-Validation (LOOCV)

  3. 10-Fold Cross-Validation

  4. Repeated 10-Fold Cross-Validation (3 Repeats)

For each method:

  • Compute the cross-validated prediction error.

  • Summarize the results in a table.

e) Compare the performance of the four validation methods and answer the following questions:

  1. Which validation method produced the lowest prediction error?

  2. Which method is expected to provide the most stable estimate of model performance?

  3. What are the advantages and disadvantages of LOOCV, K-Fold CV, and Repeated K-Fold CV?

  4. Based on the results, which validation method would you recommend for this dataset and why?

Bonus Task

Create a comparison table showing the prediction errors obtained from all validation methods and visualize the results using a bar chart.

Acknowledgement

We sincerely thank Prof. Dr. Md. Rezaul Karim Sir and the Department of Statistics, Rajshahi University, for their valuable educational resources. The content used here has been adapted from the Rajshahi University website solely for learning and teaching purposes.

References

  1. Lazic, S. E., & Roche, H. L. (2012). Introducing Monte Carlo methods with R.

  2. Rubinstein, R. Y., & Kroese, D. P. (2011). Simulation and the Monte Carlo method (Vol. 707). John Wiley & Sons.

  3. Banks, J. (1998). Handbook of simulation. Wiley, New York.

  4. Law, A. M., & Kelton, W. D. (2000). Simulation modeling and analysis. McGraw-Hill, Boston, Burr Ridge, USA.

  5. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: With applications in R. Springer.

  6. Robert, C. P., & Casella, G. (2010). Introducing Monte Carlo methods with R (Vol. 18). Springer, New York.

  7. Ross, S. M. (2006). Simulation (4th ed.). Academic Press.

  8. Singh, V. P. (2009). System modeling and simulation. New Age International (P) Limited, New Delhi.

  9. Suess, E. A., & Trumbo, B. E. (2010). Introduction to probability simulation and Gibbs sampling with R. Springer.