#for easy data manipulation and visualization
library(tidyverse)
#for easily computing cross-validation methods
library(caret)Verification and Validation of Simulation Models: R
Independence Square | PUST Source: PUST Photographic Society Facebook page
Loading Package in R
Background
Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888. Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries. The data collected are for 47 French-speaking “provinces” at about 1888.
Statistics
Fertility: common standardized fertility measure
Agriculture: % of males involved in agriculture as occupation
Examination: % draftees receiving highest mark on army examination
Education: % education beyond primary school for draftees.
Catholic: % ‘catholic’ (as opposed to ‘protestant’).
Infant.Mortality: live births who live less than 1 year.
Load dataset
data("swiss")
head(swiss) Fertility Agriculture Examination Education Catholic
Courtelary 80.2 17.0 15 12 9.96
Delemont 83.1 45.1 6 9 84.84
Franches-Mnt 92.5 39.7 5 5 93.40
Moutier 85.8 36.5 12 7 33.77
Neuveville 76.9 43.5 17 15 5.16
Porrentruy 76.1 35.3 9 7 90.57
Infant.Mortality
Courtelary 22.2
Delemont 22.2
Franches-Mnt 20.2
Moutier 20.3
Neuveville 20.6
Porrentruy 26.6
#Inspectthedata
sample_n(swiss,3) Fertility Agriculture Examination Education Catholic
Orbe 57.4 54.1 20 6 4.20
Sierre 92.2 84.6 3 3 99.46
Rive Gauche 42.8 27.7 22 29 58.33
Infant.Mortality
Orbe 15.3
Sierre 16.3
Rive Gauche 19.3
Model performance metrics to estimate prediction error
To assess how well a model predicts new data, the basic strategy is:
Build (train) the model using a training dataset
Fit the model on a subset of the data, called the training set, so that it can learn the underlying patterns and relationships.Apply the model to a separate test dataset to make predictions
Use the fitted model to predict outcomes for a new, unseen dataset, called the test set.Compute the prediction error
Compare the predicted values with the actual observed values in the test set using appropriate performance metrics.
Model Validation
The Validation Set Approach
# Split the data into training and test set
set.seed(123)
training.samples<-swiss$Fertility%>% createDataPartition(p = 0.8, list = FALSE)
train.data<-swiss[training.samples, ]
test.data<-swiss[-training.samples, ]Build a model with train dataset
# Build the model
model <-lm(Fertility ~., data = train.data)
summary(model)
Call:
lm(formula = Fertility ~ ., data = train.data)
Residuals:
Min 1Q Median 3Q Max
-15.1389 -4.8632 0.6051 5.1009 14.0352
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 74.19340 12.45329 5.958 1.09e-06 ***
Agriculture -0.18635 0.07921 -2.353 0.0248 *
Examination -0.31997 0.27590 -1.160 0.2545
Education -0.87355 0.19479 -4.485 8.35e-05 ***
Catholic 0.10648 0.04179 2.548 0.0157 *
Infant.Mortality 0.78075 0.45373 1.721 0.0947 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.417 on 33 degrees of freedom
Multiple R-squared: 0.7145, Adjusted R-squared: 0.6713
F-statistic: 16.52 on 5 and 33 DF, p-value: 3.653e-08
# Make predictions and compute the R2, RMSE and MAE
predictions <-model %>% predict(test.data)
data.frame( R2 = R2(predictions, test.data$Fertility),
RMSE = RMSE(predictions, test.data$Fertility),
MAE = MAE(predictions, test.data$Fertility)) R2 RMSE MAE
1 0.5946201 6.410914 5.651552
# Prediction error rate
PER <-RMSE(predictions, test.data$Fertility)/mean(test.data$Fertility)
PER[1] 0.08800157
Leave one out cross validation-LOOCV
# Define training control
train.control<-trainControl(method = "LOOCV")
# Train the model
model <-train(Fertility ~., data = swiss, method = "lm", trControl= train.control)
# Summarize the results
print(model)Linear Regression
47 samples
5 predictor
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 46, 46, 46, 46, 46, 46, ...
Resampling results:
RMSE Rsquared MAE
7.738618 0.6128307 6.116021
Tuning parameter 'intercept' was held constant at a value of TRUE
K-fold cross-validation
#K-fold cross-validation(cv)
#Define training control
set.seed(123)
train.control<-trainControl(method="cv",number=10)
#Train the model
model<-train(Fertility~.,data=swiss,method="lm",
trControl=train.control)
#Summarize the results
print(model)Linear Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ...
Resampling results:
RMSE Rsquared MAE
7.424916 0.6922072 6.31218
Tuning parameter 'intercept' was held constant at a value of TRUE
Repeated K-fold cross-validation
# Repeated K-fold cross-validation
# Define training control
set.seed(123)
train.control<-trainControl(method = "repeatedcv", number = 10, repeats = 3)
# Train the model
model <-train(Fertility ~., data = swiss, method = "lm", trControl= train.control)
# Summarize the results
print(model)Linear Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ...
Resampling results:
RMSE Rsquared MAE
7.357186 0.6992415 6.15871
Tuning parameter 'intercept' was held constant at a value of TRUE
Model Comparison (Validation across models)
# Linear model
lm_model <- train(Fertility ~ ., data = swiss, method = "lm",
trControl = trainControl(method="cv", number=10))
# Ridge regression
ridge_model <- train(Fertility ~ ., data = swiss,
method = "ridge",
trControl = trainControl(method="cv", number=10),
tuneLength = 5)
lm_modelLinear Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 43, 42, 43, 42, 42, 42, ...
Resampling results:
RMSE Rsquared MAE
7.762255 0.6715954 6.376696
Tuning parameter 'intercept' was held constant at a value of TRUE
ridge_modelRidge Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 43, 42, 42, 43, 43, 41, ...
Resampling results across tuning parameters:
lambda RMSE Rsquared MAE
0e+00 7.207681 0.6976287 6.063376
1e-04 7.207527 0.6976531 6.063195
1e-03 7.206162 0.6978695 6.061578
1e-02 7.194424 0.6997459 6.046308
1e-01 7.179693 0.7020423 6.029846
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.1.
The 10-fold cross-validation results indicate that the standard linear regression model provides superior predictive performance compared to ridge regression. The linear model explains approximately 77.5% of the variance in fertility, with lower RMSE and MAE values. The lack of improvement under ridge regularization suggests that multicollinearity is not a dominant issue and that the model is not overfitting. From a verification and validation perspective, this supports the adequacy, stability, and generalizability of the linear regression model for this dataset.
Acknowledgement
We sincerely thank Prof. Dr. Md. Rezaul KarimSir and the Department of Statistics, Rajshahi University, for their valuable educational resources. The content used here has been adapted from the LDAL website solely for learning and teaching purposes.
References
Lazic, S. E., & Roche, H. L. (2012). Introducing Monte Carlo methods with R.
Rubinstein, R. Y., & Kroese, D. P. (2011). Simulation and the Monte Carlo method (Vol. 707). John Wiley & Sons.
Banks, J. (1998). Handbook of simulation. Wiley, New York.
Law, A. M., & Kelton, W. D. (2000). Simulation modeling and analysis. McGraw-Hill, Boston, Burr Ridge, USA.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: With applications in R. Springer.
Robert, C. P., & Casella, G. (2010). Introducing Monte Carlo methods with R (Vol. 18). Springer, New York.
Ross, S. M. (2006). Simulation (4th ed.). Academic Press.
Singh, V. P. (2009). System modeling and simulation. New Age International (P) Limited, New Delhi.
Suess, E. A., & Trumbo, B. E. (2010). Introduction to probability simulation and Gibbs sampling with R. Springer.