#for easy data manipulation and visualization
library(tidyverse)
#for easily computing cross-validation methods
library(caret)Verification and Validation of Simulation Models: R
Independence Square | PUST Source: PUST Photographic Society Facebook page
Loading Package in R
Background
Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888. Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries. The data collected are for 47 French-speaking “provinces” at about 1888.
Statistics
Fertility: common standardized fertility measure
Agriculture: % of males involved in agriculture as occupation
Examination: % draftees receiving highest mark on army examination
Education: % education beyond primary school for draftees.
Catholic: % ‘catholic’ (as opposed to ‘protestant’).
Infant.Mortality: live births who live less than 1 year.
Load dataset
data("swiss")
head(swiss) Fertility Agriculture Examination Education Catholic
Courtelary 80.2 17.0 15 12 9.96
Delemont 83.1 45.1 6 9 84.84
Franches-Mnt 92.5 39.7 5 5 93.40
Moutier 85.8 36.5 12 7 33.77
Neuveville 76.9 43.5 17 15 5.16
Porrentruy 76.1 35.3 9 7 90.57
Infant.Mortality
Courtelary 22.2
Delemont 22.2
Franches-Mnt 20.2
Moutier 20.3
Neuveville 20.6
Porrentruy 26.6
#Inspectthedata
sample_n(swiss,3) Fertility Agriculture Examination Education Catholic
Yverdon 65.4 49.5 15 8 6.1
Echallens 68.3 72.6 18 2 24.2
Franches-Mnt 92.5 39.7 5 5 93.4
Infant.Mortality
Yverdon 22.5
Echallens 21.2
Franches-Mnt 20.2
Model performance metrics to estimate prediction error
To assess how well a model predicts new data, the basic strategy is:
Build (train) the model using a training dataset
Fit the model on a subset of the data, called the training set, so that it can learn the underlying patterns and relationships.Apply the model to a separate test dataset to make predictions
Use the fitted model to predict outcomes for a new, unseen dataset, called the test set.Compute the prediction error
Compare the predicted values with the actual observed values in the test set using appropriate performance metrics.
Model Validation
The Validation Set Approach
# Split the data into training and test set
set.seed(123)
training.samples<-swiss$Fertility%>% createDataPartition(p = 0.8, list = FALSE)
train.data<-swiss[training.samples, ]
test.data<-swiss[-training.samples, ]Build a model with train dataset
# Build the model
model <-lm(Fertility ~., data = train.data)
summary(model)
Call:
lm(formula = Fertility ~ ., data = train.data)
Residuals:
Min 1Q Median 3Q Max
-15.1389 -4.8632 0.6051 5.1009 14.0352
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 74.19340 12.45329 5.958 1.09e-06 ***
Agriculture -0.18635 0.07921 -2.353 0.0248 *
Examination -0.31997 0.27590 -1.160 0.2545
Education -0.87355 0.19479 -4.485 8.35e-05 ***
Catholic 0.10648 0.04179 2.548 0.0157 *
Infant.Mortality 0.78075 0.45373 1.721 0.0947 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.417 on 33 degrees of freedom
Multiple R-squared: 0.7145, Adjusted R-squared: 0.6713
F-statistic: 16.52 on 5 and 33 DF, p-value: 3.653e-08
# Make predictions and compute the R2, RMSE and MAE
predictions <-model %>% predict(test.data)
data.frame( R2 = R2(predictions, test.data$Fertility),
RMSE = RMSE(predictions, test.data$Fertility),
MAE = MAE(predictions, test.data$Fertility)) R2 RMSE MAE
1 0.5946201 6.410914 5.651552
# Prediction error rate
PER <-RMSE(predictions, test.data$Fertility)/mean(test.data$Fertility)
PER[1] 0.08800157
Leave one out cross validation-LOOCV
# Define training control
train.control<-trainControl(method = "LOOCV")
# Train the model
model <-train(Fertility ~., data = swiss, method = "lm", trControl= train.control)
# Summarize the results
print(model)Linear Regression
47 samples
5 predictor
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 46, 46, 46, 46, 46, 46, ...
Resampling results:
RMSE Rsquared MAE
7.738618 0.6128307 6.116021
Tuning parameter 'intercept' was held constant at a value of TRUE
K-fold cross-validation
#K-fold cross-validation(cv)
#Define training control
set.seed(123)
train.control<-trainControl(method="cv",number=10)
#Train the model
model<-train(Fertility~.,data=swiss,method="lm",
trControl=train.control)
#Summarize the results
print(model)Linear Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ...
Resampling results:
RMSE Rsquared MAE
7.424916 0.6922072 6.31218
Tuning parameter 'intercept' was held constant at a value of TRUE
Repeated K-fold cross-validation
# Repeated K-fold cross-validation
# Define training control
set.seed(123)
train.control<-trainControl(method = "repeatedcv", number = 10, repeats = 3)
# Train the model
model <-train(Fertility ~., data = swiss, method = "lm", trControl= train.control)
# Summarize the results
print(model)Linear Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ...
Resampling results:
RMSE Rsquared MAE
7.357186 0.6992415 6.15871
Tuning parameter 'intercept' was held constant at a value of TRUE
Model Comparison (Validation across models)
# Linear model
lm_model <- train(Fertility ~ ., data = swiss, method = "lm",
trControl = trainControl(method="cv", number=10))
# Ridge regression
ridge_model <- train(Fertility ~ ., data = swiss,
method = "ridge",
trControl = trainControl(method="cv", number=10),
tuneLength = 5)
lm_modelLinear Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 43, 42, 43, 42, 42, 42, ...
Resampling results:
RMSE Rsquared MAE
7.762255 0.6715954 6.376696
Tuning parameter 'intercept' was held constant at a value of TRUE
ridge_modelRidge Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 43, 42, 42, 43, 43, 41, ...
Resampling results across tuning parameters:
lambda RMSE Rsquared MAE
0e+00 7.207681 0.6976287 6.063376
1e-04 7.207527 0.6976531 6.063195
1e-03 7.206162 0.6978695 6.061578
1e-02 7.194424 0.6997459 6.046308
1e-01 7.179693 0.7020423 6.029846
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.1.
The 10-fold cross-validation results indicate that the standard linear regression model provides superior predictive performance compared to ridge regression. The linear model explains approximately 77.5% of the variance in fertility, with lower RMSE and MAE values. The lack of improvement under ridge regularization suggests that multicollinearity is not a dominant issue and that the model is not overfitting. From a verification and validation perspective, this supports the adequacy, stability, and generalizability of the linear regression model for this dataset.
Exercise
Using the swiss dataset in R, complete the following tasks:
a) Load the swiss dataset and explore its structure using appropriate R functions (e.g., head(), str(), and summary()).
b) Randomly split the dataset into: 80% Training Set and 20% Testing Set
Set a random seed to ensure reproducibility.
c) Fit a multiple linear regression model using the training dataset, with Fertility as the response variable and all remaining variables as predictors.
Then:
Predict Fertility values for the test dataset.
Calculate model performance measures such as:
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)
d) Using the training dataset, evaluate the regression model using the following validation techniques:
Validation Set Approach (Train-Test Split)
Leave-One-Out Cross-Validation (LOOCV)
10-Fold Cross-Validation
Repeated 10-Fold Cross-Validation (3 Repeats)
For each method:
Compute the cross-validated prediction error.
Summarize the results in a table.
e) Compare the performance of the four validation methods and answer the following questions:
Which validation method produced the lowest prediction error?
Which method is expected to provide the most stable estimate of model performance?
What are the advantages and disadvantages of LOOCV, K-Fold CV, and Repeated K-Fold CV?
Based on the results, which validation method would you recommend for this dataset and why?
Bonus Task
Create a comparison table showing the prediction errors obtained from all validation methods and visualize the results using a bar chart.
Acknowledgement
We sincerely thank Prof. Dr. Md. Rezaul Karim Sir and the Department of Statistics, Rajshahi University, for their valuable educational resources. The content used here has been adapted from the Rajshahi University website solely for learning and teaching purposes.
References
Lazic, S. E., & Roche, H. L. (2012). Introducing Monte Carlo methods with R.
Rubinstein, R. Y., & Kroese, D. P. (2011). Simulation and the Monte Carlo method (Vol. 707). John Wiley & Sons.
Banks, J. (1998). Handbook of simulation. Wiley, New York.
Law, A. M., & Kelton, W. D. (2000). Simulation modeling and analysis. McGraw-Hill, Boston, Burr Ridge, USA.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: With applications in R. Springer.
Robert, C. P., & Casella, G. (2010). Introducing Monte Carlo methods with R (Vol. 18). Springer, New York.
Ross, S. M. (2006). Simulation (4th ed.). Academic Press.
Singh, V. P. (2009). System modeling and simulation. New Age International (P) Limited, New Delhi.
Suess, E. A., & Trumbo, B. E. (2010). Introduction to probability simulation and Gibbs sampling with R. Springer.