To teach participants how to analyze and optimize insurance pricing strategies using R, with a focus on balancing profitability, customer satisfaction, and competitive positioning.
This section emphasizes the significance of pricing in the insurance sector. Pricing directly affects an insurer’s profitability and competitive positioning while ensuring compliance with regulations. Key points include:
This concept forms the foundation for the workshop, providing a practical understanding of the critical role pricing plays in the insurance industry.
This workshop will use an R Markdown file to provide explanations, coding exercises, and sample datasets. Below is the outline of the R Markdown content:
---
title: "Pricing Optimization Workshop"
author: "Workshop Instructor"
date: "2025-02-23"
output:
html_document:
theme: flatly
highlight: tango
---
Participants will work with the following dataset, which can be loaded from a CSV file:
# Sample dataset creation
set.seed(123)
data <- data.frame(
Policy_ID = 1:1000,
Customer_Age = sample(20:70, 1000, replace = TRUE),
Annual_Premium = round(runif(1000, 500, 3000), 2),
Claim_Frequency = rpois(1000, lambda = 1),
Claim_Severity = round(rlnorm(1000, meanlog = 7, sdlog = 0.5), 2),
Competitor_Price = round(runif(1000, 400, 3200), 2),
Churn = sample(0:1, 1000, replace = TRUE, prob = c(0.8, 0.2))
)
write.csv(data, "insurance_pricing.csv", row.names = FALSE)
head(data,5)
## Policy_ID Customer_Age Annual_Premium Claim_Frequency Claim_Severity
## 1 1 50 1277.92 3 1289.50
## 2 2 34 696.23 0 1193.53
## 3 3 70 1304.36 1 2217.35
## 4 4 33 2062.26 0 1817.03
## 5 5 22 1600.60 3 1246.00
## Competitor_Price Churn
## 1 650.31 0
## 2 2853.44 0
## 3 807.67 0
## 4 2178.77 0
## 5 1649.47 0
Here’s a simple explanation of each field in the dataset:
Policy_ID: A unique identifier for each insurance policy. It helps distinguish individual policies in the dataset.
Customer_Age: The age of the customer holding the policy. It’s used to analyze customer demographics and assess risk (e.g., older customers might have different insurance needs).
Annual_Premium: The yearly amount paid by the customer for the insurance policy. This field is central to pricing analysis.
Claim_Frequency: The number of claims a customer makes in a year. It’s generated using a Poisson distribution and represents how often customers file claims, which is a critical risk factor.
Claim_Severity: The monetary impact of claims, expressed in currency units. It’s generated using a log-normal distribution and represents the cost of claims made by customers.
Competitor_Price: The price offered by competitors for similar policies. This field helps in competitive analysis and adjusting pricing strategies.
Churn: Indicates whether the customer has canceled their policy (1 = canceled, 0 = retained). It’s important for understanding customer retention and loyalty trends.
Each of these fields provides vital information for exploring, modeling, and optimizing insurance pricing strategies.
# Load dataset
library(tidyverse)
data <- read.csv("insurance_pricing.csv")
# Explore dataset
str(data)
## 'data.frame': 1000 obs. of 7 variables:
## $ Policy_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Customer_Age : int 50 34 70 33 22 61 69 62 56 33 ...
## $ Annual_Premium : num 1278 696 1304 2062 1601 ...
## $ Claim_Frequency : int 3 0 1 0 3 2 2 1 2 0 ...
## $ Claim_Severity : num 1290 1194 2217 1817 1246 ...
## $ Competitor_Price: num 650 2853 808 2179 1649 ...
## $ Churn : int 0 0 0 0 0 0 0 0 0 0 ...
summary(data)
## Policy_ID Customer_Age Annual_Premium Claim_Frequency
## Min. : 1.0 Min. :20.00 Min. : 502.9 Min. :0.000
## 1st Qu.: 250.8 1st Qu.:32.00 1st Qu.:1110.9 1st Qu.:0.000
## Median : 500.5 Median :44.50 Median :1748.8 Median :1.000
## Mean : 500.5 Mean :44.77 Mean :1747.7 Mean :1.001
## 3rd Qu.: 750.2 3rd Qu.:57.00 3rd Qu.:2365.4 3rd Qu.:2.000
## Max. :1000.0 Max. :70.00 Max. :2998.8 Max. :6.000
## Claim_Severity Competitor_Price Churn
## Min. : 162.0 Min. : 404.6 Min. :0.000
## 1st Qu.: 778.5 1st Qu.:1154.8 1st Qu.:0.000
## Median :1098.6 Median :1782.4 Median :0.000
## Mean :1232.3 Mean :1811.6 Mean :0.191
## 3rd Qu.:1513.6 3rd Qu.:2501.5 3rd Qu.:0.000
## Max. :7524.2 Max. :3199.1 Max. :1.000
ggplot(data, aes(x = Annual_Premium)) +
geom_histogram(bins = 30, fill = "skyblue", color = "black") +
labs(title = "Distribution of Annual Premiums")
# Create risk score
data_original<-data
data <- data %>%
mutate(Risk_Score = Claim_Frequency * Claim_Severity / Annual_Premium)
# Categorize customers by age group
data <- data %>%
mutate(Age_Group = case_when(
Customer_Age < 30 ~ "Young",
Customer_Age >= 30 & Customer_Age < 60 ~ "Middle-aged",
TRUE ~ "Senior"
))
# Explore engineered features
summary(data$Risk_Score)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.5053 0.8479 1.1717 13.1114
# Fit a linear regression model
model <- lm(Annual_Premium ~ Claim_Frequency + Claim_Severity + Age_Group, data = data)
summary(model)
##
## Call:
## lm(formula = Annual_Premium ~ Claim_Frequency + Claim_Severity +
## Age_Group, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1311.76 -629.29 3.46 622.82 1324.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.720e+03 5.727e+01 30.029 <2e-16 ***
## Claim_Frequency 4.168e+01 2.361e+01 1.766 0.0778 .
## Claim_Severity 1.193e-03 3.432e-02 0.035 0.9723
## Age_GroupSenior -4.696e+01 5.907e+01 -0.795 0.4269
## Age_GroupYoung -2.871e+01 5.902e+01 -0.486 0.6268
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 726.3 on 995 degrees of freedom
## Multiple R-squared: 0.003776, Adjusted R-squared: -0.0002286
## F-statistic: 0.9429 on 4 and 995 DF, p-value: 0.4383
# Predict premiums
data$Predicted_Premium <- predict(model, data)
RMSE measures the average error between the actual and predicted values, but it gives more weight to larger errors. It’s a good indicator of how accurate the predictions are. Lower RMSE = Better predictions.
# Calculate RMSE
library(Metrics)
rmse(data$Annual_Premium, data$Predicted_Premium)
## [1] 724.5253
#mae(data$Annual_Premium, data$Predicted_Premium)
# Install optimization package
if (!require("nloptr")) install.packages("nloptr")
library(nloptr)
# Define objective function
objective <- function(price) {
revenue <- sum(price * (1 - data$Churn))
cost <- sum(data$Claim_Frequency * data$Claim_Severity)
return(-1 * (revenue - cost))
}
# Optimization
opt_result <- nloptr(
x0 = rep(mean(data$Annual_Premium), nrow(data)),
eval_f = objective,
lb = rep(500, nrow(data)),
ub = rep(3000, nrow(data)),
opts = list("algorithm" = "NLOPT_LN_COBYLA", "xtol_rel" = 1e-4)
)
opt_result
library(shiny)
library(ggplot2)
ui <- fluidPage(
titlePanel("Pricing Optimization Dashboard"),
sidebarLayout(
sidebarPanel(
sliderInput("age", "Customer Age:", min = 20, max = 70, value = 40),
numericInput("claims", "Claim Frequency:", value = 1, min = 0),
numericInput("severity", "Claim Severity:", value = 5000, min = 0)
),
mainPanel(
plotOutput("premiumPlot")
)
)
)
server <- function(input, output) {
output$premiumPlot <- renderPlot({
predicted <- predict(model, data.frame(
Claim_Frequency = input$claims,
Claim_Severity = input$severity,
Age_Group = ifelse(input$age < 30, "Young",
ifelse(input$age < 60, "Middle-aged", "Senior"))
))
ggplot(data, aes(x = Predicted_Premium)) +
geom_histogram(fill = "blue", color = "black", bins = 30) +
geom_vline(xintercept = predicted, color = "red", linetype = "dashed") +
labs(title = "Predicted Premium", x = "Premium", y = "Frequency")
})
}
shinyApp(ui, server)