Workshop: Pricing Optimization for insurance products

Objective:

To teach participants how to analyze and optimize insurance pricing strategies using R, with a focus on balancing profitability, customer satisfaction, and competitive positioning.


1. Workshop outline

Introduction to Pricing optimization

  • Topics:
    • Importance of pricing in the insurance industry.
    • Factors influencing pricing (e.g., risk, market competition, customer behavior).
    • Overview of optimization goals:
      • Maximizing profits.
      • Minimizing customer churn.
      • Maintaining regulatory compliance.
  • Use Cases:
    • Dynamic pricing for auto and health insurance.
    • Premium adjustments based on risk assessment.

Importance of pricing in the insurance industry

This section emphasizes the significance of pricing in the insurance sector. Pricing directly affects an insurer’s profitability and competitive positioning while ensuring compliance with regulations. Key points include:

  • Strategic Importance: Pricing is crucial to balance revenue generation and market competitiveness.
  • Risk and Behavior Considerations: Factors like risk level, market trends, and customer behavior significantly influence pricing decisions.
  • Goals: Effective pricing aims to maximize profits, retain customers, and maintain compliance with industry regulations.

This concept forms the foundation for the workshop, providing a practical understanding of the critical role pricing plays in the insurance industry.

2. Creating the R markdown document

This workshop will use an R Markdown file to provide explanations, coding exercises, and sample datasets. Below is the outline of the R Markdown content:

R Markdown header

---
title: "Pricing Optimization Workshop"
author: "Workshop Instructor"
date: "2025-02-23"
output:
  html_document:
    theme: flatly
    highlight: tango
---

3. Data exploration and preparation

Sample Dataset

Participants will work with the following dataset, which can be loaded from a CSV file:

# Sample dataset creation
set.seed(123)
data <- data.frame(
  Policy_ID = 1:1000,
  Customer_Age = sample(20:70, 1000, replace = TRUE),
  Annual_Premium = round(runif(1000, 500, 3000), 2),
  Claim_Frequency = rpois(1000, lambda = 1),
  Claim_Severity = round(rlnorm(1000, meanlog = 7, sdlog = 0.5), 2),
  Competitor_Price = round(runif(1000, 400, 3200), 2),
  Churn = sample(0:1, 1000, replace = TRUE, prob = c(0.8, 0.2))
)
write.csv(data, "insurance_pricing.csv", row.names = FALSE)
head(data,5)
##   Policy_ID Customer_Age Annual_Premium Claim_Frequency Claim_Severity
## 1         1           50        1277.92               3        1289.50
## 2         2           34         696.23               0        1193.53
## 3         3           70        1304.36               1        2217.35
## 4         4           33        2062.26               0        1817.03
## 5         5           22        1600.60               3        1246.00
##   Competitor_Price Churn
## 1           650.31     0
## 2          2853.44     0
## 3           807.67     0
## 4          2178.77     0
## 5          1649.47     0

Here’s a simple explanation of each field in the dataset:

  1. Policy_ID: A unique identifier for each insurance policy. It helps distinguish individual policies in the dataset.

  2. Customer_Age: The age of the customer holding the policy. It’s used to analyze customer demographics and assess risk (e.g., older customers might have different insurance needs).

  3. Annual_Premium: The yearly amount paid by the customer for the insurance policy. This field is central to pricing analysis.

  4. Claim_Frequency: The number of claims a customer makes in a year. It’s generated using a Poisson distribution and represents how often customers file claims, which is a critical risk factor.

  5. Claim_Severity: The monetary impact of claims, expressed in currency units. It’s generated using a log-normal distribution and represents the cost of claims made by customers.

  6. Competitor_Price: The price offered by competitors for similar policies. This field helps in competitive analysis and adjusting pricing strategies.

  7. Churn: Indicates whether the customer has canceled their policy (1 = canceled, 0 = retained). It’s important for understanding customer retention and loyalty trends.

Each of these fields provides vital information for exploring, modeling, and optimizing insurance pricing strategies.

Data Loading and Exploration

# Load dataset
library(tidyverse)
data <- read.csv("insurance_pricing.csv")

# Explore dataset
str(data)
## 'data.frame':    1000 obs. of  7 variables:
##  $ Policy_ID       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Customer_Age    : int  50 34 70 33 22 61 69 62 56 33 ...
##  $ Annual_Premium  : num  1278 696 1304 2062 1601 ...
##  $ Claim_Frequency : int  3 0 1 0 3 2 2 1 2 0 ...
##  $ Claim_Severity  : num  1290 1194 2217 1817 1246 ...
##  $ Competitor_Price: num  650 2853 808 2179 1649 ...
##  $ Churn           : int  0 0 0 0 0 0 0 0 0 0 ...
summary(data)
##    Policy_ID       Customer_Age   Annual_Premium   Claim_Frequency
##  Min.   :   1.0   Min.   :20.00   Min.   : 502.9   Min.   :0.000  
##  1st Qu.: 250.8   1st Qu.:32.00   1st Qu.:1110.9   1st Qu.:0.000  
##  Median : 500.5   Median :44.50   Median :1748.8   Median :1.000  
##  Mean   : 500.5   Mean   :44.77   Mean   :1747.7   Mean   :1.001  
##  3rd Qu.: 750.2   3rd Qu.:57.00   3rd Qu.:2365.4   3rd Qu.:2.000  
##  Max.   :1000.0   Max.   :70.00   Max.   :2998.8   Max.   :6.000  
##  Claim_Severity   Competitor_Price     Churn      
##  Min.   : 162.0   Min.   : 404.6   Min.   :0.000  
##  1st Qu.: 778.5   1st Qu.:1154.8   1st Qu.:0.000  
##  Median :1098.6   Median :1782.4   Median :0.000  
##  Mean   :1232.3   Mean   :1811.6   Mean   :0.191  
##  3rd Qu.:1513.6   3rd Qu.:2501.5   3rd Qu.:0.000  
##  Max.   :7524.2   Max.   :3199.1   Max.   :1.000
ggplot(data, aes(x = Annual_Premium)) +
  geom_histogram(bins = 30, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Annual Premiums")


4. Feature Engineering (30 minutes)

# Create risk score
data_original<-data
data <- data %>%
  mutate(Risk_Score = Claim_Frequency * Claim_Severity / Annual_Premium)

# Categorize customers by age group
data <- data %>%
  mutate(Age_Group = case_when(
    Customer_Age < 30 ~ "Young",
    Customer_Age >= 30 & Customer_Age < 60 ~ "Middle-aged",
    TRUE ~ "Senior"
  ))

# Explore engineered features
summary(data$Risk_Score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.5053  0.8479  1.1717 13.1114

5. Modeling pricing strategies

Linear Regression model

# Fit a linear regression model
model <- lm(Annual_Premium ~ Claim_Frequency + Claim_Severity + Age_Group, data = data)
summary(model)
## 
## Call:
## lm(formula = Annual_Premium ~ Claim_Frequency + Claim_Severity + 
##     Age_Group, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1311.76  -629.29     3.46   622.82  1324.94 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1.720e+03  5.727e+01  30.029   <2e-16 ***
## Claim_Frequency  4.168e+01  2.361e+01   1.766   0.0778 .  
## Claim_Severity   1.193e-03  3.432e-02   0.035   0.9723    
## Age_GroupSenior -4.696e+01  5.907e+01  -0.795   0.4269    
## Age_GroupYoung  -2.871e+01  5.902e+01  -0.486   0.6268    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 726.3 on 995 degrees of freedom
## Multiple R-squared:  0.003776,   Adjusted R-squared:  -0.0002286 
## F-statistic: 0.9429 on 4 and 995 DF,  p-value: 0.4383
# Predict premiums
data$Predicted_Premium <- predict(model, data)

Evaluation

RMSE measures the average error between the actual and predicted values, but it gives more weight to larger errors. It’s a good indicator of how accurate the predictions are. Lower RMSE = Better predictions.

# Calculate RMSE
library(Metrics)
rmse(data$Annual_Premium, data$Predicted_Premium)
## [1] 724.5253
#mae(data$Annual_Premium, data$Predicted_Premium)

6. Optimization techniques

Optimization example

# Install optimization package
if (!require("nloptr")) install.packages("nloptr")
library(nloptr)

# Define objective function
objective <- function(price) {
  revenue <- sum(price * (1 - data$Churn))
  cost <- sum(data$Claim_Frequency * data$Claim_Severity)
  return(-1 * (revenue - cost))
}

# Optimization
opt_result <- nloptr(
  x0 = rep(mean(data$Annual_Premium), nrow(data)),
  eval_f = objective,
  lb = rep(500, nrow(data)),
  ub = rep(3000, nrow(data)),
  opts = list("algorithm" = "NLOPT_LN_COBYLA", "xtol_rel" = 1e-4)
)
opt_result

7. Building a Pricing Dashboard

library(shiny)
library(ggplot2)

ui <- fluidPage(
  titlePanel("Pricing Optimization Dashboard"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("age", "Customer Age:", min = 20, max = 70, value = 40),
      numericInput("claims", "Claim Frequency:", value = 1, min = 0),
      numericInput("severity", "Claim Severity:", value = 5000, min = 0)
    ),
    mainPanel(
      plotOutput("premiumPlot")
    )
  )
)

server <- function(input, output) {
  output$premiumPlot <- renderPlot({
    predicted <- predict(model, data.frame(
      Claim_Frequency = input$claims,
      Claim_Severity = input$severity,
      Age_Group = ifelse(input$age < 30, "Young",
                         ifelse(input$age < 60, "Middle-aged", "Senior"))
    ))
    ggplot(data, aes(x = Predicted_Premium)) +
      geom_histogram(fill = "blue", color = "black", bins = 30) +
      geom_vline(xintercept = predicted, color = "red", linetype = "dashed") +
      labs(title = "Predicted Premium", x = "Premium", y = "Frequency")
  })
}

shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents

8. Discussion and wrap-up