This report describes a model for predicting incremental paid losses on an individual claim basis (“The Model”). The model uses a mixture of predictive modeling and simulation techniques to mimic real world claim development.
Note that this report is a much more simplified version of the full ensemble of model’s used in the full analysis performed and is merely meant as a starting point for teaching/learning purposes.
Models, by definition, use simplified assumptions of reality to reveal information or predict events based on the underlying data. Models which closely mimic the fundamental forces driving the data have the best chance of providing valuable insights.
Due to data and computing limitations, actuaries have traditionally aggregated loss information by policy, accident, or calendar period to project future losses. By aggregating losses, the actuary loses valuable claim level information.
The model assumes that individual claims and their claim level characteristics are the fundamental drivers of future payments. Therefore, In accordance with the philosophy that the best models are those which mimic reality most closely, the model uses information on an individual claim level, and runs statistically rigorous techniques to fit and simulate individual claim development.
The model is meant to be a starting point for anyone looking to discover new and advanced methods for performing micro-claims analysis and machine-learning modelling techniques that provide insights beyond the typical aggregated actuarial practices in P&C. Additionally, the model is a showcase for the statistical power that the R Programming language can provide, specifically for those with apriori statistical and mathematical knowledge in applied predictive analytics and probability theory.
I decided to use only a few very common predictor variables so the model could easily be applied to other data sets. For transparency and to aid interested individuals, I provide this report with access to the R code used to fit and run predictions. The code can be viewed by clicking the code boxes on the right side of the report. The R savvy reader can run the R code to reproduce the output, apply the model to other data sets, and expand and improve upon the model.
The model is only applicable to reported claims and their corresponding incremental payments. IBNR claim predictions are beyond the scope of this model.
For consistency and clarity I use the following terms:
In the spirit of mimicking the real world, this report communicates the model through a working example using real auto-liability data supplied mostly from the insuranceData R Package (GitHub Repo) as well as publicly available data supplied by the CAS.
Note that this specific model has been tuned to form predictions related specifically to Bodily Injury claims only, as these claims drive the foundation of the risk behind Auto Liability reserving and rate-making.
That being said, although the results of the model are specific to Auto Liability in this instance, the modeling techniques and machine-learned tuning procedures can easily be generalized to other lines of coverage, areas of business, and risk portfolios.
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(caret, warn.conflicts = FALSE)
library(lubridate)
library(ggplot2, warn.conflicts = FALSE)
library(knitr)
library(DiagrammeR)
library(scales)
require(e1071)
require(bindrcpp)
library(qs)
library(webshot)
# turn off scientific notation in printing
options(scipen = 999)
# load data
claims <- qs::qread("data/model-claims.qs")
# development age to project from
# e.g. if I set this to 18 I will use information available
# as of the 18 month evaluation to predict stuff at age 30)
devt_period <- 30
# year to predict with model
# evaluations at or greater than this time will not be included in
# the model fit
predict_eval <- as.Date("2010-11-30")
# remove unneeded data
claims <- dplyr::filter(claims,
eval <= predict_eval,
eval >= predict_eval - years(5)) %>%
dplyr::select(eval, devt, claim_number,
status, tot_rx, tot_pd_incr,
status_act, tot_pd_incr_act)
devt_periods_needed <- seq(6, devt_period + 12, by = 12)
# for showing in triangle
claims_display <- dplyr::filter(claims, devt %in% devt_periods_needed)
# only need the claims at the selected `dev_period`
claims <- dplyr::filter(claims, devt %in% devt_period)I am using data from fiscal years 2003 to 2011.
Fiscal years begin at 6/1 of the year prior to the fiscal year and end at 5/31 of the fiscal year (i.e. fiscal year 2003 includes claims which occurred between 6/1/2002 and 5/31/2003).
The claims are evaluated at 11/30 of each year from 2002 to 2010.
I created a large data set containing all the data used in fitting the model and making predictions from the model.
The original data and the script used to prepare the large data set used in this report is located in the data/ directory and details can be viewed in this report’s Appendix.
The model uses several advanced statistical techniques. For compactness and because I lack the expertise to explain everything in detail, a comprehensive explanation of these techniques is beyond the scope of this analysis.
Where-ever possible I have included links to additional resources for diving into the statistics behind the model. The statistics will be only very briefly touched upon as each technique is used in the model fit.
The first step in fitting the model is to feed training data into the model.
I am using all data from fiscal years prior to 2011 from development time 30 months to 42 months to fit the model.
Later I will pass the test data (i.e. claims from fiscal year 2011 at 30 months) to predict the status of each of these claims at 42 months and the incremental payment per claim from 30 to 42 months.
The following diagram illustrates how the model is fit:
mermaid("
graph TD
A(Claim Train Data)-->B{Fit Closure Model}
A(Claim Train Data)-->C[Remove Closed-Closed]
A(Claim Train Data)-->D[Remove Zero Paid]
C-->E{Fit Zero Model}
D-->F{Fit Payment Model}
")After fitting the models (pictured as a rhombus in the above diagram) with the claims training data I can use the three models to predict a probability for status and zero payment or a dollar value for incremental payments on the test data.
The claims test data flows through the following diagram to arrive at the final output:
At each model (pictured as a rhombus in the above diagram) the claim in the test data is given a predicted value based on the model. I then run a simulation based on this predicted value to model real world variability.
At each step the simulated claims are passed to the next model based on the results of the simulation in the previous model’s simulated results/probabilities.
To predict whether a claim will close within a given period of time, I use a logistic regression with center, scale, and Yeo-Johnson transformations applied to all continuous predictor variables.
I am modeling the following variables:
Response Variable
Predictor Variables
# remove data the same valuation or newer than the prediction eval
# Only claims from valuations before the valuation I am predicting will be used
# to fit the model
model_data <- dplyr::filter(claims, eval < predict_eval)The model fit uses 10-fold cross validation to optimize coefficient estimation and a stepwise Akaiki Information Critereon (AIC) algorithm for feature selection:
cm_model <- caret::train(status_act ~ status + tot_rx + tot_pd_incr,
data = model_data,
method = "glmStepAIC",
trace = FALSE,
preProcess = c("center", "scale", "YeoJohnson"),
trControl = trainControl(method = "repeatedcv",
repeats = 2))cm_summary <- cm_model$results[, -1]
kable(cm_summary,
digits = 5,
row.names = FALSE)| Accuracy | Kappa | AccuracySD | KappaSD |
|---|---|---|---|
| 0.8824 | 0.45185 | 0.015 | 0.10189 |
For a more detailed statistical summary of the claim closure model fits see Appendix cm_summary
cm_probs <- cbind(model_data, predict(cm_model,
newdata = model_data,
type = "prob"))
# find the logit value
cm_probs$logits <- log(cm_probs$O / cm_probs$C)In the plots below the blue line indicates the fitted probability of the claim at age 30 months being open at age 42 months.
The red dots at the top and bottom are the actual status for the training data at 42 months (i.e. model fits the blue line to the red dots).
cm_probs$status_act <- ifelse(cm_probs$status_act == "C", 0, 1)
ggplot(cm_probs, aes(x = logits, y = status_act)) +
geom_point(colour = "red",
position = position_jitter(height = 0.1, width = 0.1),
size = 0.5,
alpha = 0.2) +
geom_smooth(method = "glm", method.args = list(family = "binomial"),
size = 1) +
ylab("Probability Open") +
xlab("Logit Odds") +
ggtitle(paste0("Age ", devt_period, " to ", devt_period + 12, " Months Claim Open Probabilities"))The zero payment model is similar to the claim closure model in that I am looking at a binomial response variable. I am modeling whether the claim has zero or nonzero incremental payments.
I remove all claims that have a status at 30 months of closed and a status of closed at 42 (I refer to these claims as closed-closed claims).
Additionally, I assume that all of these claims will ultimately have zero incremental payments in the final payment model.
Reponse Variable
Predictor Variables
Note: I could use status_act as a predictor variable here because for the test data I will simulate the status at 42 first and then use that simulated status as a predictor variable in the zero payment model.
# remove all claims that have a closed closed status from the data
# these will be set to incremental payments of 0
zm_model_data <- filter(model_data, status == "O" | status_act == "O")
# Add in response variable for zero payment:
zm_model_data$zero <- factor(ifelse(zm_model_data$tot_pd_incr_act == 0,
"Zero", "NonZero"))I use the same data prepared for the claim closure model to fit the zero payment model.
zm_model <- caret::train(zero ~ status + status_act + tot_rx + tot_pd_incr,
data = zm_model_data,
method = "glmStepAIC",
trace = FALSE,
preProcess = c("center", "scale", "YeoJohnson"),
trControl = trainControl(method = "repeatedcv",
repeats = 2))zm_summary <- zm_model$results[, -1]
kable(zm_summary,
row.names = FALSE)| Accuracy | Kappa | AccuracySD | KappaSD |
|---|---|---|---|
| 0.7735232 | 0.032139 | 0.0063228 | 0.0348476 |
zm_probs <- cbind(zm_model_data, predict(zm_model,
newdata = zm_model_data,
type = "prob"))
zm_probs$logits <- log(zm_probs$NonZero / zm_probs$Zero)In the plots below, the blue line indicates the fitted probability of the claim having a payments between age 30 and 42 months. The red dots at the top are the actual claims with payments between age 30 and 42 months, and the dots at the bottom are the claims with zero payments during this time period. (i.e. Zero Payment model fits the blue line to the red dots)
zm_probs$zero <- ifelse(zm_probs$zero == "Zero", 0, 1)
ggplot(zm_probs, aes(x = logits, y = zero)) +
geom_point(colour = "red",
position = position_jitter(height = 0.1, width = 0.1),
size = 0.5,
alpha = 0.2) +
geom_smooth(method = "glm", method.args = list(family = "binomial"),
size = 1) +
ylab("Payment Probability") +
xlab("Logit Odds") +
ggtitle(paste0("Age ", devt_period, " to ", devt_period + 12,
" Non-Zero Incremental Payment"))The incremental payment model models incremental payments between 30 and 42. The incremental payment model uses a generalized additive model (GAM) with an integrated smoothness estimation and a quasi-poisson log link function ;).
Response Variable
Predictor Variables
#Take out zero pmnts:
nzm_model_data <- zm_model_data[zm_model_data$tot_pd_incr_act > 0, ]# fit incremental payment model
nzm_model <- mgcv::gam(tot_pd_incr_act ~ status_act + s(tot_rx) + s(tot_pd_incr),
data = nzm_model_data,
family = quasipoisson(link = "log"))nzm_fit <- cbind(nzm_model_data,
tot_pd_incr_sim = exp(predict(nzm_model, newdata = nzm_model_data)))# plots to be determinedset.seed(1234)
n_sims <- 2000
cm_pred_data <- dplyr::filter(claims, eval == predict_eval)
cm_probs <- cbind(cm_pred_data,
predict(cm_model, newdata = cm_pred_data, type = "prob"))
cm_pred <- lapply(cm_probs$O, rbinom, n = n_sims, size = 1)
cm_pred <- matrix(unlist(cm_pred), ncol = n_sims, byrow = TRUE)
cm_pred <- ifelse(cm_pred == 1, "O", "C")
cm_pred <- as.data.frame(cm_pred)I use the probabilities returned from the closure model to simulate the status of all of the claims.
I simulate each claim 2000 times.
The table below shows selected age 30 claims after they had their closure probability predicted by the closure model and their status simulated using a simulated binomial random variable.
cm_out <- cm_probs
cm_out <- dplyr::select(cm_out, claim_number, status, tot_rx, tot_pd_incr, O)
cm_out$status_sim <- cm_pred[, 1]
cm_out <- cm_out[c(1, 6, 24, 2, 10), ]
names(cm_out) <- c("Claim Number", "Status", "Case", "Paid Incre",
"Prob Open", "Sim Status")
kable(cm_out,
row.names = FALSE)| Claim Number | Status | Case | Paid Incre | Prob Open | Sim Status |
|---|---|---|---|---|---|
| 2008137571 | C | 0 | 0.00 | 0.0075471 | C |
| 2008137835 | C | 0 | 17754.26 | 0.0072112 | C |
| 2008138427 | C | 0 | 0.00 | 0.0075471 | C |
| 2008137654 | C | 0 | 0.00 | 0.0075471 | C |
| 2008138095 | C | 0 | 0.00 | 0.0075471 | C |
The Prob Open column is the probability that the age 30 claim will be open at age 42 as modeled in the closure model. The Sim Status column is the result of a Bernoulli simulation on each of those probabilities.
I am running this simulation 2000 times to simulate 2000 closure scenarios.
The simulations allow me to determine the corresponding distribution’s confidence intervals.
Next the simulated claims with their simulated statuses have their probability of having a non zero incremental payment simulated by the zero payment model. This probability is then simulated using the same random binomial simulation approach as used when simulating closure status.
# put closure model predictions together
cm_pred <- cbind(cm_probs[, c("claim_number"), drop = FALSE], cm_pred)
# gather `cm_pred` into a long data frame
cm_pred <- tidyr::gather(cm_pred, key = "sim_num",
value = "status_sim",
-claim_number)
# join `zm_pred_data` to predictions from closure model
# remove status_act and rename the simulated states as status_act
zm_pred_data <- left_join(cm_pred, cm_probs, by = "claim_number") %>%
dplyr::select(-status_act) %>%
dplyr::rename(status_act = status_sim)
# remove all claims that have a closed closed status from the data
# these will be set to incremental payments of 0
closed_closed_data <- dplyr::filter(zm_pred_data, status == "C" & status_act == "C")
zm_pred_data <- filter(zm_pred_data, status == "O" | status_act == "O")zm_pred <- cbind(zm_pred_data,
predict(zm_model, newdata = zm_pred_data, type = "prob"))
zm_pred$zero_sim <- sapply(zm_pred$NonZero, rbinom, n = 1, size = 1)
zm_pred$zero_sim <- ifelse(zm_pred$zero_sim == 1, "NonZero", "Zero")zm_out <- zm_pred
zm_out <- dplyr::select(zm_out, claim_number, status, tot_rx, tot_pd_incr,
status_act, NonZero, zero_sim)
zm_out <- head(zm_out, 8)
names(zm_out) <- c("Claim Number", "Status", "Case", "Paid Incre",
"Sim Status", "Prob Non Zero", "Zero Sim")
kable(zm_out,
row.names = FALSE)| Claim Number | Status | Case | Paid Incre | Sim Status | Prob Non Zero | Zero Sim |
|---|---|---|---|---|---|---|
| 2008137674 | O | 86018.83 | 3741.17 | O | 0.8621934 | NonZero |
| 2008137801 | O | 76990.10 | 4588.70 | O | 0.8541665 | Zero |
| 2008138088 | O | 123715.66 | 4861.24 | C | 0.8075995 | NonZero |
| 2008138093 | O | 20475.05 | 83510.25 | C | 0.6467032 | Zero |
| 2008138329 | O | 124989.76 | 3316019.94 | C | 0.1010250 | Zero |
| 2008138368 | O | 63180.31 | 16819.69 | C | 0.7281607 | NonZero |
| 2008138496 | O | 204459.93 | 15540.07 | O | 0.9354681 | NonZero |
| 2008138605 | O | 3541.86 | 322.00 | C | 0.6396368 | NonZero |
Since I am only interested in predicting incremental payments for claims that were simulated to have a non-zero incremental payment, all claims that were closed at age 30 and were simulated to be closed at 42 will be given an incremental payment of zero.
Additionally, all claims that were simulated by the Zero Payment Model to have a Zero payment will be given an incremental payment of zero.
# separate zeros from non zeros
zero_claims <- filter(zm_pred, zero_sim == "Zero")
nzm_pred <- filter(zm_pred, zero_sim == "NonZero")Now for the final simulations I simulate all the claims that were predicted to have a non-zero incremental payment.
### Quasi Poisson Simulation
nzm_pred$tot_pd_incr_fit <- exp(predict(nzm_model, newdata = nzm_pred))
# use negative binomial to randomly disperse claims from predicted fit
nzm_pred$tot_pd_incr_sim <- sapply(nzm_pred$tot_pd_incr_fit,
function(x) {
rnbinom(n = 1, size = x ^ (1/5), prob = 1 / (1 + x ^ (4/5)))
})closed_closed_data$tot_pd_incr_sim <- 0
zero_claims$tot_pd_incr_sim <- 0
closed_closed_data$sim_type <- "Close_Close"
zero_claims$sim_type <- "Zero"
nzm_pred$sim_type <- "Non_Zero"
cols <- c("sim_num", "claim_number", "status_act", "tot_pd_incr_sim", "sim_type")
sim_1 <- closed_closed_data[, cols]
sim_2 <- zero_claims[, cols]
sim_3 <- nzm_pred[, cols]
full_sim <- rbind(sim_1, sim_2, sim_3)
kable(
full_sim[sample(1:nrow(full_sim), 20), ],
row.names = FALSE,
col.names = c("Sim Num", "Claim Num", "Sim Status", "Sim Payment", "Sim Type"))| Sim Num | Claim Num | Sim Status | Sim Payment | Sim Type |
|---|---|---|---|---|
| V1436 | 2008143184 | C | 0 | Close_Close |
| V1907 | 2008141841 | C | 0 | Close_Close |
| V1753 | 2009155522 | O | 282069 | Non_Zero |
| V1109 | 2008146393 | C | 0 | Close_Close |
| V1587 | 2009152667 | C | 0 | Close_Close |
| V1530 | 2008138722 | C | 0 | Close_Close |
| V404 | 2009155055 | C | 0 | Close_Close |
| V226 | 2009156783 | O | 13473 | Non_Zero |
| V550 | 2008144828 | C | 0 | Close_Close |
| V306 | 2009149406 | O | 145973 | Non_Zero |
| V1921 | 2009156607 | C | 0 | Close_Close |
| V798 | 2008142796 | C | 0 | Close_Close |
| V1642 | 2008141299 | C | 0 | Close_Close |
| V298 | 2008145434 | O | 185065 | Non_Zero |
| V853 | 2008146293 | C | 0 | Close_Close |
| V1249 | 2009153881 | C | 0 | Close_Close |
| V1383 | 2008149110 | C | 0 | Close_Close |
| V1691 | 2008143511 | C | 0 | Close_Close |
| V683 | 2008142919 | C | 0 | Close_Close |
| V1186 | 2008140829 | C | 0 | Close_Close |
# find actual number of open claims and incremental payment dollars
pred_data_actuals <- mutate(cm_pred_data, status_act = ifelse(status_act == "C", 0, 1))
open_actual <- sum(pred_data_actuals$status_act)
payments_actual <- sum(pred_data_actuals$tot_pd_incr_act)The blue dashed vertical line marks the actual number of open claims in the test data at 42 months development. The white histogram shows the simulated distribution of open claims at 42 as determined from the simulation based on the claim closure model.
full_sim_agg <- mutate(full_sim, open = ifelse(status_act == "C", 0, 1)) %>%
group_by(sim_num) %>%
summarise(n = n(),
open_claims = sum(open),
incremental_paid = sum(tot_pd_incr_sim))
ggplot(full_sim_agg, aes(x = open_claims)) +
geom_histogram(fill = "white", colour = "black") +
ggtitle("Histogram of Simulated Open Claim Counts") +
ylab("Number of Observations") +
xlab("Open Claim Counts") +
geom_vline(xintercept = open_actual, size = 1,
colour = "blue", linetype = "longdash")The blue dashed vertical line marks the actual incremental payments in the test data between age 30 and 42. The white histogram shows the simulated distribution of incremental payments between age 30 and 42 months for all claims in the test data. The simulation is based on the incremental payment model.
ggplot(full_sim_agg, aes(x = incremental_paid)) +
geom_histogram(fill = "white", colour = "black") +
ggtitle("Histogram of Simulated Incremental Payments") +
ylab("Number of Observations") +
xlab("Incremental Payments") +
geom_vline(xintercept = payments_actual, size = 1,
colour = "blue", linetype = "longdash") +
scale_x_continuous(labels = dollar)The blue dashed vertical line marks the actual incremental payments in the test data for the claim in the Select Claim Number input box between age 30 and 42.
# selectInput(
# "sel_claim",
# "Select Claim number",
# choices = unique(claims$claim_number)[1:50],
# selected = 2008146184
# )
input <- list()
input$sel_claim <- "2008146184"
indiv <- full_sim[full_sim$claim_number == input$sel_claim, ]
indiv_act <- claims[claims$claim_number == input$sel_claim, "tot_pd_incr_act"]
plot_data <- list(
indiv,
indiv_act
)
ggplot(plot_data[[1]], aes(x = tot_pd_incr_sim)) +
geom_histogram(fill = "white", colour = "black") +
ggtitle(paste0("Histogram of Simulated Incremental Payments for claim ", input$sel_claim)) +
ylab("Number of Observations") +
xlab("Incremental Payments") +
geom_vline(xintercept = plot_data[[2]], size = 1,
colour = "blue", linetype = "longdash") +
scale_x_continuous(labels = dollar)out <- claims[claims$claim_number == input$sel_claim, 3:8]
names(out) <- c("Claim Num", "Status", "Case", "Incemental Payment", "Actual Status", "Actual Payment")
claim_stats <- out
kable(claim_stats)| Claim Num | Status | Case | Incemental Payment | Actual Status | Actual Payment | |
|---|---|---|---|---|---|---|
| 416 | 2008146184 | O | 73000 | 479.7 | C | 61897.94 |
WIP
I used R, the free and open source statistical programming environment, for all the data analysis, model fitting, simulations, graphics, and data output.
The caret package was used extensively for the heavy lifting predictive modeling.
Detail of the R environment at the time this report is available below:
sessionInfo()## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] webshot_0.5.2 qs_0.25.1 bindrcpp_0.2.2 e1071_1.7-9
## [5] shiny_1.7.0 scales_1.1.1 DiagrammeR_1.0.6.1 knitr_1.36
## [9] lubridate_1.7.10 caret_6.0-89 lattice_0.20-44 ggplot2_3.3.5
## [13] tidyr_1.1.4 dplyr_1.0.7
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.0 jsonlite_1.7.2 splines_4.1.1
## [4] foreach_1.5.1 prodlim_2019.11.13 RcppParallel_5.1.4
## [7] bslib_0.3.0 highr_0.9 stats4_4.1.1
## [10] yaml_2.2.1 globals_0.14.0 ipred_0.9-12
## [13] pillar_1.6.3 glue_1.4.2 pROC_1.18.0
## [16] digest_0.6.28 RColorBrewer_1.1-2 promises_1.2.0.1
## [19] stringfish_0.15.2 colorspace_2.0-2 recipes_0.1.17
## [22] htmltools_0.5.2 httpuv_1.6.3 Matrix_1.3-4
## [25] plyr_1.8.6 timeDate_3043.102 pkgconfig_2.0.3
## [28] listenv_0.8.0 purrr_0.3.4 xtable_1.8-4
## [31] later_1.3.0 gower_0.2.2 RApiSerialize_0.1.0
## [34] lava_1.6.10 tibble_3.1.4 proxy_0.4-26
## [37] mgcv_1.8-36 farver_2.1.0 generics_0.1.0
## [40] ellipsis_0.3.2 withr_2.4.2 nnet_7.3-16
## [43] survival_3.2-11 magrittr_2.0.1 crayon_1.4.1
## [46] mime_0.12 evaluate_0.14 future_1.22.1
## [49] fansi_0.5.0 parallelly_1.28.1 nlme_3.1-152
## [52] MASS_7.3-54 class_7.3-19 tools_4.1.1
## [55] data.table_1.14.2 lifecycle_1.0.1 stringr_1.4.0
## [58] munsell_0.5.0 compiler_4.1.1 jquerylib_0.1.4
## [61] rlang_0.4.11 grid_4.1.1 iterators_1.0.13
## [64] htmlwidgets_1.5.4 visNetwork_2.1.0 labeling_0.4.2
## [67] rmarkdown_2.11 gtable_0.3.0 ModelMetrics_1.2.2.2
## [70] codetools_0.2-18 reshape2_1.4.4 R6_2.5.1
## [73] fastmap_1.1.0 future.apply_1.8.1 utf8_1.2.2
## [76] bindr_0.1.1 stringi_1.7.4 parallel_4.1.1
## [79] Rcpp_1.0.7 vctrs_0.3.8 rpart_4.1-15
## [82] tidyselect_1.1.1 xfun_0.26
summary(cm_model)##
## Call:
## NULL
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9227 -0.1231 -0.1231 -0.1222 3.1558
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.70922 0.15916 -23.304 < 0.0000000000000002 ***
## statusO 2.08476 0.09473 22.008 < 0.0000000000000002 ***
## tot_rx 0.12898 0.04326 2.982 0.00286 **
## tot_pd_incr -0.29871 0.11372 -2.627 0.00862 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3127.4 on 4089 degrees of freedom
## Residual deviance: 1628.8 on 4086 degrees of freedom
## AIC: 1636.8
##
## Number of Fisher Scoring iterations: 7
summary(zm_model)##
## Call:
## NULL
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8772 -0.7730 -0.6295 -0.0603 5.5902
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.55421 0.11849 -13.116 < 0.0000000000000002 ***
## status_actO -0.33479 0.07890 -4.243 0.00002202 ***
## tot_rx -1.82081 0.40290 -4.519 0.00000621 ***
## tot_pd_incr 0.22879 0.08503 2.691 0.00713 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1098.7 on 1019 degrees of freedom
## Residual deviance: 1021.1 on 1016 degrees of freedom
## AIC: 1029.1
##
## Number of Fisher Scoring iterations: 7
summary(nzm_model)##
## Family: quasipoisson
## Link function: log
##
## Formula:
## tot_pd_incr_act ~ status_act + s(tot_rx) + s(tot_pd_incr)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.7247 0.1018 105.34 < 0.0000000000000002 ***
## status_actO -0.3492 0.1145 -3.05 0.00237 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(tot_rx) 8.734 8.967 47.905 < 0.0000000000000002 ***
## s(tot_pd_incr) 7.327 8.112 3.323 0.000893 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.687 Deviance explained = 66.3%
## GCV = 96448 Scale est. = 1.6565e+05 n = 786