模型验证和误差分析

auditor用于预测模型的可视化探索、解释和调试。

rm(list = ls())
library(auditor)
library(randomForest)

## randomForest 4.7-1.1

## Type rfNews() to see new features/changes/bug fixes.

library(DALEX)

## Welcome to DALEX (version: 2.4.2).
## Find examples and detailed introduction at: http://ema.drwhy.ai/

## 
## Attaching package: 'DALEX'

## The following object is masked from 'package:auditor':
## 
##     model_performance

titanic_glm_model <- randomForest(survived ~ .,data = titanic_imputed)

## Warning in randomForest.default(m, y, ...): The response has five or fewer
## unique values. Are you sure you want to do regression?

explainer_glm <- DALEX::explain(titanic_glm_model,
                         data = titanic_imputed,
                         y = titanic_imputed$survived)

## Preparation of a new explainer is initiated
##   -> model label       :  randomForest  (  default  )
##   -> data              :  2207  rows  8  cols 
##   -> target variable   :  2207  values 
##   -> predict function  :  yhat.randomForest  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package randomForest , ver. 4.7.1.1 , task regression (  default  ) 
##   -> predicted values  :  numerical, min =  0.01514764 , mean =  0.3211483 , max =  0.9939891  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.7658189 , mean =  0.001008435 , max =  0.901289  
##   A new explainer has been created!

exploration of residual

x <- model_residual(explainer_glm)
x

## Model label:  randomForest 
## Quantiles of Residuals:
##          0%         10%         20%         30%         40%         50% 
## -0.76581891 -0.25169370 -0.21023028 -0.16464552 -0.14276518 -0.10857934 
##         60%         70%         80%         90%        100% 
## -0.07373357  0.02490887  0.21206061  0.55671359  0.90128902

# autocorrelation of residual

evaluation of classifier

x <- model_evaluation(explainer_glm)
x

## Model label:  randomForest 
## 
##  True Positive Rate for cutoff 0.5: 0 
## 
##  False Positive Rate for cutoff 0.5: 0

plot(x)

plotD3_lift(x)

measuring a performance of model

x <- model_performance(explainer_glm)

x

## Measures for:  regression
## mse        : 0.1087034 
## rmse       : 0.3297021 
## r2         : 0.5022093 
## mad        : 0.169799
## 
## Residuals:
##          0%         10%         20%         30%         40%         50% 
## -0.76581891 -0.25169370 -0.21023028 -0.16464552 -0.14276518 -0.10857934 
##         60%         70%         80%         90%        100% 
## -0.07373357  0.02490887  0.21206061  0.55671359  0.90128902

Cook’s distance

这个是针对于回归模型而言的

x <- model_cooksdistance(explainer_glm)
x

verification of model fit（half-Normal plot）

这个是针对于回归模型而言的

model_halfnormal(explainer_glm)

可视化总结

residuals

plot_acf()
plotD3_acf()
plot_autocorrelation()
plot_correlation()
plot_pca()
plot_radar()
plot_prediction()
plot_rec()
plot_residual()
plot_residual_boxplot()
plot_residual_density()
plot_rroc()
plot_scalelocation()
plot_tsecdf()

classifier

plot_lift()
plot_roc()

influence of observation

plot_cooksdistance()

verification of models fit

plot_halfnormal()

模型验证和误差分析

Liam

2022-11-25

exploration of residual

evaluation of classifier

measuring a performance of model

Cook’s distance

verification of model fit（half-Normal plot）

可视化总结

residuals

classifier

influence of observation

verification of models fit