Modelo Predictivo

Teoría

lm() es la función de R para ajustar modelos lineales. Es el modelo estadístico más básico que existe y más fácil de interpretar. Para interpretarlo se usa la medida R-cuadrada que significa que tan cerca estan los datos de la linea de regresion ajustada (Va de 0 a 1, donde 1 es que el modelo explica toda la variabilidad).

Importar base de datos

base_de_datos <- read.csv("C:\\Users\\Derek\\Documents\\seguros.csv")

Entender la base de datos

resumen <- summary(base_de_datos)
resumen
##     ClaimID           TotalPaid       TotalReserves     TotalRecovery      
##  Min.   :  777632   Min.   :      0   Min.   :      0   Min.   :     0.00  
##  1st Qu.:  800748   1st Qu.:     83   1st Qu.:      0   1st Qu.:     0.00  
##  Median :  812128   Median :    271   Median :      0   Median :     0.00  
##  Mean   : 1864676   Mean   :  10404   Mean   :   3368   Mean   :    66.05  
##  3rd Qu.:  824726   3rd Qu.:   1122   3rd Qu.:      0   3rd Qu.:     0.00  
##  Max.   :62203364   Max.   :4527291   Max.   :1529053   Max.   :100000.00  
##                                                                            
##  IndemnityPaid      OtherPaid       TotalIncurredCost ClaimStatus       
##  Min.   :     0   Min.   :      0   Min.   : -10400   Length:31619      
##  1st Qu.:     0   1st Qu.:     80   1st Qu.:     80   Class :character  
##  Median :     0   Median :    265   Median :    266   Mode  :character  
##  Mean   :  4977   Mean   :   5427   Mean   :  13706                     
##  3rd Qu.:     0   3rd Qu.:   1023   3rd Qu.:   1098                     
##  Max.   :640732   Max.   :4129915   Max.   :4734750                     
##                                                                         
##  IncidentDate       IncidentDescription ReturnToWorkDate   ClaimantOpenedDate
##  Length:31619       Length:31619        Length:31619       Length:31619      
##  Class :character   Class :character    Class :character   Class :character  
##  Mode  :character   Mode  :character    Mode  :character   Mode  :character  
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  ClaimantClosedDate EmployerNotificationDate ReceivedDate      
##  Length:31619       Length:31619             Length:31619      
##  Class :character   Class :character         Class :character  
##  Mode  :character   Mode  :character         Mode  :character  
##                                                                
##                                                                
##                                                                
##                                                                
##     IsDenied       Transaction_Time Procesing_Time     ClaimantAge_at_DOI
##  Min.   :0.00000   Min.   :    0    Min.   :    0.00   Min.   :14.0      
##  1st Qu.:0.00000   1st Qu.:  211    1st Qu.:    4.00   1st Qu.:33.0      
##  Median :0.00000   Median :  780    Median :   10.00   Median :42.0      
##  Mean   :0.04463   Mean   : 1004    Mean   :   62.99   Mean   :41.6      
##  3rd Qu.:0.00000   3rd Qu.: 1440    3rd Qu.:   24.00   3rd Qu.:50.0      
##  Max.   :1.00000   Max.   :16428    Max.   :11558.00   Max.   :94.0      
##                    NA's   :614                                           
##     Gender          ClaimantType       InjuryNature       BodyPartRegion    
##  Length:31619       Length:31619       Length:31619       Length:31619      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    BodyPart         AverageWeeklyWage1    ClaimID1        BillReviewALE    
##  Length:31619       Min.   : 100.0     Min.   :  777632   Min.   : -448.0  
##  Class :character   1st Qu.: 492.0     1st Qu.:  800748   1st Qu.:   16.0  
##  Mode  :character   Median : 492.0     Median :  812128   Median :   24.0  
##                     Mean   : 536.5     Mean   : 1864676   Mean   :  188.7  
##                     3rd Qu.: 492.0     3rd Qu.:  824726   3rd Qu.:   64.1  
##                     Max.   :8613.5     Max.   :62203364   Max.   :46055.3  
##                                                           NA's   :14912    
##     Hospital         PhysicianOutpatient       Rx          
##  Min.   : -12570.4   Min.   :   -549.5   Min.   :  -160.7  
##  1st Qu.:    210.5   1st Qu.:    105.8   1st Qu.:    22.9  
##  Median :    613.9   Median :    218.0   Median :    61.5  
##  Mean   :   5113.2   Mean   :   1813.2   Mean   :  1695.2  
##  3rd Qu.:   2349.1   3rd Qu.:    680.6   3rd Qu.:   189.0  
##  Max.   :2759604.0   Max.   :1219766.6   Max.   :631635.5  
##  NA's   :19655       NA's   :2329        NA's   :20730
plot(base_de_datos$AverageWeeklyWage1, base_de_datos$TotalIncurredCost, main="Influencia del Salario Promedio Semanal sobre el Costo Total Incurrido", xlab="Salario Promedio Semanal", ylab="Costo Total Incurrido")

Generar regresion (modelo lineal)

regresion <- lm(TotalIncurredCost ~ AverageWeeklyWage1 + ClaimantAge_at_DOI + Transaction_Time + Gender + IsDenied, data=base_de_datos)
summary(regresion)
## 
## Call:
## lm(formula = TotalIncurredCost ~ AverageWeeklyWage1 + ClaimantAge_at_DOI + 
##     Transaction_Time + Gender + IsDenied, data = base_de_datos)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -90875   -6560   -3294    -351 1380909 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -1.117e+04  7.254e+02 -15.403  < 2e-16 ***
## AverageWeeklyWage1   1.080e+01  7.140e-01  15.127  < 2e-16 ***
## ClaimantAge_at_DOI   1.555e+02  1.386e+01  11.217  < 2e-16 ***
## Transaction_Time     4.505e+00  1.617e-01  27.868  < 2e-16 ***
## GenderMale          -2.335e+01  3.128e+02  -0.075  0.94049    
## GenderNot Available  1.639e+04  3.053e+03   5.370 7.95e-08 ***
## IsDenied            -2.347e+03  7.538e+02  -3.114  0.00185 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27390 on 30998 degrees of freedom
##   (614 observations deleted due to missingness)
## Multiple R-squared:  0.03434,    Adjusted R-squared:  0.03415 
## F-statistic: 183.7 on 6 and 30998 DF,  p-value: < 2.2e-16

Evaluar, y en caso necesario, ajustar la regresion

regresion <- lm(TotalIncurredCost ~ AverageWeeklyWage1 + ClaimantAge_at_DOI + Transaction_Time + IsDenied, data=base_de_datos)
summary(regresion)
## 
## Call:
## lm(formula = TotalIncurredCost ~ AverageWeeklyWage1 + ClaimantAge_at_DOI + 
##     Transaction_Time + IsDenied, data = base_de_datos)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -90886   -6578   -3312    -366 1380848 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -1.116e+04  7.085e+02 -15.745  < 2e-16 ***
## AverageWeeklyWage1  1.080e+01  7.141e-01  15.128  < 2e-16 ***
## ClaimantAge_at_DOI  1.552e+02  1.385e+01  11.201  < 2e-16 ***
## Transaction_Time    4.531e+00  1.614e-01  28.074  < 2e-16 ***
## IsDenied           -2.367e+03  7.538e+02  -3.141  0.00169 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27400 on 31000 degrees of freedom
##   (614 observations deleted due to missingness)
## Multiple R-squared:  0.03343,    Adjusted R-squared:  0.03331 
## F-statistic: 268.1 on 4 and 31000 DF,  p-value: < 2.2e-16

Construir un modelo de prediccion

datos_nuevos <- data.frame(AverageWeeklyWage1=500, ClaimantAge_at_DOI=40, Transaction_Time=1000, IsDenied=0)
predict(regresion, datos_nuevos)
##       1 
## 4984.57

```

LS0tDQp0aXRsZTogIlNlZ3Vyb3MgRWplcmNpY2lvIg0KYXV0aG9yOiAiRGVyZWsgUGFjaGVjbyAtIEEwMTQxMjA0MiINCmRhdGU6ICIyMDI0LTA4LTE5Ig0Kb3V0cHV0OiANCiAgaHRtbF9kb2N1bWVudDoNCiAgICB0b2M6IFRSVUUNCiAgICB0b2NfZmxvYXQ6IFRSVUUNCiAgICBjb2RlX2Rvd25sb2FkOiBUUlVFDQogICAgdGhlbWU6IGNvc21vDQotLS0NCg0KIyBNb2RlbG8gUHJlZGljdGl2bw0KDQojIyA8c3BhbiBzdHlsZT0iY29sb3I6IGdyYXk7Ij5UZW9yw61hPC9zcGFuPg0KKmxtKCkqIGVzIGxhIGZ1bmNpw7NuIGRlIFIgcGFyYSBhanVzdGFyIG1vZGVsb3MgbGluZWFsZXMuDQpFcyBlbCBtb2RlbG8gZXN0YWTDrXN0aWNvIG3DoXMgYsOhc2ljbyBxdWUgZXhpc3RlIHkgbcOhcyBmw6FjaWwgZGUgaW50ZXJwcmV0YXIuDQpQYXJhIGludGVycHJldGFybG8gc2UgdXNhIGxhIG1lZGlkYSBSLWN1YWRyYWRhIHF1ZSBzaWduaWZpY2EgcXVlIHRhbiBjZXJjYSBlc3RhbiBsb3MgZGF0b3MgZGUgbGEgbGluZWEgZGUgcmVncmVzaW9uIGFqdXN0YWRhIChWYSBkZSAwIGEgMSwgZG9uZGUgMSBlcyBxdWUgZWwgbW9kZWxvIGV4cGxpY2EgdG9kYSBsYSB2YXJpYWJpbGlkYWQpLg0KDQojIEltcG9ydGFyIGJhc2UgZGUgZGF0b3MNCmBgYHtyfQ0KYmFzZV9kZV9kYXRvcyA8LSByZWFkLmNzdigiQzpcXFVzZXJzXFxEZXJla1xcRG9jdW1lbnRzXFxzZWd1cm9zLmNzdiIpDQpgYGANCg0KDQoNCiMgRW50ZW5kZXIgbGEgYmFzZSBkZSBkYXRvcw0KYGBge3J9DQpyZXN1bWVuIDwtIHN1bW1hcnkoYmFzZV9kZV9kYXRvcykNCnJlc3VtZW4NCnBsb3QoYmFzZV9kZV9kYXRvcyRBdmVyYWdlV2Vla2x5V2FnZTEsIGJhc2VfZGVfZGF0b3MkVG90YWxJbmN1cnJlZENvc3QsIG1haW49IkluZmx1ZW5jaWEgZGVsIFNhbGFyaW8gUHJvbWVkaW8gU2VtYW5hbCBzb2JyZSBlbCBDb3N0byBUb3RhbCBJbmN1cnJpZG8iLCB4bGFiPSJTYWxhcmlvIFByb21lZGlvIFNlbWFuYWwiLCB5bGFiPSJDb3N0byBUb3RhbCBJbmN1cnJpZG8iKQ0KYGBgDQoNCiMgR2VuZXJhciByZWdyZXNpb24gKG1vZGVsbyBsaW5lYWwpDQpgYGB7cn0NCnJlZ3Jlc2lvbiA8LSBsbShUb3RhbEluY3VycmVkQ29zdCB+IEF2ZXJhZ2VXZWVrbHlXYWdlMSArIENsYWltYW50QWdlX2F0X0RPSSArIFRyYW5zYWN0aW9uX1RpbWUgKyBHZW5kZXIgKyBJc0RlbmllZCwgZGF0YT1iYXNlX2RlX2RhdG9zKQ0Kc3VtbWFyeShyZWdyZXNpb24pDQpgYGANCg0KIyBFdmFsdWFyLCB5IGVuIGNhc28gbmVjZXNhcmlvLCBhanVzdGFyIGxhIHJlZ3Jlc2lvbiANCmBgYHtyfQ0KcmVncmVzaW9uIDwtIGxtKFRvdGFsSW5jdXJyZWRDb3N0IH4gQXZlcmFnZVdlZWtseVdhZ2UxICsgQ2xhaW1hbnRBZ2VfYXRfRE9JICsgVHJhbnNhY3Rpb25fVGltZSArIElzRGVuaWVkLCBkYXRhPWJhc2VfZGVfZGF0b3MpDQpzdW1tYXJ5KHJlZ3Jlc2lvbikNCg0KYGBgDQoNCiMgQ29uc3RydWlyIHVuIG1vZGVsbyBkZSBwcmVkaWNjaW9uDQpgYGB7cn0NCmRhdG9zX251ZXZvcyA8LSBkYXRhLmZyYW1lKEF2ZXJhZ2VXZWVrbHlXYWdlMT01MDAsIENsYWltYW50QWdlX2F0X0RPST00MCwgVHJhbnNhY3Rpb25fVGltZT0xMDAwLCBJc0RlbmllZD0wKQ0KcHJlZGljdChyZWdyZXNpb24sIGRhdG9zX251ZXZvcykNCmBgYA0KDQoNCg0KYGBg