lm() es la función de R para ajustar modelos lineales. Es el modelo estadístico más básico que existe y más facil de interpretar. Para interpretarlo se usa la medida R-cuadrada, que significa qué tan cerca están los datos de la línea de regresión ajustada (Va de 0 a 1, donde 1 es que el modelo explica toda la variabilidad)
# file.choose()
base_de_datos <- read.csv("/Users/yessicaacosta/Downloads/seguros (2).csv")
resumen <- summary(base_de_datos)
resumen
## ClaimID TotalPaid TotalReserves TotalRecovery
## Min. : 777632 Min. : 0 Min. : 0 Min. : 0.00
## 1st Qu.: 800748 1st Qu.: 83 1st Qu.: 0 1st Qu.: 0.00
## Median : 812128 Median : 271 Median : 0 Median : 0.00
## Mean : 1864676 Mean : 10404 Mean : 3368 Mean : 66.05
## 3rd Qu.: 824726 3rd Qu.: 1122 3rd Qu.: 0 3rd Qu.: 0.00
## Max. :62203364 Max. :4527291 Max. :1529053 Max. :100000.00
##
## IndemnityPaid OtherPaid TotalIncurredCost ClaimStatus
## Min. : 0 Min. : 0 Min. : -10400 Length:31619
## 1st Qu.: 0 1st Qu.: 80 1st Qu.: 80 Class :character
## Median : 0 Median : 265 Median : 266 Mode :character
## Mean : 4977 Mean : 5427 Mean : 13706
## 3rd Qu.: 0 3rd Qu.: 1023 3rd Qu.: 1098
## Max. :640732 Max. :4129915 Max. :4734750
##
## IncidentDate IncidentDescription ReturnToWorkDate ClaimantOpenedDate
## Length:31619 Length:31619 Length:31619 Length:31619
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## ClaimantClosedDate EmployerNotificationDate ReceivedDate
## Length:31619 Length:31619 Length:31619
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## IsDenied Transaction_Time Procesing_Time ClaimantAge_at_DOI
## Min. :0.00000 Min. : 0 Min. : 0.00 Min. :14.0
## 1st Qu.:0.00000 1st Qu.: 211 1st Qu.: 4.00 1st Qu.:33.0
## Median :0.00000 Median : 780 Median : 10.00 Median :42.0
## Mean :0.04463 Mean : 1004 Mean : 62.99 Mean :41.6
## 3rd Qu.:0.00000 3rd Qu.: 1440 3rd Qu.: 24.00 3rd Qu.:50.0
## Max. :1.00000 Max. :16428 Max. :11558.00 Max. :94.0
## NA's :614
## Gender ClaimantType InjuryNature BodyPartRegion
## Length:31619 Length:31619 Length:31619 Length:31619
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## BodyPart AverageWeeklyWage1 ClaimID1 BillReviewALE
## Length:31619 Min. : 100.0 Min. : 777632 Min. : -448.0
## Class :character 1st Qu.: 492.0 1st Qu.: 800748 1st Qu.: 16.0
## Mode :character Median : 492.0 Median : 812128 Median : 24.0
## Mean : 536.5 Mean : 1864676 Mean : 188.7
## 3rd Qu.: 492.0 3rd Qu.: 824726 3rd Qu.: 64.1
## Max. :8613.5 Max. :62203364 Max. :46055.3
## NA's :14912
## Hospital PhysicianOutpatient Rx
## Min. : -12570.4 Min. : -549.5 Min. : -160.7
## 1st Qu.: 210.5 1st Qu.: 105.8 1st Qu.: 22.9
## Median : 613.9 Median : 218.0 Median : 61.5
## Mean : 5113.2 Mean : 1813.2 Mean : 1695.2
## 3rd Qu.: 2349.1 3rd Qu.: 680.6 3rd Qu.: 189.0
## Max. :2759604.0 Max. :1219766.6 Max. :631635.5
## NA's :19655 NA's :2329 NA's :20730
columns_tt <- c("ClaimStatus", "Gender", "ClaimantType", "InjuryNature", "BodyPartRegion", "BodyPart")
base_de_datos[columns_tt] <- lapply(base_de_datos[columns_tt], as.factor)
regresion <- lm(TotalIncurredCost ~ IsDenied +Transaction_Time +Procesing_Time + ClaimantAge_at_DOI + Gender + ClaimantType + InjuryNature + BodyPartRegion + AverageWeeklyWage1, data=base_de_datos)
summary(regresion)
##
## Call:
## lm(formula = TotalIncurredCost ~ IsDenied + Transaction_Time +
## Procesing_Time + ClaimantAge_at_DOI + Gender + ClaimantType +
## InjuryNature + BodyPartRegion + AverageWeeklyWage1, data = base_de_datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -227348 -4663 -27 2823 1363878
##
## Coefficients:
## Estimate
## (Intercept) -4.535e+03
## IsDenied -7.661e+03
## Transaction_Time 3.810e+00
## Procesing_Time 2.697e+01
## ClaimantAge_at_DOI 8.266e+01
## GenderMale 3.766e+02
## GenderNot Available 1.979e+04
## ClaimantTypeMedical Only -2.188e+04
## ClaimantTypeReport Only -2.057e+04
## InjuryNatureAll Other Specific Injuries, Noc 1.400e+04
## InjuryNatureAmputation 2.081e+04
## InjuryNatureAngina Pectoris -5.037e+02
## InjuryNatureAsbestosis 1.841e+04
## InjuryNatureAsphyxiation 1.222e+04
## InjuryNatureBlack Lung 1.530e+04
## InjuryNatureBurn 2.046e+04
## InjuryNatureCancer 4.111e+04
## InjuryNatureCarpal Tunnel Syndrome 1.218e+04
## InjuryNatureConcussion 2.409e+04
## InjuryNatureContagious Disease 1.229e+04
## InjuryNatureContusion 1.654e+04
## InjuryNatureCrushing 2.028e+04
## InjuryNatureDermatitis 1.675e+04
## InjuryNatureDislocation 1.725e+04
## InjuryNatureDust Disease, NOC 1.739e+04
## InjuryNatureElectric Shock 1.353e+04
## InjuryNatureForeign Body 1.512e+04
## InjuryNatureFracture 1.836e+04
## InjuryNatureFreezing 1.203e+04
## InjuryNatureHearing Loss Or Impairment 6.345e+03
## InjuryNatureHeat Prostration 1.386e+04
## InjuryNatureHernia 8.514e+03
## InjuryNatureInfection 1.352e+04
## InjuryNatureInflammation 1.798e+04
## InjuryNatureLaceration 1.639e+04
## InjuryNatureLoss of Hearing 1.063e+04
## InjuryNatureMental Disorder 2.029e+04
## InjuryNatureMental Stress 1.072e+04
## InjuryNatureMultiple Injuries Including Both Physical and Psychological 2.613e+04
## InjuryNatureMultiple Physical Injuries Only 1.815e+04
## InjuryNatureMyocardial Infarction 2.289e+04
## InjuryNatureNo Physical Injury 1.487e+04
## InjuryNatureNon-Standard Code 1.487e+04
## InjuryNatureNot Available 2.001e+04
## InjuryNaturePoisoning?Chemical (Other Than Metals) 1.296e+04
## InjuryNaturePoisoning?General (NOT OD or Cumulative Injury) 1.624e+04
## InjuryNaturePuncture 1.671e+04
## InjuryNatureRadiation 1.767e+04
## InjuryNatureRespiratory Disorders 1.557e+04
## InjuryNatureRupture 2.184e+04
## InjuryNatureSeverance 4.474e+04
## InjuryNatureSprain 1.738e+04
## InjuryNatureStrain 1.685e+04
## InjuryNatureSyncope 1.974e+04
## InjuryNatureVascular -4.479e+03
## InjuryNatureVDT-Related Disease -5.763e+03
## InjuryNatureVision Loss 1.928e+04
## BodyPartRegionLower Extremities -1.660e+02
## BodyPartRegionMultiple Body Parts 7.151e+02
## BodyPartRegionNeck 3.031e+02
## BodyPartRegionNon-Standard Code -1.381e+04
## BodyPartRegionTrunk 1.010e+03
## BodyPartRegionUpper Extremities -1.018e+03
## AverageWeeklyWage1 4.815e+00
## Std. Error
## (Intercept) 2.522e+04
## IsDenied 7.203e+02
## Transaction_Time 1.621e-01
## Procesing_Time 7.808e-01
## ClaimantAge_at_DOI 1.285e+01
## GenderMale 2.933e+02
## GenderNot Available 2.820e+03
## ClaimantTypeMedical Only 3.746e+02
## ClaimantTypeReport Only 8.238e+02
## InjuryNatureAll Other Specific Injuries, Noc 2.522e+04
## InjuryNatureAmputation 2.695e+04
## InjuryNatureAngina Pectoris 3.088e+04
## InjuryNatureAsbestosis 2.576e+04
## InjuryNatureAsphyxiation 3.565e+04
## InjuryNatureBlack Lung 3.565e+04
## InjuryNatureBurn 2.525e+04
## InjuryNatureCancer 2.819e+04
## InjuryNatureCarpal Tunnel Syndrome 2.527e+04
## InjuryNatureConcussion 2.534e+04
## InjuryNatureContagious Disease 2.539e+04
## InjuryNatureContusion 2.521e+04
## InjuryNatureCrushing 2.530e+04
## InjuryNatureDermatitis 2.524e+04
## InjuryNatureDislocation 2.530e+04
## InjuryNatureDust Disease, NOC 2.911e+04
## InjuryNatureElectric Shock 2.556e+04
## InjuryNatureForeign Body 2.523e+04
## InjuryNatureFracture 2.523e+04
## InjuryNatureFreezing 2.911e+04
## InjuryNatureHearing Loss Or Impairment 2.587e+04
## InjuryNatureHeat Prostration 2.551e+04
## InjuryNatureHernia 2.541e+04
## InjuryNatureInfection 2.534e+04
## InjuryNatureInflammation 2.525e+04
## InjuryNatureLaceration 2.522e+04
## InjuryNatureLoss of Hearing 2.599e+04
## InjuryNatureMental Disorder 2.762e+04
## InjuryNatureMental Stress 2.550e+04
## InjuryNatureMultiple Injuries Including Both Physical and Psychological 2.604e+04
## InjuryNatureMultiple Physical Injuries Only 2.523e+04
## InjuryNatureMyocardial Infarction 2.617e+04
## InjuryNatureNo Physical Injury 2.525e+04
## InjuryNatureNon-Standard Code 2.542e+04
## InjuryNatureNot Available 3.566e+04
## InjuryNaturePoisoning?Chemical (Other Than Metals) 2.543e+04
## InjuryNaturePoisoning?General (NOT OD or Cumulative Injury) 2.599e+04
## InjuryNaturePuncture 2.522e+04
## InjuryNatureRadiation 2.610e+04
## InjuryNatureRespiratory Disorders 2.526e+04
## InjuryNatureRupture 2.557e+04
## InjuryNatureSeverance 2.624e+04
## InjuryNatureSprain 2.522e+04
## InjuryNatureStrain 2.521e+04
## InjuryNatureSyncope 2.547e+04
## InjuryNatureVascular 3.565e+04
## InjuryNatureVDT-Related Disease 3.565e+04
## InjuryNatureVision Loss 2.625e+04
## BodyPartRegionLower Extremities 6.133e+02
## BodyPartRegionMultiple Body Parts 6.570e+02
## BodyPartRegionNeck 9.783e+02
## BodyPartRegionNon-Standard Code 3.453e+03
## BodyPartRegionTrunk 6.624e+02
## BodyPartRegionUpper Extremities 5.780e+02
## AverageWeeklyWage1 6.662e-01
## t value
## (Intercept) -0.180
## IsDenied -10.637
## Transaction_Time 23.498
## Procesing_Time 34.543
## ClaimantAge_at_DOI 6.432
## GenderMale 1.284
## GenderNot Available 7.017
## ClaimantTypeMedical Only -58.402
## ClaimantTypeReport Only -24.973
## InjuryNatureAll Other Specific Injuries, Noc 0.555
## InjuryNatureAmputation 0.772
## InjuryNatureAngina Pectoris -0.016
## InjuryNatureAsbestosis 0.715
## InjuryNatureAsphyxiation 0.343
## InjuryNatureBlack Lung 0.429
## InjuryNatureBurn 0.810
## InjuryNatureCancer 1.458
## InjuryNatureCarpal Tunnel Syndrome 0.482
## InjuryNatureConcussion 0.950
## InjuryNatureContagious Disease 0.484
## InjuryNatureContusion 0.656
## InjuryNatureCrushing 0.802
## InjuryNatureDermatitis 0.663
## InjuryNatureDislocation 0.682
## InjuryNatureDust Disease, NOC 0.597
## InjuryNatureElectric Shock 0.529
## InjuryNatureForeign Body 0.599
## InjuryNatureFracture 0.728
## InjuryNatureFreezing 0.413
## InjuryNatureHearing Loss Or Impairment 0.245
## InjuryNatureHeat Prostration 0.543
## InjuryNatureHernia 0.335
## InjuryNatureInfection 0.533
## InjuryNatureInflammation 0.712
## InjuryNatureLaceration 0.650
## InjuryNatureLoss of Hearing 0.409
## InjuryNatureMental Disorder 0.735
## InjuryNatureMental Stress 0.421
## InjuryNatureMultiple Injuries Including Both Physical and Psychological 1.004
## InjuryNatureMultiple Physical Injuries Only 0.719
## InjuryNatureMyocardial Infarction 0.875
## InjuryNatureNo Physical Injury 0.589
## InjuryNatureNon-Standard Code 0.585
## InjuryNatureNot Available 0.561
## InjuryNaturePoisoning?Chemical (Other Than Metals) 0.510
## InjuryNaturePoisoning?General (NOT OD or Cumulative Injury) 0.625
## InjuryNaturePuncture 0.663
## InjuryNatureRadiation 0.677
## InjuryNatureRespiratory Disorders 0.617
## InjuryNatureRupture 0.854
## InjuryNatureSeverance 1.705
## InjuryNatureSprain 0.689
## InjuryNatureStrain 0.668
## InjuryNatureSyncope 0.775
## InjuryNatureVascular -0.126
## InjuryNatureVDT-Related Disease -0.162
## InjuryNatureVision Loss 0.735
## BodyPartRegionLower Extremities -0.271
## BodyPartRegionMultiple Body Parts 1.089
## BodyPartRegionNeck 0.310
## BodyPartRegionNon-Standard Code -4.000
## BodyPartRegionTrunk 1.525
## BodyPartRegionUpper Extremities -1.762
## AverageWeeklyWage1 7.227
## Pr(>|t|)
## (Intercept) 0.8573
## IsDenied < 2e-16
## Transaction_Time < 2e-16
## Procesing_Time < 2e-16
## ClaimantAge_at_DOI 1.28e-10
## GenderMale 0.1993
## GenderNot Available 2.31e-12
## ClaimantTypeMedical Only < 2e-16
## ClaimantTypeReport Only < 2e-16
## InjuryNatureAll Other Specific Injuries, Noc 0.5790
## InjuryNatureAmputation 0.4400
## InjuryNatureAngina Pectoris 0.9870
## InjuryNatureAsbestosis 0.4748
## InjuryNatureAsphyxiation 0.7318
## InjuryNatureBlack Lung 0.6678
## InjuryNatureBurn 0.4179
## InjuryNatureCancer 0.1448
## InjuryNatureCarpal Tunnel Syndrome 0.6298
## InjuryNatureConcussion 0.3419
## InjuryNatureContagious Disease 0.6285
## InjuryNatureContusion 0.5118
## InjuryNatureCrushing 0.4228
## InjuryNatureDermatitis 0.5070
## InjuryNatureDislocation 0.4954
## InjuryNatureDust Disease, NOC 0.5503
## InjuryNatureElectric Shock 0.5966
## InjuryNatureForeign Body 0.5490
## InjuryNatureFracture 0.4668
## InjuryNatureFreezing 0.6795
## InjuryNatureHearing Loss Or Impairment 0.8063
## InjuryNatureHeat Prostration 0.5869
## InjuryNatureHernia 0.7376
## InjuryNatureInfection 0.5937
## InjuryNatureInflammation 0.4765
## InjuryNatureLaceration 0.5158
## InjuryNatureLoss of Hearing 0.6827
## InjuryNatureMental Disorder 0.4626
## InjuryNatureMental Stress 0.6741
## InjuryNatureMultiple Injuries Including Both Physical and Psychological 0.3155
## InjuryNatureMultiple Physical Injuries Only 0.4719
## InjuryNatureMyocardial Infarction 0.3818
## InjuryNatureNo Physical Injury 0.5560
## InjuryNatureNon-Standard Code 0.5586
## InjuryNatureNot Available 0.5747
## InjuryNaturePoisoning?Chemical (Other Than Metals) 0.6103
## InjuryNaturePoisoning?General (NOT OD or Cumulative Injury) 0.5320
## InjuryNaturePuncture 0.5076
## InjuryNatureRadiation 0.4983
## InjuryNatureRespiratory Disorders 0.5376
## InjuryNatureRupture 0.3930
## InjuryNatureSeverance 0.0882
## InjuryNatureSprain 0.4907
## InjuryNatureStrain 0.5038
## InjuryNatureSyncope 0.4385
## InjuryNatureVascular 0.9000
## InjuryNatureVDT-Related Disease 0.8716
## InjuryNatureVision Loss 0.4626
## BodyPartRegionLower Extremities 0.7866
## BodyPartRegionMultiple Body Parts 0.2764
## BodyPartRegionNeck 0.7567
## BodyPartRegionNon-Standard Code 6.34e-05
## BodyPartRegionTrunk 0.1271
## BodyPartRegionUpper Extremities 0.0781
## AverageWeeklyWage1 5.04e-13
##
## (Intercept)
## IsDenied ***
## Transaction_Time ***
## Procesing_Time ***
## ClaimantAge_at_DOI ***
## GenderMale
## GenderNot Available ***
## ClaimantTypeMedical Only ***
## ClaimantTypeReport Only ***
## InjuryNatureAll Other Specific Injuries, Noc
## InjuryNatureAmputation
## InjuryNatureAngina Pectoris
## InjuryNatureAsbestosis
## InjuryNatureAsphyxiation
## InjuryNatureBlack Lung
## InjuryNatureBurn
## InjuryNatureCancer
## InjuryNatureCarpal Tunnel Syndrome
## InjuryNatureConcussion
## InjuryNatureContagious Disease
## InjuryNatureContusion
## InjuryNatureCrushing
## InjuryNatureDermatitis
## InjuryNatureDislocation
## InjuryNatureDust Disease, NOC
## InjuryNatureElectric Shock
## InjuryNatureForeign Body
## InjuryNatureFracture
## InjuryNatureFreezing
## InjuryNatureHearing Loss Or Impairment
## InjuryNatureHeat Prostration
## InjuryNatureHernia
## InjuryNatureInfection
## InjuryNatureInflammation
## InjuryNatureLaceration
## InjuryNatureLoss of Hearing
## InjuryNatureMental Disorder
## InjuryNatureMental Stress
## InjuryNatureMultiple Injuries Including Both Physical and Psychological
## InjuryNatureMultiple Physical Injuries Only
## InjuryNatureMyocardial Infarction
## InjuryNatureNo Physical Injury
## InjuryNatureNon-Standard Code
## InjuryNatureNot Available
## InjuryNaturePoisoning?Chemical (Other Than Metals)
## InjuryNaturePoisoning?General (NOT OD or Cumulative Injury)
## InjuryNaturePuncture
## InjuryNatureRadiation
## InjuryNatureRespiratory Disorders
## InjuryNatureRupture
## InjuryNatureSeverance .
## InjuryNatureSprain
## InjuryNatureStrain
## InjuryNatureSyncope
## InjuryNatureVascular
## InjuryNatureVDT-Related Disease
## InjuryNatureVision Loss
## BodyPartRegionLower Extremities
## BodyPartRegionMultiple Body Parts
## BodyPartRegionNeck
## BodyPartRegionNon-Standard Code ***
## BodyPartRegionTrunk
## BodyPartRegionUpper Extremities .
## AverageWeeklyWage1 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25210 on 30941 degrees of freedom
## (614 observations deleted due to missingness)
## Multiple R-squared: 0.1839, Adjusted R-squared: 0.1822
## F-statistic: 110.6 on 63 and 30941 DF, p-value: < 2.2e-16
regresion <- lm(TotalIncurredCost ~ IsDenied +Transaction_Time +Procesing_Time + ClaimantAge_at_DOI + Gender + ClaimantType + BodyPartRegion + AverageWeeklyWage1, data=base_de_datos)
summary(regresion)
##
## Call:
## lm(formula = TotalIncurredCost ~ IsDenied + Transaction_Time +
## Procesing_Time + ClaimantAge_at_DOI + Gender + ClaimantType +
## BodyPartRegion + AverageWeeklyWage1, data = base_de_datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -223788 -4421 27 2735 1367299
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.207e+04 8.993e+02 13.421 < 2e-16 ***
## IsDenied -8.124e+03 6.996e+02 -11.612 < 2e-16 ***
## Transaction_Time 3.664e+00 1.582e-01 23.159 < 2e-16 ***
## Procesing_Time 2.676e+01 7.757e-01 34.504 < 2e-16 ***
## ClaimantAge_at_DOI 8.259e+01 1.282e+01 6.443 1.19e-10 ***
## GenderMale 3.614e+02 2.904e+02 1.245 0.2133
## GenderNot Available 1.989e+04 2.819e+03 7.056 1.75e-12 ***
## ClaimantTypeMedical Only -2.210e+04 3.630e+02 -60.877 < 2e-16 ***
## ClaimantTypeReport Only -2.087e+04 8.157e+02 -25.585 < 2e-16 ***
## BodyPartRegionLower Extremities 4.122e+02 5.496e+02 0.750 0.4532
## BodyPartRegionMultiple Body Parts 7.791e+02 6.046e+02 1.289 0.1975
## BodyPartRegionNeck 7.423e+02 9.439e+02 0.786 0.4316
## BodyPartRegionNon-Standard Code -1.495e+04 1.224e+03 -12.212 < 2e-16 ***
## BodyPartRegionTrunk 1.367e+03 5.703e+02 2.397 0.0165 *
## BodyPartRegionUpper Extremities -6.385e+02 5.216e+02 -1.224 0.2210
## AverageWeeklyWage1 4.885e+00 6.645e-01 7.351 2.02e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25230 on 30989 degrees of freedom
## (614 observations deleted due to missingness)
## Multiple R-squared: 0.1812, Adjusted R-squared: 0.1808
## F-statistic: 457.3 on 15 and 30989 DF, p-value: < 2.2e-16
datos_nuevos <- data.frame(IsDenied= 1, ClaimantType= "Indemnity", Gender= "Female", Transaction_Time= 1500, Procesing_Time= 100, ClaimantAge_at_DOI= 54, AverageWeeklyWage1= 450, BodyPartRegion="Head")
predict(regresion,datos_nuevos)
## 1
## 18776.85
El modelo de regresión lineal explica el 18% del comportamiento del costo total incurrido en el seguro, dados distintos valores. Este modelo se puede mejorar agregando otras variables que impacten al negocio.