Construir un modelo de Regresion de la variable Valor Financiado en función de todas las variables, exclusivamente para el Estrato Cinco.
## Loading required package: lattice
## Loading required package: ggplot2
## Loading required package: bit
## Attaching package bit
## package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
## creators: bit bitwhich
## coercion: as.logical as.integer as.bit as.bitwhich which
## operator: ! & | xor != ==
## querying: print length any all min max range sum summary
## bit access: length<- [ [<- [[ [[<-
## for more help type ?bit
##
## Attaching package: 'bit'
## The following object is masked from 'package:base':
##
## xor
## Attaching package bit64
## package:bit64 (c) 2011-2012 Jens Oehlschlaegel (GPL-2 with commercial restrictions)
## creators: integer64 seq :
## coercion: as.integer64 as.vector as.logical as.integer as.double as.character as.bin
## logical operator: ! & | xor != == < <= >= >
## arithmetic operator: + - * / %/% %% ^
## math: sign abs sqrt log log2 log10
## math: floor ceiling trunc round
## querying: is.integer64 is.vector [is.atomic} [length] is.na format print
## aggregation: any all min max range sum prod
## cumulation: diff cummin cummax cumsum cumprod
## access: length<- [ [<- [[ [[<-
## combine: c rep cbind rbind as.data.frame
## for more help type ?bit64
##
## Attaching package: 'bit64'
## The following object is masked from 'package:bit':
##
## still.identical
## The following objects are masked from 'package:base':
##
## %in%, :, is.double, match, order, rank
##
## Attaching package: 'data.table'
## The following object is masked from 'package:bit':
##
## setattr
## -------------------------------------------------------------------------
## data.table + dplyr code now lives in dtplyr.
## Please library(dtplyr)!
## -------------------------------------------------------------------------
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:data.table':
##
## between, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: iterators
## Loading required package: snow
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
##
## Attaching package: 'parallel'
## The following objects are masked from 'package:snow':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, clusterSplit, makeCluster,
## parApply, parCapply, parLapply, parRapply, parSapply,
## splitIndices, stopCluster
## [1] 206146.5
## variables IncNodePurity
## 1: VALOR_VENTA 1.478679e+14
## 2: LINEA 2.272819e+13
## 3: MES 1.881624e+13
## 4: CUOTAS 1.735168e+13
## 5: freq 1.515928e+13
## 6: reincidente 1.154540e+13
## 7: weekday 1.116312e+13
## 8: SEGMENTO 8.331836e+12
## 9: DEPARTAMENTO 2.969770e+12
Tenemos un model Random Forest con un 68% de ajuste. Hemos encontrado que hay cuatro variables que aportan el 95% de la varianza al modelo predictivo siendo VALOR_VENTA la más relevante.Ahora aplicaremos el modelo sobre los datos de Testeo.
La correlacion es del 80% en los datos de testing lo cual es bueno dada la aproximación inicial. Se recomienda reconstruir el modelo con más datos demográficos de clientes e intentar un análisis profundo de Heterocedasticidad en la data