Objetivo Principal

Construir un modelo de Regresion de la variable Valor Financiado en función de todas las variables, exclusivamente para el Estrato Cinco.

## Loading required package: lattice
## Loading required package: ggplot2
## Loading required package: bit
## Attaching package bit
## package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
## creators: bit bitwhich
## coercion: as.logical as.integer as.bit as.bitwhich which
## operator: ! & | xor != ==
## querying: print length any all min max range sum summary
## bit access: length<- [ [<- [[ [[<-
## for more help type ?bit
## 
## Attaching package: 'bit'
## The following object is masked from 'package:base':
## 
##     xor
## Attaching package bit64
## package:bit64 (c) 2011-2012 Jens Oehlschlaegel (GPL-2 with commercial restrictions)
## creators: integer64 seq :
## coercion: as.integer64 as.vector as.logical as.integer as.double as.character as.bin
## logical operator: ! & | xor != == < <= >= >
## arithmetic operator: + - * / %/% %% ^
## math: sign abs sqrt log log2 log10
## math: floor ceiling trunc round
## querying: is.integer64 is.vector [is.atomic} [length] is.na format print
## aggregation: any all min max range sum prod
## cumulation: diff cummin cummax cumsum cumprod
## access: length<- [ [<- [[ [[<-
## combine: c rep cbind rbind as.data.frame
## for more help type ?bit64
## 
## Attaching package: 'bit64'
## The following object is masked from 'package:bit':
## 
##     still.identical
## The following objects are masked from 'package:base':
## 
##     %in%, :, is.double, match, order, rank
## 
## Attaching package: 'data.table'
## The following object is masked from 'package:bit':
## 
##     setattr
## -------------------------------------------------------------------------
## data.table + dplyr code now lives in dtplyr.
## Please library(dtplyr)!
## -------------------------------------------------------------------------
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:data.table':
## 
##     between, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: iterators
## Loading required package: snow
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin
## 
## Attaching package: 'parallel'
## The following objects are masked from 'package:snow':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, clusterSplit, makeCluster,
##     parApply, parCapply, parLapply, parRapply, parSapply,
##     splitIndices, stopCluster

Limpieza de Datos

## [1] 206146.5
##       variables IncNodePurity
## 1:  VALOR_VENTA  1.478679e+14
## 2:        LINEA  2.272819e+13
## 3:          MES  1.881624e+13
## 4:       CUOTAS  1.735168e+13
## 5:         freq  1.515928e+13
## 6:  reincidente  1.154540e+13
## 7:      weekday  1.116312e+13
## 8:     SEGMENTO  8.331836e+12
## 9: DEPARTAMENTO  2.969770e+12

Tenemos un model Random Forest con un 68% de ajuste. Hemos encontrado que hay cuatro variables que aportan el 95% de la varianza al modelo predictivo siendo VALOR_VENTA la más relevante.Ahora aplicaremos el modelo sobre los datos de Testeo.

La correlacion es del 80% en los datos de testing lo cual es bueno dada la aproximación inicial. Se recomienda reconstruir el modelo con más datos demográficos de clientes e intentar un análisis profundo de Heterocedasticidad en la data