En este trabajo vamos a tratar de ver si es posible con los los datos de una app de citas predecir si tendria exito la cita o si realmente se quedara en una simple quedada que no lleve a mas. Para ello, lo primero que hacemos es cargar el dataset, el cual he subido a la pagina github y lo cargamos. Esta es la base de datos con la que trataremos de llevar a cabo la prediccion.

url= "https://raw.githubusercontent.com/Presssen/MLwork/master/Speed%20Dating%20Data.csv"
df=read.csv(url)

Una vez abierto en R el archivo vamos a pasar a verlo y observar las variables a las que nos enfrentamos.

str(df)

## 'data.frame':    8378 obs. of  195 variables:
##  $ iid     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ id      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ gender  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ idg     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ condtn  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ wave    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ round   : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ position: int  7 7 7 7 7 7 7 7 7 7 ...
##  $ positin1: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ order   : int  4 3 10 5 7 6 1 2 8 9 ...
##  $ partner : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ pid     : int  11 12 13 14 15 16 17 18 19 20 ...
##  $ match   : int  0 0 1 1 1 0 0 0 1 0 ...
##  $ int_corr: num  0.14 0.54 0.16 0.61 0.21 0.25 0.34 0.5 0.28 -0.36 ...
##  $ samerace: int  0 0 1 0 0 0 0 0 0 0 ...
##  $ age_o   : int  27 22 22 23 24 25 30 27 28 24 ...
##  $ race_o  : int  2 2 4 2 3 2 2 2 2 2 ...
##  $ pf_o_att: num  35 60 19 30 30 ...
##  $ pf_o_sin: num  20 0 18 5 10 ...
##  $ pf_o_int: num  20 0 19 15 20 ...
##  $ pf_o_fun: num  20 40 18 40 10 ...
##  $ pf_o_amb: num  0 0 14 5 10 ...
##  $ pf_o_sha: num  5 0 12 5 20 ...
##  $ dec_o   : int  0 0 1 1 1 1 0 0 1 0 ...
##  $ attr_o  : num  6 7 10 7 8 7 3 6 7 6 ...
##  $ sinc_o  : num  8 8 10 8 7 7 6 7 7 6 ...
##  $ intel_o : num  8 10 10 9 9 8 7 5 8 6 ...
##  $ fun_o   : num  8 7 10 8 6 8 5 6 8 6 ...
##  $ amb_o   : num  8 7 10 9 9 7 8 8 8 6 ...
##  $ shar_o  : num  6 5 10 8 7 7 7 6 9 6 ...
##  $ like_o  : num  7 8 10 7 8 7 2 7 6.5 6 ...
##  $ prob_o  : num  4 4 10 7 6 6 1 5 8 6 ...
##  $ met_o   : int  2 2 1 2 2 2 2 2 2 2 ...
##  $ age     : int  21 21 21 21 21 21 21 21 21 21 ...
##  $ field   : Factor w/ 260 levels "","Acting","African-American Studies/History",..: 152 152 152 152 152 152 152 152 152 152 ...
##  $ field_cd: num  1 1 1 1 1 1 1 1 1 1 ...
##  $ undergra: Factor w/ 242 levels "","American University",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ mn_sat  : Factor w/ 69 levels "","1,011.00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ tuition : Factor w/ 116 levels "","10,052.00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ race    : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ imprace : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ imprelig: int  4 4 4 4 4 4 4 4 4 4 ...
##  $ from    : Factor w/ 270 levels "","94115","alabama",..: 56 56 56 56 56 56 56 56 56 56 ...
##  $ zipcode : Factor w/ 410 levels "","0","1,040",..: 262 262 262 262 262 262 262 262 262 262 ...
##  $ income  : Factor w/ 262 levels "","106,663.00",..: 239 239 239 239 239 239 239 239 239 239 ...
##  $ goal    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ date    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ go_out  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ career  : Factor w/ 368 levels "","?","??","a research position",..: 185 185 185 185 185 185 185 185 185 185 ...
##  $ career_c: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sports  : int  9 9 9 9 9 9 9 9 9 9 ...
##  $ tvsports: int  2 2 2 2 2 2 2 2 2 2 ...
##  $ exercise: int  8 8 8 8 8 8 8 8 8 8 ...
##  $ dining  : int  9 9 9 9 9 9 9 9 9 9 ...
##  $ museums : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ art     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ hiking  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ gaming  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ clubbing: int  5 5 5 5 5 5 5 5 5 5 ...
##  $ reading : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ tv      : int  9 9 9 9 9 9 9 9 9 9 ...
##  $ theater : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ movies  : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ concerts: int  10 10 10 10 10 10 10 10 10 10 ...
##  $ music   : int  9 9 9 9 9 9 9 9 9 9 ...
##  $ shopping: int  8 8 8 8 8 8 8 8 8 8 ...
##  $ yoga    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ exphappy: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ expnum  : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ attr1_1 : num  15 15 15 15 15 15 15 15 15 15 ...
##  $ sinc1_1 : num  20 20 20 20 20 20 20 20 20 20 ...
##  $ intel1_1: num  20 20 20 20 20 20 20 20 20 20 ...
##  $ fun1_1  : num  15 15 15 15 15 15 15 15 15 15 ...
##  $ amb1_1  : num  15 15 15 15 15 15 15 15 15 15 ...
##  $ shar1_1 : num  15 15 15 15 15 15 15 15 15 15 ...
##  $ attr4_1 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ sinc4_1 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ intel4_1: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ fun4_1  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ amb4_1  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ shar4_1 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ attr2_1 : num  35 35 35 35 35 35 35 35 35 35 ...
##  $ sinc2_1 : num  20 20 20 20 20 20 20 20 20 20 ...
##  $ intel2_1: num  15 15 15 15 15 15 15 15 15 15 ...
##  $ fun2_1  : num  20 20 20 20 20 20 20 20 20 20 ...
##  $ amb2_1  : num  5 5 5 5 5 5 5 5 5 5 ...
##  $ shar2_1 : num  5 5 5 5 5 5 5 5 5 5 ...
##  $ attr3_1 : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ sinc3_1 : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ fun3_1  : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ intel3_1: int  8 8 8 8 8 8 8 8 8 8 ...
##  $ amb3_1  : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ attr5_1 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ sinc5_1 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ intel5_1: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ fun5_1  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ amb5_1  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ dec     : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ attr    : num  6 7 5 7 5 4 7 4 7 5 ...
##   [list output truncated]

Ahora procedamos a obtener la informacion del dataset para hacer el estudio mas exhaustivo

Vemos que la base de datos es compleja. Tiene muchas celdas incompletas y ademas contamos con gran numero de variables. Por ello, ahora vamos a proceder a crear un nuevo dataset con las variables que consideramos relevantes para llevar a cabo nuestro estudio.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

df2<- df %>% select("gender", "age_o","samerace","attr_o","age","imprace","goal","career_c","go_out","date","exphappy","expnum")
str(df2)

## 'data.frame':    8378 obs. of  12 variables:
##  $ gender  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ age_o   : int  27 22 22 23 24 25 30 27 28 24 ...
##  $ samerace: int  0 0 1 0 0 0 0 0 0 0 ...
##  $ attr_o  : num  6 7 10 7 8 7 3 6 7 6 ...
##  $ age     : int  21 21 21 21 21 21 21 21 21 21 ...
##  $ imprace : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ goal    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ career_c: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ go_out  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ date    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ exphappy: int  3 3 3 3 3 3 3 3 3 3 ...
##  $ expnum  : int  2 2 2 2 2 2 2 2 2 2 ...

Vamos a renombrar las variables para hacerlas entendibles

names(df2) <- c("Genero", "Edad de la pareja","Misma raza","Nota cita","Edad","Importancia misma raza","Motivo cita","Carrera","Cuanto sales","Cantidad citas por semana","Opinion cita","Estima")

Una vez renombradas las variables vamos a interpretar los

datos numericos.

{df2$Genero[df2$Genero ==1] <- "Hombre"
df2$Genero[df2$Genero ==0] <- "Mujer"
df2$Carrera[df2$Carrera ==1] <- "Derecho"
df2$Carrera[df2$Carrera ==2] <- "Matematicas"
df2$Carrera[df2$Carrera ==3] <- "Psicologia"
df2$Carrera[df2$Carrera ==4] <- "Farmacia"
df2$Carrera[df2$Carrera ==5] <- "Ingenieria"
df2$Carrera[df2$Carrera ==6] <- "Periodismo"
df2$Carrera[df2$Carrera ==7] <- "Historia, religion, filosofia"
df2$Carrera[df2$Carrera ==8] <- "Economia"
df2$Carrera[df2$Carrera ==9] <- "Educacion"
df2$Carrera[df2$Carrera ==10] <- "Fisica y quimica"
df2$Carrera[df2$Carrera ==11] <- "Trabajo social"
df2$Carrera[df2$Carrera ==12] <- "Pregrado"
df2$Carrera[df2$Carrera ==13] <- "Ciencias politicas"
df2$Carrera[df2$Carrera ==14] <- "Pelicula"
df2$Carrera[df2$Carrera ==15] <- "Bellas artes"
df2$Carrera[df2$Carrera ==16] <- "Idiomas"
df2$Carrera[df2$Carrera ==17] <- "Arquitectura"
df2$Carrera[df2$Carrera ==18] <- "Otro"
df2$`Misma raza`[df2$`Misma raza` ==1] <- "Si"
df2$`Misma raza`[df2$`Misma raza` ==0] <- "No"}

Una vez lograda una buena base de datos, vamos a proceder a crear el

ARBOL DE DECISION.

Primero abrimos bibliotecas.

library(rpart)
library(rpart.plot)

Creamos el Arbol de decision.

arbol <- rpart(`Opinion cita` ~ ., method = "class", data = df2 )

Representamos graficamente el arbol.

print(arbol)

## n=8277 (101 observations deleted due to missingness)
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 8277 6244 5 (0.014 0.036 0.085 0.096 0.25 0.24 0.18 0.062 0.026 0.015)  
##     2) Carrera=Arquitectura,Bellas artes,Ciencias politicas,Economia,Farmacia,Idiomas,Periodismo,Pregrado,Trabajo social 1622 1042 6 (0.02 0.044 0.062 0.15 0.19 0.36 0.051 0.08 0.011 0.039)  
##       4) Importancia misma raza< 1.5 627  465 4 (0.016 0.049 0.083 0.26 0.22 0.11 0.099 0.057 0 0.1)  
##         8) Motivo cita>=4.5 146   69 5 (0.068 0 0.22 0.041 0.53 0 0 0 0 0.14) *
##         9) Motivo cita< 4.5 481  325 4 (0 0.064 0.042 0.32 0.13 0.15 0.13 0.075 0 0.087) *
##       5) Importancia misma raza>=1.5 995  485 6 (0.022 0.041 0.048 0.078 0.16 0.51 0.02 0.094 0.018 0) *
##     3) Carrera=Derecho,Educacion,Fisica y quimica,Historia, religion, filosofia,Ingenieria,Matematicas,Pelicula,Psicologia 6655 4927 5 (0.013 0.034 0.091 0.083 0.26 0.21 0.21 0.058 0.03 0.0093)  
##       6) Carrera=Derecho,Educacion,Fisica y quimica,Ingenieria,Pelicula 1855 1157 5 (0.028 0.032 0.058 0.091 0.38 0.15 0.15 0.065 0.051 0) *
##       7) Carrera=Historia, religion, filosofia,Matematicas,Psicologia 4800 3664 6 (0.0067 0.035 0.1 0.08 0.21 0.24 0.23 0.055 0.022 0.013)  
##        14) Estima< 2.5 381  269 3 (0 0.081 0.29 0.076 0.16 0.18 0.17 0 0.047 0) *
##        15) Estima>=2.5 4419 3350 6 (0.0072 0.031 0.088 0.081 0.22 0.24 0.24 0.059 0.019 0.014)  
##          30) Edad>=37.5 57    0 5 (0 0 0 0 1 0 0 0 0 0) *
##          31) Edad< 37.5 4362 3293 6 (0.0073 0.031 0.089 0.082 0.21 0.25 0.24 0.06 0.02 0.014)  
##            62) Estima< 6.5 2973 2216 6 (0.011 0.023 0.093 0.1 0.21 0.25 0.21 0.063 0.024 0.0071)  
##             124) Carrera=Matematicas 1392 1001 5 (0.023 0.01 0.15 0.053 0.28 0.23 0.17 0.06 0.017 0.015)  
##               248) Cantidad citas por semana>=4.5 930  582 5 (0.015 0 0.14 0.034 0.37 0.24 0.11 0.04 0.015 0.023) *
##               249) Cantidad citas por semana< 4.5 462  336 7 (0.039 0.03 0.15 0.091 0.093 0.2 0.27 0.1 0.019 0) *
##             125) Carrera=Historia, religion, filosofia,Psicologia 1581 1142 6 (0 0.035 0.046 0.14 0.15 0.28 0.25 0.066 0.03 0) *
##            63) Estima>=6.5 1389  957 7 (0 0.048 0.08 0.039 0.2 0.22 0.31 0.053 0.011 0.03) *

rpart.plot(arbol, extra = 100)

Va descendiendo la recta y cada vez el error se hace mas pequeno.

plotcp(arbol)

Mas adelante comentaremos los resultados.

K-MEANS

Te crea grupos de subconjuntos de los datos que tenemos cuyas observaciones sean “parecidas” entre si, y “distintas” de las observaciones de los otros subconjuntos.

Para realizar el k-means hace falta eliminar los NAs de las variables que vayas a utilizar. Procedo a realizar la eliminacion.

df3<- df %>% select("gender", "age_o","samerace","attr_o","age","imprace","goal","career_c","go_out","date","exphappy","expnum")
names(df3) <- c("Genero", "Edad de la pareja","Misma raza","Nota cita","Edad","Importancia misma raza","Motivo cita","Carrera","Cuanto sales","Cantidad citas por semana","Opinion cita","Estima")

Vamos a eliminar las columnas con NAs para hacer el estudio mas sencillo.

Cuantos NAs hay por columna?

colSums(is.na(df3))

##                    Genero         Edad de la pareja 
##                         0                       104 
##                Misma raza                 Nota cita 
##                         0                       212 
##                      Edad    Importancia misma raza 
##                        95                        79 
##               Motivo cita                   Carrera 
##                        79                       138 
##              Cuanto sales Cantidad citas por semana 
##                        79                        97 
##              Opinion cita                    Estima 
##                       101                      6578

# Datos
muting <- data.frame(df3)

# 1. Encontrar filas con NAs

# 2. Contar los NAs por columna
colSums(is.na(muting))

##                    Genero         Edad.de.la.pareja 
##                         0                       104 
##                Misma.raza                 Nota.cita 
##                         0                       212 
##                      Edad    Importancia.misma.raza 
##                        95                        79 
##               Motivo.cita                   Carrera 
##                        79                       138 
##              Cuanto.sales Cantidad.citas.por.semana 
##                        79                        97 
##              Opinion.cita                    Estima 
##                       101                      6578

# Vemos que todas las columnas tienen NAs menos 2.
# 3. Dejamos solo las columnas sin NAs.
muting <- muting[,colSums(is.na(muting)) == 0]

Como esto no nos sirve, ahora vamos a ver si eliminar las filas nos es util .

{df3 <- df3[!is.na(df3$`Opinion cita`),]
df3 <- df3[!is.na(df3$Carrera),]
df3 <- df3[!is.na(df3$Edad),]
df3 <- df3[!is.na(df3$`Edad de la pareja`),]
df3 <- df3[!is.na(df3$`Nota cita`),]
df3 <- df3[!is.na(df3$`Cantidad citas por semana`),]
df3 <- df3[!is.na(df3$Estima),]}

colSums(is.na(df3))

##                    Genero         Edad de la pareja 
##                         0                         0 
##                Misma raza                 Nota cita 
##                         0                         0 
##                      Edad    Importancia misma raza 
##                         0                         0 
##               Motivo cita                   Carrera 
##                         0                         0 
##              Cuanto sales Cantidad citas por semana 
##                         0                         0 
##              Opinion cita                    Estima 
##                         0                         0

Una vez sin NAs es las variables que nos interesan, vamos a proceder con clustering.

K-MEANS 2

Vamos a hacer 6 grupos…

(m1 <- kmeans(df3, 6))

## K-means clustering with 6 clusters of sizes 328, 310, 220, 384, 302, 146
## 
## Cluster means:
##      Genero Edad de la pareja Misma raza Nota cita     Edad
## 1 0.5213415          24.80488  0.3993902  5.839939 29.99390
## 2 0.3774194          24.71290  0.4774194  6.490323 23.92903
## 3 0.5318182          30.98636  0.4227273  6.459091 26.26364
## 4 0.6328125          23.93490  0.3281250  5.966146 23.22396
## 5 0.3543046          24.36424  0.3046358  6.084437 23.71192
## 6 0.7465753          23.78767  0.4452055  6.664384 24.49315
##   Importancia misma raza Motivo cita  Carrera Cuanto sales
## 1               2.503049    2.893293 4.070122     1.954268
## 2               5.838710    1.529032 5.525806     2.048387
## 3               3.927273    2.100000 4.363636     2.000000
## 4               3.309896    2.424479 2.023438     2.109375
## 5               4.334437    2.149007 8.672185     2.354305
## 6               2.006849    2.034247 4.773973     1.760274
##   Cantidad citas por semana Opinion cita    Estima
## 1                  5.442073     5.490854  4.512195
## 2                  4.880645     5.961290  9.393548
## 3                  5.263636     5.581818  3.459091
## 4                  4.986979     4.940104  2.661458
## 5                  5.427152     5.612583  2.715232
## 6                  3.013699     6.458904 17.397260
## 
## Clustering vector:
##   31   32   33   34   35   36   37   38   39   40   41   42   43   44   45 
##    4    4    4    4    4    4    3    4    4    4    2    2    2    2    2 
##   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60 
##    2    2    2    2    2    4    4    4    4    4    4    3    4    4    4 
##   61   62   63   64   65   66   67   68   69   70   71   72   73   74   75 
##    4    4    4    4    4    4    3    4    4    4    2    2    2    2    2 
##   76   77   78   79   80   81   82   83   84   85   86   87   88   89   90 
##    2    2    2    2    2    6    6    6    6    6    6    6    6    6    6 
##   91   92   93   94   95   97   98   99  100  101  102  103  104  105  106 
##    2    2    2    2    2    2    2    2    2    4    4    4    4    4    4 
##  107  108  109  110  111  112  113  114  115  116  117  118  119  120  121 
##    4    4    4    4    6    6    6    6    6    6    6    6    6    6    4 
##  122  123  124  125  126  127  128  129  130  131  132  133  134  135  136 
##    4    4    4    4    4    4    4    4    4    6    6    6    6    6    6 
##  137  138  139  140  141  142  143  144  145  146  147  148  149  150  151 
##    6    6    6    6    2    2    2    2    2    2    2    2    2    2    2 
##  152  153  154  155  156  157  158  159  160  161  162  163  164  165  166 
##    2    2    2    2    2    2    2    2    2    1    1    1    1    1    1 
##  167  168  169  170  171  172  173  174  175  176  177  178  179  180  181 
##    1    1    1    1    5    5    5    5    5    5    5    5    5    5    6 
##  182  183  184  185  186  187  188  189  190  191  192  193  194  195  196 
##    6    6    6    6    6    6    6    6    6    6    6    6    6    6    6 
##  197  198  199  200  201  202  203  204  205  206  207  208  209  210  211 
##    6    6    6    6    5    5    5    5    5    5    5    5    5    5    5 
##  212  213  214  215  216  217  218  219  220  221  222  223  225  226  227 
##    5    5    5    5    5    1    3    1    1    1    3    1    1    1    1 
##  228  229  230  231  232  233  234  235  236  237  238  239  240  241  242 
##    1    1    3    3    1    4    3    4    4    4    4    4    3    4    4 
##  243  244  245  246  247  248  249  250  251  252  253  254  255  256  257 
##    4    4    4    4    4    4    5    3    5    5    5    5    5    3    5 
##  258  259  260  261  262  263  264  265  266  267  268  269  270  271  272 
##    5    5    5    5    5    5    5    5    3    5    5    5    5    5    3 
##  273  274  275  276  277  278  279  280  281  282  283  284  285  286  287 
##    5    5    5    5    5    5    5    5    2    2    2    2    2    2    2 
##  288  289  290  291  292  293  294  295  296  297  298  299  300  301  302 
##    2    2    2    2    2    2    2    2    2    5    5    5    5    5    5 
##  303  304  305  306  307  308  309  310  311  312  329  330  331  332  333 
##    5    5    5    5    5    5    5    5    5    5    5    5    5    5    5 
##  334  335  336  337  338  339  340  341  342  343  344  345  346  347  348 
##    5    5    5    5    5    5    5    5    5    5    5    4    3    4    4 
##  349  350  351  352  353  354  355  356  357  358  359  360  361  362  363 
##    4    4    4    3    4    4    4    4    4    4    4    4    4    3    4 
##  364  365  366  367  368  369  370  371  372  373  374  375  376  377  378 
##    4    4    4    4    3    4    4    4    4    4    4    4    4    4    3 
##  379  380  381  382  383  384  385  386  388  389  390  391  392  393  394 
##    4    4    4    3    4    3    4    4    4    4    3    3    4    1    3 
##  395  396  397  398  399  400  401  402  403  404  405  406  407  408  409 
##    1    1    1    1    1    3    1    1    1    1    1    1    1    1    5 
##  410  411  412  413  414  415  416  417  418  419  420  421  422  423  424 
##    3    5    5    5    5    5    3    5    5    5    5    5    5    5    5 
##  425  426  427  428  429  430  431  432  433  434  435  436  437  438  439 
##    2    2    2    2    2    2    2    2    2    2    2    2    2    2    2 
##  440  441  442  443  444  445  446  447  448  449  450  451  452  453  454 
##    2    2    3    2    2    2    2    2    3    2    2    2    2    2    2 
##  455  456  457  458  459  460  461  462  463  465  466  467  468  469  470 
##    2    2    5    3    5    5    5    3    5    5    5    5    5    5    3 
##  471  472  473  474  475  476  477  478  479  480  481  482  484  485  486 
##    3    5    2    2    2    2    2    2    2    2    2    2    2    2    2 
##  487  488  489  490  491  492  493  494  495  496  497  498  500  501  502 
##    2    2    4    3    4    4    4    4    4    3    4    4    4    4    4 
##  503  504  524  525  526  527  528  529  530  531  532  533  534  535  536 
##    4    4    1    1    1    1    1    1    1    1    1    1    1    1    3 
##  537  538  539  540  541  542  543  544  545  546  547  548  549  550  551 
##    1    1    1    1    1    1    4    4    4    4    4    4    4    4    4 
##  552  553  554  555  556  557  558  559  560  561  562  563  564  565  566 
##    4    4    4    3    4    4    4    4    4    4    4    4    4    4    4 
##  567  568  569  570  571  572  573  574  575  576  577  578  579  580  581 
##    4    4    4    4    4    4    4    3    4    4    4    4    4    4    4 
##  582  583  584  585  586  587  588  589  590  591  592  593  594  595  596 
##    4    4    4    4    4    4    4    4    4    4    4    3    4    4    4 
##  597  598  599  600  601  602  603  604  605  606  607  608  609  610  611 
##    4    4    4    2    2    2    2    2    2    2    2    2    2    2    2 
##  612  613  614  615  616  617  618  619  620  621  622  623  624  625  626 
##    3    2    2    2    2    2    2    2    2    2    2    2    2    2    2 
##  627  628  629  630  631  632  633  634  635  636  637  638  639  640  641 
##    2    2    2    2    3    2    2    2    2    2    2    1    1    1    1 
##  642  643  644  645  646  647  648  649  650  651  652  653  654  655  656 
##    1    1    1    1    1    1    1    1    3    1    1    1    1    1    1 
##  657  658  659  660  661  662  663  664  665  666  667  668  669  670  671 
##    2    2    2    2    2    2    2    2    2    2    2    2    3    2    2 
##  672  673  674  675  676  677  678  679  680  681  682  683  684  685  686 
##    2    2    2    2    5    5    5    5    5    5    5    5    5    5    5 
##  687  688  689  690  691  692  693  694  695  696  697  698  699  700  701 
##    5    3    5    5    5    5    5    5    1    1    1    1    1    1    1 
##  702  703  704  705  706  707  708  709  710  711  712  713  714  715  716 
##    1    1    1    1    1    3    1    1    1    1    1    1    4    4    4 
##  717  718  719  720  721  722  723  724  725  726  727  728  729  730  731 
##    4    4    4    4    4    4    4    4    4    3    4    4    4    4    4 
##  732  733  734  735  736  737  738  739  741  742  743  744  745  746  747 
##    4    4    4    4    4    4    4    4    4    4    4    4    3    4    4 
##  748  749  750  751  752  753  755  756  757  758  759  760  761  762  763 
##    4    4    4    4    1    1    1    1    1    1    1    1    1    1    1 
##  764  765  766  767  768  769  770  771  772  773  774  775  776  777  778 
##    3    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
##  779  780  781  782  783  784  785  786  787  788  789  790  791  792  793 
##    1    1    1    1    3    1    1    1    1    1    1    4    4    4    4 
##  794  795  796  797  798  799  800  801  802  803  804  805  806  807  808 
##    4    4    4    4    4    4    4    4    3    4    4    4    4    4    4 
##  809  810  811  812  813  814  815  816  817  818  819  820  821  822  823 
##    2    2    2    2    2    2    2    2    2    2    2    2    2    2    2 
##  824  825  826  827  828  849  850  851  852  853  854  855  856  857  858 
##    2    2    2    2    2    1    1    1    1    1    1    1    1    1    1 
##  859  860  861  862  863  864  865  866  867  868  869  870  871  872  873 
##    3    1    1    1    1    1    1    1    1    1    3    4    4    3    4 
##  874  875  876  877  878  879  880  881  882  883  884  885  886  887  888 
##    4    4    4    4    4    5    5    5    5    5    5    5    5    5    5 
##  889  890  891  892  893  894  895  896  897  898  899  900  901  902  903 
##    5    5    5    5    5    5    5    5    5    5    2    2    2    2    2 
##  904  905  906  907  908  909  910  913  914  915  916  917  918  919  920 
##    2    2    2    2    2    1    1    3    1    1    1    1    1    4    4 
##  923  924  925  926  927  928  929  930  933  934  935  936  937  938  939 
##    4    4    4    4    4    4    1    1    3    3    1    1    1    1    1 
##  940  943  944  945  946  947  948  949  950  953  954  955  956  957  958 
##    1    3    3    1    1    1    1    5    5    3    3    5    5    5    5 
##  959  960  963  964  965  966  967  968  969  970  973  974  975  976  977 
##    5    5    5    5    5    5    5    5    2    2    2    2    2    2    2 
##  978  979  980  983  984  985  986  987  988  989  990  993  994  995  996 
##    2    5    5    5    5    5    5    5    5    4    4    3    3    5    4 
##  997  998  999 1000 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 
##    3    4    4    4    4    4    4    4    4    4    1    1    1    1    3 
## 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 
##    1    1    1    3    1    1    1    1    1    1    1    1    1    5    5 
## 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 
##    5    5    3    5    5    5    3    5    5    5    5    5    5    5    5 
## 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 
##    5    5    5    5    5    3    5    5    5    3    5    5    5    5    5 
## 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 
##    5    5    5    5    3    5    5    3    3    3    3    5    3    5    3 
## 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 
##    5    5    5    5    5    3    3    2    2    2    2    3    2    3    2 
## 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 
##    3    2    2    2    2    2    2    2    2    2    3    1    1    3    3 
## 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 
##    3    3    1    3    1    1    1    1    1    1    1    3    1    3    4 
## 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 
##    4    3    3    4    3    4    3    4    4    4    4    4    4    4    3 
## 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 
##    4    1    1    1    1    3    1    1    1    3    1    1    1    1    1 
## 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 
##    1    1    1    1    1    1    1    1    3    1    1    1    3    1    1 
## 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 
##    1    1    1    1    1    1    1    1    1    1    3    3    1    3    1 
## 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 
##    3    1    1    1    1    1    1    1    1    1    2    2    2    2    3 
## 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 
##    2    2    2    3    2    2    2    2    2    2    2    2    2    2    2 
## 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 
##    2    2    3    2    2    2    2    2    2    2    2    2    2    2    2 
## 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 
##    2    2    2    2    2    3    2    2    2    2    2    2    2    2    2 
## 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 
##    2    2    2    2    3    1    1    3    3    3    3    1    3    1    1 
## 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 
##    1    1    1    1    1    3    1    5    5    5    5    3    5    5    5 
## 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 
##    3    5    5    5    5    5    5    5    5    5    3    4    4    3    3 
## 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 
##    3    3    4    3    4    4    4    4    4    4    4    3    4    1    1 
## 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 
##    1    1    3    1    3    1    3    1    1    1    1    1    1    1    1 
## 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 
##    1    1    1    1    3    3    1    3    1    3    1    1    1    1    1 
## 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 
##    1    1    1    1    3    5    5    5    5    3    5    3    3    3    5 
## 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 
##    5    5    5    5    5    3    3    3    2    2    2    2    2    2    3 
## 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 
##    3    3    2    2    2    2    2    2    3    3    3    4    4    4    4 
## 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 
##    3    4    3    3    3    4    4    4    4    4    4    3    3    3    1 
## 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 
##    1    1    1    1    1    3    3    3    1    1    1    1    1    1    3 
## 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 
##    3    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
## 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 
##    1    1    1    1    6    6    6    6    6    6    6    6    6    6    6 
## 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 
##    6    6    6    6    6    6    6    3    1    1    1    1    1    1    3 
## 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 
##    3    3    1    1    1    1    1    1    3    3    2    6    6    6    6 
## 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 
##    6    6    2    2    6    6    6    6    6    6    6    6    6    6    6 
## 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 
##    6    6    6    6    6    6    6    6    6    6    6    6    6    6    6 
## 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 
##    6    3    5    5    5    2    3    5    3    3    3    5    5    5    2 
## 1509 1510 1511 1512 1513 1514 1515 1517 1518 1519 1520 1521 1522 1523 1524 
##    5    5    3    3    3    4    4    3    3    4    3    3    3    4    4 
## 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 
##    4    3    4    4    3    3    3    4    4    4    4    4    4    3    3 
## 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 
##    3    4    4    4    4    4    4    3    3    3    4    4    4    4    4 
## 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 
##    4    3    3    3    4    4    4    4    4    4    4    4    3    4    4 
## 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 
##    4    4    4    4    3    3    3    4    4    4    4    4    4    4    4 
## 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 
##    3    4    4    4    4    3    4    3    3    3    4    4    4    4    4 
## 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 
##    4    3    3    3    5    5    5    3    3    5    3    3    3    5    5 
## 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 
##    5    3    5    5    3    3    3    1    1    1    1    3    1    3    3 
## 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 
##    3    1    1    1    1    1    1    3    3    3    5    5    5    5    3 
## 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 
##    5    3    3    3    5    5    5    5    5    5    3    3    5    5    5 
## 1660 1661 1662 1663 1665 1666 1667 1668 1669 1670 1671 1672 1673 1675 1676 
##    5    5    5    5    5    5    6    6    6    6    6    6    6    6    6 
## 1687 1688 1689 1690 1691 1692 1693 1695 1697 1698 1699 1700 1701 1702 1703 
##    2    2    2    2    2    2    2    2    6    6    6    6    6    6    6 
## 1705 1706 1707 1708 1709 1710 1711 1712 1713 1715 1717 1718 1719 1720 1721 
##    6    6    5    5    5    5    5    5    5    5    6    6    6    6    6 
## 1722 1723 1725 1726 1727 1728 1729 1730 1731 1732 1733 1735 1736 1737 1738 
##    6    6    6    6    4    4    4    4    4    4    4    4    4    4    4 
## 1739 1740 1741 1742 1743 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 
##    4    4    4    4    4    4    4    2    2    2    2    2    2    2    2 
## 1755 1757 1758 1759 1760 1761 1762 1764 1765 1767 1768 1769 1770 1771 1772 
##    2    4    4    4    4    4    4    4    4    2    2    2    2    2    2 
## 1773 1774 1775 1777 1778 1779 1780 1781 1782 1783 1784 1785 1787 1788 1789 
##    2    2    2    4    4    4    4    4    4    4    4    4    4    4    4 
## 1790 1791 1792 1793 1794 1795 1797 1798 1799 1800 1801 1802 1803 1804 1805 
##    4    4    4    4    4    4    6    6    6    6    6    6    6    6    6 
## 1807 1808 1809 1810 1811 1812 1813 1814 1815 1827 1828 1829 1830 1831 1832 
##    6    6    6    6    6    6    6    6    6    5    5    5    5    5    5 
## 1833 1834 1835 1837 1838 1839 1841 1843 1844 1845 
##    5    5    5    5    5    5    5    5    5    5 
## 
## Within cluster sum of squares by cluster:
## [1] 13189.664 13770.390 10507.177 13881.229 12304.262  8764.308
##  (between_SS / total_SS =  46.5 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"

table(df3$`Opinion cita`, m1$cluster)

##     
##        1   2   3   4   5   6
##   1    0   0   1   9   0   9
##   2    0  15   2  22   8   0
##   3    6  18  13  79  16   0
##   4   59  17  12   9   9   9
##   5  119  27  97 125 122   9
##   6   74  90  44  50  84  38
##   7   52 133  31  78  43  43
##   8   18   0  10   8   0  28
##   9    0  10  10   4  20   0
##   10   0   0   0   0   0  10

Representamos en grafica los resultados…

plot(df3[c("Cantidad citas por semana", "Opinion cita")], col = m1$cluster)
points(m1$centers[,c("Cantidad citas por semana", "Opinion cita")], col = 1:3,pch = 16, cex=2)

aggregate(df3,by=list(m1$cluster),FUN=mean)

##   Group.1    Genero Edad de la pareja Misma raza Nota cita     Edad
## 1       1 0.5213415          24.80488  0.3993902  5.839939 29.99390
## 2       2 0.3774194          24.71290  0.4774194  6.490323 23.92903
## 3       3 0.5318182          30.98636  0.4227273  6.459091 26.26364
## 4       4 0.6328125          23.93490  0.3281250  5.966146 23.22396
## 5       5 0.3543046          24.36424  0.3046358  6.084437 23.71192
## 6       6 0.7465753          23.78767  0.4452055  6.664384 24.49315
##   Importancia misma raza Motivo cita  Carrera Cuanto sales
## 1               2.503049    2.893293 4.070122     1.954268
## 2               5.838710    1.529032 5.525806     2.048387
## 3               3.927273    2.100000 4.363636     2.000000
## 4               3.309896    2.424479 2.023438     2.109375
## 5               4.334437    2.149007 8.672185     2.354305
## 6               2.006849    2.034247 4.773973     1.760274
##   Cantidad citas por semana Opinion cita    Estima
## 1                  5.442073     5.490854  4.512195
## 2                  4.880645     5.961290  9.393548
## 3                  5.263636     5.581818  3.459091
## 4                  4.986979     4.940104  2.661458
## 5                  5.427152     5.612583  2.715232
## 6                  3.013699     6.458904 17.397260

plot(df3[c(“Cantidad citas por semana”, “Opinion cita”)], col = m1\(cluster) points(m1\)centers[,c(“Cantidad citas por semana”, “Opinion cita”)], col = 1:3,pch = 16, cex=2)

Como es obvio y ademas el grafico nos lo corrobora, a mayor cantidad de citas la opinion no varia desmesuradamente, sino que la media de las notas esta estancada en torno al 6 cuando hay muchas citas. En cambio, la opinion en la primera cita esta mejor valorada ya que la primera cita es especial y la gente se esfuerza mas por que salga bien.

Vamos a anadir una columna al dataset que nos revele al final si la relacion tendra exito o no.

Resumen de df2

table (df2$Genero)

## 
## Hombre  Mujer 
##   4194   4184

table (df2$`Misma raza`)

## 
##   No   Si 
## 5062 3316

summary(df2$`Nota cita`)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    5.00    6.00    6.19    8.00   10.50     212

Importancia de que la persona con la que quedas sea de la misma raza.

summary(df2$`Importancia misma raza`)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   1.000   3.000   3.785   6.000  10.000      79

Eliminar las “,” e interpretar los puntos como “,”.

df$income <- as.numeric(as.character(gsub(",", "", df$income)))

Vamos a crear un nuevo dataframe con 3 variables nuevas a estudiar y predecir;

“dec”, “dec_0”, y “match”.

df4<- df %>% select("iid","match","dec","dec_o","attr_o","career_c","exphappy")
names(df4) <- c("iid","Partido","Personas que eligieron","Personas que les eligieron","Nota cita","Carrera","Opinion cita")

# Una vez con el nuevo dataframe, vamos a proceder a emplear los datos para predecir como ser??
# la evoluci??n.
# El nuevo dataframe se llamar?? "pareja". 
pareja <- data.frame()

Creamos el dataframe con las variables que vamos a estudiar.

for(Carrera in unique(df4$`Opinion cita`)) {
  Partido <- nrow(df4[df4$`Opinion cita`==Carrera & df4$Partido==1,])
  No.partido <- nrow(df4[df4$`Opinion cita`==Carrera & df4$Partido==0,])
  pareja <- rbind(pareja, c(Carrera, Partido, No.partido, Partido/No.partido))}
names(pareja) <- c("Carrera", "Partidos", "Fail Partidos", "Ratio")
pareja$Exito <- as.factor(ifelse(pareja$`Fail Partidos` >1000, "No", "Si"))

Carreras.

{pareja$Carrera[pareja$Carrera ==1] <- "Derecho"
pareja$Carrera[pareja$Carrera ==2] <- "Matematicas"
pareja$Carrera[pareja$Carrera ==3] <- "Psicologia"
pareja$Carrera[pareja$Carrera ==4] <- "Farmacia"
pareja$Carrera[pareja$Carrera ==5] <- "Ingenieria"
pareja$Carrera[pareja$Carrera ==6] <- "Periodismo"
pareja$Carrera[pareja$Carrera ==7] <- "Historia, religion, filosofia"
pareja$Carrera[pareja$Carrera ==8] <- "Economia"
pareja$Carrera[pareja$Carrera ==9] <- "Educacion"
pareja$Carrera[pareja$Carrera ==10] <- "Fisica y quimica"
pareja$Carrera[pareja$Carrera ==11] <- "Trabajo social"
pareja$Carrera[pareja$Carrera ==12] <- "Pregrado"
pareja$Carrera[pareja$Carrera ==13] <- "Ciencias politicas"
pareja$Carrera[pareja$Carrera ==14] <- "Pelicula"
pareja$Carrera[pareja$Carrera ==15] <- "Bellas artes"
pareja$Carrera[pareja$Carrera ==16] <- "Idiomas"
pareja$Carrera[pareja$Carrera ==17] <- "Arquitectura"
pareja$Carrera[pareja$Carrera ==18] <- "Otro"}

10 mayores y 10 inferiores

head(pareja[order(pareja[,4], decreasing=T),], 10)

##                          Carrera Partidos Fail Partidos     Ratio Exito
## 8               Fisica y quimica       42           184 0.2282609    Si
## 5                     Periodismo      366          1738 0.2105869    No
## 11                     Educacion       55           262 0.2099237    Si
## 6                       Economia      106           508 0.2086614    Si
## 4  Historia, religion, filosofia      261          1315 0.1984791    No
## 9                           <NA>     1380          6998 0.1971992    No
## 7                     Ingenieria      348          1786 0.1948488    No
## 10                   Matematicas       61           337 0.1810089    Si
## 2                       Farmacia      136           758 0.1794195    Si
## 1                     Psicologia      118           689 0.1712627    Si

tail(pareja[order(pareja[,4], decreasing=T),], 10)

##                          Carrera Partidos Fail Partidos     Ratio Exito
## 5                     Periodismo      366          1738 0.2105869    No
## 11                     Educacion       55           262 0.2099237    Si
## 6                       Economia      106           508 0.2086614    Si
## 4  Historia, religion, filosofia      261          1315 0.1984791    No
## 9                           <NA>     1380          6998 0.1971992    No
## 7                     Ingenieria      348          1786 0.1948488    No
## 10                   Matematicas       61           337 0.1810089    Si
## 2                       Farmacia      136           758 0.1794195    Si
## 1                     Psicologia      118           689 0.1712627    Si
## 3                        Derecho       31           186 0.1666667    Si

Dibujamos grafica que muestre cantidad de exito y fracaso en las parejas.

Como es de esperar, hay m??s fracaso que exito en general.

boxplot(pareja[pareja$Ratio<quantile(pareja$Ratio, 0.25),"Partidos"], 
        pareja[pareja$Ratio>quantile(pareja$Ratio, 0.75),"Partidos"], 
        pareja[pareja$Ratio<quantile(pareja$Ratio, 0.25),"Fail Partidos"], 
        pareja[pareja$Ratio>quantile(pareja$Ratio, 0.75),"Fail Partidos"], 
        col=c("slategrey", "pink", "slategray", "pink"), xlab=c("Partidos            Fail Partidos"))
legend("topleft", legend=c("Fracaso", "Exito"), fill=c("pink", "slategray"), cex=1.5)

###########################################

Ahora vamos a ver y a ilustrar la relacion entre las variables

genero, partido y importancia de la raza. Vemos que el genero

marca la diferencia entre la importancia de la misma raza y

los partidos de cada genero.

df5<- df %>% select("match","gender","samerace")
names(df5) <- c("Partido", "Genero", "Misma raza")
library(flexclust)

## Loading required package: grid

## Loading required package: lattice

## Loading required package: modeltools

## Loading required package: stats4

# data("volunteers")
df5 <- as.matrix(df5)

set.seed(1)
model <- kcca(df5, k = 2, save.data = TRUE, family = kccaFamily("ejaccard"))

Realizacion de grafico

df.pca <- prcomp(df5)
plot(model, data = df5, project = df.pca, main = "Cluster")

Perfil de segmento

barchart(model, strip.prefix = "#", shade = TRUE, layout = c(model@k, 1), main = "Clusters")

ML JOB2: Speed datting data

David Presencio Rodríguez

8/12/2018