Ejercicio de regresión lineal multiple: El objetivo es utulizar el set de datos de “Prestige” , el cual presenta los resultados de un estudio realizado en Canadá sobre el prestigio de las profesiones.
En este caso práctico trabajaremos con el set de datos de “Prestige” disponible en la librería car del paquete carData.
El dataset Prestige esta compuesto por 102 observaciones y 6 columnas. Las observaciones son ocupaciones. Este dataset contiene las siguientes columnas:
education: Educación media de los titulares ocupacionales.
income: Ingreso promedio en dólares.
women: Porcentaje de mujeres por ocupación.
Prestige: Prestigio de la ocupación, resultado de una encuesta social realizada a mediados de la década de 1960.
census: Código ocupacional del censo canadiense.
type: Tipo de ocupación. Un factor con niveles: bc, Blue Collar; prof, Professional, Managerial, and Technical; wc, White Collar.
# Usaremos los datos de Prestige disponibles en la libreria car
library(car)
## Loading required package: carData
# Cargamos los datos
data(Prestige)
# vemos los primeros 5 registros
head(Prestige)
## education income women prestige census type
## gov.administrators 13.11 12351 11.16 68.8 1113 prof
## general.managers 12.26 25879 4.02 69.1 1130 prof
## accountants 12.77 9271 15.70 63.4 1171 prof
## purchasing.officers 11.42 8865 9.11 56.8 1175 prof
## chemists 14.62 8403 11.68 73.5 2111 prof
## physicists 15.64 11030 5.13 77.6 2113 prof
# Vemos la estructura
str(Prestige)
## 'data.frame': 102 obs. of 6 variables:
## $ education: num 13.1 12.3 12.8 11.4 14.6 ...
## $ income : int 12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
## $ women : num 11.16 4.02 15.7 9.11 11.68 ...
## $ prestige : num 68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
## $ census : int 1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
## $ type : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...
#Estadísticos principales
summary(Prestige)
## education income women prestige
## Min. : 6.380 Min. : 611 Min. : 0.000 Min. :14.80
## 1st Qu.: 8.445 1st Qu.: 4106 1st Qu.: 3.592 1st Qu.:35.23
## Median :10.540 Median : 5930 Median :13.600 Median :43.60
## Mean :10.738 Mean : 6798 Mean :28.979 Mean :46.83
## 3rd Qu.:12.648 3rd Qu.: 8187 3rd Qu.:52.203 3rd Qu.:59.27
## Max. :15.970 Max. :25879 Max. :97.510 Max. :87.20
## census type
## Min. :1113 bc :44
## 1st Qu.:3120 prof:31
## Median :5135 wc :23
## Mean :5402 NA's: 4
## 3rd Qu.:8312
## Max. :9517
# Nombre de las columnas
colnames(Prestige)
## [1] "education" "income" "women" "prestige" "census" "type"
# Tabla de la variable type
table(Prestige$type)
##
## bc prof wc
## 44 31 23
# Plot de la variable type
plot(Prestige$type)
Realizamos el procesamiento de transformación de la variable Type (categórica) a variables dummies
#install.packages("dummies")
library(dummies)
## dummies-1.5.6 provided by Decision Patterns
# Creación variables dummies
df = dummy.data.frame(Prestige)
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
# Veamos las dimensiones
dim(df)
## [1] 102 9
# Vemos los 5 primeros registros
head(df)
## education income women prestige census typebc typeprof
## gov.administrators 13.11 12351 11.16 68.8 1113 0 1
## general.managers 12.26 25879 4.02 69.1 1130 0 1
## accountants 12.77 9271 15.70 63.4 1171 0 1
## purchasing.officers 11.42 8865 9.11 56.8 1175 0 1
## chemists 14.62 8403 11.68 73.5 2111 0 1
## physicists 15.64 11030 5.13 77.6 2113 0 1
## typewc typeNA
## gov.administrators 0 0
## general.managers 0 0
## accountants 0 0
## purchasing.officers 0 0
## chemists 0 0
## physicists 0 0
# Vemos los 昼㹡ltimos 5 registros
tail(df)
## education income women prestige census typebc typeprof typewc
## train.engineers 8.49 8845 0.00 48.9 9131 1 0 0
## bus.drivers 7.58 5562 9.47 35.9 9171 1 0 0
## taxi.drivers 7.93 4224 3.59 25.1 9173 1 0 0
## longshoremen 8.37 4753 0.00 26.1 9313 1 0 0
## typesetters 10.00 6462 13.58 42.2 9511 1 0 0
## bookbinders 8.55 3617 70.87 35.2 9517 1 0 0
## typeNA
## train.engineers 0
## bus.drivers 0
## taxi.drivers 0
## longshoremen 0
## typesetters 0
## bookbinders 0
# Veamos todos los registros
df
## education income women prestige census typebc
## gov.administrators 13.11 12351 11.16 68.8 1113 0
## general.managers 12.26 25879 4.02 69.1 1130 0
## accountants 12.77 9271 15.70 63.4 1171 0
## purchasing.officers 11.42 8865 9.11 56.8 1175 0
## chemists 14.62 8403 11.68 73.5 2111 0
## physicists 15.64 11030 5.13 77.6 2113 0
## biologists 15.09 8258 25.65 72.6 2133 0
## architects 15.44 14163 2.69 78.1 2141 0
## civil.engineers 14.52 11377 1.03 73.1 2143 0
## mining.engineers 14.64 11023 0.94 68.8 2153 0
## surveyors 12.39 5902 1.91 62.0 2161 0
## draughtsmen 12.30 7059 7.83 60.0 2163 0
## computer.programers 13.83 8425 15.33 53.8 2183 0
## economists 14.44 8049 57.31 62.2 2311 0
## psychologists 14.36 7405 48.28 74.9 2315 0
## social.workers 14.21 6336 54.77 55.1 2331 0
## lawyers 15.77 19263 5.13 82.3 2343 0
## librarians 14.15 6112 77.10 58.1 2351 0
## vocational.counsellors 15.22 9593 34.89 58.3 2391 0
## ministers 14.50 4686 4.14 72.8 2511 0
## university.teachers 15.97 12480 19.59 84.6 2711 0
## primary.school.teachers 13.62 5648 83.78 59.6 2731 0
## secondary.school.teachers 15.08 8034 46.80 66.1 2733 0
## physicians 15.96 25308 10.56 87.2 3111 0
## veterinarians 15.94 14558 4.32 66.7 3115 0
## osteopaths.chiropractors 14.71 17498 6.91 68.4 3117 0
## nurses 12.46 4614 96.12 64.7 3131 0
## nursing.aides 9.45 3485 76.14 34.9 3135 1
## physio.therapsts 13.62 5092 82.66 72.1 3137 0
## pharmacists 15.21 10432 24.71 69.3 3151 0
## medical.technicians 12.79 5180 76.04 67.5 3156 0
## commercial.artists 11.09 6197 21.03 57.2 3314 0
## radio.tv.announcers 12.71 7562 11.15 57.6 3337 0
## athletes 11.44 8206 8.13 54.1 3373 0
## secretaries 11.59 4036 97.51 46.0 4111 0
## typists 11.49 3148 95.97 41.9 4113 0
## bookkeepers 11.32 4348 68.24 49.4 4131 0
## tellers.cashiers 10.64 2448 91.76 42.3 4133 0
## computer.operators 11.36 4330 75.92 47.7 4143 0
## shipping.clerks 9.17 4761 11.37 30.9 4153 0
## file.clerks 12.09 3016 83.19 32.7 4161 0
## receptionsts 11.04 2901 92.86 38.7 4171 0
## mail.carriers 9.22 5511 7.62 36.1 4172 0
## postal.clerks 10.07 3739 52.27 37.2 4173 0
## telephone.operators 10.51 3161 96.14 38.1 4175 0
## collectors 11.20 4741 47.06 29.4 4191 0
## claim.adjustors 11.13 5052 56.10 51.1 4192 0
## travel.clerks 11.43 6259 39.17 35.7 4193 0
## office.clerks 11.00 4075 63.23 35.6 4197 0
## sales.supervisors 9.84 7482 17.04 41.5 5130 0
## commercial.travellers 11.13 8780 3.16 40.2 5133 0
## sales.clerks 10.05 2594 67.82 26.5 5137 0
## newsboys 9.62 918 7.00 14.8 5143 0
## service.station.attendant 9.93 2370 3.69 23.3 5145 1
## insurance.agents 11.60 8131 13.09 47.3 5171 0
## real.estate.salesmen 11.09 6992 24.44 47.1 5172 0
## buyers 11.03 7956 23.88 51.1 5191 0
## firefighters 9.47 8895 0.00 43.5 6111 1
## policemen 10.93 8891 1.65 51.6 6112 1
## cooks 7.74 3116 52.00 29.7 6121 1
## bartenders 8.50 3930 15.51 20.2 6123 1
## funeral.directors 10.57 7869 6.01 54.9 6141 1
## babysitters 9.46 611 96.53 25.9 6147 0
## launderers 7.33 3000 69.31 20.8 6162 1
## janitors 7.11 3472 33.57 17.3 6191 1
## elevator.operators 7.58 3582 30.08 20.1 6193 1
## farmers 6.84 3643 3.60 44.1 7112 0
## farm.workers 8.60 1656 27.75 21.5 7182 1
## rotary.well.drillers 8.88 6860 0.00 35.3 7711 1
## bakers 7.54 4199 33.30 38.9 8213 1
## slaughterers.1 7.64 5134 17.26 25.2 8215 1
## slaughterers.2 7.64 5134 17.26 34.8 8215 1
## canners 7.42 1890 72.24 23.2 8221 1
## textile.weavers 6.69 4443 31.36 33.3 8267 1
## textile.labourers 6.74 3485 39.48 28.8 8278 1
## tool.die.makers 10.09 8043 1.50 42.5 8311 1
## machinists 8.81 6686 4.28 44.2 8313 1
## sheet.metal.workers 8.40 6565 2.30 35.9 8333 1
## welders 7.92 6477 5.17 41.8 8335 1
## auto.workers 8.43 5811 13.62 35.9 8513 1
## aircraft.workers 8.78 6573 5.78 43.7 8515 1
## electronic.workers 8.76 3942 74.54 50.8 8534 1
## radio.tv.repairmen 10.29 5449 2.92 37.2 8537 1
## sewing.mach.operators 6.38 2847 90.67 28.2 8563 1
## auto.repairmen 8.10 5795 0.81 38.1 8581 1
## aircraft.repairmen 10.10 7716 0.78 50.3 8582 1
## railway.sectionmen 6.67 4696 0.00 27.3 8715 1
## electrical.linemen 9.05 8316 1.34 40.9 8731 1
## electricians 9.93 7147 0.99 50.2 8733 1
## construction.foremen 8.24 8880 0.65 51.1 8780 1
## carpenters 6.92 5299 0.56 38.9 8781 1
## masons 6.60 5959 0.52 36.2 8782 1
## house.painters 7.81 4549 2.46 29.9 8785 1
## plumbers 8.33 6928 0.61 42.9 8791 1
## construction.labourers 7.52 3910 1.09 26.5 8798 1
## pilots 12.27 14032 0.58 66.1 9111 0
## train.engineers 8.49 8845 0.00 48.9 9131 1
## bus.drivers 7.58 5562 9.47 35.9 9171 1
## taxi.drivers 7.93 4224 3.59 25.1 9173 1
## longshoremen 8.37 4753 0.00 26.1 9313 1
## typesetters 10.00 6462 13.58 42.2 9511 1
## bookbinders 8.55 3617 70.87 35.2 9517 1
## typeprof typewc typeNA
## gov.administrators 1 0 0
## general.managers 1 0 0
## accountants 1 0 0
## purchasing.officers 1 0 0
## chemists 1 0 0
## physicists 1 0 0
## biologists 1 0 0
## architects 1 0 0
## civil.engineers 1 0 0
## mining.engineers 1 0 0
## surveyors 1 0 0
## draughtsmen 1 0 0
## computer.programers 1 0 0
## economists 1 0 0
## psychologists 1 0 0
## social.workers 1 0 0
## lawyers 1 0 0
## librarians 1 0 0
## vocational.counsellors 1 0 0
## ministers 1 0 0
## university.teachers 1 0 0
## primary.school.teachers 1 0 0
## secondary.school.teachers 1 0 0
## physicians 1 0 0
## veterinarians 1 0 0
## osteopaths.chiropractors 1 0 0
## nurses 1 0 0
## nursing.aides 0 0 0
## physio.therapsts 1 0 0
## pharmacists 1 0 0
## medical.technicians 0 1 0
## commercial.artists 1 0 0
## radio.tv.announcers 0 1 0
## athletes 0 0 1
## secretaries 0 1 0
## typists 0 1 0
## bookkeepers 0 1 0
## tellers.cashiers 0 1 0
## computer.operators 0 1 0
## shipping.clerks 0 1 0
## file.clerks 0 1 0
## receptionsts 0 1 0
## mail.carriers 0 1 0
## postal.clerks 0 1 0
## telephone.operators 0 1 0
## collectors 0 1 0
## claim.adjustors 0 1 0
## travel.clerks 0 1 0
## office.clerks 0 1 0
## sales.supervisors 0 1 0
## commercial.travellers 0 1 0
## sales.clerks 0 1 0
## newsboys 0 0 1
## service.station.attendant 0 0 0
## insurance.agents 0 1 0
## real.estate.salesmen 0 1 0
## buyers 0 1 0
## firefighters 0 0 0
## policemen 0 0 0
## cooks 0 0 0
## bartenders 0 0 0
## funeral.directors 0 0 0
## babysitters 0 0 1
## launderers 0 0 0
## janitors 0 0 0
## elevator.operators 0 0 0
## farmers 0 0 1
## farm.workers 0 0 0
## rotary.well.drillers 0 0 0
## bakers 0 0 0
## slaughterers.1 0 0 0
## slaughterers.2 0 0 0
## canners 0 0 0
## textile.weavers 0 0 0
## textile.labourers 0 0 0
## tool.die.makers 0 0 0
## machinists 0 0 0
## sheet.metal.workers 0 0 0
## welders 0 0 0
## auto.workers 0 0 0
## aircraft.workers 0 0 0
## electronic.workers 0 0 0
## radio.tv.repairmen 0 0 0
## sewing.mach.operators 0 0 0
## auto.repairmen 0 0 0
## aircraft.repairmen 0 0 0
## railway.sectionmen 0 0 0
## electrical.linemen 0 0 0
## electricians 0 0 0
## construction.foremen 0 0 0
## carpenters 0 0 0
## masons 0 0 0
## house.painters 0 0 0
## plumbers 0 0 0
## construction.labourers 0 0 0
## pilots 1 0 0
## train.engineers 0 0 0
## bus.drivers 0 0 0
## taxi.drivers 0 0 0
## longshoremen 0 0 0
## typesetters 0 0 0
## bookbinders 0 0 0
Análisis final
Al revisar la salida anterior vemos que se crearon 3 variables más, pasando de 6 a 9 variables, es decir, se crearon: typebc, typeprof, typewc y typeNA, si revisamos esta ultima vemos que tiene 4 observaciones con 1, lo que corresponde a los 4 NA; la función dummy crea estás nuevas variables para las categóricas y rellena de ceros (0) y unos (1); incidando con el 1 la observación que contiene la información, de esta manera se realiza la transformación y se puede continuar con el proceso.
Nota: en los modelos de regresión y clasificación proceso de dummies lo realizan automáticamente los algoritmos.