Ejercicio de regresión lineal multiple: El objetivo es utulizar el set de datos de “Prestige” , el cual presenta los resultados de un estudio realizado en Canadá sobre el prestigio de las profesiones.

Prestigio de las ocupaciones canadienses

En este caso práctico trabajaremos con el set de datos de “Prestige” disponible en la librería car del paquete carData.

Descripción

  • El dataset Prestige esta compuesto por 102 observaciones y 6 columnas. Las observaciones son ocupaciones. Este dataset contiene las siguientes columnas:

  • education: Educación media de los titulares ocupacionales.

  • income: Ingreso promedio en dólares.

  • women: Porcentaje de mujeres por ocupación.

  • Prestige: Prestigio de la ocupación, resultado de una encuesta social realizada a mediados de la década de 1960.

  • census: Código ocupacional del censo canadiense.

  • type: Tipo de ocupación. Un factor con niveles: bc, Blue Collar; prof, Professional, Managerial, and Technical; wc, White Collar.

2.1. Cargamos los datos

# Usaremos los datos de Prestige disponibles en la libreria car
library(car)
## Loading required package: carData
# Cargamos los datos
data(Prestige)
# vemos los primeros 5 registros
head(Prestige)
##                     education income women prestige census type
## gov.administrators      13.11  12351 11.16     68.8   1113 prof
## general.managers        12.26  25879  4.02     69.1   1130 prof
## accountants             12.77   9271 15.70     63.4   1171 prof
## purchasing.officers     11.42   8865  9.11     56.8   1175 prof
## chemists                14.62   8403 11.68     73.5   2111 prof
## physicists              15.64  11030  5.13     77.6   2113 prof

2.2. Exploramos las principales características de los datos

# Vemos la estructura
str(Prestige)
## 'data.frame':    102 obs. of  6 variables:
##  $ education: num  13.1 12.3 12.8 11.4 14.6 ...
##  $ income   : int  12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
##  $ women    : num  11.16 4.02 15.7 9.11 11.68 ...
##  $ prestige : num  68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
##  $ census   : int  1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
##  $ type     : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...
#Estadísticos principales
summary(Prestige)
##    education          income          women           prestige    
##  Min.   : 6.380   Min.   :  611   Min.   : 0.000   Min.   :14.80  
##  1st Qu.: 8.445   1st Qu.: 4106   1st Qu.: 3.592   1st Qu.:35.23  
##  Median :10.540   Median : 5930   Median :13.600   Median :43.60  
##  Mean   :10.738   Mean   : 6798   Mean   :28.979   Mean   :46.83  
##  3rd Qu.:12.648   3rd Qu.: 8187   3rd Qu.:52.203   3rd Qu.:59.27  
##  Max.   :15.970   Max.   :25879   Max.   :97.510   Max.   :87.20  
##      census       type   
##  Min.   :1113   bc  :44  
##  1st Qu.:3120   prof:31  
##  Median :5135   wc  :23  
##  Mean   :5402   NA's: 4  
##  3rd Qu.:8312            
##  Max.   :9517
# Nombre de las columnas
colnames(Prestige)
## [1] "education" "income"    "women"     "prestige"  "census"    "type"
# Tabla de la variable type
table(Prestige$type)
## 
##   bc prof   wc 
##   44   31   23
# Plot de la variable type
plot(Prestige$type)

Vemos que esto suma 98, es decir que hay 4 registros NA

2.3. Variables Dummies

Realizamos el procesamiento de transformación de la variable Type (categórica) a variables dummies

#install.packages("dummies")
library(dummies)
## dummies-1.5.6 provided by Decision Patterns
# Creación variables dummies
df = dummy.data.frame(Prestige)
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
# Veamos las dimensiones
dim(df)
## [1] 102   9
# Vemos los 5 primeros registros
head(df)
##                     education income women prestige census typebc typeprof
## gov.administrators      13.11  12351 11.16     68.8   1113      0        1
## general.managers        12.26  25879  4.02     69.1   1130      0        1
## accountants             12.77   9271 15.70     63.4   1171      0        1
## purchasing.officers     11.42   8865  9.11     56.8   1175      0        1
## chemists                14.62   8403 11.68     73.5   2111      0        1
## physicists              15.64  11030  5.13     77.6   2113      0        1
##                     typewc typeNA
## gov.administrators       0      0
## general.managers         0      0
## accountants              0      0
## purchasing.officers      0      0
## chemists                 0      0
## physicists               0      0
# Vemos los 昼㹡ltimos 5 registros
tail(df)
##                 education income women prestige census typebc typeprof typewc
## train.engineers      8.49   8845  0.00     48.9   9131      1        0      0
## bus.drivers          7.58   5562  9.47     35.9   9171      1        0      0
## taxi.drivers         7.93   4224  3.59     25.1   9173      1        0      0
## longshoremen         8.37   4753  0.00     26.1   9313      1        0      0
## typesetters         10.00   6462 13.58     42.2   9511      1        0      0
## bookbinders          8.55   3617 70.87     35.2   9517      1        0      0
##                 typeNA
## train.engineers      0
## bus.drivers          0
## taxi.drivers         0
## longshoremen         0
## typesetters          0
## bookbinders          0
# Veamos todos los registros
df
##                           education income women prestige census typebc
## gov.administrators            13.11  12351 11.16     68.8   1113      0
## general.managers              12.26  25879  4.02     69.1   1130      0
## accountants                   12.77   9271 15.70     63.4   1171      0
## purchasing.officers           11.42   8865  9.11     56.8   1175      0
## chemists                      14.62   8403 11.68     73.5   2111      0
## physicists                    15.64  11030  5.13     77.6   2113      0
## biologists                    15.09   8258 25.65     72.6   2133      0
## architects                    15.44  14163  2.69     78.1   2141      0
## civil.engineers               14.52  11377  1.03     73.1   2143      0
## mining.engineers              14.64  11023  0.94     68.8   2153      0
## surveyors                     12.39   5902  1.91     62.0   2161      0
## draughtsmen                   12.30   7059  7.83     60.0   2163      0
## computer.programers           13.83   8425 15.33     53.8   2183      0
## economists                    14.44   8049 57.31     62.2   2311      0
## psychologists                 14.36   7405 48.28     74.9   2315      0
## social.workers                14.21   6336 54.77     55.1   2331      0
## lawyers                       15.77  19263  5.13     82.3   2343      0
## librarians                    14.15   6112 77.10     58.1   2351      0
## vocational.counsellors        15.22   9593 34.89     58.3   2391      0
## ministers                     14.50   4686  4.14     72.8   2511      0
## university.teachers           15.97  12480 19.59     84.6   2711      0
## primary.school.teachers       13.62   5648 83.78     59.6   2731      0
## secondary.school.teachers     15.08   8034 46.80     66.1   2733      0
## physicians                    15.96  25308 10.56     87.2   3111      0
## veterinarians                 15.94  14558  4.32     66.7   3115      0
## osteopaths.chiropractors      14.71  17498  6.91     68.4   3117      0
## nurses                        12.46   4614 96.12     64.7   3131      0
## nursing.aides                  9.45   3485 76.14     34.9   3135      1
## physio.therapsts              13.62   5092 82.66     72.1   3137      0
## pharmacists                   15.21  10432 24.71     69.3   3151      0
## medical.technicians           12.79   5180 76.04     67.5   3156      0
## commercial.artists            11.09   6197 21.03     57.2   3314      0
## radio.tv.announcers           12.71   7562 11.15     57.6   3337      0
## athletes                      11.44   8206  8.13     54.1   3373      0
## secretaries                   11.59   4036 97.51     46.0   4111      0
## typists                       11.49   3148 95.97     41.9   4113      0
## bookkeepers                   11.32   4348 68.24     49.4   4131      0
## tellers.cashiers              10.64   2448 91.76     42.3   4133      0
## computer.operators            11.36   4330 75.92     47.7   4143      0
## shipping.clerks                9.17   4761 11.37     30.9   4153      0
## file.clerks                   12.09   3016 83.19     32.7   4161      0
## receptionsts                  11.04   2901 92.86     38.7   4171      0
## mail.carriers                  9.22   5511  7.62     36.1   4172      0
## postal.clerks                 10.07   3739 52.27     37.2   4173      0
## telephone.operators           10.51   3161 96.14     38.1   4175      0
## collectors                    11.20   4741 47.06     29.4   4191      0
## claim.adjustors               11.13   5052 56.10     51.1   4192      0
## travel.clerks                 11.43   6259 39.17     35.7   4193      0
## office.clerks                 11.00   4075 63.23     35.6   4197      0
## sales.supervisors              9.84   7482 17.04     41.5   5130      0
## commercial.travellers         11.13   8780  3.16     40.2   5133      0
## sales.clerks                  10.05   2594 67.82     26.5   5137      0
## newsboys                       9.62    918  7.00     14.8   5143      0
## service.station.attendant      9.93   2370  3.69     23.3   5145      1
## insurance.agents              11.60   8131 13.09     47.3   5171      0
## real.estate.salesmen          11.09   6992 24.44     47.1   5172      0
## buyers                        11.03   7956 23.88     51.1   5191      0
## firefighters                   9.47   8895  0.00     43.5   6111      1
## policemen                     10.93   8891  1.65     51.6   6112      1
## cooks                          7.74   3116 52.00     29.7   6121      1
## bartenders                     8.50   3930 15.51     20.2   6123      1
## funeral.directors             10.57   7869  6.01     54.9   6141      1
## babysitters                    9.46    611 96.53     25.9   6147      0
## launderers                     7.33   3000 69.31     20.8   6162      1
## janitors                       7.11   3472 33.57     17.3   6191      1
## elevator.operators             7.58   3582 30.08     20.1   6193      1
## farmers                        6.84   3643  3.60     44.1   7112      0
## farm.workers                   8.60   1656 27.75     21.5   7182      1
## rotary.well.drillers           8.88   6860  0.00     35.3   7711      1
## bakers                         7.54   4199 33.30     38.9   8213      1
## slaughterers.1                 7.64   5134 17.26     25.2   8215      1
## slaughterers.2                 7.64   5134 17.26     34.8   8215      1
## canners                        7.42   1890 72.24     23.2   8221      1
## textile.weavers                6.69   4443 31.36     33.3   8267      1
## textile.labourers              6.74   3485 39.48     28.8   8278      1
## tool.die.makers               10.09   8043  1.50     42.5   8311      1
## machinists                     8.81   6686  4.28     44.2   8313      1
## sheet.metal.workers            8.40   6565  2.30     35.9   8333      1
## welders                        7.92   6477  5.17     41.8   8335      1
## auto.workers                   8.43   5811 13.62     35.9   8513      1
## aircraft.workers               8.78   6573  5.78     43.7   8515      1
## electronic.workers             8.76   3942 74.54     50.8   8534      1
## radio.tv.repairmen            10.29   5449  2.92     37.2   8537      1
## sewing.mach.operators          6.38   2847 90.67     28.2   8563      1
## auto.repairmen                 8.10   5795  0.81     38.1   8581      1
## aircraft.repairmen            10.10   7716  0.78     50.3   8582      1
## railway.sectionmen             6.67   4696  0.00     27.3   8715      1
## electrical.linemen             9.05   8316  1.34     40.9   8731      1
## electricians                   9.93   7147  0.99     50.2   8733      1
## construction.foremen           8.24   8880  0.65     51.1   8780      1
## carpenters                     6.92   5299  0.56     38.9   8781      1
## masons                         6.60   5959  0.52     36.2   8782      1
## house.painters                 7.81   4549  2.46     29.9   8785      1
## plumbers                       8.33   6928  0.61     42.9   8791      1
## construction.labourers         7.52   3910  1.09     26.5   8798      1
## pilots                        12.27  14032  0.58     66.1   9111      0
## train.engineers                8.49   8845  0.00     48.9   9131      1
## bus.drivers                    7.58   5562  9.47     35.9   9171      1
## taxi.drivers                   7.93   4224  3.59     25.1   9173      1
## longshoremen                   8.37   4753  0.00     26.1   9313      1
## typesetters                   10.00   6462 13.58     42.2   9511      1
## bookbinders                    8.55   3617 70.87     35.2   9517      1
##                           typeprof typewc typeNA
## gov.administrators               1      0      0
## general.managers                 1      0      0
## accountants                      1      0      0
## purchasing.officers              1      0      0
## chemists                         1      0      0
## physicists                       1      0      0
## biologists                       1      0      0
## architects                       1      0      0
## civil.engineers                  1      0      0
## mining.engineers                 1      0      0
## surveyors                        1      0      0
## draughtsmen                      1      0      0
## computer.programers              1      0      0
## economists                       1      0      0
## psychologists                    1      0      0
## social.workers                   1      0      0
## lawyers                          1      0      0
## librarians                       1      0      0
## vocational.counsellors           1      0      0
## ministers                        1      0      0
## university.teachers              1      0      0
## primary.school.teachers          1      0      0
## secondary.school.teachers        1      0      0
## physicians                       1      0      0
## veterinarians                    1      0      0
## osteopaths.chiropractors         1      0      0
## nurses                           1      0      0
## nursing.aides                    0      0      0
## physio.therapsts                 1      0      0
## pharmacists                      1      0      0
## medical.technicians              0      1      0
## commercial.artists               1      0      0
## radio.tv.announcers              0      1      0
## athletes                         0      0      1
## secretaries                      0      1      0
## typists                          0      1      0
## bookkeepers                      0      1      0
## tellers.cashiers                 0      1      0
## computer.operators               0      1      0
## shipping.clerks                  0      1      0
## file.clerks                      0      1      0
## receptionsts                     0      1      0
## mail.carriers                    0      1      0
## postal.clerks                    0      1      0
## telephone.operators              0      1      0
## collectors                       0      1      0
## claim.adjustors                  0      1      0
## travel.clerks                    0      1      0
## office.clerks                    0      1      0
## sales.supervisors                0      1      0
## commercial.travellers            0      1      0
## sales.clerks                     0      1      0
## newsboys                         0      0      1
## service.station.attendant        0      0      0
## insurance.agents                 0      1      0
## real.estate.salesmen             0      1      0
## buyers                           0      1      0
## firefighters                     0      0      0
## policemen                        0      0      0
## cooks                            0      0      0
## bartenders                       0      0      0
## funeral.directors                0      0      0
## babysitters                      0      0      1
## launderers                       0      0      0
## janitors                         0      0      0
## elevator.operators               0      0      0
## farmers                          0      0      1
## farm.workers                     0      0      0
## rotary.well.drillers             0      0      0
## bakers                           0      0      0
## slaughterers.1                   0      0      0
## slaughterers.2                   0      0      0
## canners                          0      0      0
## textile.weavers                  0      0      0
## textile.labourers                0      0      0
## tool.die.makers                  0      0      0
## machinists                       0      0      0
## sheet.metal.workers              0      0      0
## welders                          0      0      0
## auto.workers                     0      0      0
## aircraft.workers                 0      0      0
## electronic.workers               0      0      0
## radio.tv.repairmen               0      0      0
## sewing.mach.operators            0      0      0
## auto.repairmen                   0      0      0
## aircraft.repairmen               0      0      0
## railway.sectionmen               0      0      0
## electrical.linemen               0      0      0
## electricians                     0      0      0
## construction.foremen             0      0      0
## carpenters                       0      0      0
## masons                           0      0      0
## house.painters                   0      0      0
## plumbers                         0      0      0
## construction.labourers           0      0      0
## pilots                           1      0      0
## train.engineers                  0      0      0
## bus.drivers                      0      0      0
## taxi.drivers                     0      0      0
## longshoremen                     0      0      0
## typesetters                      0      0      0
## bookbinders                      0      0      0

Análisis final

Al revisar la salida anterior vemos que se crearon 3 variables más, pasando de 6 a 9 variables, es decir, se crearon: typebc, typeprof, typewc y typeNA, si revisamos esta ultima vemos que tiene 4 observaciones con 1, lo que corresponde a los 4 NA; la función dummy crea estás nuevas variables para las categóricas y rellena de ceros (0) y unos (1); incidando con el 1 la observación que contiene la información, de esta manera se realiza la transformación y se puede continuar con el proceso.

Nota: en los modelos de regresión y clasificación proceso de dummies lo realizan automáticamente los algoritmos.