VARIBLES EN LA BASE age: continuous. workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. fnlwgt: continuous. education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. education-num: continuous. marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. sex: Female, Male. capital-gain: continuous. capital-loss: continuous. hours-per-week: continuous. native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.
rm(list=ls())
library(data.table)
urlData <- "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
data <- fread(urlData)
columnas <- c('edad','tipo trabajo','fnlwgt','educación','num-educación','Estatus Matrimonial','ocupación','realación',
'raza','sexo','ganancia-capital','perdida-capital',
'horas por semana','Nacimiento','ingresos')
data <- data.frame(data)
colnames(data)<- columnas
str(data)
## 'data.frame': 32561 obs. of 15 variables:
## $ edad : int 39 50 38 53 28 37 49 52 31 42 ...
## $ tipo trabajo : chr "State-gov" "Self-emp-not-inc" "Private" "Private" ...
## $ fnlwgt : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
## $ educación : chr "Bachelors" "Bachelors" "HS-grad" "11th" ...
## $ num-educación : int 13 13 9 7 13 14 5 9 14 13 ...
## $ Estatus Matrimonial: chr "Never-married" "Married-civ-spouse" "Divorced" "Married-civ-spouse" ...
## $ ocupación : chr "Adm-clerical" "Exec-managerial" "Handlers-cleaners" "Handlers-cleaners" ...
## $ realación : chr "Not-in-family" "Husband" "Not-in-family" "Husband" ...
## $ raza : chr "White" "White" "White" "Black" ...
## $ sexo : chr "Male" "Male" "Male" "Male" ...
## $ ganancia-capital : int 2174 0 0 0 0 0 0 0 14084 5178 ...
## $ perdida-capital : int 0 0 0 0 0 0 0 0 0 0 ...
## $ horas por semana : int 40 13 40 40 40 40 16 45 50 40 ...
## $ Nacimiento : chr "United-States" "United-States" "United-States" "United-States" ...
## $ ingresos : chr "<=50K" "<=50K" "<=50K" "<=50K" ...
INGRESOS DE LA MUESTRA.
## [1] "Estimado de población con un ingreso menor a 50K al año 24720"
## [1] "Estimado de población con un ingreso mayor a 50K al año 7841"
Es factible generar otra información a partir de los datos en la base, por ejemplo ingresos por ocupación. Tarea. Desarrolle información que le paresca relevante.