fuente de los datos: https://archive.ics.uci.edu/dataset/2/adult
Conocido también como Census Income.
Fue publicado en el UCI Machine Learning Repository por Barry Becker en 1996, a partir de datos del censo de EE. UU. de 1994.
Tiene 48 842 instancias y 14 atributos.
Sys.setlocale("LC_ALL", "es_ES.UTF-8")
## [1] "LC_COLLATE=es_ES.UTF-8;LC_CTYPE=es_ES.UTF-8;LC_MONETARY=es_ES.UTF-8;LC_NUMERIC=C;LC_TIME=es_ES.UTF-8"
adult1 <- read.csv("~/rstudio/adult.data", header=FALSE,na.strings=" ?")
adult2 <- read.csv("~/rstudio/adult.test", header=FALSE,na.strings=" ?")
df <- rbind(adult1,adult2)
str(df)
## 'data.frame': 48842 obs. of 15 variables:
## $ V1 : int 39 50 38 53 28 37 49 52 31 42 ...
## $ V2 : chr " State-gov" " Self-emp-not-inc" " Private" " Private" ...
## $ V3 : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
## $ V4 : chr " Bachelors" " Bachelors" " HS-grad" " 11th" ...
## $ V5 : int 13 13 9 7 13 14 5 9 14 13 ...
## $ V6 : chr " Never-married" " Married-civ-spouse" " Divorced" " Married-civ-spouse" ...
## $ V7 : chr " Adm-clerical" " Exec-managerial" " Handlers-cleaners" " Handlers-cleaners" ...
## $ V8 : chr " Not-in-family" " Husband" " Not-in-family" " Husband" ...
## $ V9 : chr " White" " White" " White" " Black" ...
## $ V10: chr " Male" " Male" " Male" " Male" ...
## $ V11: int 2174 0 0 0 0 0 0 0 14084 5178 ...
## $ V12: int 0 0 0 0 0 0 0 0 0 0 ...
## $ V13: int 40 13 40 40 40 40 16 45 50 40 ...
## $ V14: chr " United-States" " United-States" " United-States" " United-States" ...
## $ V15: chr " <=50K" " <=50K" " <=50K" " <=50K" ...
head(df)
## V1 V2 V3 V4 V5 V6
## 1 39 State-gov 77516 Bachelors 13 Never-married
## 2 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse
## 3 38 Private 215646 HS-grad 9 Divorced
## 4 53 Private 234721 11th 7 Married-civ-spouse
## 5 28 Private 338409 Bachelors 13 Married-civ-spouse
## 6 37 Private 284582 Masters 14 Married-civ-spouse
## V7 V8 V9 V10 V11 V12 V13 V14
## 1 Adm-clerical Not-in-family White Male 2174 0 40 United-States
## 2 Exec-managerial Husband White Male 0 0 13 United-States
## 3 Handlers-cleaners Not-in-family White Male 0 0 40 United-States
## 4 Handlers-cleaners Husband Black Male 0 0 40 United-States
## 5 Prof-specialty Wife Black Female 0 0 40 Cuba
## 6 Exec-managerial Wife White Female 0 0 40 United-States
## V15
## 1 <=50K
## 2 <=50K
## 3 <=50K
## 4 <=50K
## 5 <=50K
## 6 <=50K
Cargamos el conjunto de datos desde las 2 tablas y como no es nuestro objetivo realizar tecnicas de machine learning juntamos los registros de train y test para formar un dataframe completo con toda la informacion. Se observa ademas la falta de nombres representativos para las columnas asi que vamos a arreglar eso
cols <- c("Age",
"Workclass",
"Fnlwgt",
"Education",
"Education-num",
"Marital-status",
"Occupation",
"Relationship",
"Race",
"Sex",
"Capital-gain",
"Capital-loss",
"Hours-per-week",
"Native-country",
"Income"
)
colnames(df) <- cols
head(df)
## Age Workclass Fnlwgt Education Education-num Marital-status
## 1 39 State-gov 77516 Bachelors 13 Never-married
## 2 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse
## 3 38 Private 215646 HS-grad 9 Divorced
## 4 53 Private 234721 11th 7 Married-civ-spouse
## 5 28 Private 338409 Bachelors 13 Married-civ-spouse
## 6 37 Private 284582 Masters 14 Married-civ-spouse
## Occupation Relationship Race Sex Capital-gain Capital-loss
## 1 Adm-clerical Not-in-family White Male 2174 0
## 2 Exec-managerial Husband White Male 0 0
## 3 Handlers-cleaners Not-in-family White Male 0 0
## 4 Handlers-cleaners Husband Black Male 0 0
## 5 Prof-specialty Wife Black Female 0 0
## 6 Exec-managerial Wife White Female 0 0
## Hours-per-week Native-country Income
## 1 40 United-States <=50K
## 2 13 United-States <=50K
## 3 40 United-States <=50K
## 4 40 United-States <=50K
## 5 40 Cuba <=50K
## 6 40 United-States <=50K
ademas notamos que el tipo de dato está correctamente entendido por r en el caso de las variables numericas excepto income y education-num(a pesar de ser un numero es una variable categorica) y las convertiremos a factor
factores <- c("Workclass", "Education", "Education-num", "Marital-status", "Occupation", "Relationship", "Race", "Sex", "Native-country")
df[factores] <- lapply(df[factores], as.factor)
head(df)
## Age Workclass Fnlwgt Education Education-num Marital-status
## 1 39 State-gov 77516 Bachelors 13 Never-married
## 2 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse
## 3 38 Private 215646 HS-grad 9 Divorced
## 4 53 Private 234721 11th 7 Married-civ-spouse
## 5 28 Private 338409 Bachelors 13 Married-civ-spouse
## 6 37 Private 284582 Masters 14 Married-civ-spouse
## Occupation Relationship Race Sex Capital-gain Capital-loss
## 1 Adm-clerical Not-in-family White Male 2174 0
## 2 Exec-managerial Husband White Male 0 0
## 3 Handlers-cleaners Not-in-family White Male 0 0
## 4 Handlers-cleaners Husband Black Male 0 0
## 5 Prof-specialty Wife Black Female 0 0
## 6 Exec-managerial Wife White Female 0 0
## Hours-per-week Native-country Income
## 1 40 United-States <=50K
## 2 13 United-States <=50K
## 3 40 United-States <=50K
## 4 40 United-States <=50K
## 5 40 Cuba <=50K
## 6 40 United-States <=50K
Ahora que ya finalizamos con el cargado y la validacion adecuada del tipo de dato podemos proseguir con un analisis exploratorio de datos con su respectivo analisis de datos faltantes
realizamos un summary del dataframe
summary(df)
## Age Workclass Fnlwgt
## Min. :17.00 Private :33906 Min. : 12285
## 1st Qu.:28.00 Self-emp-not-inc: 3862 1st Qu.: 117551
## Median :37.00 Local-gov : 3136 Median : 178145
## Mean :38.64 State-gov : 1981 Mean : 189664
## 3rd Qu.:48.00 Self-emp-inc : 1695 3rd Qu.: 237642
## Max. :90.00 (Other) : 1463 Max. :1490400
## NA's : 2799
## Education Education-num Marital-status
## HS-grad :15784 9 :15784 Divorced : 6633
## Some-college:10878 10 :10878 Married-AF-spouse : 37
## Bachelors : 8025 13 : 8025 Married-civ-spouse :22379
## Masters : 2657 14 : 2657 Married-spouse-absent: 628
## Assoc-voc : 2061 11 : 2061 Never-married :16117
## 11th : 1812 7 : 1812 Separated : 1530
## (Other) : 7625 (Other): 7625 Widowed : 1518
## Occupation Relationship Race
## Prof-specialty : 6172 Husband :19716 Amer-Indian-Eskimo: 470
## Craft-repair : 6112 Not-in-family :12583 Asian-Pac-Islander: 1519
## Exec-managerial: 6086 Other-relative: 1506 Black : 4685
## Adm-clerical : 5611 Own-child : 7581 Other : 406
## Sales : 5504 Unmarried : 5125 White :41762
## (Other) :16548 Wife : 2331
## NA's : 2809
## Sex Capital-gain Capital-loss Hours-per-week
## Female:16192 Min. : 0 Min. : 0.0 Min. : 1.00
## Male :32650 1st Qu.: 0 1st Qu.: 0.0 1st Qu.:40.00
## Median : 0 Median : 0.0 Median :40.00
## Mean : 1079 Mean : 87.5 Mean :40.42
## 3rd Qu.: 0 3rd Qu.: 0.0 3rd Qu.:45.00
## Max. :99999 Max. :4356.0 Max. :99.00
##
## Native-country Income
## United-States:43832 Length:48842
## Mexico : 951 Class :character
## Philippines : 295 Mode :character
## Germany : 206
## Puerto-Rico : 184
## (Other) : 2517
## NA's : 857
Podemos observar que hay valores vacios(NA’s) por lo que debemos analizarlos antes de poder continuar con nuestro analisis ### Analisis de datos faltantes
library(Amelia)
## Cargando paquete requerido: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.3, built: 2024-11-07)
## ## Copyright (C) 2005-2025 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
missmap(df,col=c("red","blue"),legend=TRUE)
# Contar y calcular porcentaje de "?" por columna
df[df == " ?"] <- NA
conteo_preguntas <- colSums(is.na(df))
porcentaje_preguntas <- round((conteo_preguntas / nrow(df)) * 100, 2)
# Crear tabla resumen
resumen <- data.frame(
Columna = names(conteo_preguntas),
Cantidad = conteo_preguntas,
Porcentaje = porcentaje_preguntas
)
print(resumen)
## Columna Cantidad Porcentaje
## Age Age 0 0.00
## Workclass Workclass 2799 5.73
## Fnlwgt Fnlwgt 0 0.00
## Education Education 0 0.00
## Education-num Education-num 0 0.00
## Marital-status Marital-status 0 0.00
## Occupation Occupation 2809 5.75
## Relationship Relationship 0 0.00
## Race Race 0 0.00
## Sex Sex 0 0.00
## Capital-gain Capital-gain 0 0.00
## Capital-loss Capital-loss 0 0.00
## Hours-per-week Hours-per-week 0 0.00
## Native-country Native-country 857 1.75
## Income Income 0 0.00
se observa que las variables de Workclass, Occupation y Native-country tienen NA que no es muy problematico. En las primeras 2 variables sobrepasa el 5% por lo que se realizara una imputacion simple en ellas, en el de los paises nativos al ser del 1.73% el impacto de los NA no sera mucho en la variable. Revisaremos la normalidad de las variables numericas para verificar cual imputacion seria mejor en dichos casos
num_vars <- sapply(df, is.numeric)
for (col in names(df)[num_vars]) {
# Extraer datos sin NA
x <- df[[col]][!is.na(df[[col]])]
# ks.test contra normal con media y sd de x
ks_res <- ks.test(x, "pnorm", mean = mean(x), sd = sd(x))
if (ks_res$p.value > 0.05) {
# Considerar normal → imputar con media
df[[col]][is.na(df[[col]])] <- mean(x)
cat(col, ": normal → imputada con media\n")
} else {
# No normal → imputar con mediana
df[[col]][is.na(df[[col]])] <- median(x)
cat(col, ": NO normal → imputada con mediana\n")
}
}
## Warning in ks.test.default(x, "pnorm", mean = mean(x), sd = sd(x)): ties should
## not be present for the one-sample Kolmogorov-Smirnov test
## Age : NO normal → imputada con mediana
## Warning in ks.test.default(x, "pnorm", mean = mean(x), sd = sd(x)): ties should
## not be present for the one-sample Kolmogorov-Smirnov test
## Fnlwgt : NO normal → imputada con mediana
## Warning in ks.test.default(x, "pnorm", mean = mean(x), sd = sd(x)): ties should
## not be present for the one-sample Kolmogorov-Smirnov test
## Capital-gain : NO normal → imputada con mediana
## Warning in ks.test.default(x, "pnorm", mean = mean(x), sd = sd(x)): ties should
## not be present for the one-sample Kolmogorov-Smirnov test
## Capital-loss : NO normal → imputada con mediana
## Warning in ks.test.default(x, "pnorm", mean = mean(x), sd = sd(x)): ties should
## not be present for the one-sample Kolmogorov-Smirnov test
## Hours-per-week : NO normal → imputada con mediana
Como las variables no son normales los valores NA se los va a imputar por la mediana
df$Fnlwgt[is.na(df$Fnlwgt)] <- median(df$Fnlwgt, na.rm = TRUE)
df$`Capital-gain`[is.na(df$`Capital-gain`)] <- median(df$`Capital-gain`, na.rm = TRUE)
df$`Capital-loss`[is.na(df$`Capital-loss`)] <- median(df$`Capital-loss`, na.rm = TRUE)
df$`Hours-per-week`[is.na(df$`Hours-per-week`)]<-median(df$`Hours-per-week`,na.rm=TRUE)
Como el resto de variables que contienen NA son categoricas, debemos imputarlas por la moda
moda <- function(x) {
ux <- na.omit(unique(x))
ux[which.max(tabulate(match(x, ux)))]
}
df$Workclass[is.na(df$Workclass)] <- moda(df$Workclass)
df$`Education-num`[is.na(df$`Education-num`)]<-moda(df$`Education-num`)
df$Occupation[is.na(df$Occupation)]<-moda(df$Occupation)
df$`Native-country`[is.na(df$`Native-country`)]<-moda(df$`Native-country`)
colSums(is.na(df))
## Age Workclass Fnlwgt Education Education-num
## 0 0 0 0 0
## Marital-status Occupation Relationship Race Sex
## 0 0 0 0 0
## Capital-gain Capital-loss Hours-per-week Native-country Income
## 0 0 0 0 0
Con esto las variables NA ya estan completamente codificadas por lo que podemos finalmente realizar el analisis exploratorio ## Analisis Univariado ### Analisis de variables Categoricas Para iniciar el analisis univariado empezaremos analizando el comportamiento de las variables categoricas
library(ggplot2)
for (var in factores) {
cat("\n========================\n")
cat("Variable:", var, "\n")
cat("========================\n")
# Frecuencia absoluta
freq_table <- table(df[[var]])
print(freq_table)
# Porcentaje
prop_table <- prop.table(freq_table) * 100
print(round(prop_table, 2))
}
##
## ========================
## Variable: Workclass
## ========================
##
## Federal-gov Local-gov Never-worked Private
## 1432 3136 10 36705
## Self-emp-inc Self-emp-not-inc State-gov Without-pay
## 1695 3862 1981 21
##
## Federal-gov Local-gov Never-worked Private
## 2.93 6.42 0.02 75.15
## Self-emp-inc Self-emp-not-inc State-gov Without-pay
## 3.47 7.91 4.06 0.04
##
## ========================
## Variable: Education
## ========================
##
## 10th 11th 12th 1st-4th 5th-6th
## 1389 1812 657 247 509
## 7th-8th 9th Assoc-acdm Assoc-voc Bachelors
## 955 756 1601 2061 8025
## Doctorate HS-grad Masters Preschool Prof-school
## 594 15784 2657 83 834
## Some-college
## 10878
##
## 10th 11th 12th 1st-4th 5th-6th
## 2.84 3.71 1.35 0.51 1.04
## 7th-8th 9th Assoc-acdm Assoc-voc Bachelors
## 1.96 1.55 3.28 4.22 16.43
## Doctorate HS-grad Masters Preschool Prof-school
## 1.22 32.32 5.44 0.17 1.71
## Some-college
## 22.27
##
## ========================
## Variable: Education-num
## ========================
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 83 247 509 955 756 1389 1812 657 15784 10878 2061 1601 8025
## 14 15 16
## 2657 834 594
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0.17 0.51 1.04 1.96 1.55 2.84 3.71 1.35 32.32 22.27 4.22 3.28 16.43
## 14 15 16
## 5.44 1.71 1.22
##
## ========================
## Variable: Marital-status
## ========================
##
## Divorced Married-AF-spouse Married-civ-spouse
## 6633 37 22379
## Married-spouse-absent Never-married Separated
## 628 16117 1530
## Widowed
## 1518
##
## Divorced Married-AF-spouse Married-civ-spouse
## 13.58 0.08 45.82
## Married-spouse-absent Never-married Separated
## 1.29 33.00 3.13
## Widowed
## 3.11
##
## ========================
## Variable: Occupation
## ========================
##
## Adm-clerical Armed-Forces Craft-repair Exec-managerial
## 5611 15 6112 6086
## Farming-fishing Handlers-cleaners Machine-op-inspct Other-service
## 1490 2072 3022 4923
## Priv-house-serv Prof-specialty Protective-serv Sales
## 242 8981 983 5504
## Tech-support Transport-moving
## 1446 2355
##
## Adm-clerical Armed-Forces Craft-repair Exec-managerial
## 11.49 0.03 12.51 12.46
## Farming-fishing Handlers-cleaners Machine-op-inspct Other-service
## 3.05 4.24 6.19 10.08
## Priv-house-serv Prof-specialty Protective-serv Sales
## 0.50 18.39 2.01 11.27
## Tech-support Transport-moving
## 2.96 4.82
##
## ========================
## Variable: Relationship
## ========================
##
## Husband Not-in-family Other-relative Own-child Unmarried
## 19716 12583 1506 7581 5125
## Wife
## 2331
##
## Husband Not-in-family Other-relative Own-child Unmarried
## 40.37 25.76 3.08 15.52 10.49
## Wife
## 4.77
##
## ========================
## Variable: Race
## ========================
##
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other
## 470 1519 4685 406
## White
## 41762
##
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other
## 0.96 3.11 9.59 0.83
## White
## 85.50
##
## ========================
## Variable: Sex
## ========================
##
## Female Male
## 16192 32650
##
## Female Male
## 33.15 66.85
##
## ========================
## Variable: Native-country
## ========================
##
## Cambodia Canada
## 28 182
## China Columbia
## 122 85
## Cuba Dominican-Republic
## 138 103
## Ecuador El-Salvador
## 45 155
## England France
## 127 38
## Germany Greece
## 206 49
## Guatemala Haiti
## 88 75
## Holand-Netherlands Honduras
## 1 20
## Hong Hungary
## 30 19
## India Iran
## 151 59
## Ireland Italy
## 37 105
## Jamaica Japan
## 106 92
## Laos Mexico
## 23 951
## Nicaragua Outlying-US(Guam-USVI-etc)
## 49 23
## Peru Philippines
## 46 295
## Poland Portugal
## 87 67
## Puerto-Rico Scotland
## 184 21
## South Taiwan
## 115 65
## Thailand Trinadad&Tobago
## 30 27
## United-States Vietnam
## 44689 86
## Yugoslavia
## 23
##
## Cambodia Canada
## 0.06 0.37
## China Columbia
## 0.25 0.17
## Cuba Dominican-Republic
## 0.28 0.21
## Ecuador El-Salvador
## 0.09 0.32
## England France
## 0.26 0.08
## Germany Greece
## 0.42 0.10
## Guatemala Haiti
## 0.18 0.15
## Holand-Netherlands Honduras
## 0.00 0.04
## Hong Hungary
## 0.06 0.04
## India Iran
## 0.31 0.12
## Ireland Italy
## 0.08 0.21
## Jamaica Japan
## 0.22 0.19
## Laos Mexico
## 0.05 1.95
## Nicaragua Outlying-US(Guam-USVI-etc)
## 0.10 0.05
## Peru Philippines
## 0.09 0.60
## Poland Portugal
## 0.18 0.14
## Puerto-Rico Scotland
## 0.38 0.04
## South Taiwan
## 0.24 0.13
## Thailand Trinadad&Tobago
## 0.06 0.06
## United-States Vietnam
## 91.50 0.18
## Yugoslavia
## 0.05
for (var in factores) {
cat("\nTop categorÃas en", var, "\n")
print(sort(table(df[[var]]), decreasing = TRUE)[1:5])
}
##
## Top categorÃas en Workclass
##
## Private Self-emp-not-inc Local-gov State-gov
## 36705 3862 3136 1981
## Self-emp-inc
## 1695
##
## Top categorÃas en Education
##
## HS-grad Some-college Bachelors Masters Assoc-voc
## 15784 10878 8025 2657 2061
##
## Top categorÃas en Education-num
##
## 9 10 13 14 11
## 15784 10878 8025 2657 2061
##
## Top categorÃas en Marital-status
##
## Married-civ-spouse Never-married Divorced Separated
## 22379 16117 6633 1530
## Widowed
## 1518
##
## Top categorÃas en Occupation
##
## Prof-specialty Craft-repair Exec-managerial Adm-clerical
## 8981 6112 6086 5611
## Sales
## 5504
##
## Top categorÃas en Relationship
##
## Husband Not-in-family Own-child Unmarried Wife
## 19716 12583 7581 5125 2331
##
## Top categorÃas en Race
##
## White Black Asian-Pac-Islander Amer-Indian-Eskimo
## 41762 4685 1519 470
## Other
## 406
##
## Top categorÃas en Sex
##
## Male Female <NA> <NA> <NA>
## 32650 16192
##
## Top categorÃas en Native-country
##
## United-States Mexico Philippines Germany Puerto-Rico
## 44689 951 295 206 184
for (var in factores) {
p <- ggplot(df, aes(x = .data[[var]])) +
geom_bar(fill = "steelblue") +
labs(
title = enc2utf8(paste("Distribución de", var)),
x = var,
y = "Frecuencia"
) +
theme_minimal()
print(p)
}
Podemos observar la moda para cada variable gracias al ranking. Gracias
al grafico y a los resumenes podemos observar que existe desbalance en
todas las variables categoricas por lo que damos inicio al analisis de
las variables numericas.
iniciamos con medidas descriptivas para las variables cuantitativas, primero creamos un vector para identificar y realizar los ciclos mas facilmente
numericas <- names(df)[sapply(df, is.numeric)]
numericas
## [1] "Age" "Fnlwgt" "Capital-gain" "Capital-loss"
## [5] "Hours-per-week"
ahora con un resumen estadistico
for (var in numericas) {
cat("\n========================\n")
cat("Variable:", var, "\n")
cat("========================\n")
# Medidas descriptivas
valores <- df[[var]]
cat("MÃnimo:", min(valores, na.rm = TRUE), "\n")
cat("Máximo:", max(valores, na.rm = TRUE), "\n")
cat("Media:", mean(valores, na.rm = TRUE), "\n")
cat("Mediana:", median(valores, na.rm = TRUE), "\n")
cat("Desviación estándar:", sd(valores, na.rm = TRUE), "\n")
cat("Cuartiles:\n")
print(quantile(valores, na.rm = TRUE))
}
##
## ========================
## Variable: Age
## ========================
## MÃnimo: 17
## Máximo: 90
## Media: 38.64359
## Mediana: 37
## Desviación estándar: 13.71051
## Cuartiles:
## 0% 25% 50% 75% 100%
## 17 28 37 48 90
##
## ========================
## Variable: Fnlwgt
## ========================
## MÃnimo: 12285
## Máximo: 1490400
## Media: 189664.1
## Mediana: 178144.5
## Desviación estándar: 105604
## Cuartiles:
## 0% 25% 50% 75% 100%
## 12285.0 117550.5 178144.5 237642.0 1490400.0
##
## ========================
## Variable: Capital-gain
## ========================
## MÃnimo: 0
## Máximo: 99999
## Media: 1079.068
## Mediana: 0
## Desviación estándar: 7452.019
## Cuartiles:
## 0% 25% 50% 75% 100%
## 0 0 0 0 99999
##
## ========================
## Variable: Capital-loss
## ========================
## MÃnimo: 0
## Máximo: 4356
## Media: 87.50231
## Mediana: 0
## Desviación estándar: 403.0046
## Cuartiles:
## 0% 25% 50% 75% 100%
## 0 0 0 0 4356
##
## ========================
## Variable: Hours-per-week
## ========================
## MÃnimo: 1
## Máximo: 99
## Media: 40.42238
## Mediana: 40
## Desviación estándar: 12.39144
## Cuartiles:
## 0% 25% 50% 75% 100%
## 1 40 40 45 99
seguidamente un histograma para tener una idea grafica de la distribucion de las variables(si las tiene)
for (var in numericas) {
p1 <- ggplot(df, aes(x = .data[[var]])) +
geom_histogram(fill = "steelblue", color = "white") +
labs(title = paste("Histograma de", var), x = var, y = "Frecuencia") +
theme_minimal()
print(p1)
}
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
for (var in numericas) {
p1 <- ggplot(df, aes(y = .data[[var]])) +
geom_boxplot(fill = "orange") +
labs(title = paste("Boxplot de", var), y = var) +
theme_minimal()
print(p1)
}
se puede observar presencia de datos atipicos en el boxplot y los
histogramas unicamente age parece asimilarse graficamente a la forma de
una normal por lo que será importante realizar pruebas analiticas para
determinar normalidad, pruebas analiticas para entender la naturaleza de
estos datos atipicos
# Aplicar prueba de Lilliefors a cada
library(moments)
library(nortest)
for (var in numericas) {
cat("\n========================\n")
cat("Variable:", var, "\n")
cat("========================\n")
resultado <- lillie.test(df[[var]])
print(resultado)
}
##
## ========================
## Variable: Age
## ========================
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: df[[var]]
## D = 0.063157, p-value < 2.2e-16
##
##
## ========================
## Variable: Fnlwgt
## ========================
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: df[[var]]
## D = 0.088684, p-value < 2.2e-16
##
##
## ========================
## Variable: Capital-gain
## ========================
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: df[[var]]
## D = 0.47495, p-value < 2.2e-16
##
##
## ========================
## Variable: Capital-loss
## ========================
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: df[[var]]
## D = 0.53922, p-value < 2.2e-16
##
##
## ========================
## Variable: Hours-per-week
## ========================
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: df[[var]]
## D = 0.24712, p-value < 2.2e-16
Se concluye que ninguna variable tiene comportamiento normal. Ahora analizamos valores atipicos usando el criterio del rango intercuartilico
for (var in numericas) {
cat("\n========================\n")
cat("Variable:", var, "\n")
cat("========================\n")
x <- df[[var]]
n_total <- sum(!is.na(x)) # valores no NA
outliers <- boxplot.stats(x)$out
if (length(outliers) > 0) {
porcentaje <- (length(outliers) / n_total) * 100
cat("Cantidad de outliers:", length(outliers), "\n")
cat("Porcentaje de outliers:", round(porcentaje, 2), "%\n")
} else {
cat("No se detectaron valores atÃpicos.\n")
}
}
##
## ========================
## Variable: Age
## ========================
## Cantidad de outliers: 216
## Porcentaje de outliers: 0.44 %
##
## ========================
## Variable: Fnlwgt
## ========================
## Cantidad de outliers: 1453
## Porcentaje de outliers: 2.97 %
##
## ========================
## Variable: Capital-gain
## ========================
## Cantidad de outliers: 4035
## Porcentaje de outliers: 8.26 %
##
## ========================
## Variable: Capital-loss
## ========================
## Cantidad de outliers: 2282
## Porcentaje de outliers: 4.67 %
##
## ========================
## Variable: Hours-per-week
## ========================
## Cantidad de outliers: 13496
## Porcentaje de outliers: 27.63 %
for (var in numericas) {
cat("\n========================\n")
cat("Variable:", var, "\n")
cat("========================\n")
x <- df[[var]]
Q1 <- quantile(x, 0.25, na.rm = TRUE)
Q3 <- quantile(x, 0.75, na.rm = TRUE)
IQR_value <- Q3 - Q1
limite_inferior <- Q1 - 1.5 * IQR_value
limite_superior <- Q3 + 1.5 * IQR_value
outliers_bajos <- x[x < limite_inferior]
outliers_altos <- x[x > limite_superior]
cat("Outliers bajos:", length(outliers_bajos), "\n")
cat("Outliers altos:", length(outliers_altos), "\n")
}
##
## ========================
## Variable: Age
## ========================
## Outliers bajos: 0
## Outliers altos: 216
##
## ========================
## Variable: Fnlwgt
## ========================
## Outliers bajos: 0
## Outliers altos: 1453
##
## ========================
## Variable: Capital-gain
## ========================
## Outliers bajos: 0
## Outliers altos: 4035
##
## ========================
## Variable: Capital-loss
## ========================
## Outliers bajos: 0
## Outliers altos: 2282
##
## ========================
## Variable: Hours-per-week
## ========================
## Outliers bajos: 8286
## Outliers altos: 5210
Ahora vamos a realizar el tratado de outliers: - Como la variable Age, Fnlwgt, capital-loss tiene menos del 5% de outliers no es necesario realizar alguna limpieza - Para la variable capital-gain se tiene un porcentaje de outliers con el que podemos realizar limpiezas simples, por lo que realizaremos imputacion por la mediana (dado que los datos son no normales) - Como la variable hours-per-week tiene una cantidad de outliers considerable se realizaran tecnicas de imputacion avanzada
mediana_capital_gain <- median(df$`capital-gain`, na.rm = TRUE)
df$`capital-gain`[df$`capital-gain` %in% boxplot.stats(df$`capital-gain`)$out] <- mediana_capital_gain
Se va a realizar el analisis bivariado de el dataset, realizando comparaciones entre las variables para determinar si las variables estan relacionadas
Realizamos una matriz de correlacion para determinar la correlacion entre las variables numericas
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Var_num <- df %>%
select(where(is.numeric))
cor_matrix <- cor(Var_num, use = "complete.obs")
library(reshape2)
cor_df <- melt(cor_matrix)
ggplot(cor_df, aes(x = Var1, y = Var2, fill = value)) +
geom_tile(color = "white") +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), space = "Lab",
name = "Correlación") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1)) +
coord_fixed() +
ggtitle("Matriz de correlación (Heatmap)")
Podemos observar que las variables numericas no tienen una alta
correlacion entre si, por lo que no tienen una relacion lineal muy alta.
Vamos a utilizar scatterplots para revisar si hay relacion no lineal
entre las variables.
Realizamos tablas de contingencia para las variables categoricas y utilizamos la prueba de chi-cuadrado para determinar si hay diferencias estadisticamente significativas entre las variables y si pueden estar relacionadas.
if(length(factores) >= 2){
comb_cat <- combn(factores, 2, simplify = FALSE)
for (par in comb_cat) {
tabla <- table(df[[par[1]]], df[[par[2]]])
cat("\nTabla de contingencia entre", par[1], "y", par[2], ":\n")
print(tabla)
if(min(dim(tabla)) > 1){ # Para evitar error si hay solo 1 nivel
chis <- chisq.test(tabla,simulate.p.value = TRUE)
cat("chi cuadrado p-valor:", chis$p.value, "\n")
}
}
}
##
## Tabla de contingencia entre Workclass y Education :
##
## 10th 11th 12th 1st-4th 5th-6th 7th-8th 9th
## Federal-gov 15 14 8 1 1 4 6
## Local-gov 52 61 25 5 13 40 31
## Never-worked 2 3 0 0 0 1 0
## Private 1170 1590 570 221 450 734 637
## Self-emp-inc 27 23 13 2 8 20 13
## Self-emp-not-inc 104 106 30 16 33 138 59
## State-gov 19 15 11 2 4 16 10
## Without-pay 0 0 0 0 0 2 0
##
## Assoc-acdm Assoc-voc Bachelors Doctorate HS-grad
## Federal-gov 81 61 313 22 395
## Local-gov 129 124 700 35 761
## Never-worked 0 0 0 0 2
## Private 1157 1571 5556 281 12492
## Self-emp-inc 58 59 418 55 426
## Self-emp-not-inc 111 171 607 76 1279
## State-gov 63 75 431 125 415
## Without-pay 2 0 0 0 14
##
## Masters Preschool Prof-school Some-college
## Federal-gov 110 0 38 363
## Local-gov 526 4 41 589
## Never-worked 0 0 0 2
## Private 1443 73 385 8375
## Self-emp-inc 112 0 129 332
## Self-emp-not-inc 207 5 196 724
## State-gov 259 1 45 490
## Without-pay 0 0 0 3
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Workclass y Education-num :
##
## 1 2 3 4 5 6 7 8 9 10
## Federal-gov 0 1 1 4 6 15 14 8 395 363
## Local-gov 4 5 13 40 31 52 61 25 761 589
## Never-worked 0 0 0 1 0 2 3 0 2 2
## Private 73 221 450 734 637 1170 1590 570 12492 8375
## Self-emp-inc 0 2 8 20 13 27 23 13 426 332
## Self-emp-not-inc 5 16 33 138 59 104 106 30 1279 724
## State-gov 1 2 4 16 10 19 15 11 415 490
## Without-pay 0 0 0 2 0 0 0 0 14 3
##
## 11 12 13 14 15 16
## Federal-gov 61 81 313 110 38 22
## Local-gov 124 129 700 526 41 35
## Never-worked 0 0 0 0 0 0
## Private 1571 1157 5556 1443 385 281
## Self-emp-inc 59 58 418 112 129 55
## Self-emp-not-inc 171 111 607 207 196 76
## State-gov 75 63 431 259 45 125
## Without-pay 0 2 0 0 0 0
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Workclass y Marital-status :
##
## Divorced Married-AF-spouse Married-civ-spouse
## Federal-gov 238 3 721
## Local-gov 529 0 1536
## Never-worked 1 0 1
## Private 4971 29 15400
## Self-emp-inc 146 0 1264
## Self-emp-not-inc 432 3 2554
## State-gov 316 2 890
## Without-pay 0 0 13
##
## Married-spouse-absent Never-married Separated Widowed
## Federal-gov 15 368 39 48
## Local-gov 33 798 100 140
## Never-worked 1 7 0 0
## Private 497 13478 1216 1114
## Self-emp-inc 8 211 25 41
## Self-emp-not-inc 48 613 85 127
## State-gov 25 636 65 47
## Without-pay 1 6 0 1
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Workclass y Occupation :
##
## Adm-clerical Armed-Forces Craft-repair Exec-managerial
## Federal-gov 487 15 93 268
## Local-gov 421 0 211 331
## Never-worked 0 0 0 0
## Private 4208 0 4748 3995
## Self-emp-inc 47 0 167 617
## Self-emp-not-inc 70 0 798 587
## State-gov 375 0 94 287
## Without-pay 3 0 1 1
##
## Farming-fishing Handlers-cleaners Machine-op-inspct
## Federal-gov 9 36 19
## Local-gov 43 65 24
## Never-worked 0 0 0
## Private 670 1923 2882
## Self-emp-inc 82 6 17
## Self-emp-not-inc 653 21 59
## State-gov 25 19 19
## Without-pay 8 2 2
##
## Other-service Priv-house-serv Prof-specialty
## Federal-gov 55 0 253
## Local-gov 300 0 1061
## Never-worked 0 0 10
## Private 4057 242 6208
## Self-emp-inc 42 0 245
## Self-emp-not-inc 276 0 575
## State-gov 191 0 629
## Without-pay 2 0 0
##
## Protective-serv Sales Tech-support Transport-moving
## Federal-gov 47 17 96 37
## Local-gov 450 16 58 156
## Never-worked 0 0 0 0
## Private 299 4439 1154 1880
## Self-emp-inc 5 420 9 38
## Self-emp-not-inc 7 591 42 183
## State-gov 175 20 87 60
## Without-pay 0 1 0 1
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Workclass y Relationship :
##
## Husband Not-in-family Other-relative Own-child
## Federal-gov 658 398 31 99
## Local-gov 1288 791 64 331
## Never-worked 0 1 1 7
## Private 13457 9781 1302 6551
## Self-emp-inc 1189 253 14 93
## Self-emp-not-inc 2343 773 61 244
## State-gov 773 586 33 248
## Without-pay 8 0 0 8
##
## Unmarried Wife
## Federal-gov 185 61
## Local-gov 436 226
## Never-worked 0 1
## Private 3938 1676
## Self-emp-inc 78 68
## Self-emp-not-inc 250 191
## State-gov 236 105
## Without-pay 2 3
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Workclass y Race :
##
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other
## Federal-gov 33 64 255 11
## Local-gov 65 59 437 16
## Never-worked 0 0 3 0
## Private 313 1139 3575 343
## Self-emp-inc 2 64 38 6
## Self-emp-not-inc 34 100 136 16
## State-gov 23 92 240 14
## Without-pay 0 1 1 0
##
## White
## Federal-gov 1069
## Local-gov 2559
## Never-worked 7
## Private 31335
## Self-emp-inc 1585
## Self-emp-not-inc 3576
## State-gov 1612
## Without-pay 19
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Workclass y Sex :
##
## Female Male
## Federal-gov 452 980
## Local-gov 1258 1878
## Never-worked 3 7
## Private 12869 23836
## Self-emp-inc 211 1484
## Self-emp-not-inc 629 3233
## State-gov 763 1218
## Without-pay 7 14
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Workclass y Native-country :
##
## Cambodia Canada China Columbia Cuba
## Federal-gov 1 4 2 2 3
## Local-gov 0 10 3 2 8
## Never-worked 0 0 0 0 0
## Private 24 132 94 71 100
## Self-emp-inc 0 13 5 0 12
## Self-emp-not-inc 2 19 5 8 15
## State-gov 1 4 13 2 0
## Without-pay 0 0 0 0 0
##
## Dominican-Republic Ecuador El-Salvador England France
## Federal-gov 0 0 2 4 1
## Local-gov 4 1 5 6 4
## Never-worked 0 0 0 0 0
## Private 94 39 143 98 27
## Self-emp-inc 2 0 1 2 2
## Self-emp-not-inc 3 3 4 13 3
## State-gov 0 2 0 4 1
## Without-pay 0 0 0 0 0
##
## Germany Greece Guatemala Haiti Holand-Netherlands
## Federal-gov 8 0 1 2 0
## Local-gov 15 1 4 4 0
## Never-worked 0 0 0 0 0
## Private 150 29 80 62 1
## Self-emp-inc 8 10 0 2 0
## Self-emp-not-inc 15 9 3 3 0
## State-gov 10 0 0 2 0
## Without-pay 0 0 0 0 0
##
## Honduras Hong Hungary India Iran Ireland Italy
## Federal-gov 0 0 0 4 2 0 1
## Local-gov 1 1 0 3 2 0 5
## Never-worked 0 0 0 0 0 0 0
## Private 14 26 12 106 39 33 77
## Self-emp-inc 2 1 1 10 2 1 6
## Self-emp-not-inc 1 0 6 11 13 3 14
## State-gov 2 2 0 17 1 0 2
## Without-pay 0 0 0 0 0 0 0
##
## Jamaica Japan Laos Mexico Nicaragua
## Federal-gov 1 3 1 4 1
## Local-gov 7 2 1 21 3
## Never-worked 0 0 0 0 0
## Private 90 72 21 864 41
## Self-emp-inc 1 4 0 9 0
## Self-emp-not-inc 4 8 0 47 3
## State-gov 3 3 0 6 1
## Without-pay 0 0 0 0 0
##
## Outlying-US(Guam-USVI-etc) Peru Philippines Poland
## Federal-gov 0 1 19 3
## Local-gov 1 1 16 3
## Never-worked 0 0 0 0
## Private 20 42 240 72
## Self-emp-inc 0 0 2 1
## Self-emp-not-inc 1 2 8 5
## State-gov 1 0 9 3
## Without-pay 0 0 1 0
##
## Portugal Puerto-Rico Scotland South Taiwan Thailand
## Federal-gov 1 10 0 2 0 0
## Local-gov 2 13 0 0 2 1
## Never-worked 0 0 0 0 0 0
## Private 55 151 18 72 47 21
## Self-emp-inc 4 1 0 12 8 4
## Self-emp-not-inc 4 5 2 28 1 3
## State-gov 1 4 1 1 7 1
## Without-pay 0 0 0 0 0 0
##
## Trinadad&Tobago United-States Vietnam Yugoslavia
## Federal-gov 1 1346 2 0
## Local-gov 2 2977 4 1
## Never-worked 0 10 0 0
## Private 20 33320 71 17
## Self-emp-inc 2 1565 1 1
## Self-emp-not-inc 1 3576 7 4
## State-gov 1 1875 1 0
## Without-pay 0 20 0 0
## chi cuadrado p-valor: 0.005497251
##
## Tabla de contingencia entre Education y Education-num :
##
## 1 2 3 4 5 6 7 8 9 10
## 10th 0 0 0 0 0 1389 0 0 0 0
## 11th 0 0 0 0 0 0 1812 0 0 0
## 12th 0 0 0 0 0 0 0 657 0 0
## 1st-4th 0 247 0 0 0 0 0 0 0 0
## 5th-6th 0 0 509 0 0 0 0 0 0 0
## 7th-8th 0 0 0 955 0 0 0 0 0 0
## 9th 0 0 0 0 756 0 0 0 0 0
## Assoc-acdm 0 0 0 0 0 0 0 0 0 0
## Assoc-voc 0 0 0 0 0 0 0 0 0 0
## Bachelors 0 0 0 0 0 0 0 0 0 0
## Doctorate 0 0 0 0 0 0 0 0 0 0
## HS-grad 0 0 0 0 0 0 0 0 15784 0
## Masters 0 0 0 0 0 0 0 0 0 0
## Preschool 83 0 0 0 0 0 0 0 0 0
## Prof-school 0 0 0 0 0 0 0 0 0 0
## Some-college 0 0 0 0 0 0 0 0 0 10878
##
## 11 12 13 14 15 16
## 10th 0 0 0 0 0 0
## 11th 0 0 0 0 0 0
## 12th 0 0 0 0 0 0
## 1st-4th 0 0 0 0 0 0
## 5th-6th 0 0 0 0 0 0
## 7th-8th 0 0 0 0 0 0
## 9th 0 0 0 0 0 0
## Assoc-acdm 0 1601 0 0 0 0
## Assoc-voc 2061 0 0 0 0 0
## Bachelors 0 0 8025 0 0 0
## Doctorate 0 0 0 0 0 594
## HS-grad 0 0 0 0 0 0
## Masters 0 0 0 2657 0 0
## Preschool 0 0 0 0 0 0
## Prof-school 0 0 0 0 834 0
## Some-college 0 0 0 0 0 0
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education y Marital-status :
##
## Divorced Married-AF-spouse Married-civ-spouse
## 10th 172 1 525
## 11th 192 0 545
## 12th 63 0 199
## 1st-4th 17 0 125
## 5th-6th 31 0 271
## 7th-8th 101 0 541
## 9th 98 0 349
## Assoc-acdm 280 2 697
## Assoc-voc 361 2 1013
## Bachelors 843 6 4136
## Doctorate 56 1 403
## HS-grad 2416 15 7243
## Masters 367 0 1527
## Preschool 2 0 30
## Prof-school 74 1 596
## Some-college 1560 9 4179
##
## Married-spouse-absent Never-married Separated Widowed
## 10th 22 525 75 69
## 11th 25 913 79 58
## 12th 10 348 19 18
## 1st-4th 18 50 12 25
## 5th-6th 27 128 31 21
## 7th-8th 20 158 39 96
## 9th 13 220 41 35
## Assoc-acdm 21 522 43 36
## Assoc-voc 19 539 64 63
## Bachelors 98 2681 136 125
## Doctorate 13 97 11 13
## HS-grad 199 4671 607 633
## Masters 24 635 44 60
## Preschool 5 38 3 5
## Prof-school 5 139 9 10
## Some-college 109 4453 317 251
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education y Occupation :
##
## Adm-clerical Armed-Forces Craft-repair Exec-managerial
## 10th 59 0 239 42
## 11th 100 0 270 51
## 12th 52 1 92 18
## 1st-4th 6 0 28 6
## 5th-6th 8 0 71 6
## 7th-8th 20 0 172 28
## 9th 20 0 144 23
## Assoc-acdm 281 0 167 240
## Assoc-voc 269 1 375 234
## Bachelors 765 1 332 2025
## Doctorate 6 0 4 84
## HS-grad 2047 5 2911 1192
## Masters 105 2 34 779
## Preschool 3 0 6 1
## Prof-school 12 1 9 69
## Some-college 1858 4 1258 1288
##
## Farming-fishing Handlers-cleaners Machine-op-inspct
## 10th 71 108 152
## 11th 67 177 153
## 12th 29 55 61
## 1st-4th 33 26 36
## 5th-6th 52 59 95
## 7th-8th 106 66 129
## 9th 44 72 102
## Assoc-acdm 25 34 51
## Assoc-voc 85 43 95
## Bachelors 113 79 99
## Doctorate 1 0 1
## HS-grad 573 943 1531
## Masters 14 5 12
## Preschool 17 5 12
## Prof-school 7 0 1
## Some-college 253 400 492
##
## Other-service Priv-house-serv Prof-specialty
## 10th 280 8 164
## 11th 368 18 215
## 12th 129 8 71
## 1st-4th 55 14 22
## 5th-6th 98 20 43
## 7th-8th 149 17 124
## 9th 142 16 73
## Assoc-acdm 110 3 279
## Assoc-voc 160 5 328
## Bachelors 259 12 2486
## Doctorate 2 1 468
## HS-grad 1936 91 1156
## Masters 35 1 1369
## Preschool 22 2 11
## Prof-school 7 0 691
## Some-college 1171 26 1481
##
## Protective-serv Sales Tech-support Transport-moving
## 10th 12 120 5 129
## 11th 18 232 9 134
## 12th 11 70 4 56
## 1st-4th 1 8 0 12
## 5th-6th 1 17 1 38
## 7th-8th 11 40 6 87
## 9th 9 47 3 61
## Assoc-acdm 50 209 116 36
## Assoc-voc 67 163 181 55
## Bachelors 147 1268 346 93
## Doctorate 1 16 8 2
## HS-grad 326 1580 270 1223
## Masters 20 206 61 14
## Preschool 0 2 0 2
## Prof-school 1 23 10 3
## Some-college 308 1503 426 410
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education y Relationship :
##
## Husband Not-in-family Other-relative Own-child Unmarried
## 10th 471 303 62 326 181
## 11th 479 332 90 651 216
## 12th 166 128 38 239 65
## 1st-4th 111 67 23 11 29
## 5th-6th 238 105 61 19 65
## 7th-8th 503 208 46 66 103
## 9th 305 160 44 108 104
## Assoc-acdm 581 483 30 198 201
## Assoc-voc 883 536 38 212 274
## Bachelors 3636 2503 132 774 516
## Doctorate 377 143 4 11 33
## HS-grad 6388 3841 607 2287 1937
## Masters 1339 825 19 82 212
## Preschool 23 38 6 7 6
## Prof-school 558 182 7 17 32
## Some-college 3658 2729 299 2573 1151
##
## Wife
## 10th 46
## 11th 44
## 12th 21
## 1st-4th 6
## 5th-6th 21
## 7th-8th 29
## 9th 35
## Assoc-acdm 108
## Assoc-voc 118
## Bachelors 464
## Doctorate 26
## HS-grad 724
## Masters 180
## Preschool 3
## Prof-school 38
## Some-college 468
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education y Race :
##
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other White
## 10th 22 16 182 11 1158
## 11th 26 27 252 22 1485
## 12th 5 15 105 17 515
## 1st-4th 4 10 24 13 196
## 5th-6th 2 28 41 23 415
## 7th-8th 10 14 90 23 818
## 9th 9 10 111 15 611
## Assoc-acdm 13 49 161 10 1368
## Assoc-voc 31 53 165 9 1803
## Bachelors 29 408 504 50 7034
## Doctorate 3 46 16 3 526
## HS-grad 176 336 1780 105 13387
## Masters 13 140 143 13 2348
## Preschool 1 7 12 2 61
## Prof-school 2 58 21 5 748
## Some-college 124 302 1078 85 9289
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education y Sex :
##
## Female Male
## 10th 457 932
## 11th 650 1162
## 12th 211 446
## 1st-4th 61 186
## 5th-6th 127 382
## 7th-8th 239 716
## 9th 220 536
## Assoc-acdm 627 974
## Assoc-voc 734 1327
## Bachelors 2477 5548
## Doctorate 113 481
## HS-grad 5097 10687
## Masters 845 1812
## Preschool 24 59
## Prof-school 132 702
## Some-college 4178 6700
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education y Native-country :
##
## Cambodia Canada China Columbia Cuba Dominican-Republic
## 10th 0 2 3 4 4 4
## 11th 0 5 0 2 2 4
## 12th 1 3 1 3 5 8
## 1st-4th 1 0 1 2 5 10
## 5th-6th 0 1 2 2 8 5
## 7th-8th 1 5 5 2 7 12
## 9th 0 3 2 4 6 7
## Assoc-acdm 0 4 0 4 4 1
## Assoc-voc 2 11 0 5 6 2
## Bachelors 5 38 29 6 19 6
## Doctorate 0 11 13 1 2 0
## HS-grad 9 51 28 34 33 26
## Masters 0 11 24 1 9 3
## Preschool 1 0 2 0 0 2
## Prof-school 0 3 3 4 6 0
## Some-college 8 34 9 11 22 13
##
## Ecuador El-Salvador England France Germany Greece
## 10th 2 6 4 0 2 3
## 11th 1 7 2 0 5 1
## 12th 2 2 1 0 4 0
## 1st-4th 1 13 1 0 0 0
## 5th-6th 2 29 1 0 0 1
## 7th-8th 2 6 0 1 2 3
## 9th 0 15 1 0 0 0
## Assoc-acdm 0 4 6 3 11 0
## Assoc-voc 2 1 3 1 16 5
## Bachelors 5 8 34 10 50 5
## Doctorate 0 1 7 3 6 0
## HS-grad 14 34 37 6 47 17
## Masters 3 3 12 7 8 5
## Preschool 0 7 0 0 0 0
## Prof-school 0 1 3 1 5 1
## Some-college 11 18 15 6 50 8
##
## Guatemala Haiti Holand-Netherlands Honduras Hong Hungary
## 10th 6 3 0 1 0 0
## 11th 7 3 0 2 1 0
## 12th 3 2 0 0 0 0
## 1st-4th 9 1 0 1 0 0
## 5th-6th 12 3 0 3 1 1
## 7th-8th 11 2 0 0 1 0
## 9th 6 4 0 0 2 0
## Assoc-acdm 1 4 0 0 2 1
## Assoc-voc 2 0 0 0 1 1
## Bachelors 3 5 0 2 6 5
## Doctorate 0 0 0 0 2 0
## HS-grad 17 24 0 5 7 6
## Masters 0 2 0 0 5 2
## Preschool 2 4 0 0 1 0
## Prof-school 0 1 0 1 0 1
## Some-college 9 17 1 5 1 2
##
## India Iran Ireland Italy Jamaica Japan Laos Mexico
## 10th 2 0 0 3 2 1 0 42
## 11th 8 0 2 1 5 0 1 40
## 12th 0 0 0 7 2 0 0 23
## 1st-4th 0 0 0 4 0 0 0 103
## 5th-6th 0 0 0 12 1 1 3 216
## 7th-8th 0 0 2 7 1 0 1 79
## 9th 1 0 1 0 2 0 0 76
## Assoc-acdm 5 5 1 4 8 2 1 4
## Assoc-voc 3 2 2 2 3 2 2 13
## Bachelors 37 18 8 14 10 26 3 43
## Doctorate 10 5 0 1 2 2 0 1
## HS-grad 9 7 17 32 38 23 6 178
## Masters 36 10 1 6 4 11 0 9
## Preschool 1 0 0 0 1 0 1 31
## Prof-school 22 0 0 2 0 5 1 2
## Some-college 17 12 3 10 27 19 4 91
##
## Nicaragua Outlying-US(Guam-USVI-etc) Peru Philippines
## 10th 1 0 2 1
## 11th 3 1 3 9
## 12th 1 0 0 1
## 1st-4th 1 0 0 5
## 5th-6th 3 0 0 10
## 7th-8th 1 0 2 3
## 9th 1 2 0 3
## Assoc-acdm 1 1 2 11
## Assoc-voc 1 1 1 9
## Bachelors 3 5 8 105
## Doctorate 1 0 0 0
## HS-grad 15 4 16 53
## Masters 3 0 2 11
## Preschool 1 0 0 2
## Prof-school 0 0 0 14
## Some-college 13 9 10 58
##
## Poland Portugal Puerto-Rico Scotland South Taiwan
## 10th 1 6 4 0 0 0
## 11th 3 2 18 1 3 0
## 12th 0 1 3 0 2 0
## 1st-4th 1 9 5 0 0 0
## 5th-6th 0 1 7 1 0 0
## 7th-8th 5 11 14 0 1 0
## 9th 1 6 9 0 0 0
## Assoc-acdm 3 1 3 1 1 1
## Assoc-voc 7 2 3 0 3 0
## Bachelors 13 2 17 4 30 22
## Doctorate 1 0 0 0 2 11
## HS-grad 28 24 68 8 37 5
## Masters 8 1 1 2 9 16
## Preschool 0 0 1 0 1 0
## Prof-school 0 0 0 0 2 4
## Some-college 16 1 31 4 24 6
##
## Thailand Trinadad&Tobago United-States Vietnam Yugoslavia
## 10th 0 0 1276 3 1
## 11th 0 1 1666 3 0
## 12th 1 1 579 1 0
## 1st-4th 1 0 70 3 0
## 5th-6th 0 1 178 3 1
## 7th-8th 0 1 765 1 1
## 9th 0 3 599 1 1
## Assoc-acdm 3 1 1492 3 2
## Assoc-voc 2 0 1941 3 1
## Bachelors 6 0 7394 17 4
## Doctorate 1 0 510 1 0
## HS-grad 10 9 14768 25 9
## Masters 2 3 2427 0 0
## Preschool 0 0 25 0 0
## Prof-school 1 0 750 1 0
## Some-college 3 7 10249 21 3
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education-num y Marital-status :
##
## Divorced Married-AF-spouse Married-civ-spouse Married-spouse-absent
## 1 2 0 30 5
## 2 17 0 125 18
## 3 31 0 271 27
## 4 101 0 541 20
## 5 98 0 349 13
## 6 172 1 525 22
## 7 192 0 545 25
## 8 63 0 199 10
## 9 2416 15 7243 199
## 10 1560 9 4179 109
## 11 361 2 1013 19
## 12 280 2 697 21
## 13 843 6 4136 98
## 14 367 0 1527 24
## 15 74 1 596 5
## 16 56 1 403 13
##
## Never-married Separated Widowed
## 1 38 3 5
## 2 50 12 25
## 3 128 31 21
## 4 158 39 96
## 5 220 41 35
## 6 525 75 69
## 7 913 79 58
## 8 348 19 18
## 9 4671 607 633
## 10 4453 317 251
## 11 539 64 63
## 12 522 43 36
## 13 2681 136 125
## 14 635 44 60
## 15 139 9 10
## 16 97 11 13
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education-num y Occupation :
##
## Adm-clerical Armed-Forces Craft-repair Exec-managerial
## 1 3 0 6 1
## 2 6 0 28 6
## 3 8 0 71 6
## 4 20 0 172 28
## 5 20 0 144 23
## 6 59 0 239 42
## 7 100 0 270 51
## 8 52 1 92 18
## 9 2047 5 2911 1192
## 10 1858 4 1258 1288
## 11 269 1 375 234
## 12 281 0 167 240
## 13 765 1 332 2025
## 14 105 2 34 779
## 15 12 1 9 69
## 16 6 0 4 84
##
## Farming-fishing Handlers-cleaners Machine-op-inspct Other-service
## 1 17 5 12 22
## 2 33 26 36 55
## 3 52 59 95 98
## 4 106 66 129 149
## 5 44 72 102 142
## 6 71 108 152 280
## 7 67 177 153 368
## 8 29 55 61 129
## 9 573 943 1531 1936
## 10 253 400 492 1171
## 11 85 43 95 160
## 12 25 34 51 110
## 13 113 79 99 259
## 14 14 5 12 35
## 15 7 0 1 7
## 16 1 0 1 2
##
## Priv-house-serv Prof-specialty Protective-serv Sales Tech-support
## 1 2 11 0 2 0
## 2 14 22 1 8 0
## 3 20 43 1 17 1
## 4 17 124 11 40 6
## 5 16 73 9 47 3
## 6 8 164 12 120 5
## 7 18 215 18 232 9
## 8 8 71 11 70 4
## 9 91 1156 326 1580 270
## 10 26 1481 308 1503 426
## 11 5 328 67 163 181
## 12 3 279 50 209 116
## 13 12 2486 147 1268 346
## 14 1 1369 20 206 61
## 15 0 691 1 23 10
## 16 1 468 1 16 8
##
## Transport-moving
## 1 2
## 2 12
## 3 38
## 4 87
## 5 61
## 6 129
## 7 134
## 8 56
## 9 1223
## 10 410
## 11 55
## 12 36
## 13 93
## 14 14
## 15 3
## 16 2
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education-num y Relationship :
##
## Husband Not-in-family Other-relative Own-child Unmarried Wife
## 1 23 38 6 7 6 3
## 2 111 67 23 11 29 6
## 3 238 105 61 19 65 21
## 4 503 208 46 66 103 29
## 5 305 160 44 108 104 35
## 6 471 303 62 326 181 46
## 7 479 332 90 651 216 44
## 8 166 128 38 239 65 21
## 9 6388 3841 607 2287 1937 724
## 10 3658 2729 299 2573 1151 468
## 11 883 536 38 212 274 118
## 12 581 483 30 198 201 108
## 13 3636 2503 132 774 516 464
## 14 1339 825 19 82 212 180
## 15 558 182 7 17 32 38
## 16 377 143 4 11 33 26
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education-num y Race :
##
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other White
## 1 1 7 12 2 61
## 2 4 10 24 13 196
## 3 2 28 41 23 415
## 4 10 14 90 23 818
## 5 9 10 111 15 611
## 6 22 16 182 11 1158
## 7 26 27 252 22 1485
## 8 5 15 105 17 515
## 9 176 336 1780 105 13387
## 10 124 302 1078 85 9289
## 11 31 53 165 9 1803
## 12 13 49 161 10 1368
## 13 29 408 504 50 7034
## 14 13 140 143 13 2348
## 15 2 58 21 5 748
## 16 3 46 16 3 526
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education-num y Sex :
##
## Female Male
## 1 24 59
## 2 61 186
## 3 127 382
## 4 239 716
## 5 220 536
## 6 457 932
## 7 650 1162
## 8 211 446
## 9 5097 10687
## 10 4178 6700
## 11 734 1327
## 12 627 974
## 13 2477 5548
## 14 845 1812
## 15 132 702
## 16 113 481
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Education-num y Native-country :
##
## Cambodia Canada China Columbia Cuba Dominican-Republic Ecuador
## 1 1 0 2 0 0 2 0
## 2 1 0 1 2 5 10 1
## 3 0 1 2 2 8 5 2
## 4 1 5 5 2 7 12 2
## 5 0 3 2 4 6 7 0
## 6 0 2 3 4 4 4 2
## 7 0 5 0 2 2 4 1
## 8 1 3 1 3 5 8 2
## 9 9 51 28 34 33 26 14
## 10 8 34 9 11 22 13 11
## 11 2 11 0 5 6 2 2
## 12 0 4 0 4 4 1 0
## 13 5 38 29 6 19 6 5
## 14 0 11 24 1 9 3 3
## 15 0 3 3 4 6 0 0
## 16 0 11 13 1 2 0 0
##
## El-Salvador England France Germany Greece Guatemala Haiti
## 1 7 0 0 0 0 2 4
## 2 13 1 0 0 0 9 1
## 3 29 1 0 0 1 12 3
## 4 6 0 1 2 3 11 2
## 5 15 1 0 0 0 6 4
## 6 6 4 0 2 3 6 3
## 7 7 2 0 5 1 7 3
## 8 2 1 0 4 0 3 2
## 9 34 37 6 47 17 17 24
## 10 18 15 6 50 8 9 17
## 11 1 3 1 16 5 2 0
## 12 4 6 3 11 0 1 4
## 13 8 34 10 50 5 3 5
## 14 3 12 7 8 5 0 2
## 15 1 3 1 5 1 0 1
## 16 1 7 3 6 0 0 0
##
## Holand-Netherlands Honduras Hong Hungary India Iran Ireland Italy
## 1 0 0 1 0 1 0 0 0
## 2 0 1 0 0 0 0 0 4
## 3 0 3 1 1 0 0 0 12
## 4 0 0 1 0 0 0 2 7
## 5 0 0 2 0 1 0 1 0
## 6 0 1 0 0 2 0 0 3
## 7 0 2 1 0 8 0 2 1
## 8 0 0 0 0 0 0 0 7
## 9 0 5 7 6 9 7 17 32
## 10 1 5 1 2 17 12 3 10
## 11 0 0 1 1 3 2 2 2
## 12 0 0 2 1 5 5 1 4
## 13 0 2 6 5 37 18 8 14
## 14 0 0 5 2 36 10 1 6
## 15 0 1 0 1 22 0 0 2
## 16 0 0 2 0 10 5 0 1
##
## Jamaica Japan Laos Mexico Nicaragua Outlying-US(Guam-USVI-etc) Peru
## 1 1 0 1 31 1 0 0
## 2 0 0 0 103 1 0 0
## 3 1 1 3 216 3 0 0
## 4 1 0 1 79 1 0 2
## 5 2 0 0 76 1 2 0
## 6 2 1 0 42 1 0 2
## 7 5 0 1 40 3 1 3
## 8 2 0 0 23 1 0 0
## 9 38 23 6 178 15 4 16
## 10 27 19 4 91 13 9 10
## 11 3 2 2 13 1 1 1
## 12 8 2 1 4 1 1 2
## 13 10 26 3 43 3 5 8
## 14 4 11 0 9 3 0 2
## 15 0 5 1 2 0 0 0
## 16 2 2 0 1 1 0 0
##
## Philippines Poland Portugal Puerto-Rico Scotland South Taiwan
## 1 2 0 0 1 0 1 0
## 2 5 1 9 5 0 0 0
## 3 10 0 1 7 1 0 0
## 4 3 5 11 14 0 1 0
## 5 3 1 6 9 0 0 0
## 6 1 1 6 4 0 0 0
## 7 9 3 2 18 1 3 0
## 8 1 0 1 3 0 2 0
## 9 53 28 24 68 8 37 5
## 10 58 16 1 31 4 24 6
## 11 9 7 2 3 0 3 0
## 12 11 3 1 3 1 1 1
## 13 105 13 2 17 4 30 22
## 14 11 8 1 1 2 9 16
## 15 14 0 0 0 0 2 4
## 16 0 1 0 0 0 2 11
##
## Thailand Trinadad&Tobago United-States Vietnam Yugoslavia
## 1 0 0 25 0 0
## 2 1 0 70 3 0
## 3 0 1 178 3 1
## 4 0 1 765 1 1
## 5 0 3 599 1 1
## 6 0 0 1276 3 1
## 7 0 1 1666 3 0
## 8 1 1 579 1 0
## 9 10 9 14768 25 9
## 10 3 7 10249 21 3
## 11 2 0 1941 3 1
## 12 3 1 1492 3 2
## 13 6 0 7394 17 4
## 14 2 3 2427 0 0
## 15 1 0 750 1 0
## 16 1 0 510 1 0
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Marital-status y Occupation :
##
## Adm-clerical Armed-Forces Craft-repair
## Divorced 1192 0 679
## Married-AF-spouse 6 0 4
## Married-civ-spouse 1495 7 3818
## Married-spouse-absent 84 0 77
## Never-married 2360 8 1301
## Separated 224 0 160
## Widowed 250 0 73
##
## Exec-managerial Farming-fishing Handlers-cleaners
## Divorced 890 90 197
## Married-AF-spouse 3 1 1
## Married-civ-spouse 3600 869 724
## Married-spouse-absent 52 35 32
## Never-married 1260 434 1029
## Separated 126 23 63
## Widowed 155 38 26
##
## Machine-op-inspct Other-service Priv-house-serv
## Divorced 434 762 46
## Married-AF-spouse 1 5 0
## Married-civ-spouse 1469 1088 27
## Married-spouse-absent 37 92 9
## Never-married 872 2442 99
## Separated 123 275 21
## Widowed 86 259 40
##
## Prof-specialty Protective-serv Sales Tech-support
## Divorced 1065 121 664 239
## Married-AF-spouse 9 1 5 0
## Married-civ-spouse 4110 583 2491 609
## Married-spouse-absent 109 7 55 9
## Never-married 3091 237 1992 506
## Separated 242 23 146 48
## Widowed 355 11 151 35
##
## Transport-moving
## Divorced 254
## Married-AF-spouse 1
## Married-civ-spouse 1489
## Married-spouse-absent 30
## Never-married 486
## Separated 56
## Widowed 39
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Marital-status y Relationship :
##
## Husband Not-in-family Other-relative Own-child
## Divorced 0 3628 181 455
## Married-AF-spouse 12 0 1 1
## Married-civ-spouse 19704 23 201 143
## Married-spouse-absent 0 330 54 61
## Never-married 0 7114 920 6750
## Separated 0 637 79 146
## Widowed 0 851 70 25
##
## Unmarried Wife
## Divorced 2369 0
## Married-AF-spouse 0 23
## Married-civ-spouse 0 2308
## Married-spouse-absent 183 0
## Never-married 1333 0
## Separated 668 0
## Widowed 572 0
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Marital-status y Race :
##
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other
## Divorced 90 108 709 42
## Married-AF-spouse 0 1 3 0
## Married-civ-spouse 168 737 1263 157
## Married-spouse-absent 12 64 89 17
## Never-married 163 544 2032 160
## Separated 17 26 396 21
## Widowed 20 39 193 9
##
## White
## Divorced 5684
## Married-AF-spouse 33
## Married-civ-spouse 20054
## Married-spouse-absent 446
## Never-married 13218
## Separated 1070
## Widowed 1257
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Marital-status y Sex :
##
## Female Male
## Divorced 4001 2632
## Married-AF-spouse 25 12
## Married-civ-spouse 2480 19899
## Married-spouse-absent 304 324
## Never-married 7218 8899
## Separated 931 599
## Widowed 1233 285
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Marital-status y Native-country :
##
## Cambodia Canada China Columbia Cuba
## Divorced 1 27 6 9 22
## Married-AF-spouse 0 0 0 0 0
## Married-civ-spouse 16 93 81 33 73
## Married-spouse-absent 1 4 8 6 3
## Never-married 9 47 21 27 26
## Separated 0 5 3 7 9
## Widowed 1 6 3 3 5
##
## Dominican-Republic Ecuador El-Salvador England
## Divorced 13 5 6 22
## Married-AF-spouse 0 0 0 0
## Married-civ-spouse 36 22 52 54
## Married-spouse-absent 11 3 8 1
## Never-married 36 11 76 41
## Separated 6 3 9 2
## Widowed 1 1 4 7
##
## France Germany Greece Guatemala Haiti
## Divorced 8 35 3 4 7
## Married-AF-spouse 0 0 0 0 0
## Married-civ-spouse 16 94 35 22 30
## Married-spouse-absent 2 4 0 7 7
## Never-married 11 58 9 45 26
## Separated 0 7 0 6 3
## Widowed 1 8 2 4 2
##
## Holand-Netherlands Honduras Hong Hungary India
## Divorced 0 5 0 2 4
## Married-AF-spouse 0 0 0 0 0
## Married-civ-spouse 0 4 19 9 89
## Married-spouse-absent 0 2 0 1 15
## Never-married 1 6 10 5 38
## Separated 0 3 1 0 2
## Widowed 0 0 0 2 3
##
## Iran Ireland Italy Jamaica Japan Laos Mexico
## Divorced 7 3 5 9 13 0 42
## Married-AF-spouse 0 0 0 0 0 0 0
## Married-civ-spouse 33 14 73 32 48 13 461
## Married-spouse-absent 1 2 2 8 2 2 55
## Never-married 14 17 16 48 26 8 330
## Separated 2 1 3 8 2 0 45
## Widowed 2 0 6 1 1 0 18
##
## Nicaragua Outlying-US(Guam-USVI-etc) Peru
## Divorced 4 5 5
## Married-AF-spouse 0 0 0
## Married-civ-spouse 18 5 17
## Married-spouse-absent 0 0 3
## Never-married 20 11 17
## Separated 4 1 4
## Widowed 3 1 0
##
## Philippines Poland Portugal Puerto-Rico Scotland
## Divorced 21 4 7 20 3
## Married-AF-spouse 1 0 0 0 0
## Married-civ-spouse 146 44 41 68 10
## Married-spouse-absent 13 10 0 9 0
## Never-married 96 21 13 59 6
## Separated 10 2 3 18 2
## Widowed 8 6 3 10 0
##
## South Taiwan Thailand Trinadad&Tobago
## Divorced 11 1 3 2
## Married-AF-spouse 0 0 0 0
## Married-civ-spouse 54 36 13 14
## Married-spouse-absent 4 3 2 0
## Never-married 39 25 10 10
## Separated 2 0 0 1
## Widowed 5 0 2 0
##
## United-States Vietnam Yugoslavia
## Divorced 6280 7 2
## Married-AF-spouse 36 0 0
## Married-civ-spouse 20411 35 15
## Married-spouse-absent 426 3 0
## Never-married 14782 41 5
## Separated 1356 0 0
## Widowed 1398 0 1
## chi cuadrado p-valor: 0.0009995002
##
## Tabla de contingencia entre Occupation y Relationship :
##
## Husband Not-in-family Other-relative Own-child
## Adm-clerical 932 1730 211 1144
## Armed-Forces 6 5 2 2
## Craft-repair 3731 1217 148 587
## Exec-managerial 3231 1564 74 349
## Farming-fishing 841 277 52 210
## Handlers-cleaners 673 458 138 618
## Machine-op-inspct 1323 720 120 401
## Other-service 772 1465 279 1305
## Priv-house-serv 2 97 30 35
## Prof-specialty 3404 2575 185 1344
## Protective-serv 563 220 22 113
## Sales 2259 1344 166 1058
## Tech-support 529 447 36 181
## Transport-moving 1450 464 43 234
##
## Unmarried Wife
## Adm-clerical 1072 522
## Armed-Forces 0 0
## Craft-repair 381 48
## Exec-managerial 523 345
## Farming-fishing 92 18
## Handlers-cleaners 156 29
## Machine-op-inspct 340 118
## Other-service 832 270
## Priv-house-serv 61 17
## Prof-specialty 812 661
## Protective-serv 55 10
## Sales 481 196
## Tech-support 180 73
## Transport-moving 140 24
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Occupation y Race :
##
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other
## Adm-clerical 54 198 738 43
## Armed-Forces 1 0 1 0
## Craft-repair 61 118 371 58
## Exec-managerial 48 193 351 20
## Farming-fishing 14 23 56 14
## Handlers-cleaners 34 39 262 18
## Machine-op-inspct 28 85 429 47
## Other-service 61 184 824 51
## Priv-house-serv 1 5 51 4
## Prof-specialty 83 385 700 76
## Protective-serv 13 21 149 8
## Sales 36 167 353 38
## Tech-support 8 62 132 9
## Transport-moving 28 39 268 20
##
## White
## Adm-clerical 4578
## Armed-Forces 13
## Craft-repair 5504
## Exec-managerial 5474
## Farming-fishing 1383
## Handlers-cleaners 1719
## Machine-op-inspct 2433
## Other-service 3803
## Priv-house-serv 181
## Prof-specialty 7737
## Protective-serv 792
## Sales 4910
## Tech-support 1235
## Transport-moving 2000
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Occupation y Sex :
##
## Female Male
## Adm-clerical 3769 1842
## Armed-Forces 0 15
## Craft-repair 323 5789
## Exec-managerial 1748 4338
## Farming-fishing 95 1395
## Handlers-cleaners 254 1818
## Machine-op-inspct 804 2218
## Other-service 2698 2225
## Priv-house-serv 228 14
## Prof-specialty 3515 5466
## Protective-serv 122 861
## Sales 1947 3557
## Tech-support 562 884
## Transport-moving 127 2228
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Occupation y Native-country :
##
## Cambodia Canada China Columbia Cuba
## Adm-clerical 0 14 4 9 19
## Armed-Forces 0 0 0 0 0
## Craft-repair 9 23 5 13 8
## Exec-managerial 1 20 16 6 20
## Farming-fishing 1 3 0 1 3
## Handlers-cleaners 0 2 0 5 5
## Machine-op-inspct 5 8 10 14 9
## Other-service 1 20 26 11 17
## Priv-house-serv 0 0 0 3 2
## Prof-specialty 6 60 49 11 22
## Protective-serv 0 2 0 0 7
## Sales 4 17 6 4 15
## Tech-support 0 4 4 3 1
## Transport-moving 1 9 2 5 10
##
## Dominican-Republic Ecuador El-Salvador England France
## Adm-clerical 9 4 4 9 3
## Armed-Forces 0 0 0 0 0
## Craft-repair 11 8 23 12 1
## Exec-managerial 4 4 2 34 9
## Farming-fishing 0 0 3 1 1
## Handlers-cleaners 7 2 11 2 0
## Machine-op-inspct 27 9 7 3 0
## Other-service 19 4 56 9 2
## Priv-house-serv 1 1 11 4 2
## Prof-specialty 9 6 15 35 13
## Protective-serv 1 0 1 3 1
## Sales 10 3 12 9 2
## Tech-support 0 1 2 5 4
## Transport-moving 5 3 8 1 0
##
## Germany Greece Guatemala Haiti Holand-Netherlands
## Adm-clerical 32 2 4 7 0
## Armed-Forces 0 0 0 0 0
## Craft-repair 23 8 15 6 0
## Exec-managerial 29 18 2 0 0
## Farming-fishing 3 0 5 2 0
## Handlers-cleaners 5 1 11 3 0
## Machine-op-inspct 8 2 12 7 1
## Other-service 12 6 10 26 0
## Priv-house-serv 1 0 14 1 0
## Prof-specialty 53 3 4 12 0
## Protective-serv 7 0 1 1 0
## Sales 20 6 6 3 0
## Tech-support 8 1 2 1 0
## Transport-moving 5 2 2 6 0
##
## Honduras Hong Hungary India Iran Ireland Italy
## Adm-clerical 2 4 1 18 3 2 13
## Armed-Forces 0 0 0 0 0 0 0
## Craft-repair 3 5 5 10 5 10 19
## Exec-managerial 1 4 4 16 12 4 8
## Farming-fishing 0 1 0 0 0 1 2
## Handlers-cleaners 2 0 0 3 0 2 7
## Machine-op-inspct 2 2 0 5 4 4 11
## Other-service 2 2 1 4 6 4 14
## Priv-house-serv 1 0 1 0 0 0 0
## Prof-specialty 3 8 5 61 18 8 20
## Protective-serv 1 1 0 4 0 0 0
## Sales 3 2 0 21 8 2 7
## Tech-support 0 1 1 7 1 0 1
## Transport-moving 0 0 1 2 2 0 3
##
## Jamaica Japan Laos Mexico Nicaragua
## Adm-clerical 21 13 4 53 10
## Armed-Forces 0 0 0 0 0
## Craft-repair 7 7 5 149 6
## Exec-managerial 12 23 1 26 0
## Farming-fishing 0 1 0 124 0
## Handlers-cleaners 4 2 0 108 5
## Machine-op-inspct 2 2 8 141 7
## Other-service 25 13 1 160 8
## Priv-house-serv 2 0 0 23 3
## Prof-specialty 14 19 2 72 3
## Protective-serv 1 1 0 5 0
## Sales 8 9 2 53 2
## Tech-support 5 2 0 7 3
## Transport-moving 5 0 0 30 2
##
## Outlying-US(Guam-USVI-etc) Peru Philippines Poland
## Adm-clerical 3 1 56 3
## Armed-Forces 0 0 0 0
## Craft-repair 2 3 22 22
## Exec-managerial 4 2 18 6
## Farming-fishing 0 1 6 1
## Handlers-cleaners 3 4 15 8
## Machine-op-inspct 1 6 20 10
## Other-service 4 11 52 8
## Priv-house-serv 0 0 2 2
## Prof-specialty 3 7 58 15
## Protective-serv 0 2 5 1
## Sales 2 6 22 6
## Tech-support 0 1 15 2
## Transport-moving 1 2 4 3
##
## Portugal Puerto-Rico Scotland South Taiwan Thailand
## Adm-clerical 6 29 3 5 3 3
## Armed-Forces 0 0 0 0 0 0
## Craft-repair 18 21 0 11 2 2
## Exec-managerial 3 14 4 19 13 6
## Farming-fishing 3 8 0 0 0 1
## Handlers-cleaners 6 7 0 3 0 0
## Machine-op-inspct 15 22 2 4 0 1
## Other-service 6 33 5 15 2 8
## Priv-house-serv 0 2 0 0 0 1
## Prof-specialty 5 21 4 28 36 5
## Protective-serv 1 4 2 0 0 1
## Sales 3 10 1 29 7 1
## Tech-support 0 2 0 0 2 0
## Transport-moving 1 11 0 1 0 1
##
## Trinadad&Tobago United-States Vietnam Yugoslavia
## Adm-clerical 6 5211 17 1
## Armed-Forces 0 15 0 0
## Craft-repair 2 5595 13 3
## Exec-managerial 1 5708 4 8
## Farming-fishing 0 1315 2 1
## Handlers-cleaners 0 1834 5 0
## Machine-op-inspct 4 2613 11 3
## Other-service 6 4298 11 5
## Priv-house-serv 0 164 0 1
## Prof-specialty 3 8256 9 0
## Protective-serv 1 929 0 0
## Sales 1 5174 8 0
## Tech-support 2 1354 4 0
## Transport-moving 1 2223 2 1
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Relationship y Race :
##
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other White
## Husband 135 589 1011 123 17858
## Not-in-family 125 325 1223 105 10805
## Other-relative 22 123 242 41 1078
## Own-child 67 249 839 57 6369
## Unmarried 93 130 1144 57 3701
## Wife 28 103 226 23 1951
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Relationship y Sex :
##
## Female Male
## Husband 1 19715
## Not-in-family 5870 6713
## Other-relative 689 817
## Own-child 3376 4205
## Unmarried 3928 1197
## Wife 2328 3
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Relationship y Native-country :
##
## Cambodia Canada China Columbia Cuba Dominican-Republic
## Husband 11 77 63 27 64 28
## Not-in-family 7 55 24 23 28 24
## Other-relative 5 1 11 7 6 10
## Own-child 2 16 7 8 18 15
## Unmarried 2 18 6 17 15 20
## Wife 1 15 11 3 7 6
##
## Ecuador El-Salvador England France Germany Greece
## Husband 17 41 44 15 82 31
## Not-in-family 9 29 47 14 54 8
## Other-relative 10 29 1 0 3 2
## Own-child 2 26 16 3 26 4
## Unmarried 6 23 10 5 29 2
## Wife 1 7 9 1 12 2
##
## Guatemala Haiti Holand-Netherlands Honduras Hong
## Husband 21 24 0 1 12
## Not-in-family 29 12 0 4 7
## Other-relative 17 10 1 3 3
## Own-child 9 8 0 1 0
## Unmarried 11 16 0 8 1
## Wife 1 5 0 3 7
##
## Hungary India Iran Ireland Italy Jamaica Japan Laos
## Husband 8 82 29 13 63 27 41 9
## Not-in-family 9 34 12 19 18 30 29 4
## Other-relative 1 10 1 2 3 11 1 1
## Own-child 0 14 7 1 9 13 7 3
## Unmarried 0 8 6 2 4 21 8 2
## Wife 1 3 4 0 8 4 6 4
##
## Mexico Nicaragua Outlying-US(Guam-USVI-etc) Peru
## Husband 400 13 4 14
## Not-in-family 183 4 12 7
## Other-relative 133 10 1 3
## Own-child 84 9 2 10
## Unmarried 123 10 3 9
## Wife 28 3 1 3
##
## Philippines Poland Portugal Puerto-Rico Scotland South
## Husband 107 38 36 52 7 42
## Not-in-family 52 24 8 48 5 18
## Other-relative 28 5 1 12 1 12
## Own-child 53 8 10 16 1 21
## Unmarried 29 8 7 40 4 14
## Wife 26 4 5 16 3 8
##
## Taiwan Thailand Trinadad&Tobago United-States Vietnam
## Husband 30 10 11 18078 30
## Not-in-family 14 8 6 11646 15
## Other-relative 2 2 2 1133 12
## Own-child 10 3 1 7122 13
## Unmarried 4 5 4 4611 13
## Wife 5 2 3 2099 3
##
## Yugoslavia
## Husband 14
## Not-in-family 4
## Other-relative 0
## Own-child 3
## Unmarried 1
## Wife 1
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Race y Sex :
##
## Female Male
## Amer-Indian-Eskimo 185 285
## Asian-Pac-Islander 517 1002
## Black 2308 2377
## Other 155 251
## White 13027 28735
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Race y Native-country :
##
## Cambodia Canada China Columbia Cuba
## Amer-Indian-Eskimo 0 0 0 1 0
## Asian-Pac-Islander 24 1 119 0 0
## Black 2 0 0 0 5
## Other 0 2 0 9 3
## White 2 179 3 75 130
##
## Dominican-Republic Ecuador El-Salvador England
## Amer-Indian-Eskimo 0 0 0 0
## Asian-Pac-Islander 1 0 0 2
## Black 18 1 1 11
## Other 21 12 8 1
## White 63 32 146 113
##
## France Germany Greece Guatemala Haiti
## Amer-Indian-Eskimo 0 1 0 0 0
## Asian-Pac-Islander 1 4 1 1 1
## Black 1 11 0 0 72
## Other 1 1 0 8 0
## White 35 189 48 79 2
##
## Holand-Netherlands Honduras Hong Hungary India Iran
## Amer-Indian-Eskimo 0 0 1 0 0 0
## Asian-Pac-Islander 0 0 25 0 124 9
## Black 0 3 0 0 2 0
## Other 0 1 0 0 9 6
## White 1 16 4 19 16 44
##
## Ireland Italy Jamaica Japan Laos Mexico Nicaragua
## Amer-Indian-Eskimo 0 0 0 0 0 11 0
## Asian-Pac-Islander 1 0 0 53 23 1 1
## Black 0 0 98 4 0 5 3
## Other 0 0 3 3 0 63 4
## White 36 105 5 32 0 871 41
##
## Outlying-US(Guam-USVI-etc) Peru Philippines Poland
## Amer-Indian-Eskimo 0 0 1 0
## Asian-Pac-Islander 3 0 279 2
## Black 8 0 2 0
## Other 0 4 0 0
## White 12 42 13 85
##
## Portugal Puerto-Rico Scotland South Taiwan Thailand
## Amer-Indian-Eskimo 0 1 0 2 0 0
## Asian-Pac-Islander 1 1 0 110 61 25
## Black 0 13 1 1 0 1
## Other 1 30 0 1 1 0
## White 65 139 20 1 3 4
##
## Trinadad&Tobago United-States Vietnam Yugoslavia
## Amer-Indian-Eskimo 0 452 0 0
## Asian-Pac-Islander 4 557 84 0
## Black 21 4401 0 0
## Other 1 213 0 0
## White 1 39066 2 23
## chi cuadrado p-valor: 0.0004997501
##
## Tabla de contingencia entre Sex y Native-country :
##
## Cambodia Canada China Columbia Cuba Dominican-Republic Ecuador
## Female 6 63 33 32 50 48 16
## Male 22 119 89 53 88 55 29
##
## El-Salvador England France Germany Greece Guatemala Haiti
## Female 54 45 14 87 9 26 31
## Male 101 82 24 119 40 62 44
##
## Holand-Netherlands Honduras Hong Hungary India Iran Ireland
## Female 1 11 11 7 18 12 9
## Male 0 9 19 12 133 47 28
##
## Italy Jamaica Japan Laos Mexico Nicaragua
## Female 28 58 31 9 215 22
## Male 77 48 61 14 736 27
##
## Outlying-US(Guam-USVI-etc) Peru Philippines Poland Portugal
## Female 9 18 115 24 14
## Male 14 28 180 63 53
##
## Puerto-Rico Scotland South Taiwan Thailand Trinadad&Tobago
## Female 75 8 44 19 14 14
## Male 109 13 71 46 16 13
##
## United-States Vietnam Yugoslavia
## Female 14857 30 5
## Male 29832 56 18
## chi cuadrado p-valor: 0.0004997501
Podemos observar que los p-valores observados son menores a 0.05 por lo tanto existen diferencias significativas entre las variables categoricas y pueden estar relacionadas entre si.