#1. Cargar la base de datos
## Warning: package 'carData' was built under R version 4.3.3
#2. Descripción de la base de datos
## starting httpd help server ... done
Chilean Plebiscite
Description The Chile data frame has 2700 rows and 8 columns. The data are from a national survey conducted in April and May of 1988 by FLACSO/Chile. There are some missing data.
Usage Chile
Format This data frame contains the following columns:
region A factor with levels: C, Central; M, Metropolitan Santiago area; N, North; S, South; SA, city of Santiago.
population Population size of respondent’s community.
sex A factor with levels: F, female; M, male.
age in years.
education A factor with levels (note: out of order): P, Primary; PS, Post-secondary; S, Secondary.
income Monthly income, in Pesos.
statusquo Scale of support for the status-quo.
vote a factor with levels: A, will abstain; N, will vote no (against Pinochet); U, undecided; Y, will vote yes (for Pinochet).
Source Personal communication from FLACSO/Chile.
References Fox, J. (2016) Applied Regression Analysis and Generalized Linear Models, Third Edition. Sage.
Fox, J. and Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.
[Package carData version 3.0-5 Index]
#3. Análisis de bases de datos
Para obtener un resumen de estadísticas descriptivas y frecuencias:
summary(df)
: Proporciona un resumen
estadístico básico de cada columna del dataframe.
Desc(df)
: Ofrece un resumen
detallado y gráficos para cada variable en el dataframe.
dfSummary(df)
: Resumen detallado
con estadísticas y gráficos usando el paquete
summarytools
.
## region population sex age education
## C :600 Min. : 3750 F:1379 Min. :18.00 P :1107
## M :100 1st Qu.: 25000 M:1321 1st Qu.:26.00 PS : 462
## N :322 Median :175000 Median :36.00 S :1120
## S :718 Mean :152222 Mean :38.55 NA's: 11
## SA:960 3rd Qu.:250000 3rd Qu.:49.00
## Max. :250000 Max. :70.00
## NA's :1
## income statusquo vote
## Min. : 2500 Min. :-1.80301 A :187
## 1st Qu.: 7500 1st Qu.:-1.00223 N :889
## Median : 15000 Median :-0.04558 U :588
## Mean : 33876 Mean : 0.00000 Y :868
## 3rd Qu.: 35000 3rd Qu.: 0.96857 NA's:168
## Max. :200000 Max. : 2.04859
## NA's :98 NA's :17
## Warning: package 'DescTools' was built under R version 4.3.3
## ──────────────────────────────────────────────────────────────────────────────
## Describe bd (data.frame):
##
## data frame: 2700 obs. of 8 variables
## 2431 complete cases (90.0%)
##
## Nr Class ColName NAs Levels
## 1 fac region . (5): 1-C, 2-M, 3-N, 4-S, 5-SA
## 2 int population .
## 3 fac sex . (2): 1-F, 2-M
## 4 int age 1 (0.0%)
## 5 fac education 11 (0.4%) (3): 1-P, 2-PS, 3-S
## 6 int income 98 (3.6%)
## 7 num statusquo 17 (0.6%)
## 8 fac vote 168 (6.2%) (4): 1-A, 2-N, 3-U, 4-Y
##
##
## ──────────────────────────────────────────────────────────────────────────────
## 1 - region (factor)
##
## length n NAs unique levels dupes
## 2'700 2'700 0 5 5 y
## 100.0% 0.0%
##
## level freq perc cumfreq cumperc
## 1 SA 960 35.6% 960 35.6%
## 2 S 718 26.6% 1'678 62.1%
## 3 C 600 22.2% 2'278 84.4%
## 4 N 322 11.9% 2'600 96.3%
## 5 M 100 3.7% 2'700 100.0%
## ──────────────────────────────────────────────────────────────────────────────
## 2 - population (integer)
##
## length n NAs unique 0s mean'
## 2'700 2'700 0 10 0 152'222.22
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 15'000.00 15'000.00 25'000.00 175'000.00 250'000.00 250'000.00
##
## range sd vcoef mad IQR skew
## 246'250.00 102'198.04 0.67 111'195.00 225'000.00 -0.27
##
## meanCI
## 148'365.63
## 156'078.81
##
## .95
## 250'000.00
##
## kurt
## -1.72
##
##
## value freq perc cumfreq cumperc
## 1 3750 20 0.7% 20 0.7%
## 2 8750 60 2.2% 80 3.0%
## 3 15000 300 11.1% 380 14.1%
## 4 25000 360 13.3% 740 27.4%
## 5 45000 120 4.4% 860 31.9%
## 6 62500 80 3.0% 940 34.8%
## 7 87500 80 3.0% 1'020 37.8%
## 8 125000 240 8.9% 1'260 46.7%
## 9 175000 140 5.2% 1'400 51.9%
## 10 250000 1'300 48.1% 2'700 100.0%
##
## ' 95%-CI (classic)
## ──────────────────────────────────────────────────────────────────────────────
## 3 - sex (factor - dichotomous)
##
## length n NAs unique
## 2'700 2'700 0 2
## 100.0% 0.0%
##
## freq perc lci.95 uci.95'
## F 1'379 51.1% 49.2% 53.0%
## M 1'321 48.9% 47.0% 50.8%
##
## ' 95%-CI (Wilson)
## ──────────────────────────────────────────────────────────────────────────────
## 4 - age (integer)
##
## length n NAs unique 0s mean meanCI'
## 2'700 2'699 1 53 0 38.55 37.99
## 100.0% 0.0% 0.0% 39.11
##
## .05 .10 .25 median .75 .90 .95
## 19.00 21.00 26.00 36.00 49.00 61.00 66.00
##
## range sd vcoef mad IQR skew kurt
## 52.00 14.76 0.38 16.31 23.00 0.47 -0.86
##
## lowest : 18 (90), 19 (76), 20 (78), 21 (96), 22 (92)
## highest: 66 (24), 67 (24), 68 (25), 69 (16), 70 (56)
##
## ' 95%-CI (classic)
## ──────────────────────────────────────────────────────────────────────────────
## 5 - education (factor)
##
## length n NAs unique levels dupes
## 2'700 2'689 11 3 3 y
## 99.6% 0.4%
##
## level freq perc cumfreq cumperc
## 1 S 1'120 41.7% 1'120 41.7%
## 2 P 1'107 41.2% 2'227 82.8%
## 3 PS 462 17.2% 2'689 100.0%
## ──────────────────────────────────────────────────────────────────────────────
## 6 - income (integer)
##
## length n NAs unique 0s mean meanCI'
## 2'700 2'602 98 7 0 33'875.86 32'357.33
## 96.4% 3.6% 0.0% 35'394.40
##
## .05 .10 .25 median .75 .90 .95
## 2'500.00 7'500.00 7'500.00 15'000.00 35'000.00 75'000.00 125'000.00
##
## range sd vcoef mad IQR skew kurt
## 197'500.00 39'502.87 1.17 18'532.50 27'500.00 2.58 7.29
##
##
## value freq perc cumfreq cumperc
## 1 2500 160 6.1% 160 6.1%
## 2 7500 494 19.0% 654 25.1%
## 3 15000 768 29.5% 1'422 54.7%
## 4 35000 747 28.7% 2'169 83.4%
## 5 75000 269 10.3% 2'438 93.7%
## 6 125000 88 3.4% 2'526 97.1%
## 7 200000 76 2.9% 2'602 100.0%
##
## ' 95%-CI (classic)
## ──────────────────────────────────────────────────────────────────────────────
## 7 - statusquo (numeric)
##
## length n NAs unique 0s mean'
## 2'700 2'683 17 2'092 0 -1.118151e-08
## 99.4% 0.6% 0.0%
##
## .05 .10 .25 median .75 .90
## -1.296170 -1.257950 -1.002235 -0.045580 0.968575 1.403610
##
## range sd vcoef mad IQR skew
## 3.851600 1.000186 -8.945001e+07 1.453126 1.970810 0.161683
##
## meanCI
## -0.037863
## 0.037863
##
## .95
## 1.511120
##
## kurt
## -1.454072
##
## lowest : -1.803010, -1.744010, -1.725940, -1.481440, -1.343920
## highest: 1.68819, 1.69876, 1.71355, 2.02141, 2.04859
##
## heap(?): remarkable frequency (7.5%) for the mode(s) (= -1.29617)
##
## ' 95%-CI (classic)
## ──────────────────────────────────────────────────────────────────────────────
## 8 - vote (factor)
##
## length n NAs unique levels dupes
## 2'700 2'532 168 4 4 y
## 93.8% 6.2%
##
## level freq perc cumfreq cumperc
## 1 N 889 35.1% 889 35.1%
## 2 Y 868 34.3% 1'757 69.4%
## 3 U 588 23.2% 2'345 92.6%
## 4 A 187 7.4% 2'532 100.0%
Desc(df)
: Ofrece un resumen detallado y
gráficos para cada variable en el dataframe.
## Warning: package 'summarytools' was built under R version 4.3.3
## Data Frame Summary
## bd
## Dimensions: 2700 x 8
## Duplicates: 9
##
## ----------------------------------------------------------------------------------------------------
## Variable Stats / Values Freqs (% of Valid) Graph Missing
## ------------ ------------------------------- ----------------------- --------------------- ---------
## region 1. C 600 (22.2%) IIII 0
## [factor] 2. M 100 ( 3.7%) (0.0%)
## 3. N 322 (11.9%) II
## 4. S 718 (26.6%) IIIII
## 5. SA 960 (35.6%) IIIIIII
##
## population Mean (sd) : 152222.2 (102198) 3750 : 20 ( 0.7%) 0
## [integer] min < med < max: 8750 : 60 ( 2.2%) (0.0%)
## 3750 < 175000 < 250000 15000 : 300 (11.1%) II
## IQR (CV) : 225000 (0.7) 25000 : 360 (13.3%) II
## 45000 : 120 ( 4.4%)
## 62500 : 80 ( 3.0%)
## 87500 : 80 ( 3.0%)
## 125000 : 240 ( 8.9%) I
## 175000 : 140 ( 5.2%) I
## 250000 : 1300 (48.1%) IIIIIIIII
##
## sex 1. F 1379 (51.1%) IIIIIIIIII 0
## [factor] 2. M 1321 (48.9%) IIIIIIIII (0.0%)
##
## age Mean (sd) : 38.5 (14.8) 53 distinct values : 1
## [integer] min < med < max: : . . (0.0%)
## 18 < 36 < 70 : : : : :
## IQR (CV) : 23 (0.4) : : : : : : . : . :
## : : : : : : : : : :
##
## education 1. P 1107 (41.2%) IIIIIIII 11
## [factor] 2. PS 462 (17.2%) III (0.4%)
## 3. S 1120 (41.7%) IIIIIIII
##
## income Mean (sd) : 33875.9 (39502.9) 2500 : 160 ( 6.1%) I 98
## [integer] min < med < max: 7500 : 494 (19.0%) III (3.6%)
## 2500 < 15000 < 2e+05 15000 : 768 (29.5%) IIIII
## IQR (CV) : 27500 (1.2) 35000 : 747 (28.7%) IIIII
## 75000 : 269 (10.3%) II
## 125000 : 88 ( 3.4%)
## 200000 : 76 ( 2.9%)
##
## statusquo Mean (sd) : 0 (1) 2092 distinct values : 17
## [numeric] min < med < max: : . (0.6%)
## -1.8 < 0 < 2 : : . . :
## IQR (CV) : 2 (-89450012) : : : : : :
## : : : : : : :
##
## vote 1. A 187 ( 7.4%) I 168
## [factor] 2. N 889 (35.1%) IIIIIII (6.2%)
## 3. U 588 (23.2%) IIII
## 4. Y 868 (34.3%) IIIIII
## ----------------------------------------------------------------------------------------------------
dfSummary(bd)
: Resumen detallado con
estadísticas y gráficos usando el paquete summarytools
.
Para generar y visualizar el resumen anterior de summarytools en formato HTML:
htmltools::html_print(html_summary)
:
Imprime el resumen en formato HTML
# Generar resumen en HTML
resumen <- dfSummary(bd,
varnumbers = FALSE,
valid.col = FALSE,
graph.magnif = 0.76)
html_summary <- print(resumen, method = "render")
htmltools::html_print(html_summary)
gt_plt_summary(df, title = "Resumen de la base de datos")
:
Crea una tabla resumen con un título utilizando el paquete
gtExtras
.
## Warning: package 'gtExtras' was built under R version 4.3.3
## Loading required package: gt
## Warning: package 'gt' was built under R version 4.3.3
## Warning in geom_point(data = NULL, aes(x = rng_vals[1], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 2700 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_point(data = NULL, aes(x = rng_vals[2], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 2700 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_point(data = NULL, aes(x = rng_vals[1], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 2699 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_point(data = NULL, aes(x = rng_vals[2], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 2699 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_point(data = NULL, aes(x = rng_vals[1], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 2602 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_point(data = NULL, aes(x = rng_vals[2], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 2602 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_point(data = NULL, aes(x = rng_vals[1], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 2683 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_point(data = NULL, aes(x = rng_vals[2], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 2683 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
Resumen de la base de datos | ||||||
2700 rows x 8 cols | ||||||
Column | Plot Overview | Missing | Mean | Median | SD | |
---|---|---|---|---|---|---|
regionSA, S, C, N and M |
0.0% | — | — | — | ||
population | 0.0% | 152,222.2 | 175,000.0 | 102,198.0 | ||
sexF and M |
0.0% | — | — | — | ||
age | 0.0% | 38.5 | 36.0 | 14.8 | ||
educationS, P and PS |
0.4% | — | — | — | ||
income | 3.6% | 33,875.9 | 15,000.0 | 39,502.9 | ||
statusquo | 0.6% | 0.0 | 0.0 | 1.0 | ||
voteN, Y, U and A |
6.2% | — | — | — |