Paquete datos

El paquete datos contiene datos para prƔctica en espaƱol

results <- help.search("datasets", package = "datos")
results$matches[c("Name", "Title")]

Explorar datos

  1. Cargar datos en una variable
actuales <- vehiculos
  1. Usar glimpse() para ver los datos
glimpse(actuales)
## Rows: 33,442
## Columns: 12
## $ id          <dbl> 13309, 13310, 13311, 14038, 14039, 14040, 14834, 14835, 1…
## $ fabricante  <chr> "Acura", "Acura", "Acura", "Acura", "Acura", "Acura", "Ac…
## $ modelo      <chr> "2.2CL/3.0CL", "2.2CL/3.0CL", "2.2CL/3.0CL", "2.3CL/3.0CL…
## $ anio        <dbl> 1997, 1997, 1997, 1998, 1998, 1998, 1999, 1999, 1999, 199…
## $ clase       <chr> "Automóviles subcompactos", "Automóviles subcompactos", "…
## $ transmision <chr> "AutomĆ”tica 4-velocidades", "Manual 5-velocidades", "Auto…
## $ traccion    <chr> "Delantera", "Delantera", "Delantera", "Delantera", "Dela…
## $ cilindros   <dbl> 4, 4, 6, 4, 4, 6, 4, 4, 6, 5, 5, 6, 5, 6, 5, 6, 6, 6, 6, …
## $ motor       <dbl> 2.2, 2.2, 3.0, 2.3, 2.3, 3.0, 2.3, 2.3, 3.0, 2.5, 2.5, 3.…
## $ combustible <chr> "Regular", "Regular", "Regular", "Regular", "Regular", "R…
## $ autopista   <dbl> 26, 28, 26, 27, 29, 26, 27, 29, 26, 23, 23, 22, 23, 22, 2…
## $ ciudad      <dbl> 20, 22, 18, 19, 21, 17, 20, 21, 17, 18, 18, 17, 18, 17, 1…
  1. Con skim() del paquete skimr provée much mas información
skim(actuales)
Data summary
Name actuales
Number of rows 33442
Number of columns 12
_______________________
Column type frequency:
character 6
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
fabricante 0 1 3 34 0 128 0
modelo 0 1 1 39 0 3198 0
clase 0 1 4 44 0 34 0
transmision 8 1 10 43 0 45 0
traccion 0 1 7 32 0 7 0
combustible 0 1 6 26 0 13 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
id 0 1 17038.30 10087.01 1 8361.25 16723.5 25264.75 34932.0 ▇▇▇▇▇
anio 0 1 1999.11 9.38 1984 1991.00 1999.0 2008.00 2015.0 ▇▆▅▆▇
cilindros 58 1 5.77 1.74 2 4.00 6.0 6.00 16.0 ▇▇▅▁▁
motor 57 1 3.35 1.36 0 2.30 3.0 4.30 8.4 ▁▇▅▂▁
autopista 0 1 23.55 6.21 9 19.00 23.0 27.00 109.0 ▇▁▁▁▁
ciudad 0 1 17.49 5.58 6 15.00 17.0 20.00 138.0 ▇▁▁▁▁
  1. Con correlate(), del paquete corrr, permite usar el pipe (%>%) para producir una tabla de correlaciones
actuales %>%
  select_if(is.numeric) %>%
  correlate()
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
  1. La función rplot() crea una grÔfica fÔcilmente
actuales %>%
  select_if(is.numeric) %>%
  correlate() %>%
  rplot()
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
## Don't know how to automatically pick scale for object of type noquote. Defaulting to continuous.

  1. La función shave() quita los valores duplicados
actuales %>%
  select_if(is.numeric) %>%
  correlate() %>%
  shave() %>%
  rplot()
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
## Don't know how to automatically pick scale for object of type noquote. Defaulting to continuous.