library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(summarytools)
Anexando pacote: 'summarytools'
O seguinte objeto é mascarado por 'package:tibble':
view
library(knitr)
library(ggplot2)
library(rlang)
Anexando pacote: 'rlang'
Os seguintes objetos são mascarados por 'package:purrr':
flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
flatten_raw, invoke, splice
library(patchwork)
library(corrplot)corrplot 0.95 loaded
library(gtsummary)
adult_csv <- read_csv("adult.csv")Rows: 32561 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): workclass, education, marital.status, occupation, relationship, rac...
dbl (6): age, fnlwgt, education.num, capital.gain, capital.loss, hours.per.week
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(adult_csv)Rows: 32,561
Columns: 15
$ age <dbl> 90, 82, 66, 54, 41, 34, 38, 74, 68, 41, 45, 38, 52, 32,…
$ workclass <chr> "?", "Private", "?", "Private", "Private", "Private", "…
$ fnlwgt <dbl> 77053, 132870, 186061, 140359, 264663, 216864, 150601, …
$ education <chr> "HS-grad", "HS-grad", "Some-college", "7th-8th", "Some-…
$ education.num <dbl> 9, 9, 10, 4, 10, 9, 6, 16, 9, 10, 16, 15, 13, 14, 16, 1…
$ marital.status <chr> "Widowed", "Widowed", "Widowed", "Divorced", "Separated…
$ occupation <chr> "?", "Exec-managerial", "?", "Machine-op-inspct", "Prof…
$ relationship <chr> "Not-in-family", "Not-in-family", "Unmarried", "Unmarri…
$ race <chr> "White", "White", "Black", "White", "White", "White", "…
$ sex <chr> "Female", "Female", "Female", "Female", "Female", "Fema…
$ capital.gain <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ capital.loss <dbl> 4356, 4356, 4356, 3900, 3900, 3770, 3770, 3683, 3683, 3…
$ hours.per.week <dbl> 40, 18, 40, 40, 40, 45, 40, 20, 40, 60, 35, 45, 20, 55,…
$ native.country <chr> "United-States", "United-States", "United-States", "Uni…
$ income <chr> "<=50K", "<=50K", "<=50K", "<=50K", "<=50K", "<=50K", "…
# Seleciona todas as variáveis categóricas
categoricas <- adult_csv |>
select(where(~ is.character(.))) %>%
names()
# Filtra apenas pessoas com renda >50K
adult_high <- adult_csv |>
filter(income == ">50K")