Loading packages and dataset

pacotes <- c("readxl", "dplyr", "ggplot2")
lapply(pacotes, library, character.only = TRUE)
cars_train <- read_excel("cars_train.xlsx")

Analyses

Looking at the states with the highest numbers of cars with paid IPVA whose sellers are individuals

ipvas <- cars_train %>%
  group_by(estado_vendedor) %>%
  filter(ipva_pago == "IPVA pago" & tipo_vendedor == "PF") %>%
  summarize(total_carros = n()) %>%
  arrange(desc(total_carros))
ipvas
## # A tibble: 20 × 2
##    estado_vendedor          total_carros
##    <chr>                           <int>
##  1 São Paulo (SP)                   7977
##  2 Rio de Janeiro (RJ)              1296
##  3 Paraná (PR)                      1071
##  4 Rio Grande do Sul (RS)           1013
##  5 Santa Catarina (SC)               871
##  6 Minas Gerais (MG)                 842
##  7 Goiás (GO)                        431
##  8 Bahia (BA)                         94
##  9 Alagoas (AL)                       44
## 10 Pará (PA)                          37
## 11 Mato Grosso do Sul (MS)            26
## 12 Ceará (CE)                         18
## 13 Rio Grande do Norte (RN)           17
## 14 Pernambuco (PE)                    13
## 15 Espírito Santo (ES)                10
## 16 Mato Grosso (MT)                    7
## 17 Amazonas (AM)                       4
## 18 Paraíba (PB)                        2
## 19 Sergipe (SE)                        2
## 20 Piauí (PI)                          1

Filtering only the five states with the highest numbers of cars with paid IPVA

top_5_estados <- head(ipvas, 5)$estado_vendedor
dados_top_5 <- filter(cars_train, estado_vendedor %in% top_5_estados & ipva_pago == "IPVA pago")

Creating a boxplot of the prices of the five states with the highest numbers of cars with paid IPVA whose sellers are individuals

ggplot(dados_top_5, aes(x = estado_vendedor, y = preco)) +
  geom_boxplot() +
  labs(title = "Box Plot of prices per state",
       x = "State",
       y = "Price")

Calculating the average price of cars with paid IPVA whose sellers are individuals in the five states with the most observations

media_precos_estado <- dados_top_5 %>%
  group_by(estado_vendedor) %>%
  summarize(media_preco = mean(preco)) %>%
  arrange(media_preco)
media_precos_estado
## # A tibble: 5 × 2
##   estado_vendedor        media_preco
##   <chr>                        <dbl>
## 1 São Paulo (SP)             128483.
## 2 Rio de Janeiro (RJ)        136465.
## 3 Santa Catarina (SC)        138894.
## 4 Rio Grande do Sul (RS)     141671.
## 5 Paraná (PR)                146163.

Considering the analyses made above, we conclude that, excluding the state of São Paulo, the state of Rio de Janeiro would be the best state to buy a car with paid IPVA whose seller is an individual.

It is noted that the state of Rio de Janeiro is the second state with the highest number of cars with such characteristics. It is also noticed that the state of Rio de Janeiro has the smallest range in its outliers.

Finally, we can see that the state of Rio de Janeiro has the second lowest average price for cars with paid IPVA whose seller is an individual.

Therefore, we can affirm that in general, and disregarding the state of São Paulo, Rio de Janeiro is the best state to buy cars with paid IPVA whose seller is an individual.