Loading packages and dataset

pacotes <- c("readxl", "dplyr")
lapply(pacotes, library, character.only = TRUE)
cars_train <- read_excel("cars_train.xlsx")

Analyses

Using table to look at the available types

table(cars_train$tipo)
## 
##                 Cupê            Hatchback              Minivan 
##                   26                 4924                    7 
##             Perua/SW               Picape                 Sedã 
##                   27                 4849                16429 
## Utilitário esportivo 
##                 3322

. ## Creating a new dataset with only the “picape”(pickup truck) type

picapes_auto <- filter(cars_train, tipo %in% c("Picape"))

Filtering the “picapes_auto” dataset to contain only automatic pickups

picapes_auto <- filter(picapes_auto, cambio %in% c("Automática"))

Looking at the observations for each “estado_vendedor”(selling state)

vendas_por_estado <- picapes_auto %>%
  group_by(estado_vendedor) %>%
  summarize(volume_vendas = n()) %>%
  arrange(desc(volume_vendas))
vendas_por_estado
## # A tibble: 18 × 2
##    estado_vendedor          volume_vendas
##    <chr>                            <int>
##  1 São Paulo (SP)                    1712
##  2 Paraná (PR)                        348
##  3 Rio de Janeiro (RJ)                318
##  4 Santa Catarina (SC)                283
##  5 Minas Gerais (MG)                  211
##  6 Rio Grande do Sul (RS)             198
##  7 Goiás (GO)                         102
##  8 Bahia (BA)                          68
##  9 Pernambuco (PE)                     14
## 10 Alagoas (AL)                        12
## 11 Acre (AC)                            6
## 12 Mato Grosso (MT)                     6
## 13 Mato Grosso do Sul (MS)              5
## 14 Sergipe (SE)                         5
## 15 Paraíba (PB)                         4
## 16 Piauí (PI)                           4
## 17 Tocantins (TO)                       3
## 18 Rio Grande do Norte (RN)             1

Obtaining the lowest price for each “estado_vendedor”

menor_preco_por_estado <- picapes_auto %>%
  group_by(estado_vendedor) %>%
  summarize(menor_preco = min(preco)) %>%
arrange(menor_preco)
menor_preco_por_estado
## # A tibble: 18 × 2
##    estado_vendedor          menor_preco
##    <chr>                          <dbl>
##  1 São Paulo (SP)                15953.
##  2 Rio de Janeiro (RJ)           26730.
##  3 Paraná (PR)                   33475.
##  4 Rio Grande do Sul (RS)        40222.
##  5 Santa Catarina (SC)           41525.
##  6 Mato Grosso do Sul (MS)       51084.
##  7 Alagoas (AL)                  52207.
##  8 Acre (AC)                     56453.
##  9 Minas Gerais (MG)             58014.
## 10 Bahia (BA)                    61530.
## 11 Goiás (GO)                    72266.
## 12 Paraíba (PB)                  73224.
## 13 Pernambuco (PE)               75521.
## 14 Tocantins (TO)               121773.
## 15 Piauí (PI)                   147847.
## 16 Mato Grosso (MT)             167592.
## 17 Rio Grande do Norte (RN)     179962.
## 18 Sergipe (SE)                 233540.

Calculating the standard deviation for each “estado_vendedor”

desvio_padrao_por_estado <- picapes_auto %>%
  group_by(estado_vendedor) %>%
  summarize(desvio_padrao = sd(preco)) %>%
arrange(desc(desvio_padrao))
desvio_padrao_por_estado
## # A tibble: 18 × 2
##    estado_vendedor          desvio_padrao
##    <chr>                            <dbl>
##  1 Santa Catarina (SC)            114100.
##  2 São Paulo (SP)                 105203.
##  3 Goiás (GO)                     100476.
##  4 Minas Gerais (MG)               92135.
##  5 Rio Grande do Sul (RS)          91456.
##  6 Paraná (PR)                     90511.
##  7 Pernambuco (PE)                 89385.
##  8 Alagoas (AL)                    85425.
##  9 Rio de Janeiro (RJ)             84677.
## 10 Bahia (BA)                      84220.
## 11 Sergipe (SE)                    69626.
## 12 Acre (AC)                       67929.
## 13 Tocantins (TO)                  61314.
## 14 Mato Grosso do Sul (MS)         60465.
## 15 Piauí (PI)                      41337.
## 16 Mato Grosso (MT)                39038.
## 17 Paraíba (PB)                    15281.
## 18 Rio Grande do Norte (RN)           NA

Calculating the average selling price for each “estado_vendedor”

preco_medio_por_estado <- picapes_auto %>%
  group_by(estado_vendedor) %>%
  summarize(preco_medio = mean(preco)) %>%
  arrange(preco_medio)
preco_medio_por_estado
## # A tibble: 18 × 2
##    estado_vendedor          preco_medio
##    <chr>                          <dbl>
##  1 Paraíba (PB)                  93157.
##  2 Mato Grosso do Sul (MS)      144700.
##  3 Acre (AC)                    145257.
##  4 Rio Grande do Norte (RN)     179962.
##  5 Rio de Janeiro (RJ)          181660.
##  6 Santa Catarina (SC)          185849.
##  7 Tocantins (TO)               187717.
##  8 São Paulo (SP)               188427.
##  9 Pernambuco (PE)              192567.
## 10 Rio Grande do Sul (RS)       195252.
## 11 Minas Gerais (MG)            195703.
## 12 Paraná (PR)                  198385.
## 13 Bahia (BA)                   206365.
## 14 Goiás (GO)                   208154.
## 15 Piauí (PI)                   208181.
## 16 Mato Grosso (MT)             214102.
## 17 Alagoas (AL)                 218672.
## 18 Sergipe (SE)                 298195.

It is observed that the state of Paraíba has the lowest average price for automatic pickup trucks. However, it is important to note that the state of Paraíba has only four automatic pickup trucks in the dataset, which makes the sample size small and may indicate difficulty in finding automatic pickup trucks for purchase.

On the other hand, the state of São Paulo has the highest number of observations for automatic pickup trucks. Furthermore, it also has the lowest absolute value among all automatic pickup trucks available, one of the highest standard deviations in the prices of these vehicles, and only the eighth-highest average price out of 18 states.

Therefore, overall, the state of São Paulo is the best state to buy an automatic pickup truck. It is where a person would have the easiest time finding an automatic pickup truck to purchase, and due to the significant variability in prices indicated by the standard deviation, the person would also have the opportunity to find relatively lower prices than in other states.

It is important to note that this analysis was conducted only with observations that had the category “Picape”(pickup truck) in the qualitative variable “tipo”(type) and “Automática”(automatic) in the qualitative variable “cambio”(trasmission).