pacotes <- c("readxl", "dplyr", "ggplot2" )
lapply(pacotes, library, character.only = TRUE)
cars_train <- read_excel("cars_train.xlsx")
Using table to look at the available brands:
table(cars_train$marca)
##
## ALFA ROMEO AUDI BMW BRM CHERY
## 9 1698 1784 1 153
## CHEVROLET CHRYSLER CITROËN DODGE EFFA
## 3020 30 194 37 1
## FERRARI FIAT FORD HONDA HYUNDAI
## 1 1918 1060 1586 2043
## IVECO JAC JAGUAR JEEP KIA
## 2 3 148 2000 408
## LAND ROVER LEXUS LIFAN MASERATI MERCEDES-BENZ
## 760 75 8 7 1125
## MINI MITSUBISHI NISSAN PEUGEOT PORSCHE
## 137 862 438 1675 349
## RAM RENAULT SMART SSANGYONG SUBARU
## 168 538 12 14 41
## SUZUKI TOYOTA TROLLER VOLKSWAGEN VOLVO
## 41 2180 177 4594 287
I’ll consider the term “popular” as a reference to brands that have more affordable cars in Brazil.
populares <- filter(cars_train, marca %in% c("CHEVROLET", "FIAT", "FORD", "HONDA", "HYUNDAI", "JEEP", "KIA", "NISSAN", "PEUGEOT", "RENAULT", "TOYOTA", "VOLKSWAGEN"))
vendas_por_estado <- populares %>%
group_by(estado_vendedor) %>%
summarise(total_vendas = n()) %>%
arrange(desc(total_vendas))
top_10_estados <- head(vendas_por_estado, 10)
top_10_estados
## # A tibble: 10 × 2
## estado_vendedor total_vendas
## <chr> <int>
## 1 São Paulo (SP) 11827
## 2 Rio de Janeiro (RJ) 1829
## 3 Paraná (PR) 1759
## 4 Santa Catarina (SC) 1663
## 5 Minas Gerais (MG) 1298
## 6 Rio Grande do Sul (RS) 1107
## 7 Goiás (GO) 588
## 8 Bahia (BA) 490
## 9 Pernambuco (PE) 255
## 10 Alagoas (AL) 120
maior_preco_por_estado <- populares %>%
group_by(estado_vendedor) %>%
summarize(maior_preco = max(preco)) %>%
arrange(desc(maior_preco))
maior_preco_por_estado
## # A tibble: 25 × 2
## estado_vendedor maior_preco
## <chr> <dbl>
## 1 São Paulo (SP) 781750.
## 2 Minas Gerais (MG) 653173.
## 3 Goiás (GO) 586482.
## 4 Paraná (PR) 550147.
## 5 Santa Catarina (SC) 545973.
## 6 Rio Grande do Sul (RS) 540420.
## 7 Rio de Janeiro (RJ) 484355.
## 8 Sergipe (SE) 400378.
## 9 Bahia (BA) 372539.
## 10 Alagoas (AL) 345007.
## # … with 15 more rows
desvio_padrao_por_estado <- populares %>%
group_by(estado_vendedor) %>%
summarize(desvio_padrao = sd(preco)) %>%
arrange(desc(desvio_padrao))
desvio_padrao_por_estado
## # A tibble: 25 × 2
## estado_vendedor desvio_padrao
## <chr> <dbl>
## 1 Sergipe (SE) 110768.
## 2 Rio Grande do Sul (RS) 65032.
## 3 Minas Gerais (MG) 64713.
## 4 Paraná (PR) 63789.
## 5 Goiás (GO) 62744.
## 6 Rio de Janeiro (RJ) 62449.
## 7 Bahia (BA) 61629.
## 8 Santa Catarina (SC) 60929.
## 9 São Paulo (SP) 60881.
## 10 Alagoas (AL) 59437.
## # … with 15 more rows
populares_top_estados <- filter(populares, estado_vendedor %in% top_10_estados$estado_vendedor)
media_preco_por_estado <- populares_top_estados %>%
group_by(estado_vendedor) %>%
summarize(media_preco = mean(preco))
media_preco_por_estado
## # A tibble: 10 × 2
## estado_vendedor media_preco
## <chr> <dbl>
## 1 Alagoas (AL) 124097.
## 2 Bahia (BA) 114218.
## 3 Goiás (GO) 132822.
## 4 Minas Gerais (MG) 119491.
## 5 Paraná (PR) 123895.
## 6 Pernambuco (PE) 105228.
## 7 Rio Grande do Sul (RS) 122820.
## 8 Rio de Janeiro (RJ) 115898.
## 9 Santa Catarina (SC) 115379.
## 10 São Paulo (SP) 110091.
siglas_estados <- c("AL", "BA", "GO", "MG", "PR", "PE", "RJ", "RS", "SC", "SP" )
ggplot(populares_top_estados, aes(x = estado_vendedor, y = preco)) +
geom_boxplot() +
labs(title = "Boxplot dos Preços dos Carros Populares",
x = "Estado Vendedor",
y = "Preço") +
scale_x_discrete(labels = siglas_estados) +
scale_y_continuous(labels = scales::comma)
We can observe that the state of São Paulo had a significantly higher number of observations compared to other states. Furthermore, although the average selling price is similar among the top ten states with the most observations, São Paulo has the highest selling price for a popular brand car. The standard deviation also indicates reasonable price variability, suggesting that sellers can get good deals when selling their cars in the state of São Paulo.
Given that the difference in average prices for popular cars between states is relatively small, it is more advantageous to sell the car in the state with the highest number of observations. In this case, the seller would have an easier time closing the sale and would likely receive more offers to choose from, which may be more attractive.
Therefore, based on the analysis of the dataset, we can conclude that São Paulo is the most favorable state to sell a popular brand car, mainly due to the significant difference in the number of observations compared to other states.
It’s worth noting that this analysis considered only the “popular” brands in the “cars_train” dataset, which are: “CHEVROLET,” “FIAT,” “FORD,” “HONDA,” “HYUNDAI,” “JEEP,” “KIA,” “NISSAN,” “PEUGEOT,” “RENAULT,” “TOYOTA,” and “VOLKSWAGEN.”