I asked to ChatGPT a dataset with the following prompt:
"Create a dataset for time series analysis on 5 ice cream flavors
1. Chocolate
2. Strawberry
3. Cream
4. Lemon
5. Grape
The explanatory variables are: Date, Flavor, Place of Sale (supermarket, shopping mall, beach and city), Price, Ice cream Size (Ice cream Cone, 250 grams, 500 grams and 1 kilogram).
The response variable is Sales, with the sales volume over 3 years between 2021 and 2024.
Dates must be weekly.
Make the best selling flavors are based on brazilians preferences."
Resulting on:
## Data Sabor Local de venda Preço Tamanho Vendas
## <IDat> <char> <char> <num> <char> <int>
## 1: 2021-01-03 Chocolate Supermercado 3.41 Casquinha 30
## 2: 2021-01-03 Chocolate Supermercado 6.52 250g 33
## 3: 2021-01-03 Chocolate Supermercado 10.94 500g 26
## 4: 2021-01-03 Chocolate Supermercado 20.83 1kg 28
## 5: 2021-01-03 Chocolate Shopping 3.16 Casquinha 31
## 6: 2021-01-03 Chocolate Shopping 6.56 250g 21
The following analysis was made based on this data: Ice cream weekly sales between 2021 and 2023, with focus on flavors, size, place and temporal patterns.
This analysis consists from Exploratory Analysis to Data Prediction. I asked just the most clear and summary insights in this document.
When we get to the Decomposition of the Series we will realize that something should have been request for in the prompt (do you know what it would be?). I will do a Part Two of this project, with the same data, however asking the AI to add what is missing, and we will see if the conclusions will be different from this analysis (we certainly will!).
## # A tibble: 5 × 2
## sabor vendas_total
## <fct> <int>
## 1 Chocolate 72732
## 2 Morango 59315
## 3 Creme 47788
## 4 Limão 36986
## 5 Uva 26361
Insight: Chocolate is the best seller, Grape is the least sold.
Insight: Grape have the biggest average ticket, Strawberry the least.
Insight: Chocolate combine high volume with balanced average ticket price. Grape have high profitability, but low demand.
Insight: Ice cream Cones sell more on the beach. The 1 kilo size is more profitable. Average ticket price is consistent between locations.
## `geom_smooth()` using formula = 'y ~ x'
Insight: Clear linear relationship, but with different inclinations by size. The 1 kilo size have the best unitary value.
## `geom_smooth()` using formula = 'y ~ x'
Insight: Chocolate is the best seller in all sizes. Grape is the least-selling.
• Are there any differences between the locations?
## Df Sum Sq Mean Sq F value Pr(>F)
## estabelecimento 3 0.09 0.0314 0.066 0.978
## Residuals 624 296.35 0.4749
• Are there any differences between the flavors?
## Df Sum Sq Mean Sq F value Pr(>F)
## sabor 4 13.8 3.451 4.701 0.00094 ***
## Residuals 780 572.6 0.734
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Insight: The places don’t are differ significantly from each other in terms of average ticket price. However the flavors are different.
• Which flavors differ from each other?
dunnTest(ticket_medio ~ as.factor(sabor), data = dados_teste2, method = "bonferroni")
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Bonferroni method.
## Comparison Z P.unadj P.adj
## 1 Chocolate - Creme -0.007963971 0.993645738 1.00000000
## 2 Chocolate - Limão -0.967871364 0.333108617 1.00000000
## 3 Creme - Limão -0.959907393 0.337101825 1.00000000
## 4 Chocolate - Morango 0.433787551 0.664442722 1.00000000
## 5 Creme - Morango 0.441751523 0.658669022 1.00000000
## 6 Limão - Morango 1.401658915 0.161017125 1.00000000
## 7 Chocolate - Uva -2.835173715 0.004580077 0.04580077
## 8 Creme - Uva -2.827209744 0.004695556 0.04695556
## 9 Limão - Uva -1.867302351 0.061859377 0.61859377
## 10 Morango - Uva -3.268961267 0.001079431 0.01079431
Insight: Grape is the flavor that is most different from the others, except for the Lemon flavor.
# Chart: STL Decomposition
autoplot(decomposicao) + labs(title = "Decomposition STL of the Weekly Sales Series")
Insight: The series presents a clear trend, without defined seasonality.
gg_season(dados_ts, vendas_total)
Insight: There is no defined annual seasonality.
Insight: AR(1) and MA(1) was suggest as candidates. But ARIMA(2,0,0) was the best-fitting model.
# Chart: Residue Diagnosis
gg_tsresiduals(modelo_arima_2)
Insight: The residuals belong to the Normal Distribution, are homoscedastic, and not autocorrelated.
# Overall Prediction
autoplot(previsao_25s, dados_ts)
Insight: The overall forecast is stable, consistent with the series pattern.
# Prediction by Flavor
autoplot(previsao_por_sabor, dados_sabor)
Insight: Chocolate and Strawberry are more confiable. Grape have high uncertainty.
# Prediction by Size
autoplot(previsao_por_tamanho, dados_tamanho)
Insight: Ice cream Cones are still the best-seller, but 500 grams is more predictable.
# Prediction by Place
autoplot(previsao_por_estabelecimento, dados_estabelecimento)
Insight: City have the best behaviour. Shopping is the most uncertable.
# Prediction Flavor + Size
autoplot(previsao_por_sabor_tam, dados_sabor_tam_ts) +
labs(
title = "Previsão de Vendas Totais por Sabor e Tamanho - 25 Semanas",
x = "Data", y = "Vendas Totais"
) +
facet_wrap(sabor ~ tamanho) +
theme_minimal()
Insight: Chocolate cone ice cream is the best-selling product with stability. Grapes have prediction problems at all sizes.
## # A tibble: 4 × 11
## tamanho .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1kg ARIMA… Trai… -9.29e- 2 37.8 29.9 -2.43 12.5 0.702 0.692 0.00887
## 2 250g ARIMA… Trai… 8.45e-14 41.0 33.3 -1.04 8.34 0.727 0.715 0.0180
## 3 500g ARIMA… Trai… -6.14e- 2 37.6 29.3 -1.45 9.35 0.636 0.657 -0.00765
## 4 Casquinha ARIMA… Trai… 3.48e- 2 38.7 31.8 -0.459 5.62 0.677 0.681 -0.0320
## # A tibble: 4 × 11
## estabelecimento .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cidade ARIMA(ve… Trai… 5.54e-15 40.5 32.3 -1.23 8.84 0.700 0.685
## 2 Praia ARIMA(ve… Trai… 9.80e-14 39.2 30.8 -0.789 7.01 0.748 0.733
## 3 Shopping ARIMA(ve… Trai… -1.55e-13 43.8 34.9 -1.75 10.6 0.717 0.715
## 4 Supermercado ARIMA(ve… Trai… -1.27e- 1 43.9 34.7 -1.39 9.33 0.659 0.676
## # ℹ 1 more variable: ACF1 <dbl>
## # A tibble: 20 × 12
## sabor tamanho .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Chocola… 1kg ARIMA… Trai… -1.73e- 3 18.7 14.8 -8.24 23.6 0.700 0.715
## 2 Chocola… 250g ARIMA… Trai… -3.60e-12 20.1 16.0 -2.94 13.9 0.698 0.708
## 3 Chocola… 500g ARIMA… Trai… -1.97e- 2 19.2 15.4 -4.51 17.5 0.689 0.670
## 4 Chocola… Casqui… ARIMA… Trai… -8.14e-14 20.8 16.8 -1.51 10.0 0.697 0.713
## 5 Creme 1kg ARIMA… Trai… -1.79e-14 16.8 13.4 -16.5 36.0 0.743 0.721
## 6 Creme 250g ARIMA… Trai… -2.17e-14 18.6 14.2 -6.25 19.9 0.721 0.733
## 7 Creme 500g ARIMA… Trai… -3.17e-14 20.3 16.3 -17.1 35.5 0.676 0.671
## 8 Creme Casqui… ARIMA… Trai… -1.56e- 2 20.1 16.3 -3.62 15.6 0.684 0.685
## 9 Limão 1kg ARIMA… Trai… -2.49e-14 17.0 13.5 -32.4 54.3 0.659 0.674
## 10 Limão 250g ARIMA… Trai… -2.03e-15 18.0 14.3 -14.3 31.7 0.722 0.731
## 11 Limão 500g ARIMA… Trai… -3.00e-12 16.9 13.5 -20.6 40.5 0.699 0.696
## 12 Limão Casqui… ARIMA… Trai… 3.61e-14 16.5 12.4 -4.06 15.6 0.697 0.731
## 13 Morango 1kg ARIMA… Trai… 5.64e-13 18.6 15.6 -12.3 31.6 0.727 0.709
## 14 Morango 250g ARIMA… Trai… -3.44e-14 17.6 13.9 -3.69 15.4 0.722 0.709
## 15 Morango 500g ARIMA… Trai… 3.42e-12 19.9 15.7 -9.26 24.5 0.722 0.711
## 16 Morango Casqui… ARIMA… Trai… -1.02e-13 20.5 17.1 -2.27 12.7 0.713 0.690
## 17 Uva 1kg ARIMA… Trai… -1.30e-14 13.6 11.2 -Inf Inf 0.777 0.745
## 18 Uva 250g ARIMA… Trai… -1.23e- 2 17.3 13.7 -52.5 74.0 0.686 0.698
## 19 Uva 500g ARIMA… Trai… -1.53e- 2 15.7 13.0 -29.5 53.6 0.745 0.722
## 20 Uva Casqui… ARIMA… Trai… -3.07e-14 16.4 13.2 -9.25 25.7 0.722 0.722
## # ℹ 1 more variable: ACF1 <dbl>
Insight: MASE below de 0.75 in the most of the cases. Combination with Grape and big sizes are less predictable.
I believe that you should undestand that what falt in the initial prompt was ask the guarantee of seasonality in the volume of ice cream sales, also based on the preferences of Brazilians. This way, we would have the seasonal component link to the seasons of the year, manly the Summer and Winter of the country, and probably the choose model would be other.
The chat for this project was very extensive, so because of this I couldn’t show for you how enriching the interaction with ChatGPT for this analysis was, both for the positive aspects of the tool (knowledge adapted to my profile, in-depth conversation on the subject, building insights together, extensive knowledge about R language and its libraries, etc), and for the negative aspects. (exemple, I had the experience of seeing AI confirmation bias through a conclusion that I made and commented on in chat, but later saw that this conclusion was wrong and instead ChatGPT correcting me, it took my mistake as truth). That is why we must always remember to never trust 100% what the AI tells us, because just as we make mistakes, it is also subject to making mistakes (at least for now!).