1. Introduction

I asked to ChatGPT a dataset with the following prompt:

"Create a dataset for time series analysis on 5 ice cream flavors
1. Chocolate
2. Strawberry
3. Cream
4. Lemon
5. Grape
The explanatory variables are: Date, Flavor, Place of Sale (supermarket, shopping mall, beach and city), Price, Ice cream Size (Ice cream Cone, 250 grams, 500 grams and 1 kilogram).
The response variable is Sales, with the sales volume over 3 years between 2021 and 2024. 
Dates must be weekly.

Make the best selling flavors are based on brazilians preferences."

Resulting on:

##          Data     Sabor Local de venda Preço   Tamanho Vendas
##        <IDat>    <char>         <char> <num>    <char>  <int>
## 1: 2021-01-03 Chocolate   Supermercado  3.41 Casquinha     30
## 2: 2021-01-03 Chocolate   Supermercado  6.52      250g     33
## 3: 2021-01-03 Chocolate   Supermercado 10.94      500g     26
## 4: 2021-01-03 Chocolate   Supermercado 20.83       1kg     28
## 5: 2021-01-03 Chocolate       Shopping  3.16 Casquinha     31
## 6: 2021-01-03 Chocolate       Shopping  6.56      250g     21

The following analysis was made based on this data: Ice cream weekly sales between 2021 and 2023, with focus on flavors, size, place and temporal patterns.

This analysis consists from Exploratory Analysis to Data Prediction. I asked just the most clear and summary insights in this document.

When we get to the Decomposition of the Series we will realize that something should have been request for in the prompt (do you know what it would be?). I will do a Part Two of this project, with the same data, however asking the AI to add what is missing, and we will see if the conclusions will be different from this analysis (we certainly will!).

2. Exploratory Analysis

## # A tibble: 5 × 2
##   sabor     vendas_total
##   <fct>            <int>
## 1 Chocolate        72732
## 2 Morango          59315
## 3 Creme            47788
## 4 Limão            36986
## 5 Uva              26361

Insight: Chocolate is the best seller, Grape is the least sold.

Insight: Grape have the biggest average ticket, Strawberry the least.

Insight: Chocolate combine high volume with balanced average ticket price. Grape have high profitability, but low demand.

Insight: Ice cream Cones sell more on the beach. The 1 kilo size is more profitable. Average ticket price is consistent between locations.

3. Correlation Analysis

## `geom_smooth()` using formula = 'y ~ x'

Insight: Clear linear relationship, but with different inclinations by size. The 1 kilo size have the best unitary value.

## `geom_smooth()` using formula = 'y ~ x'

Insight: Chocolate is the best seller in all sizes. Grape is the least-selling.

4. Hypoteses Test

• Are there any differences between the locations?

##                  Df Sum Sq Mean Sq F value Pr(>F)
## estabelecimento   3   0.09  0.0314   0.066  0.978
## Residuals       624 296.35  0.4749

• Are there any differences between the flavors?

##              Df Sum Sq Mean Sq F value  Pr(>F)    
## sabor         4   13.8   3.451   4.701 0.00094 ***
## Residuals   780  572.6   0.734                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Insight: The places don’t are differ significantly from each other in terms of average ticket price. However the flavors are different.

• Which flavors differ from each other?

dunnTest(ticket_medio ~ as.factor(sabor), data = dados_teste2, method = "bonferroni")
## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Bonferroni method.
##             Comparison            Z     P.unadj      P.adj
## 1    Chocolate - Creme -0.007963971 0.993645738 1.00000000
## 2    Chocolate - Limão -0.967871364 0.333108617 1.00000000
## 3        Creme - Limão -0.959907393 0.337101825 1.00000000
## 4  Chocolate - Morango  0.433787551 0.664442722 1.00000000
## 5      Creme - Morango  0.441751523 0.658669022 1.00000000
## 6      Limão - Morango  1.401658915 0.161017125 1.00000000
## 7      Chocolate - Uva -2.835173715 0.004580077 0.04580077
## 8          Creme - Uva -2.827209744 0.004695556 0.04695556
## 9          Limão - Uva -1.867302351 0.061859377 0.61859377
## 10       Morango - Uva -3.268961267 0.001079431 0.01079431

Insight: Grape is the flavor that is most different from the others, except for the Lemon flavor.

5. Temporal Series Models

# Chart: STL Decomposition 
autoplot(decomposicao) + labs(title = "Decomposition STL of the Weekly Sales Series")

Insight: The series presents a clear trend, without defined seasonality.

gg_season(dados_ts, vendas_total)

Insight: There is no defined annual seasonality.

Insight: AR(1) and MA(1) was suggest as candidates. But ARIMA(2,0,0) was the best-fitting model.

# Chart: Residue Diagnosis
gg_tsresiduals(modelo_arima_2)

Insight: The residuals belong to the Normal Distribution, are homoscedastic, and not autocorrelated.

6. Predictions

# Overall Prediction
autoplot(previsao_25s, dados_ts)

Insight: The overall forecast is stable, consistent with the series pattern.

# Prediction by Flavor
autoplot(previsao_por_sabor, dados_sabor)

Insight: Chocolate and Strawberry are more confiable. Grape have high uncertainty.

# Prediction by Size
autoplot(previsao_por_tamanho, dados_tamanho)

Insight: Ice cream Cones are still the best-seller, but 500 grams is more predictable.

# Prediction by Place
autoplot(previsao_por_estabelecimento, dados_estabelecimento)

Insight: City have the best behaviour. Shopping is the most uncertable.

# Prediction Flavor + Size
autoplot(previsao_por_sabor_tam, dados_sabor_tam_ts) +
  labs(
    title = "Previsão de Vendas Totais por Sabor e Tamanho - 25 Semanas",
    x = "Data", y = "Vendas Totais"
  ) +
  facet_wrap(sabor ~ tamanho) +
  theme_minimal()

Insight: Chocolate cone ice cream is the best-selling product with stability. Grapes have prediction problems at all sizes.

7. Model Accuracy by Group

## # A tibble: 4 × 11
##   tamanho   .model .type        ME  RMSE   MAE    MPE  MAPE  MASE RMSSE     ACF1
##   <chr>     <chr>  <chr>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 1kg       ARIMA… Trai… -9.29e- 2  37.8  29.9 -2.43  12.5  0.702 0.692  0.00887
## 2 250g      ARIMA… Trai…  8.45e-14  41.0  33.3 -1.04   8.34 0.727 0.715  0.0180 
## 3 500g      ARIMA… Trai… -6.14e- 2  37.6  29.3 -1.45   9.35 0.636 0.657 -0.00765
## 4 Casquinha ARIMA… Trai…  3.48e- 2  38.7  31.8 -0.459  5.62 0.677 0.681 -0.0320
## # A tibble: 4 × 11
##   estabelecimento .model    .type        ME  RMSE   MAE    MPE  MAPE  MASE RMSSE
##   <chr>           <chr>     <chr>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 Cidade          ARIMA(ve… Trai…  5.54e-15  40.5  32.3 -1.23   8.84 0.700 0.685
## 2 Praia           ARIMA(ve… Trai…  9.80e-14  39.2  30.8 -0.789  7.01 0.748 0.733
## 3 Shopping        ARIMA(ve… Trai… -1.55e-13  43.8  34.9 -1.75  10.6  0.717 0.715
## 4 Supermercado    ARIMA(ve… Trai… -1.27e- 1  43.9  34.7 -1.39   9.33 0.659 0.676
## # ℹ 1 more variable: ACF1 <dbl>
## # A tibble: 20 × 12
##    sabor    tamanho .model .type        ME  RMSE   MAE     MPE  MAPE  MASE RMSSE
##    <chr>    <chr>   <chr>  <chr>     <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
##  1 Chocola… 1kg     ARIMA… Trai… -1.73e- 3  18.7  14.8   -8.24  23.6 0.700 0.715
##  2 Chocola… 250g    ARIMA… Trai… -3.60e-12  20.1  16.0   -2.94  13.9 0.698 0.708
##  3 Chocola… 500g    ARIMA… Trai… -1.97e- 2  19.2  15.4   -4.51  17.5 0.689 0.670
##  4 Chocola… Casqui… ARIMA… Trai… -8.14e-14  20.8  16.8   -1.51  10.0 0.697 0.713
##  5 Creme    1kg     ARIMA… Trai… -1.79e-14  16.8  13.4  -16.5   36.0 0.743 0.721
##  6 Creme    250g    ARIMA… Trai… -2.17e-14  18.6  14.2   -6.25  19.9 0.721 0.733
##  7 Creme    500g    ARIMA… Trai… -3.17e-14  20.3  16.3  -17.1   35.5 0.676 0.671
##  8 Creme    Casqui… ARIMA… Trai… -1.56e- 2  20.1  16.3   -3.62  15.6 0.684 0.685
##  9 Limão    1kg     ARIMA… Trai… -2.49e-14  17.0  13.5  -32.4   54.3 0.659 0.674
## 10 Limão    250g    ARIMA… Trai… -2.03e-15  18.0  14.3  -14.3   31.7 0.722 0.731
## 11 Limão    500g    ARIMA… Trai… -3.00e-12  16.9  13.5  -20.6   40.5 0.699 0.696
## 12 Limão    Casqui… ARIMA… Trai…  3.61e-14  16.5  12.4   -4.06  15.6 0.697 0.731
## 13 Morango  1kg     ARIMA… Trai…  5.64e-13  18.6  15.6  -12.3   31.6 0.727 0.709
## 14 Morango  250g    ARIMA… Trai… -3.44e-14  17.6  13.9   -3.69  15.4 0.722 0.709
## 15 Morango  500g    ARIMA… Trai…  3.42e-12  19.9  15.7   -9.26  24.5 0.722 0.711
## 16 Morango  Casqui… ARIMA… Trai… -1.02e-13  20.5  17.1   -2.27  12.7 0.713 0.690
## 17 Uva      1kg     ARIMA… Trai… -1.30e-14  13.6  11.2 -Inf    Inf   0.777 0.745
## 18 Uva      250g    ARIMA… Trai… -1.23e- 2  17.3  13.7  -52.5   74.0 0.686 0.698
## 19 Uva      500g    ARIMA… Trai… -1.53e- 2  15.7  13.0  -29.5   53.6 0.745 0.722
## 20 Uva      Casqui… ARIMA… Trai… -3.07e-14  16.4  13.2   -9.25  25.7 0.722 0.722
## # ℹ 1 more variable: ACF1 <dbl>

Insight: MASE below de 0.75 in the most of the cases. Combination with Grape and big sizes are less predictable.

8. Conclusion


9. Final Considerations

I believe that you should undestand that what falt in the initial prompt was ask the guarantee of seasonality in the volume of ice cream sales, also based on the preferences of Brazilians. This way, we would have the seasonal component link to the seasons of the year, manly the Summer and Winter of the country, and probably the choose model would be other.

The chat for this project was very extensive, so because of this I couldn’t show for you how enriching the interaction with ChatGPT for this analysis was, both for the positive aspects of the tool (knowledge adapted to my profile, in-depth conversation on the subject, building insights together, extensive knowledge about R language and its libraries, etc), and for the negative aspects. (exemple, I had the experience of seeing AI confirmation bias through a conclusion that I made and commented on in chat, but later saw that this conclusion was wrong and instead ChatGPT correcting me, it took my mistake as truth). That is why we must always remember to never trust 100% what the AI tells us, because just as we make mistakes, it is also subject to making mistakes (at least for now!).