Read & inspect transactions data
## Warning: Expecting logical in H3783 / R3783C8: got 'Pay by TnG'
## Warning: Expecting logical in H3787 / R3787C8: got 'Pay by card'
Consolidate the data into daily with sum/min/max/mean/median/number
of daily transaction
Create a new column called After_1_day_sum_sales, in that column,
move the target variable to a day forward, to simulate that the metadata
of today pair with target variable of tomorrow
Train with 4 models: linear regression, svm, random forest and
decision tree

Findings
- SVM has the lowest value in MAE, whereas LR has the lowest value in
RMSE.
- This means LR may handle extreme deviations better than SVM but
overall SVM performs better in predicting typical sales.




Findings
- None of the algorithms able to handle outliers (6 data points that
are around 10,000 to 20,000).
- Random Forest algorithm predicts some data (supposedly 5,000 actual
value) unexpectedly high, until 12,000.
- Visually, Decision Tree algorithm has a rigid predicted values, due
to the nature of the decision rules it has.
Next Step: Remove potential outliers and train again

The table below shows the summary of the After_1_day_sum_sales
column
Findings
- SVM is still the lowest MAE.
- Random Forest algorithm now becomes better in predicting in lowest
RMSE score.
- Overall RMSE dropped more than half of the original score, from
4,000 to lesser than 2,000.
- MAE has reduced around 1/3 of original score too, from 2,000 to
1,500.
- The standard deviation of the target column is 1.7k, so the MAE is
still in acceptable range.





Findings
- The outliers significantly reduced the performance of models, by
making them predict values with lesser errors.
- SVM outperforms other algorithms in predicting the next day’s
sales.