Most of us would have heard about the new buzz in the market i.e. Cryptocurrency. Many of us would have invested in their coins too. But, is investing money in such a volatile currency safe? How can we make sure that investing in these coins now would surely generate a healthy profit in the future? We can’t be sure but we can surely generate an approximate value based on the previous prices. Time series models is one way to predict them.
In this article, I will talk about a new time series technique that was developed by Facebook named Prophet. Dataset is the credit card spending from January 2016 to April 2018 and I will try to forecast spending amount of credit card in May 2018
I use a decomposable time series model with three main model components: trend, seasonality, and holidays. They are combined in the following equation:
\(y(t) = g(t) + s(t) + h(t) + Error(t)\)
library(dplyr)
library(lubridate)
library(magrittr)
library(prophet)
library(ggplot2)
library(ggthemes)
rm(list = ls())
load(file = "D:/4. BA - R/1. TOPIC/49. CREDIT CARD SPENDING ANALYSIS/CREDIT CARD SPENDING ANALYSIS/Data/raw_data.Rda")
raw_data %>% head
## id yearmonth target_id trans_date trans_curr trans_amount_qd
## 1 239912899 201802 10656943 2018-02-10 VND 675000.0
## 2 239881890 201802 7312783 2018-02-10 VND 347694.6
## 3 239881897 201802 8291980 2018-02-10 VND 60943.2
## 4 239881914 201802 10073216 2018-02-10 VND 62080.2
## 5 239886848 201802 5348936 2018-02-09 VND 4590000.0
## 6 239886858 201802 6237498 2018-02-09 VND 2300000.0
## trans_details merchant_id trans_city trans_country
## 1 LUCAS 96 BUI THI XUAN 12401770 HA NOI Viet Nam
## 2 Uber BV 518898000200042 Vorden NETHERLANDS
## 3 Uber BV 518898000200042 Vorden NETHERLANDS
## 4 Uber BV 518898000200042 Vorden NETHERLANDS
## 5 MAT TROI HA LONG 44800014 QUANG NINH Viet Nam
## 6 NHA KHOA PHUONG 12001269 HA NOI Viet Nam
## bi_trans_type sic_code cif product_detail
## 1 POS 5137 1592062 02. TITANIUM LADY CREDIT
## 2 OnlinePayment 4121 1395827 02. TITANIUM LADY CREDIT
## 3 OnlinePayment 4121 1844092 04. PLATINUM CREDIT
## 4 OnlinePayment 4121 806327 03. TITANIUM STEPUP CREDIT
## 5 POS 7996 974699 04. PLATINUM CREDIT
## 6 POS 8021 765716 02. TITANIUM LADY CREDIT
## sic_id mcc_category
## 1 5137 Men/Women/Child Uniforms/Commercial Clothings FashionShopping
## 2 4121 Taxicabs/limousines Travelline
## 3 4121 Taxicabs/limousines Travelline
## 4 4121 Taxicabs/limousines Travelline
## 5 7996 Amusement Parks,circuses Entertainment
## 6 8021 Dentists,orthodontists HospitalMedical
## mcc_category_aia open_date yearmonth_new date month_transaction
## 1 <NA> 2017-11-07 201802 2018-02-10 4
## 2 <NA> 2016-09-28 201802 2018-02-10 17
## 3 <NA> 2017-04-10 201802 2018-02-10 11
## 4 <NA> 2017-09-19 201802 2018-02-10 5
## 5 <NA> 2015-08-13 201802 2018-02-09 31
## 6 Medical 2016-01-12 201802 2018-02-09 26
The dataset contains transactions of credit card users in many areas (called mcc categories), we need to forecast spending of every single mcc category before calculating the performance of the whole month
raw_data %>%
group_by(yearmonth_new) %>%
summarise(sum = round(sum(trans_amount_qd)/1e9,2)) %>%
ggplot(aes(x = as.factor(yearmonth_new), y = sum, group = 1)) +
geom_line(col = "darkgreen") +
geom_point(col = "darkred") +
geom_text(aes(label = sum), check_overlap = TRUE, vjust = -0.5) +
theme_economist() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
xlab("Timeline") +
ylab("Credit card spending amount (bil)") +
ggtitle("Credit card spending")
Create credit card spending amount in each mcc category
load(file = "D:/4. BA - R/1. TOPIC/49. CREDIT CARD SPENDING ANALYSIS/CREDIT CARD SPENDING ANALYSIS/Data/Prophet_data.Rda")
for (i in unique(raw_data$mcc_category)) {
df <- raw_data %>%
filter(mcc_category == i) %>%
group_by(date) %>%
summarise(spending = round(sum(trans_amount_qd)/1e9,5)) %>%
as.data.frame()
names(df)[1] <- "ds"
names(df)[2] <- "y"
df$y <- log(df$y)
df$mcc_category <- paste(i)
name <- paste(i)
df <- df %>% select(3,1,2)
assign(name, df)
}
Calculate performce of credit card in May of each mcc category using regression for logarithm value of credit card spending
list_of_dataframe <- list(FashionShopping, Travelline, Entertainment, HospitalMedical,
Departmentstores, Utilities, Dailyitems, Others,
Cash, DinningDrinking, Education, Automotivemotorandbikesaleandservice,
Electronic, HouseholdFurniture, Jewellery, Advertisingservices, Insurance,
Fuel, TelephoneServiceEquipment, BeautySpa, Dutyfreetouristattraction,
Bookstoresnewsstationeryprinting, Realestate)
final_result <- data.frame()
for ( i in c(1:23)) {
df <- list_of_dataframe[[i]]
m <- prophet(df)
future <- make_future_dataframe(m, periods = 32)
forecast <- predict(m, future)
result <- forecast %>%
mutate(spending = exp(1)^yhat) %>%
tail(31) %>%
summarise(predict = sum(spending)) %>%
as.data.frame()
result$mcc_category <- unique(list_of_dataframe[[i]]$mcc_category)
final_result <- rbind(final_result, result)
}
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 928045447
## initial log joint probability = -50.544
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 2061027555
## initial log joint probability = -28.8111
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 995665898
## initial log joint probability = -159.451
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1392713796
## initial log joint probability = -131.491
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 2017638617
## initial log joint probability = -44.4012
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1276467127
## initial log joint probability = -60.8832
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 776550200
## initial log joint probability = -12.9036
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 570197813
## initial log joint probability = -206.558
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 814904973
## initial log joint probability = -45.6499
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 211907482
## initial log joint probability = -19.6751
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1304807879
## initial log joint probability = -586.116
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 755878985
## initial log joint probability = -291.915
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 957690062
## initial log joint probability = -57.0713
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1539297794
## initial log joint probability = -171.261
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1115245831
## initial log joint probability = -46.3715
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1742770748
## initial log joint probability = -134.498
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1453274203
## initial log joint probability = -2466.52
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 35550647
## initial log joint probability = -404.799
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1163350206
## initial log joint probability = -141.945
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1190466966
## initial log joint probability = -198.14
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 783860143
## initial log joint probability = -1691.26
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1941332230
## initial log joint probability = -63945.3
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = user
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-012
## tol_grad = 1e-008
## tol_param = 1e-008
## tol_rel_obj = 10000
## tol_rel_grad = 1e+007
## history_size = 5
## seed = 1254228559
## initial log joint probability = -4255.61
## Optimization terminated normally:
## Convergence detected: absolute parameter change was below tolerance
final_result %<>% select(2,1)
predict <- final_result %>%
select(1,2) %>%
mutate(year = 2018,
month = "05" %>% as.factor()) %>%
select(1,3,4,2) %>%
rename(mcc_category = mcc_category,
year = year,
month = month,
spending = predict) %>%
as.data.frame()
mcc_spending <- raw_data %>%
mutate(year = year(date),
month = month(date)) %>%
mutate(month = case_when(
month == 1 ~ "01",
month == 2 ~ "02",
month == 3 ~ "03",
month == 4 ~ "04",
month == 5 ~ "05",
month == 6 ~ "06",
month == 7 ~ "07",
month == 8 ~ "08",
month == 9 ~ "09",
month == 10 ~ "10",
month == 11 ~ "11",
month == 12 ~ "12"
) %>% as.factor) %>%
group_by(mcc_category, year, month) %>%
summarise(spending = round(sum(trans_amount_qd)/1e9,5)) %>%
as.data.frame()
mcc_spending <- rbind(mcc_spending, predict)
mcc_spending %>%
mutate(yearmonth = paste(year,month, sep = "") %>% as.factor()) %>%
group_by(yearmonth) %>%
summarise(spending = sum(spending)) %>%
ggplot(aes(x = as.factor(yearmonth), y = spending, group = 1)) +
geom_line(col = "darkgreen") +
geom_point(col = "darkred") +
geom_text(aes(label = round(spending,2)), check_overlap = TRUE, vjust = -0.5) +
theme_economist() +
xlab("Timeline") +
ylab("Credit card spending amount (bil)") +
ggtitle("Spending per transaction since 2017")
After using prophet for time series forecast, we can see that credit card spending in May 2018 reach more than 3000 billion VND