Can Macroeconomic factors predict stock market returns? A case of listed financial stocks on the Dhaka Stock Exchange
Both asset managers and academicians have resorted to chalk out the factors highly correlated with stock market growth in their research. The business value of such research is immense, from decisions in asset allocations to tilting the portfolios for sectoral allocations. Even though the research from academia is available, most of the research done by the asset managers are proprietary and are part of the ‘black box’ that helps them in their investment management decisions. It makes sense, as the managers with high alpha are not motivated to share their intuitions with the rest so that they can keep on returning a satisfactory alpha to the investors. Moreover, although research papers using macroeconomic factors to predict overall stock market returns are common, very few academicians have attempted to predict stock returns on a sectors basis. Even less common are empirical studies conducted on emerging markets, partly because of the lack of availability of data and partly because these markets are too small for the bigger asset managers to enter. This paper aims to find if some of the macroeconomic variables may help to predict stock market returns of listed financial stocks on the Dhaka Stock Exchange. The financial stocks will be divided into two categories, banks and insurance companies. Several models including linear regression will be used to test out the hypothesis that bank and insurance stock returns can be predicted using changes in macroeconomic variables.
There is a myriad of research conducted regarding the factors affecting the stock market returns all over the world. Perhaps this is of special interest to researchers because of the potential applications of a valid model in improving the returns of the funds under management of investment management companies. More research papers are available on the effect of macroeconomic factors on stock returns in the developed market, perhaps due to more availability of data. However, since my research if focused on Bangladesh, an emerging market economy, we have tried to look to focus on literature available on other emerging/frontier markets. The literature review for this paper can be divided into two segments, the first dealing with effect of macroeconomic variables on overall stock market returns using the market indices, and the second part dealing with the effect on either banking or insurance stock returns. As expected, there has been very limited research on the sectoral returns and there is a demand for sectoral research to help with portfolio allocations for a higher risk/return dynamic while keeping the portfolio well diversified. One of the most notable research testing the relationship between macroeconomic variables and stock returns has been conducted by Garcia and Liu [1]. The research uses data for 15 countries for the period 1980 to 1995. The study includes seven countries in Latin America, Six in East Asia, and two advanced economies: US and Japan. The paper explores the effects of real income, saving rate, financial intermediary development, stock market liquidity, and macroeconomic stability on stock market capitalization. The authors use regression and conclude that real income level, savings rate, financial intermediary development, and stock market liquidity are important predictors of market capitalization. A few other studies tests have been carried out on markets in the sub-continent. Nijam, Ismail and Musthafa [2] have investigated the relationship between the returns of the All Share Price Index of Colombo Stock Exchange and five macroeconomic variables, namely GDP, Inflation, Interest rate, Balance of Payments and Exchange rate. Ordinary Least Square was used with linear, linear-log, log-log and log-linear data transformations. The study finds GDP, Exchange Rate, and Interest rate to be significantly positively related, whereas Inflation is found to be negatively related. Balance of Payments was found to be an insignificant variable. Srivastava [3] has used six factors to conduct a similar test on the Indian market. He uses Industrial production index of all commodities, wholesale price index, interest rate and foreign exchange rates and monthly time series for the period April 1996 to January 2009 for the Sensex. The paper applies the Johansen multivariate cointegration test and concludes that Industrial Production index, wholesale price index, and interest rates significantly affect the pricing in the Indian Stock market. In another study, Haque and Sarwar [4] explores the relationship between macroeconomic variables and individual stock returns on 394 listed equity on the Karachi Stock Exchange. The study uses panel data specification tests and concludes that volatility and GDP have significant positive associations, while inflation, interest rate, budget deficit and money supply have significant negative associations. One interesting finding of this study is that different sectors are found to react differently to the same macroeconomic variable. Regarding sectoral research, specifically on the financial sector, we have come across three papers that have attempted to test the relationship between macroeconomic factors and banking and insurance stocks returns. Nurazi and Usman study [5] have attempted to find if there is any relationship between several microeconomic factors using the CAMELS ratios and several macroeconomic factors like interest rate, exchange rate, and inflation rate with the stock returns of 16 listed Indonesian banks. The study found evidence that all the three macroeconomic variables negatively and significantly contribute to the stock returns of the sixteen banks for the period 2002 to 2011. The study has used the Pooled Least Square Model to carry out the test. Another study by Mugambi and Okech [6] uses quarterly time series data from 200 to 2015 to explore if there is a relationship between stock returns and macroeconomic variables on listed banks in the Nairobi Stock Exchange. The variables used are exchange rate, inflation rate, interest rate and GDP. A linear regression model using Ordinary Least Squares was used. The paper concludes that all variables except GDP have a significant effect at the 5% level of significance. Bhattarai [7] uses data for seven banks and six insurance companies from 2009 to 2014. The study uses regression and concludes that Money supply, GDP growth, Exchange rate and Interest rates are all factors having significant associations with the movements in prices of all 13 financial stocks. As regards to research done specifically on the Dhaka Stock Exchange, which is the focus of this study, most authors have worked on data relating to overall market returns. Ali [8] has used a multivariate regression model and found inflation and foreign remittance to have negative influence and industrial production index and market Pes to have positive influence. However, a lack of Granger causality may point to an informationally inefficient market. A similar paper by Mohiuddin, Alam and Shahid [9] used quarterly data and found no significant relationship between macroeconomic variables and market returns. Hasan [10] uses Ordinary Least Squares and Weighted Least Squares and uses monthly returns for 126 stocks to test the association between macroeconomic variables and sector returns. The author concludes that only the banking sector returns a significant association.
We intend to use monthly macroeconomic data for Bangladesh and monthly market cap data for banks and insurance companies listed on the Dhaka Stock Exchange.
The research questions are:
Are there macroeconomic variables that can help predict the stock returns of the banks listed on the Dhaka Stock Exchange?
Are there macroeconomic variables that can help predict the stock returns of the insurance companies listed on the Dhaka Stock Exchange?
This study takes a different approach to historical work. First, we will be using monthly data for the variables and intend to expand beyond the five macroeconomic factors used by most of these papers. Secondly, a YoY (Year on Year) lag transformation will be used to cast out yearly seasonality in the data. Most of the papers above have limited themselves to Ordinary Least Squares and finding causality, but none has tested out if some these variables can be used to predict sectoral returns within acceptable boundaries. This study is a first in the sense that we will be using Linear Regression models including Partial Least Squares, Principal Component Regression, and if possible penalized models like Ridge, Lasso and Elastic net to explore which method is best at predicting the sectoral returns on the Dhaka Stock Exchange. If data permits, non-linear models like kNN, SVM, MARS and Neural Network models will be tested. The same process will be followed separately for bank and insurance companies. Models will be evaluated using metrics such as adjusted r squared, accuracy, MAPE, and RMSE. It will be interesting to find out if the association between macroeconomic variables and stock returns that have been observed in other emerging markets hold true for the banking and insurance stocks on the Dhaka Stock Exchange. If such an association exists, it will be of immense interest to the fund managers in helping to decide on sectoral tilts.
The data has been specifically collated by us for the purpose of this study. Most of the variables are available on the CEIC website (https://www.ceicdata.com/en/bangladesh). There were a lot of missing values in the interest rates data and those have been picked up from the Bangladesh Bank website (https://bb.org.bd/pub/archive.php?i=6). The process was fully manual as we had to go into several timelines to get the required data. The variables were mostly in the form of YoY basis. The raw variables were also converted to YoY to keep consistency throughout the dataset. A description of the variables is provided below: Independent variables: YoY percentage change MCB : Market Capitalization of banks MCI: Market Capitalization of insurance Dependent variables: YoY percentage change TB5: TBill rate, 5 years, a proxy of long term interest rate TB10: Tbill rate, 10 years, a proxy of long term interest rates BLR: Average bank lending rate CPI: Inflation rate CAC: Change in current account balance, proxy of international trade DCR: Change in Domestic credit level ELC: Change in electricity availability (this is an important factor supporting industrialization in developing countries) EXC: Change in Taka/dollar exchange rate FXR: Change in Foreign Exchange reserves IPR: Change in Industrial production rate M2: Change in broad money M1: Change in narrow money PE: Change in overall price to earnings ratio STR: Short term interest rate (Prime rate) DPG: Change in total deposit level, proxy of savings rate EXG: Change in export level IMG: Change in Import level LOAG: Change in total loans taken, a proxy of Investment level The dataset contains monthly data starting December 2004 till October 2020. This is almost 16 years of data and covers one boom and one bust on the DSE.
First We will read in the Excel file compiled. Due to lack of enough data, remittance has been dropped from this database.
#Read in the data
data<-read.csv("https://raw.githubusercontent.com/zahirf/Data698/main/macrodata.csv", stringsAsFactors = F, sep=",")
data1<-data[,-1]
There are 193 observations of 21 variables. MCB and MCI, the change in market caps of banks and insurance sectors, are the two dependent variables. Apart from month, the rest are predictor variables. Let us take a look at the summary
temp<-psych::describe(data1)
temp%>%
kable(caption='Summary statistics Numerical') %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = T)
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MCB | 1 | 191 | 0.2427514 | 0.5201478 | 0.0567252 | 0.1597277 | 0.3598092 | -0.3971493 | 2.393861 | 2.791010 | 1.6132664 | 2.7176559 | 0.0376366 |
| MCI | 2 | 191 | 0.3105277 | 0.6447370 | 0.0993257 | 0.1960082 | 0.2982239 | -0.4219561 | 3.656711 | 4.078667 | 2.3683365 | 7.1059110 | 0.0466515 |
| TB5 | 3 | 191 | 0.0212408 | 0.2419886 | -0.0020000 | 0.0043072 | 0.1734642 | -0.5110000 | 1.178000 | 1.689000 | 1.2240442 | 3.5774127 | 0.0175097 |
| TB10 | 4 | 191 | 0.0011518 | 0.1631245 | 0.0080000 | 0.0055556 | 0.1200906 | -0.4520000 | 0.351000 | 0.803000 | -0.2249347 | 0.0402007 | 0.0118033 |
| BLR | 5 | 191 | -0.0092618 | 0.0800538 | -0.0030000 | -0.0119085 | 0.0904386 | -0.1990000 | 0.188000 | 0.387000 | 0.1389610 | -0.4396476 | 0.0057925 |
| CPI | 6 | 191 | 7.0263351 | 1.9083485 | 6.3700000 | 6.7456732 | 1.3254444 | 2.2460000 | 12.716000 | 10.470000 | 1.1628794 | 1.0036915 | 0.1380833 |
| CAC | 7 | 191 | 0.4267068 | 41.8461889 | -0.6440000 | -0.7068758 | 1.3788180 | -134.9290000 | 537.268000 | 672.197000 | 10.7158025 | 139.7954663 | 3.0278840 |
| DCR | 8 | 191 | 16.0074764 | 4.7844381 | 14.4100000 | 15.4851634 | 4.8140022 | 9.6300000 | 29.177000 | 19.547000 | 0.8010703 | -0.2467050 | 0.3461898 |
| ELC | 9 | 191 | 0.0849476 | 0.0876615 | 0.0770000 | 0.0818497 | 0.0726474 | -0.1850000 | 0.435000 | 0.620000 | 0.3460588 | 1.8424585 | 0.0063430 |
| EXC | 10 | 191 | 0.0235393 | 0.0414074 | 0.0090000 | 0.0197059 | 0.0207564 | -0.0500000 | 0.174000 | 0.224000 | 1.0195664 | 1.1137629 | 0.0029961 |
| FXR | 11 | 191 | 0.1874136 | 0.2154764 | 0.1740000 | 0.1768562 | 0.2431464 | -0.2400000 | 0.836000 | 1.076000 | 0.4333208 | -0.4507889 | 0.0155913 |
| IPR | 12 | 191 | 10.0785916 | 7.2165515 | 10.4760000 | 10.3104967 | 6.6761478 | -29.2280000 | 27.266000 | 56.494000 | -1.0435233 | 4.6784510 | 0.5221713 |
| M2 | 13 | 191 | 16.0569948 | 3.8287890 | 16.2300000 | 16.0663333 | 4.5515820 | 8.7600000 | 23.506000 | 14.746000 | -0.0201483 | -0.9847020 | 0.2770414 |
| M1 | 14 | 191 | 0.1346335 | 0.0782014 | 0.1330000 | 0.1354379 | 0.0667170 | -0.1040000 | 0.354000 | 0.458000 | -0.1460577 | 0.7250039 | 0.0056585 |
| PE | 15 | 191 | 0.0491204 | 0.3221201 | -0.0360000 | 0.0186536 | 0.2520420 | -0.5870000 | 1.180000 | 1.767000 | 0.9339893 | 0.7873051 | 0.0233078 |
| STR | 16 | 191 | 0.2489476 | 1.1109342 | 0.0170000 | 0.0601699 | 0.3721326 | -0.8200000 | 7.435000 | 8.255000 | 4.0542367 | 20.8011968 | 0.0803844 |
| DPG | 17 | 191 | 16.0667696 | 3.8424164 | 16.6020000 | 16.1379804 | 5.0393574 | 9.2270000 | 22.914000 | 13.687000 | -0.1567664 | -1.2352058 | 0.2780275 |
| EXG | 18 | 191 | 12.9780314 | 18.3841520 | 10.3940000 | 11.3361634 | 11.2173516 | -61.3670000 | 75.661000 | 137.028000 | 0.5242242 | 3.8298597 | 1.3302305 |
| IMG | 19 | 191 | 12.8969948 | 19.5402860 | 10.4540000 | 12.5714641 | 18.7875072 | -50.6960000 | 68.067000 | 118.763000 | 0.0451563 | 0.3604087 | 1.4138855 |
| LOAG | 20 | 191 | 16.6023927 | 4.3046512 | 15.7830000 | 16.2515098 | 4.4181480 | 9.4710000 | 28.129000 | 18.658000 | 0.6508671 | -0.4195888 | 0.3114736 |
Missing Data
# Missing Data
vis_miss(data1)
There is no missing data in the observations as much care has been put to compile a complete dataset.
Skewness and Outliers
We will look at the distributions of the variables and identify outliers.
We find that a lot of the variables are bimodal or trimodal and do not follow the normal distributions.
par(mfrow = c(3, 3))
data_sub = melt(data1)
suppressWarnings(ggplot(data_sub, aes(x= value)) +
geom_density(fill='orange') + facet_wrap(~variable, scales = 'free') )
ggplot(data = reshape2::melt(data1) , aes(x = variable, y = log(abs(value)))) +
geom_boxplot(fill = 'steelblue2', outlier.alpha = 0.75) +
labs(title = 'Boxplot: Scaled Training Set',
x = 'Variables',
y = 'log-Normalized Values') +
theme(panel.background = element_rect(fill = 'white'),
axis.text.x = element_text(size = 10, angle = 90))
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).
There are outliers as expected from capital market returns but none of the variables are overwhelmed with outliers. We will correct this using senter and scale.
preprocessing <- preProcess(as.data.frame(data1), method = c("center", "scale"))
#preprocessing
data_imp <- predict(preprocessing, data1)
The distributions look closer to normal than before after the centering and scaling.
par(mfrow = c(3, 3))
data_sub = melt(data_imp)
suppressWarnings(ggplot(data_sub, aes(x= value)) +
geom_density(fill='orange') + facet_wrap(~variable, scales = 'free') )
ggplot(data = reshape2::melt(data_imp) , aes(x = variable, y = log(abs(value)))) +
geom_boxplot(fill = 'steelblue2', outlier.alpha = 0.75) +
labs(title = 'Boxplot: Scaled Training Set',
x = 'Variables',
y = 'log-Normalized Values') +
theme(panel.background = element_rect(fill = 'white'),
axis.text.x = element_text(size = 10, angle = 90))
Remove correlated predictors
We will look for predictors with correlators higher than 0.90 and remove them.
#Separate x and y variables
predictors<-data_imp[, -c(1:2)]
mcb<-data.frame(data_imp[,1]) #Y variable for bank
mci<-data.frame(data_imp[,2]) #Y variable for insurance
colnames(mcb)[1] <- "mcb" #rename the columns
colnames(mci)[1] <- "mci"
Let us first explore the relationship of the predictors with the two y variables.
Except for CAC (change in current account balance), there seems to be some kind of relationship with banking sector returns and the other variables.
#plot
featurePlot(predictors,mcb$mcb)
For the insurance sector returns, the change in current account balance and change in short term interest rates seem to be the variables not correlated with stock returns.
#plot
featurePlot(predictors,mci$mci)
We would like to take a detailed look at the relationship among the predictor variables.
corr <- round(cor(predictors, use="na.or.complete"), 1)
ggcorrplot(corr,
type="lower",
lab=TRUE,
lab_size=3,
method="circle",
colors=c("red2", "white", "blue3"),
title="Correlation of variables",
ggtheme=theme_minimal)
We see only two pairs with correlations above 0.85. The 5 year and 10 year treasury bills. Both are proxies for long term interest rates. The Deposit growth rate and the money supply measure M2 is the other pair, and we will remove the deposit growth rate. This should take care of multicollinearity
tooHigh <- findCorrelation(cor(predictors, use="na.or.complete"), cutoff = .85, names = TRUE)
tooHigh
## [1] "DPG" "TB10"
predictors[ ,c(tooHigh)] <- list(NULL)
colnames(predictors)
## [1] "TB5" "BLR" "CPI" "CAC" "DCR" "ELC" "EXC" "FXR" "IPR" "M2"
## [11] "M1" "PE" "STR" "EXG" "IMG" "LOAG"
Near Zero variance
There are no near zero predictors in the dataset.
caret::nearZeroVar(predictors, names = TRUE)
## character(0)
We will first split the data into train and test sets.We will use the 70% level to split.
set.seed(675)
#Splitting data into training and test datasets
bank<-cbind(mcb, predictors)
splitt <- createDataPartition(bank$mcb, p=0.7, list=FALSE)
x_train <- bank[splitt, ]
x_train<-x_train[,-1]
y_train <- as.data.frame(bank$mcb[splitt])
x_test <- bank[-splitt, ]
x_test<-x_test[,-1]
y_test <- as.data.frame(bank$mcb[-splitt])
#Correct column names
colnames(y_train) <- c("MCB")
colnames(y_test) <- c("MCB")
ctrl <- trainControl(method = "cv", number = 10)
lm <- train(x_train, y_train$MCB,
method = "lm",
trControl = ctrl)
summary(lm)
##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96497 -0.33397 0.03142 0.29204 0.95887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.029511 0.040872 -0.722 0.4717
## TB5 -0.037517 0.073739 -0.509 0.6119
## BLR -0.059279 0.076782 -0.772 0.4416
## CPI 0.103418 0.064924 1.593 0.1139
## CAC -0.093574 0.098566 -0.949 0.3444
## DCR 1.120611 0.211030 5.310 5.22e-07 ***
## ELC 0.016439 0.046745 0.352 0.7257
## EXC 0.072083 0.097757 0.737 0.4624
## FXR 0.223101 0.125678 1.775 0.0784 .
## IPR 0.030919 0.057307 0.540 0.5905
## M2 -0.525381 0.126821 -4.143 6.48e-05 ***
## M1 0.085463 0.104784 0.816 0.4164
## PE 0.865999 0.071854 12.052 < 2e-16 ***
## STR -0.031095 0.078873 -0.394 0.6941
## EXG 0.009485 0.072210 0.131 0.8957
## IMG -0.110853 0.070618 -1.570 0.1192
## LOAG -0.221254 0.134126 -1.650 0.1017
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4565 on 118 degrees of freedom
## Multiple R-squared: 0.8224, Adjusted R-squared: 0.7984
## F-statistic: 34.16 on 16 and 118 DF, p-value: < 2.2e-16
lm$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 0.4959997 0.7458327 0.4109339 0.05713202 0.1184137 0.04192749
plot(varImp(lm), main='LM Variables by importance')
plot(residuals(lm), main='Residuals LM' )
set.seed(100)
pls <- train(x_train, y_train$MCB,
method = "pls",
tuneLength = 25,
trControl = ctrl)
summary(pls)
## Data: X dimension: 135 16
## Y dimension: 135 1
## Fit method: oscorespls
## Number of components considered: 14
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 23.92 36.28 57.01 64.44 72.45 77.18 80.04
## .outcome 49.15 69.91 74.80 77.96 79.59 80.10 80.85
## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
## X 83.12 86.16 88.59 92.55 94.61 95.04 96.58
## .outcome 81.50 81.87 82.12 82.18 82.21 82.24 82.24
plot(pls, main='PLS Components versus RMSE')
pls$bestTune
## ncomp
## 14 14
#Save lowest RMSE in results
pls_results <- pls$results %>%
filter(ncomp==14)
pls_results
## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 14 0.4851892 0.7626439 0.4032369 0.08212703 0.1255137 0.08777629
plot(varImp(pls), main='PLS Variables by Importance')
plot(residuals(pls) )
ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 15))
ridge <- train(x_train, y_train$MCB,
method = "ridge",
tuneGrid = ridgeGrid,
trControl = ctrl)
summary(ridge)
## Length Class Mode
## call 4 -none- call
## actions 25 -none- list
## allset 16 -none- numeric
## beta.pure 400 -none- numeric
## vn 16 -none- character
## mu 1 -none- numeric
## normx 16 -none- numeric
## meanx 16 -none- numeric
## lambda 1 -none- numeric
## L1norm 25 -none- numeric
## penalty 25 -none- numeric
## df 25 -none- numeric
## Cp 25 -none- numeric
## sigma2 1 -none- numeric
## xNames 16 -none- character
## problemType 1 -none- character
## tuneValue 1 data.frame list
## obsLevels 1 -none- logical
## param 0 -none- list
plot(ridge, main='RMSE vs Weight Decay')
ridge$bestTune
## lambda
## 2 0.007142857
ridge_results <- ridge$results
ridge_results<-ridge_results[row.names(ridge_results) == 2, ]
plot(varImp(ridge), main='Ridge Variables by Importance')
plot(residuals(ridge) )
### Summary of linear models
We see similar variables in the top ten important variables list for all three linear models.
p1<-plot(varImp(lm), top=10, main='LM')
p2<-plot(varImp(pls), top=10, main='PLS')
p3<-plot(varImp(ridge),top=10, main='Ridge')
gridExtra::grid.arrange(p1, p2,p3, ncol = 3)
knn <- train(x = x_train, y = y_train$MCB,
method = "knn",
tuneLength = 25,
trControl = ctrl)
knn
## k-Nearest Neighbors
##
## 135 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 123, 122, 120, 120, 122, 121, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 0.4297728 0.8288522 0.3206046
## 7 0.4796549 0.8171145 0.3508061
## 9 0.5503164 0.7696747 0.3956877
## 11 0.5615214 0.7850441 0.3991806
## 13 0.5908162 0.7443228 0.4174694
## 15 0.6069796 0.7311551 0.4267778
## 17 0.6281199 0.6912893 0.4458129
## 19 0.6413438 0.6910558 0.4534921
## 21 0.6511627 0.6744651 0.4625602
## 23 0.6661625 0.6608598 0.4707100
## 25 0.6841502 0.6495093 0.4878065
## 27 0.6907142 0.6397021 0.4956525
## 29 0.7053210 0.6202751 0.5044997
## 31 0.7166127 0.6030113 0.5134191
## 33 0.7247616 0.6016335 0.5200782
## 35 0.7377396 0.5758104 0.5295366
## 37 0.7446191 0.5641693 0.5324217
## 39 0.7552855 0.5433497 0.5360298
## 41 0.7524054 0.5507129 0.5348462
## 43 0.7611104 0.5344460 0.5431021
## 45 0.7690780 0.5245354 0.5486000
## 47 0.7786472 0.5047102 0.5557803
## 49 0.7847242 0.4910153 0.5605712
## 51 0.7909341 0.4759253 0.5651026
## 53 0.8007937 0.4577813 0.5734161
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.
knnPred <- predict(knn, newdata = x_test)
knn_results <- postResample(pred = knnPred, obs = y_test$MCB)
knn_results
## RMSE Rsquared MAE
## 0.4229597 0.8157786 0.3166943
plot(varImp(knn), main='kNN Variables by Importance')
plot(residuals(knn), main='Residuals kNN Model')
nnetGrid <- expand.grid(.decay = c(0, .01, 1),
.size = c(1:10),
.bag = FALSE)
set.seed(100)
nnet <- train(x = x_train,
y = y_train$MCB,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = ctrl,
linout = FALSE, trace = FALSE,
MaxNWts = 5* (ncol(x_train) + 1) + 5 + 1,
maxit = 250)
nnet
## Model Averaged Neural Network
##
## 135 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 122, 121, 121, 123, 121, 121, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 0.9798420 NaN 0.7773056
## 0.00 2 0.9798416 0.6459013 0.7773053
## 0.00 3 0.9771588 0.4347334 0.7763439
## 0.00 4 0.9754947 0.3975196 0.7741696
## 0.00 5 0.9780130 0.1328233 0.7744443
## 0.00 6 NaN NaN NaN
## 0.00 7 NaN NaN NaN
## 0.00 8 NaN NaN NaN
## 0.00 9 NaN NaN NaN
## 0.00 10 NaN NaN NaN
## 0.01 1 0.7651605 0.6796539 0.6407079
## 0.01 2 0.7487507 0.7351500 0.6195027
## 0.01 3 0.7504572 0.7473636 0.6207154
## 0.01 4 0.7547597 0.7308998 0.6270593
## 0.01 5 0.7553922 0.7208053 0.6281627
## 0.01 6 NaN NaN NaN
## 0.01 7 NaN NaN NaN
## 0.01 8 NaN NaN NaN
## 0.01 9 NaN NaN NaN
## 0.01 10 NaN NaN NaN
## 1.00 1 0.8767042 0.6455948 0.7395006
## 1.00 2 0.8520395 0.6696216 0.7176598
## 1.00 3 0.8444341 0.6703542 0.7104081
## 1.00 4 0.8350163 0.6786887 0.7025693
## 1.00 5 0.8321711 0.6804939 0.6998788
## 1.00 6 NaN NaN NaN
## 1.00 7 NaN NaN NaN
## 1.00 8 NaN NaN NaN
## 1.00 9 NaN NaN NaN
## 1.00 10 NaN NaN NaN
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 2, decay = 0.01 and bag = FALSE.
summary(nnet)
## Length Class Mode
## model 5 -none- list
## repeats 1 -none- numeric
## bag 1 -none- logical
## seeds 5 -none- numeric
## names 16 -none- character
## terms 3 terms call
## coefnames 16 -none- character
## xlevels 0 -none- list
## xNames 16 -none- character
## problemType 1 -none- character
## tuneValue 3 data.frame list
## obsLevels 1 -none- logical
## param 4 -none- list
nnet$bestTune
## size decay bag
## 12 2 0.01 FALSE
nnetPred <- predict(nnet, newdata=x_test)
NNET <- postResample(pred = nnetPred, obs = y_test$MCB)
NNET
## RMSE Rsquared MAE
## 0.7089727 0.7360134 0.5978394
plotmo(nnet)
## plotmo grid: TB5 BLR CPI CAC DCR ELC
## -0.1084383 0.04074487 -0.2307414 -0.02716393 -0.3587624 -0.09066284
## EXC FXR IPR M2 M1 PE STR
## -0.3752772 0.04448927 0.05049619 0.04518537 0.1069864 -0.2425196 -0.2105864
## EXG IMG LOAG
## -0.14393 -0.1250235 -0.1866336
plot(varImp(nnet), main= 'Nnet Variables by Importance')
set.seed(200)
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
mars<- train(x = x_train, y = y_train$MCB,
method = "earth",
tuneGrid = marsGrid,
trControl = ctrl)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
mars
## Multivariate Adaptive Regression Spline
##
## 135 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 122, 121, 122, 123, 122, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 0.7828631 0.3961661 0.6214264
## 1 3 0.6558746 0.6139275 0.5160067
## 1 4 0.5669884 0.6987255 0.4632596
## 1 5 0.5431638 0.7175091 0.4434431
## 1 6 0.4782926 0.7698784 0.4093307
## 1 7 0.4324762 0.8051950 0.3574345
## 1 8 0.4131732 0.8307522 0.3423554
## 1 9 0.4106595 0.8332552 0.3433455
## 1 10 0.4204844 0.8188496 0.3545389
## 1 11 0.4091654 0.8323371 0.3460230
## 1 12 0.4101774 0.8328498 0.3465441
## 1 13 0.4448865 0.7988036 0.3608075
## 1 14 0.4548243 0.7910200 0.3730259
## 1 15 0.4545797 0.7854380 0.3660873
## 1 16 0.4648106 0.7759917 0.3709382
## 1 17 0.4564673 0.7811348 0.3646842
## 1 18 0.4555716 0.7807619 0.3699457
## 1 19 0.4600576 0.7784273 0.3733455
## 1 20 0.4600576 0.7784273 0.3733455
## 1 21 0.4582866 0.7796145 0.3709677
## 1 22 0.4591089 0.7792369 0.3723104
## 1 23 0.4591089 0.7792369 0.3723104
## 1 24 0.4591089 0.7792369 0.3723104
## 1 25 0.4591089 0.7792369 0.3723104
## 1 26 0.4591089 0.7792369 0.3723104
## 1 27 0.4591089 0.7792369 0.3723104
## 1 28 0.4591089 0.7792369 0.3723104
## 1 29 0.4591089 0.7792369 0.3723104
## 1 30 0.4591089 0.7792369 0.3723104
## 1 31 0.4591089 0.7792369 0.3723104
## 1 32 0.4591089 0.7792369 0.3723104
## 1 33 0.4591089 0.7792369 0.3723104
## 1 34 0.4591089 0.7792369 0.3723104
## 1 35 0.4591089 0.7792369 0.3723104
## 1 36 0.4591089 0.7792369 0.3723104
## 1 37 0.4591089 0.7792369 0.3723104
## 1 38 0.4591089 0.7792369 0.3723104
## 2 2 0.7828631 0.3961661 0.6214264
## 2 3 0.6684515 0.6057972 0.5332141
## 2 4 0.5784430 0.6851370 0.4675201
## 2 5 0.5440394 0.7223495 0.4414993
## 2 6 0.4791392 0.7849091 0.3705263
## 2 7 0.4019894 0.8550400 0.3183873
## 2 8 0.3627111 0.8808382 0.2860863
## 2 9 0.3397024 0.8926157 0.2755388
## 2 10 0.3427765 0.8901162 0.2748148
## 2 11 0.3476235 0.8832383 0.2735279
## 2 12 0.3539016 0.8789854 0.2799788
## 2 13 0.3549362 0.8858539 0.2765993
## 2 14 0.3510347 0.8864693 0.2731189
## 2 15 0.3486083 0.8855971 0.2751290
## 2 16 0.3472564 0.8913351 0.2743585
## 2 17 0.3475296 0.8913129 0.2770975
## 2 18 0.3412851 0.8938953 0.2732802
## 2 19 0.4329626 0.8435168 0.3086288
## 2 20 0.4329626 0.8435168 0.3086288
## 2 21 0.4329626 0.8435168 0.3086288
## 2 22 0.4329626 0.8435168 0.3086288
## 2 23 0.4329626 0.8435168 0.3086288
## 2 24 0.4265904 0.8439140 0.3035353
## 2 25 0.4265904 0.8439140 0.3035353
## 2 26 0.4265904 0.8439140 0.3035353
## 2 27 0.4265904 0.8439140 0.3035353
## 2 28 0.4265904 0.8439140 0.3035353
## 2 29 0.4265904 0.8439140 0.3035353
## 2 30 0.4265904 0.8439140 0.3035353
## 2 31 0.4265904 0.8439140 0.3035353
## 2 32 0.4265904 0.8439140 0.3035353
## 2 33 0.4265904 0.8439140 0.3035353
## 2 34 0.4265904 0.8439140 0.3035353
## 2 35 0.4265904 0.8439140 0.3035353
## 2 36 0.4265904 0.8439140 0.3035353
## 2 37 0.4265904 0.8439140 0.3035353
## 2 38 0.4265904 0.8439140 0.3035353
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 9 and degree = 2.
marsPred <- predict(mars, newdata=x_test)
MARS <- postResample(pred = marsPred, obs = y_test$MCB)
MARS
## RMSE Rsquared MAE
## 0.4478498 0.7982848 0.3175453
plotmo(mars)
## plotmo grid: TB5 BLR CPI CAC DCR ELC
## -0.1084383 0.04074487 -0.2307414 -0.02716393 -0.3587624 -0.09066284
## EXC FXR IPR M2 M1 PE STR
## -0.3752772 0.04448927 0.05049619 0.04518537 0.1069864 -0.2425196 -0.2105864
## EXG IMG LOAG
## -0.14393 -0.1250235 -0.1866336
plot(varImp(mars), main='MARS Variables by Importance')
plot(residuals(mars))
set.seed(100)
svm <- train(x = x_train, y = y_train$MCB,
method = "svmLinear",
tuneLength = 25,
trControl = trainControl(method = "cv"))
svm
## Support Vector Machines with Linear Kernel
##
## 135 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 122, 121, 121, 123, 121, 121, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 0.5299002 0.7293758 0.4323107
##
## Tuning parameter 'C' was held constant at a value of 1
svmPred <- predict(svm, newdata=x_test)
SVM<- postResample(pred = svmPred, obs = y_test$MCB)
SVM
## RMSE Rsquared MAE
## 0.4470014 0.8183016 0.3579210
plotmo(svm)
## plotmo grid: TB5 BLR CPI CAC DCR ELC
## -0.1084383 0.04074487 -0.2307414 -0.02716393 -0.3587624 -0.09066284
## EXC FXR IPR M2 M1 PE STR
## -0.3752772 0.04448927 0.05049619 0.04518537 0.1069864 -0.2425196 -0.2105864
## EXG IMG LOAG
## -0.14393 -0.1250235 -0.1866336
plot(varImp(svm), main='SVM Variables by Importance')
p1<-plot(varImp(knn), top=10, main='kNN')
p2<-plot(varImp(nnet), top=10, main='Nnet')
p3<-plot(varImp(mars),top=10, main='MARS')
p4<-plot(varImp(svm), top=10, main='SVM')
gridExtra::grid.arrange(p1, p2,p3,p4, ncol = 4)
We see that Price earnings ratio, Import growth and CPI show up as the most important variables in all the non linear models.
The following table shows the RMSE and Rsquare for the different models.
#read in results
lm_res<- c('Linear Model', lm$results$RMSE, lm$results$Rsquared, lm$results$MAE)
pls_res <- c('Partial Least Square', pls_results$RMSE, pls_results$Rsquared, pls_results$MAE)
ridge_res <- c('Ridge Regression', ridge_results$RMSE, ridge_results$Rsquared, ridge_results$MAE)
knn_res <- c('KNN', knn_results['RMSE'], knn_results['Rsquared'], knn_results['MAE'])
nn_res<-c('Neural Network', NNET[1], NNET[2], NNET[3])
mars_res <- c('Multivariate Adaptive Regression Spline', MARS[1], MARS[2], MARS[3])
svm_res <- c('Support Vector Machines - Linear', SVM[1], SVM[2], SVM[3])
#Combine and rename
results<-rbind(lm_res, pls_res, ridge_res, knn_res, nn_res, mars_res, svm_res)
colnames(results) <- c('Model', 'RMSE', 'Rsquared', 'MAE')
row.names(results) <- c(1:nrow(results))
results%>%kable(caption='Summary statistics Numerical') %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = T)
| Model | RMSE | Rsquared | MAE |
|---|---|---|---|
| Linear Model | 0.495999685837682 | 0.74583271563391 | 0.41093389688591 |
| Partial Least Square | 0.48518921044043 | 0.762643880226287 | 0.403236872362957 |
| Ridge Regression | 0.47385269413623 | 0.767248641116019 | 0.399191411013123 |
| KNN | 0.422959673355544 | 0.815778618831522 | 0.316694334051523 |
| Neural Network | 0.70897274410554 | 0.736013420545942 | 0.597839410610149 |
| Multivariate Adaptive Regression Spline | 0.447849756743571 | 0.798284808287012 | 0.317545329886213 |
| Support Vector Machines - Linear | 0.447001401474033 | 0.818301592804144 | 0.357921012972581 |
WE can see that the Support Vector Machines did the best job of predicting the bank stock returns with both the highest RSquare and second lowest RMSE. The rest of the models did not lag far behind and have acceptable Rsquares and RMSEs. It may be concluded that the macroeconomic predictors can be useful in predicting the stock returns of the banking sector companies listed on the Dhaka Stock Exchange.
We will be doing the exact same analysis using the stock returns of the companies from the insurance sector.
We will first split the data into train and test sets.We will use the 70% level to split.
set.seed(676)
#Splitting data into training and test datasets
ins<-cbind(mci, predictors)
splitt <- createDataPartition(ins$mci, p=0.7, list=FALSE)
x_train <- ins[splitt, ]
x_train<-x_train[,-1]
y_train <- as.data.frame(ins$mci[splitt])
x_test <- ins[-splitt, ]
x_test<-x_test[,-1]
y_test <- as.data.frame(ins$mci[-splitt])
#Correct column names
colnames(y_train) <- c("MCI")
colnames(y_test) <- c("MCI")
ctrl <- trainControl(method = "cv", number = 10)
lm1 <- train(x_train, y_train$MCI,
method = "lm",
trControl = ctrl)
summary(lm)
##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96497 -0.33397 0.03142 0.29204 0.95887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.029511 0.040872 -0.722 0.4717
## TB5 -0.037517 0.073739 -0.509 0.6119
## BLR -0.059279 0.076782 -0.772 0.4416
## CPI 0.103418 0.064924 1.593 0.1139
## CAC -0.093574 0.098566 -0.949 0.3444
## DCR 1.120611 0.211030 5.310 5.22e-07 ***
## ELC 0.016439 0.046745 0.352 0.7257
## EXC 0.072083 0.097757 0.737 0.4624
## FXR 0.223101 0.125678 1.775 0.0784 .
## IPR 0.030919 0.057307 0.540 0.5905
## M2 -0.525381 0.126821 -4.143 6.48e-05 ***
## M1 0.085463 0.104784 0.816 0.4164
## PE 0.865999 0.071854 12.052 < 2e-16 ***
## STR -0.031095 0.078873 -0.394 0.6941
## EXG 0.009485 0.072210 0.131 0.8957
## IMG -0.110853 0.070618 -1.570 0.1192
## LOAG -0.221254 0.134126 -1.650 0.1017
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4565 on 118 degrees of freedom
## Multiple R-squared: 0.8224, Adjusted R-squared: 0.7984
## F-statistic: 34.16 on 16 and 118 DF, p-value: < 2.2e-16
lm1$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 0.4419947 0.7137254 0.3628797 0.1264806 0.1437704 0.1095867
plot(varImp(lm1), main='LM Variables by importance')
plot(residuals(lm1), main='Residuals LM' )
set.seed(100)
pls1 <- train(x_train, y_train$MCI,
method = "pls",
tuneLength = 25,
trControl = ctrl)
summary(pls1)
## Data: X dimension: 135 16
## Y dimension: 135 1
## Fit method: oscorespls
## Number of components considered: 5
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps
## X 21.83 41.83 52.16 61.66 68.30
## .outcome 47.80 56.88 69.32 74.62 75.85
plot(pls1, main='PLS Components versus RMSE')
pls1$bestTune
## ncomp
## 5 5
#Save lowest RMSE in results
pls1_results <- pls1$results %>%
filter(ncomp==5)
pls1_results
## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 5 0.4571651 0.6828355 0.3640916 0.1350082 0.1558406 0.09594917
plot(varImp(pls1), main='PLS Variables by Importance')
plot(residuals(pls1) )
ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 15))
ridge1 <- train(x_train, y_train$MCI,
method = "ridge",
tuneGrid = ridgeGrid,
trControl = ctrl)
summary(ridge1)
## Length Class Mode
## call 4 -none- call
## actions 17 -none- list
## allset 16 -none- numeric
## beta.pure 272 -none- numeric
## vn 16 -none- character
## mu 1 -none- numeric
## normx 16 -none- numeric
## meanx 16 -none- numeric
## lambda 1 -none- numeric
## L1norm 17 -none- numeric
## penalty 17 -none- numeric
## df 17 -none- numeric
## Cp 17 -none- numeric
## sigma2 1 -none- numeric
## xNames 16 -none- character
## problemType 1 -none- character
## tuneValue 1 data.frame list
## obsLevels 1 -none- logical
## param 0 -none- list
plot(ridge1, main='RMSE vs Weight Decay')
ridge1$bestTune
## lambda
## 6 0.03571429
ridge1_results <- ridge1$results
ridge1_results<-ridge1_results[row.names(ridge1_results) == 6, ]
ridge1_results
## lambda RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 6 0.03571429 0.455275 0.6685421 0.3564199 0.1057322 0.171994 0.07441358
plot(varImp(ridge1), main='Ridge Variables by Importance')
plot(residuals(ridge1) )
We see similar variables in the top ten important variables list for all three linear models.
p1<-plot(varImp(lm1), top=10, main='LM')
p2<-plot(varImp(pls1), top=10, main='PLS')
p3<-plot(varImp(ridge1),top=10, main='Ridge')
gridExtra::grid.arrange(p1, p2,p3, ncol = 3)
We find that PE has been replaced by Change in Exchange rate and loan growth rates as the top variables. The other important variables are deposit growth rate, monet supply M2, export growth rate and import growth rate.
knn1 <- train(x = x_train, y = y_train$MCI,
method = "knn",
tuneLength = 25,
trControl = ctrl)
knn1
## k-Nearest Neighbors
##
## 135 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 123, 122, 120, 120, 122, 121, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 0.4089157 0.7898811 0.2922318
## 7 0.4219275 0.7861062 0.3015809
## 9 0.4458421 0.7410069 0.3206597
## 11 0.4539090 0.7391085 0.3246510
## 13 0.4722146 0.7113184 0.3357770
## 15 0.4788309 0.6997660 0.3450264
## 17 0.4864670 0.6834446 0.3513716
## 19 0.4932438 0.6607236 0.3611435
## 21 0.5042321 0.6513884 0.3672226
## 23 0.5252982 0.6240902 0.3813403
## 25 0.5401341 0.6017261 0.3924175
## 27 0.5509078 0.5839875 0.4013634
## 29 0.5578200 0.5825403 0.4032653
## 31 0.5703572 0.5611357 0.4147600
## 33 0.5792777 0.5468639 0.4222959
## 35 0.5883212 0.5286205 0.4313152
## 37 0.5945825 0.5244124 0.4350325
## 39 0.6003365 0.5175872 0.4371826
## 41 0.6059037 0.5141532 0.4427362
## 43 0.6112735 0.5020125 0.4474559
## 45 0.6154121 0.4955817 0.4503586
## 47 0.6229250 0.4857005 0.4554690
## 49 0.6268308 0.4823045 0.4582686
## 51 0.6330973 0.4725126 0.4641476
## 53 0.6387549 0.4628344 0.4685732
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.
knnPred1 <- predict(knn1, newdata = x_test)
knn_results1 <- postResample(pred = knnPred1, obs = y_test$MCI)
knn_results1
## RMSE Rsquared MAE
## 0.7188419 0.8105752 0.4376714
plot(varImp(knn1), main='kNN Variables by Importance')
plot(residuals(knn1), main='Residuals kNN Model')
nnetGrid <- expand.grid(.decay = c(0, .01, 1),
.size = c(1:10),
.bag = FALSE)
set.seed(110)
nnet1 <- train(x = x_train,
y = y_train$MCI,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = ctrl,
linout = FALSE, trace = FALSE,
MaxNWts = 5* (ncol(x_train) + 1) + 5 + 1,
maxit = 250)
nnet1
## Model Averaged Neural Network
##
## 135 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 121, 123, 120, 122, 122, 122, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 0.8006616 NaN 0.6138065
## 0.00 2 0.8006616 0.5411679 0.6138065
## 0.00 3 0.8006616 0.7694807 0.6138065
## 0.00 4 0.8006616 0.3868924 0.6138065
## 0.00 5 0.8006616 NaN 0.6138065
## 0.00 6 NaN NaN NaN
## 0.00 7 NaN NaN NaN
## 0.00 8 NaN NaN NaN
## 0.00 9 NaN NaN NaN
## 0.00 10 NaN NaN NaN
## 0.01 1 0.6283319 0.6169347 0.5135357
## 0.01 2 0.6359674 0.6078016 0.5231397
## 0.01 3 0.6332928 0.6211338 0.5212375
## 0.01 4 0.6346771 0.6220899 0.5200557
## 0.01 5 0.6268023 0.6712441 0.5135239
## 0.01 6 NaN NaN NaN
## 0.01 7 NaN NaN NaN
## 0.01 8 NaN NaN NaN
## 0.01 9 NaN NaN NaN
## 0.01 10 NaN NaN NaN
## 1.00 1 0.7270736 0.6347139 0.6031713
## 1.00 2 0.7062747 0.6335868 0.5846712
## 1.00 3 0.6959433 0.6375719 0.5761733
## 1.00 4 0.6901206 0.6415519 0.5712686
## 1.00 5 0.6865532 0.6402069 0.5676773
## 1.00 6 NaN NaN NaN
## 1.00 7 NaN NaN NaN
## 1.00 8 NaN NaN NaN
## 1.00 9 NaN NaN NaN
## 1.00 10 NaN NaN NaN
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.01 and bag = FALSE.
summary(nnet1)
## Length Class Mode
## model 5 -none- list
## repeats 1 -none- numeric
## bag 1 -none- logical
## seeds 5 -none- numeric
## names 16 -none- character
## terms 3 terms call
## coefnames 16 -none- character
## xlevels 0 -none- list
## xNames 16 -none- character
## problemType 1 -none- character
## tuneValue 3 data.frame list
## obsLevels 1 -none- logical
## param 4 -none- list
nnet1$bestTune
## size decay bag
## 15 5 0.01 FALSE
nnetPred1 <- predict(nnet1, newdata=x_test)
NNET1 <- postResample(pred = nnetPred1, obs = y_test$MCI)
NNET1
## RMSE Rsquared MAE
## 1.0486480 0.6621918 0.6928478
plotmo(nnet1)
## plotmo grid: TB5 BLR CPI CAC DCR
## -0.1084383 -0.009221552 -0.4738836 -0.02544334 -0.3416235
## ELC EXC FXR IPR M2 M1
## 0.0005972519 -0.3511269 -0.09473711 -0.07587996 -0.1076045 -0.08482591
## PE STR EXG IMG LOAG
## -0.2797727 -0.2087861 -0.1724328 -0.1190359 -0.3120793
plot(varImp(nnet1), main= 'Nnet Variables by Importance')
set.seed(200)
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
mars1<- train(x = x_train, y = y_train$MCI,
method = "earth",
tuneGrid = marsGrid,
trControl = ctrl)
mars1
## Multivariate Adaptive Regression Spline
##
## 135 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 122, 121, 122, 123, 122, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 0.6253293 0.4634208 0.4688157
## 1 3 0.5184894 0.5975585 0.3909778
## 1 4 0.5040306 0.6290170 0.3800183
## 1 5 0.4457547 0.6934899 0.3489214
## 1 6 0.4184976 0.7187839 0.3279691
## 1 7 0.3978064 0.7287622 0.3122345
## 1 8 0.8475289 0.6794028 0.4456061
## 1 9 0.9474415 0.6759347 0.4696865
## 1 10 0.9363486 0.6901546 0.4656000
## 1 11 0.9651696 0.6921126 0.4738686
## 1 12 0.9826423 0.6830498 0.4857455
## 1 13 1.0025784 0.6858247 0.4852533
## 1 14 0.9961714 0.6834668 0.4844759
## 1 15 0.9482046 0.6712909 0.4749547
## 1 16 0.9916951 0.6744043 0.4906862
## 1 17 0.9951780 0.6623029 0.4890224
## 1 18 0.9897283 0.6680592 0.4853171
## 1 19 0.9865714 0.6717363 0.4858423
## 1 20 0.9933326 0.6669622 0.4857460
## 1 21 0.9930403 0.6679093 0.4855560
## 1 22 0.9930403 0.6679093 0.4855560
## 1 23 0.9930403 0.6679093 0.4855560
## 1 24 0.9930403 0.6679093 0.4855560
## 1 25 0.9930403 0.6679093 0.4855560
## 1 26 0.9930403 0.6679093 0.4855560
## 1 27 0.9930403 0.6679093 0.4855560
## 1 28 0.9930403 0.6679093 0.4855560
## 1 29 0.9930403 0.6679093 0.4855560
## 1 30 0.9930403 0.6679093 0.4855560
## 1 31 0.9930403 0.6679093 0.4855560
## 1 32 0.9930403 0.6679093 0.4855560
## 1 33 0.9930403 0.6679093 0.4855560
## 1 34 0.9930403 0.6679093 0.4855560
## 1 35 0.9930403 0.6679093 0.4855560
## 1 36 0.9930403 0.6679093 0.4855560
## 1 37 0.9930403 0.6679093 0.4855560
## 1 38 0.9930403 0.6679093 0.4855560
## 2 2 0.6372212 0.4362399 0.4754387
## 2 3 0.4658445 0.5765872 0.3612062
## 2 4 0.3894096 0.7007385 0.3047747
## 2 5 0.3592511 0.7382962 0.2929671
## 2 6 0.3496631 0.7542496 0.2843372
## 2 7 0.3372170 0.7718030 0.2736506
## 2 8 0.3409674 0.7767622 0.2731574
## 2 9 0.3519288 0.7797334 0.2786078
## 2 10 0.3612549 0.7745890 0.2798221
## 2 11 0.3562219 0.7805599 0.2743897
## 2 12 0.3562991 0.7856655 0.2712994
## 2 13 0.3541912 0.7952552 0.2689984
## 2 14 0.3611164 0.7911902 0.2764053
## 2 15 0.3582301 0.7902572 0.2707426
## 2 16 0.3840778 0.7483496 0.2911929
## 2 17 0.3823398 0.7532270 0.2926700
## 2 18 0.3894509 0.7460225 0.2959487
## 2 19 0.3970489 0.7381758 0.3013780
## 2 20 0.3987436 0.7415567 0.3038427
## 2 21 0.3987868 0.7417502 0.3042704
## 2 22 0.3987868 0.7417502 0.3042704
## 2 23 0.4028302 0.7365767 0.3069198
## 2 24 0.4028302 0.7365767 0.3069198
## 2 25 0.4028302 0.7365767 0.3069198
## 2 26 0.4028302 0.7365767 0.3069198
## 2 27 0.4028302 0.7365767 0.3069198
## 2 28 0.4028302 0.7365767 0.3069198
## 2 29 0.4028302 0.7365767 0.3069198
## 2 30 0.4028302 0.7365767 0.3069198
## 2 31 0.4028302 0.7365767 0.3069198
## 2 32 0.4028302 0.7365767 0.3069198
## 2 33 0.4028302 0.7365767 0.3069198
## 2 34 0.4028302 0.7365767 0.3069198
## 2 35 0.4028302 0.7365767 0.3069198
## 2 36 0.4028302 0.7365767 0.3069198
## 2 37 0.4028302 0.7365767 0.3069198
## 2 38 0.4028302 0.7365767 0.3069198
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 7 and degree = 2.
marsPred1 <- predict(mars, newdata=x_test)
MARS1 <- postResample(pred = marsPred1, obs = y_test$MCI)
MARS1
## RMSE Rsquared MAE
## 1.1980882 0.2436580 0.7450327
plotmo(mars1)
## plotmo grid: TB5 BLR CPI CAC DCR
## -0.1084383 -0.009221552 -0.4738836 -0.02544334 -0.3416235
## ELC EXC FXR IPR M2 M1
## 0.0005972519 -0.3511269 -0.09473711 -0.07587996 -0.1076045 -0.08482591
## PE STR EXG IMG LOAG
## -0.2797727 -0.2087861 -0.1724328 -0.1190359 -0.3120793
plot(varImp(mars1), main='MARS Variables by Importance')
plot(residuals(mars1))
set.seed(120)
svm1 <- train(x = x_train, y = y_train$MCI,
method = "svmLinear",
tuneLength = 25,
trControl = trainControl(method = "cv"))
svm1
## Support Vector Machines with Linear Kernel
##
## 135 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 122, 120, 122, 122, 122, 121, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 0.5216752 0.6033346 0.4004447
##
## Tuning parameter 'C' was held constant at a value of 1
svmPred1 <- predict(svm, newdata=x_test)
SVM1<- postResample(pred = svmPred1, obs = y_test$MCI)
SVM1
## RMSE Rsquared MAE
## 1.1930641 0.2204846 0.7591112
plotmo(svm1)
## plotmo grid: TB5 BLR CPI CAC DCR
## -0.1084383 -0.009221552 -0.4738836 -0.02544334 -0.3416235
## ELC EXC FXR IPR M2 M1
## 0.0005972519 -0.3511269 -0.09473711 -0.07587996 -0.1076045 -0.08482591
## PE STR EXG IMG LOAG
## -0.2797727 -0.2087861 -0.1724328 -0.1190359 -0.3120793
plot(varImp(svm1), main='SVM Variables by Importance')
p1<-plot(varImp(knn1), top=10, main='kNN')
p2<-plot(varImp(nnet1), top=10, main='Nnet')
p3<-plot(varImp(mars1),top=10, main='MARS')
p4<-plot(varImp(svm1), top=10, main='SVM')
gridExtra::grid.arrange(p1, p2,p3,p4, ncol = 4)
We find similar kinds of variables in the top 10 list with non linear models also.
The following table shows the RMSE and Rsquare for the different models.
#read in results
lm_res1<- c('Linear Model', lm1$results$RMSE, lm1$results$Rsquared, lm1$results$MAE)
pls_res1 <- c('Partial Least Square', pls1_results$RMSE, pls1_results$Rsquared, pls1_results$MAE)
ridge_res1 <- c('Ridge Regression', ridge1_results$RMSE, ridge1_results$Rsquared, ridge1_results$MAE)
knn_res1 <- c('KNN', knn_results1['RMSE'], knn_results1['Rsquared'], knn_results1['MAE'])
nn_res1<-c('Neural Network', NNET1[1], NNET1[2], NNET1[3])
mars_res1 <- c('Multivariate Adaptive Regression Spline', MARS1[1], MARS1[2], MARS1[3])
svm_res1 <- c('Support Vector Machines - Linear', SVM1[1], SVM1[2], SVM1[3])
#Combine and rename
results1<-rbind(lm_res1, pls_res1, ridge_res1, knn_res1, nn_res1, mars_res1, svm_res1)
colnames(results1) <- c('Model', 'RMSE', 'Rsquared', 'MAE')
row.names(results1) <- c(1:nrow(results1))
results1%>%kable(caption='Summary statistics Numerical') %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = T)
| Model | RMSE | Rsquared | MAE |
|---|---|---|---|
| Linear Model | 0.441994714504095 | 0.713725381606877 | 0.362879692679264 |
| Partial Least Square | 0.457165079634718 | 0.68283545519077 | 0.364091568491284 |
| Ridge Regression | 0.455275044250047 | 0.668542109144494 | 0.356419860940034 |
| KNN | 0.718841873641116 | 0.810575215264818 | 0.437671374202208 |
| Neural Network | 1.04864800557038 | 0.66219183514197 | 0.692847769311332 |
| Multivariate Adaptive Regression Spline | 1.19808822090789 | 0.243657960011675 | 0.745032667431987 |
| Support Vector Machines - Linear | 1.1930640845381 | 0.220484580291216 | 0.759111175592163 |
kNN seems to do the best job with the highest Rsquare, although the RMSE is almost double those from the linear models.
We will be repeating the best models with selected varaibles from the top 10 to see if we can simplify by restricting the number of variables
Bank Model Support Vector Machines
We will run the SVM model with the top seven variables. By trial and error we found that this gives an Rsquare very close to that by using all variables.
x_train_bank<-x_train[c('PE', 'IMG', 'CPI', 'LOAG', 'EXC', 'M2', 'DCR')]
x_test_bank<-x_test[c('PE', 'IMG', 'CPI', 'LOAG', 'EXC', 'M2', 'DCR')]
y_train <- as.data.frame(bank$mcb[splitt])
y_test <- as.data.frame(bank$mcb[-splitt])
#Correct column names
colnames(y_train) <- c("MCB")
colnames(y_test) <- c("MCB")
set.seed(200)
svm1 <- train(x = x_train_bank, y = y_train$MCB,
method = "svmLinear",
tuneLength = 25,
trControl = trainControl(method = "cv"))
svm1
## Support Vector Machines with Linear Kernel
##
## 135 samples
## 7 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 120, 122, 121, 122, 123, 122, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 0.4578548 0.8120911 0.3589988
##
## Tuning parameter 'C' was held constant at a value of 1
svmPred1 <- predict(svm1, newdata=x_test_bank)
SVM1<- postResample(pred = svmPred1, obs = y_test$MCB)
SVM1
## RMSE Rsquared MAE
## 0.4741013 0.7838049 0.3746547
We can see that using only the top 7 variables gives us an Rsquare of 78%, which is very close to the rSquare of 81% using all variables. The RMSE has increased slightly from 0.44 to 0.50.
Insurance Model kNN
We will choose the top 9 variables from the kNN model for insurance (by trial and error, we find that this gives the same Rsquare as using all the variables)
x_train_ins<-x_train[c('LOAG', 'PE', 'M2', 'DCR', 'IMG', 'TB5', 'CPI', 'M1', 'EXC')]
x_test_ins<-x_test[c('LOAG','PE', 'M2', 'DCR', 'IMG', 'TB5', 'CPI', 'M1', 'EXC')]
y_train <- as.data.frame(ins$mci[splitt])
y_test <- as.data.frame(ins$mci[-splitt])
#Correct column names
colnames(y_train) <- c("MCI")
colnames(y_test) <- c("MCI")
knn2 <- train(x = x_train_ins, y = y_train$MCI,
method = "knn",
tuneLength = 25,
trControl = ctrl)
knn2
## k-Nearest Neighbors
##
## 135 samples
## 9 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 122, 122, 120, 122, 123, 121, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 0.3838326 0.8108170 0.2707159
## 7 0.4190181 0.7710194 0.2861551
## 9 0.4354676 0.7792612 0.3032874
## 11 0.4452021 0.7598363 0.3132272
## 13 0.4486929 0.7709435 0.3122924
## 15 0.4598753 0.7575727 0.3225840
## 17 0.4756409 0.7311253 0.3337932
## 19 0.4844690 0.7271731 0.3391267
## 21 0.5000145 0.7053357 0.3502120
## 23 0.5194768 0.6764689 0.3661537
## 25 0.5300205 0.6544241 0.3757139
## 27 0.5388365 0.6414956 0.3813517
## 29 0.5518070 0.6197262 0.3926299
## 31 0.5614596 0.6042875 0.3989024
## 33 0.5699084 0.5948181 0.4075900
## 35 0.5772494 0.5840137 0.4129995
## 37 0.5849488 0.5746314 0.4196079
## 39 0.5922449 0.5697349 0.4240797
## 41 0.6000519 0.5601955 0.4292254
## 43 0.6038245 0.5582317 0.4328874
## 45 0.6092438 0.5543898 0.4350027
## 47 0.6189312 0.5344113 0.4420499
## 49 0.6257952 0.5283148 0.4462889
## 51 0.6308411 0.5223434 0.4481774
## 53 0.6369616 0.5152562 0.4522889
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.
knnPred2 <- predict(knn2, newdata = x_test_ins)
knn_results2 <- postResample(pred = knnPred2, obs = y_test$MCI)
knn_results2
## RMSE Rsquared MAE
## 0.6963655 0.8143327 0.3977580
The RMSE has actually decreased in this case from 0.72 to 0.70.
We find that using seven out of the 16 variables for the banking sector and nine out of the 16 variables for the insurance sector gives us an approximately 80% Rsquare with low RMSE in predicting the returns of those sectors. We conclude that macroeconomic variables can help predict the stock returns of both the banking and the insurance sectors on the Dhaka Stock Exchange.
Further research may be carried out using different lags (here a lag of 12 has been used to calculate returns) and test if the results improve and similar prediction levels can be reached with fewer variables.
[1] Garcia, Valeriano F. and Liu, Lin (1999) [Macroeconomic Determinants of Stock Market Development] (https://www.tandfonline.com/doi/abs/10.1080/15140326.1999.12040532)
[2] Nijam, H M., Ismail, SMM. And Musthafa, AMM. (2015) [The Impact of Macro-Economic Variables on Stock Performance; Evidence from Sri Lanka] (https://journals.co.za/doi/abs/10.10520/EJC-100b084600)
[3] Srivastava, A. (2010) * Relevance of Macro Economic factors for the Indian Stock Market*
[4] Haque, Abdul and Sarwar, Suleman (2012) Macro-Determinants of Stock Returns in Pakistan
[5] Nurazi, Ridwan and Usman, Berto (2016) [Bank Stock Returns in Responding the Contribution of Fundamental and Macroeconomic Effects] (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2822943)
[6] Okech, Timothy Chrispinus and Mugambi, Mike (2016) [Effect of Macroeconomic Variables on Stock Returns of Listed Commercial Banks in Kenya] (http://erepo.usiu.ac.ke/handle/11732/3014)
[7] Bhattarai, Bishnu Prasad (2018) [ The Firm Specific and Macroeconomic Variables Effects on Share Prices of Nepalese Commercial Banks and Insurance Companies] (http://buscompress.com/uploads/3/4/9/8/34980536/riber_7-s3_01k18-152_1-11.pdf)
[8] Ali, Md. Bayezid (2011) [ Impact of Micro and Macroeconomic Variables on Emerging Stock Market Return: A Case on Dhaka Stock Exchange (DSE)] (https://d1wqtxts1xzle7.cloudfront.net/30907270/idjrbjournal00025.pdf?1363030931=&response-content-disposition=inline%3B+filename%3DImpact_of_Micro_and_Macroeconomic_Variab.pdf&Expires=1616843755&Signature=EPMXQEMYmHVj2KHB2ckISk8Fz1jMSfMHnGpSVemDo4BHKNaN-K8IWTecO~WB81F6z8wQg1-02g9CHjWpGXVds47C4Z69HUBc6bHm3LjpRw6aBeFoi8~IbVPgZSZ0wuxWJTHR7ELRRY3Nk7L6qZi7Ywf7NN8ho1aTWJ~a7cG~45SbbjLRmYHA7QShsMxFUPyTlD8U7LW1ZGm-yDGpz1X01jkzgU-vkSJsQv92B3rQOuWn6cHHfCDiC~BUsa1gVfkKO0LzTk19KHR-mFOZijGKhZtb7zRkzQDTgLaox1EcIo32E1Es3uBKhizKeebbhHb2DO0V-~ym-hPuKytlUeUsDQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA)
[9] Mohiuddin, Md., Alam, Md. Didarul and Shahid, Abdullah Ibneyy (2008) [An Empirical Study of the Relationship between Macroeconomic Variables and Stock Price: A Study on Dhaka Stock Exchange] (https://core.ac.uk/download/pdf/6505614.pdf)
[10] Hasan, Mostofa Mahmud (2010) [Sector-wise stock return analysis: An Evidence from Dhaka Stock Exchange in Bangladesh] (https://d1wqtxts1xzle7.cloudfront.net/46085986/10850-32538-1-PB.pdf?1464668157=&response-content-disposition=inline%3B+filename%3DSector_Wise_Stock_Return_Analysis_An_Evi.pdf&Expires=1616845297&Signature=ZODj~2rpNWB-QSB389dl~7a3pLguchO2UVMrLlYAD2UEFzWnuFkA9wDfPbe9E3BW6gccJAB2KCCiav6-98t3nWJqC8r3oxb1lAoTa46lAF~dGVaOPdXrG6mucj~X8PmNqrVHE03-SsrAU9DdVWzwTmEY6Dmg3vvO1EvXTerHVhA4oTEX3UDK8HaN2aADqA69QoWlMv2TdipXHfH~HhgPHhqxmPvyEFYFXBQDj12wNeww3o87-oO92Bryhex19bnfomyh5cCpLXNMRRiAhgMxele4P~FvGUDP05hOETGS7gzKJvGmgWsxMCkqpjtRPJXgnUNkToSMkncpg15on5LaXQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA)