Data 698 proposal

Can Macroeconomic factors predict stock market returns? A case of listed financial stocks on the Dhaka Stock Exchange

Introduction

Both asset managers and academicians have resorted to chalk out the factors highly correlated with stock market growth in their research. The business value of such research is immense, from decisions in asset allocations to tilting the portfolios for sectoral allocations. Even though the research from academia is available, most of the research done by the asset managers are proprietary and are part of the ‘black box’ that helps them in their investment management decisions. It makes sense, as the managers with high alpha are not motivated to share their intuitions with the rest so that they can keep on returning a satisfactory alpha to the investors. Moreover, although research papers using macroeconomic factors to predict overall stock market returns are common, very few academicians have attempted to predict stock returns on a sectors basis. Even less common are empirical studies conducted on emerging markets, partly because of the lack of availability of data and partly because these markets are too small for the bigger asset managers to enter. This paper aims to find if some of the macroeconomic variables may help to predict stock market returns of listed financial stocks on the Dhaka Stock Exchange. The financial stocks will be divided into two categories, banks and insurance companies. Several models including linear regression will be used to test out the hypothesis that bank and insurance stock returns can be predicted using changes in macroeconomic variables.

Review of the Literature

There is a myriad of research conducted regarding the factors affecting the stock market returns all over the world. Perhaps this is of special interest to researchers because of the potential applications of a valid model in improving the returns of the funds under management of investment management companies. More research papers are available on the effect of macroeconomic factors on stock returns in the developed market, perhaps due to more availability of data. However, since my research if focused on Bangladesh, an emerging market economy, we have tried to look to focus on literature available on other emerging/frontier markets. The literature review for this paper can be divided into two segments, the first dealing with effect of macroeconomic variables on overall stock market returns using the market indices, and the second part dealing with the effect on either banking or insurance stock returns. As expected, there has been very limited research on the sectoral returns and there is a demand for sectoral research to help with portfolio allocations for a higher risk/return dynamic while keeping the portfolio well diversified. One of the most notable research testing the relationship between macroeconomic variables and stock returns has been conducted by Garcia and Liu [1]. The research uses data for 15 countries for the period 1980 to 1995. The study includes seven countries in Latin America, Six in East Asia, and two advanced economies: US and Japan. The paper explores the effects of real income, saving rate, financial intermediary development, stock market liquidity, and macroeconomic stability on stock market capitalization. The authors use regression and conclude that real income level, savings rate, financial intermediary development, and stock market liquidity are important predictors of market capitalization. A few other studies tests have been carried out on markets in the sub-continent. Nijam, Ismail and Musthafa [2] have investigated the relationship between the returns of the All Share Price Index of Colombo Stock Exchange and five macroeconomic variables, namely GDP, Inflation, Interest rate, Balance of Payments and Exchange rate. Ordinary Least Square was used with linear, linear-log, log-log and log-linear data transformations. The study finds GDP, Exchange Rate, and Interest rate to be significantly positively related, whereas Inflation is found to be negatively related. Balance of Payments was found to be an insignificant variable. Srivastava [3] has used six factors to conduct a similar test on the Indian market. He uses Industrial production index of all commodities, wholesale price index, interest rate and foreign exchange rates and monthly time series for the period April 1996 to January 2009 for the Sensex. The paper applies the Johansen multivariate cointegration test and concludes that Industrial Production index, wholesale price index, and interest rates significantly affect the pricing in the Indian Stock market. In another study, Haque and Sarwar [4] explores the relationship between macroeconomic variables and individual stock returns on 394 listed equity on the Karachi Stock Exchange. The study uses panel data specification tests and concludes that volatility and GDP have significant positive associations, while inflation, interest rate, budget deficit and money supply have significant negative associations. One interesting finding of this study is that different sectors are found to react differently to the same macroeconomic variable. Regarding sectoral research, specifically on the financial sector, we have come across three papers that have attempted to test the relationship between macroeconomic factors and banking and insurance stocks returns. Nurazi and Usman study [5] have attempted to find if there is any relationship between several microeconomic factors using the CAMELS ratios and several macroeconomic factors like interest rate, exchange rate, and inflation rate with the stock returns of 16 listed Indonesian banks. The study found evidence that all the three macroeconomic variables negatively and significantly contribute to the stock returns of the sixteen banks for the period 2002 to 2011. The study has used the Pooled Least Square Model to carry out the test. Another study by Mugambi and Okech [6] uses quarterly time series data from 200 to 2015 to explore if there is a relationship between stock returns and macroeconomic variables on listed banks in the Nairobi Stock Exchange. The variables used are exchange rate, inflation rate, interest rate and GDP. A linear regression model using Ordinary Least Squares was used. The paper concludes that all variables except GDP have a significant effect at the 5% level of significance. Bhattarai [7] uses data for seven banks and six insurance companies from 2009 to 2014. The study uses regression and concludes that Money supply, GDP growth, Exchange rate and Interest rates are all factors having significant associations with the movements in prices of all 13 financial stocks. As regards to research done specifically on the Dhaka Stock Exchange, which is the focus of this study, most authors have worked on data relating to overall market returns. Ali [8] has used a multivariate regression model and found inflation and foreign remittance to have negative influence and industrial production index and market Pes to have positive influence. However, a lack of Granger causality may point to an informationally inefficient market. A similar paper by Mohiuddin, Alam and Shahid [9] used quarterly data and found no significant relationship between macroeconomic variables and market returns. Hasan [10] uses Ordinary Least Squares and Weighted Least Squares and uses monthly returns for 126 stocks to test the association between macroeconomic variables and sector returns. The author concludes that only the banking sector returns a significant association.

Research Questions

We intend to use monthly macroeconomic data for Bangladesh and monthly market cap data for banks and insurance companies listed on the Dhaka Stock Exchange.

The research questions are:

Are there macroeconomic variables that can help predict the stock returns of the banks listed on the Dhaka Stock Exchange?
Are there macroeconomic variables that can help predict the stock returns of the insurance companies listed on the Dhaka Stock Exchange?

Methodology

This study takes a different approach to historical work. First, we will be using monthly data for the variables and intend to expand beyond the five macroeconomic factors used by most of these papers. Secondly, a YoY (Year on Year) lag transformation will be used to cast out yearly seasonality in the data. Most of the papers above have limited themselves to Ordinary Least Squares and finding causality, but none has tested out if some these variables can be used to predict sectoral returns within acceptable boundaries. This study is a first in the sense that we will be using Linear Regression models including Partial Least Squares, Principal Component Regression, and if possible penalized models like Ridge, Lasso and Elastic net to explore which method is best at predicting the sectoral returns on the Dhaka Stock Exchange. If data permits, non-linear models like kNN, SVM, MARS and Neural Network models will be tested. The same process will be followed separately for bank and insurance companies. Models will be evaluated using metrics such as adjusted r squared, accuracy, MAPE, and RMSE. It will be interesting to find out if the association between macroeconomic variables and stock returns that have been observed in other emerging markets hold true for the banking and insurance stocks on the Dhaka Stock Exchange. If such an association exists, it will be of immense interest to the fund managers in helping to decide on sectoral tilts.

Data Source

The data has been specifically collated by us for the purpose of this study. Most of the variables are available on the CEIC website (https://www.ceicdata.com/en/bangladesh). There were a lot of missing values in the interest rates data and those have been picked up from the Bangladesh Bank website (https://bb.org.bd/pub/archive.php?i=6). The process was fully manual as we had to go into several timelines to get the required data. The variables were mostly in the form of YoY basis. The raw variables were also converted to YoY to keep consistency throughout the dataset. A description of the variables is provided below: Independent variables: YoY percentage change MCB : Market Capitalization of banks MCI: Market Capitalization of insurance Dependent variables: YoY percentage change TB5: TBill rate, 5 years, a proxy of long term interest rate TB10: Tbill rate, 10 years, a proxy of long term interest rates BLR: Average bank lending rate CPI: Inflation rate CAC: Change in current account balance, proxy of international trade DCR: Change in Domestic credit level ELC: Change in electricity availability (this is an important factor supporting industrialization in developing countries) EXC: Change in Taka/dollar exchange rate FXR: Change in Foreign Exchange reserves IPR: Change in Industrial production rate M2: Change in broad money M1: Change in narrow money PE: Change in overall price to earnings ratio STR: Short term interest rate (Prime rate) DPG: Change in total deposit level, proxy of savings rate EXG: Change in export level IMG: Change in Import level LOAG: Change in total loans taken, a proxy of Investment level The dataset contains monthly data starting December 2004 till October 2020. This is almost 16 years of data and covers one boom and one bust on the DSE.

Exploratory Analysis

First We will read in the Excel file compiled. Due to lack of enough data, remittance has been dropped from this database.

#Read in the data
data<-read.csv("https://raw.githubusercontent.com/zahirf/Data698/main/macrodata.csv", stringsAsFactors = F, sep=",")
data1<-data[,-1]

There are 193 observations of 21 variables. MCB and MCI, the change in market caps of banks and insurance sectors, are the two dependent variables. Apart from month, the rest are predictor variables. Let us take a look at the summary

temp<-psych::describe(data1)
temp%>%
  kable(caption='Summary statistics Numerical') %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = T)

Summary statistics Numerical
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
MCB	1	191	0.2427514	0.5201478	0.0567252	0.1597277	0.3598092	-0.3971493	2.393861	2.791010	1.6132664	2.7176559	0.0376366
MCI	2	191	0.3105277	0.6447370	0.0993257	0.1960082	0.2982239	-0.4219561	3.656711	4.078667	2.3683365	7.1059110	0.0466515
TB5	3	191	0.0212408	0.2419886	-0.0020000	0.0043072	0.1734642	-0.5110000	1.178000	1.689000	1.2240442	3.5774127	0.0175097
TB10	4	191	0.0011518	0.1631245	0.0080000	0.0055556	0.1200906	-0.4520000	0.351000	0.803000	-0.2249347	0.0402007	0.0118033
BLR	5	191	-0.0092618	0.0800538	-0.0030000	-0.0119085	0.0904386	-0.1990000	0.188000	0.387000	0.1389610	-0.4396476	0.0057925
CPI	6	191	7.0263351	1.9083485	6.3700000	6.7456732	1.3254444	2.2460000	12.716000	10.470000	1.1628794	1.0036915	0.1380833
CAC	7	191	0.4267068	41.8461889	-0.6440000	-0.7068758	1.3788180	-134.9290000	537.268000	672.197000	10.7158025	139.7954663	3.0278840
DCR	8	191	16.0074764	4.7844381	14.4100000	15.4851634	4.8140022	9.6300000	29.177000	19.547000	0.8010703	-0.2467050	0.3461898
ELC	9	191	0.0849476	0.0876615	0.0770000	0.0818497	0.0726474	-0.1850000	0.435000	0.620000	0.3460588	1.8424585	0.0063430
EXC	10	191	0.0235393	0.0414074	0.0090000	0.0197059	0.0207564	-0.0500000	0.174000	0.224000	1.0195664	1.1137629	0.0029961
FXR	11	191	0.1874136	0.2154764	0.1740000	0.1768562	0.2431464	-0.2400000	0.836000	1.076000	0.4333208	-0.4507889	0.0155913
IPR	12	191	10.0785916	7.2165515	10.4760000	10.3104967	6.6761478	-29.2280000	27.266000	56.494000	-1.0435233	4.6784510	0.5221713
M2	13	191	16.0569948	3.8287890	16.2300000	16.0663333	4.5515820	8.7600000	23.506000	14.746000	-0.0201483	-0.9847020	0.2770414
M1	14	191	0.1346335	0.0782014	0.1330000	0.1354379	0.0667170	-0.1040000	0.354000	0.458000	-0.1460577	0.7250039	0.0056585
PE	15	191	0.0491204	0.3221201	-0.0360000	0.0186536	0.2520420	-0.5870000	1.180000	1.767000	0.9339893	0.7873051	0.0233078
STR	16	191	0.2489476	1.1109342	0.0170000	0.0601699	0.3721326	-0.8200000	7.435000	8.255000	4.0542367	20.8011968	0.0803844
DPG	17	191	16.0667696	3.8424164	16.6020000	16.1379804	5.0393574	9.2270000	22.914000	13.687000	-0.1567664	-1.2352058	0.2780275
EXG	18	191	12.9780314	18.3841520	10.3940000	11.3361634	11.2173516	-61.3670000	75.661000	137.028000	0.5242242	3.8298597	1.3302305
IMG	19	191	12.8969948	19.5402860	10.4540000	12.5714641	18.7875072	-50.6960000	68.067000	118.763000	0.0451563	0.3604087	1.4138855
LOAG	20	191	16.6023927	4.3046512	15.7830000	16.2515098	4.4181480	9.4710000	28.129000	18.658000	0.6508671	-0.4195888	0.3114736

Missing Data

# Missing Data
vis_miss(data1)

There is no missing data in the observations as much care has been put to compile a complete dataset.

Skewness and Outliers

We will look at the distributions of the variables and identify outliers.

We find that a lot of the variables are bimodal or trimodal and do not follow the normal distributions.

par(mfrow = c(3, 3))
data_sub = melt(data1) 
suppressWarnings(ggplot(data_sub, aes(x= value)) + 
                   geom_density(fill='orange') + facet_wrap(~variable, scales = 'free') )

ggplot(data = reshape2::melt(data1) , aes(x = variable, y = log(abs(value)))) + 
geom_boxplot(fill = 'steelblue2', outlier.alpha = 0.75) +
  labs(title = 'Boxplot: Scaled Training Set',
       x = 'Variables',
       y = 'log-Normalized Values') +
  theme(panel.background = element_rect(fill = 'white'),
        axis.text.x = element_text(size = 10, angle = 90))

## Warning: Removed 16 rows containing non-finite values (stat_boxplot).

There are outliers as expected from capital market returns but none of the variables are overwhelmed with outliers. We will correct this using senter and scale.

preprocessing <- preProcess(as.data.frame(data1), method = c("center", "scale")) 
#preprocessing 
data_imp <- predict(preprocessing, data1)

The distributions look closer to normal than before after the centering and scaling.

par(mfrow = c(3, 3))
data_sub = melt(data_imp) 
suppressWarnings(ggplot(data_sub, aes(x= value)) + 
                   geom_density(fill='orange') + facet_wrap(~variable, scales = 'free') )

ggplot(data = reshape2::melt(data_imp) , aes(x = variable, y = log(abs(value)))) + 
geom_boxplot(fill = 'steelblue2', outlier.alpha = 0.75) +
  labs(title = 'Boxplot: Scaled Training Set',
       x = 'Variables',
       y = 'log-Normalized Values') +
  theme(panel.background = element_rect(fill = 'white'),
        axis.text.x = element_text(size = 10, angle = 90))

Remove correlated predictors

We will look for predictors with correlators higher than 0.90 and remove them.

#Separate x and y variables
predictors<-data_imp[, -c(1:2)]
mcb<-data.frame(data_imp[,1]) #Y variable for bank
mci<-data.frame(data_imp[,2]) #Y variable for insurance
colnames(mcb)[1] <- "mcb" #rename the columns
colnames(mci)[1] <- "mci"

Let us first explore the relationship of the predictors with the two y variables.

Except for CAC (change in current account balance), there seems to be some kind of relationship with banking sector returns and the other variables.

#plot 
featurePlot(predictors,mcb$mcb)

For the insurance sector returns, the change in current account balance and change in short term interest rates seem to be the variables not correlated with stock returns.

#plot 
featurePlot(predictors,mci$mci)

We would like to take a detailed look at the relationship among the predictor variables.

corr <- round(cor(predictors, use="na.or.complete"), 1)
ggcorrplot(corr,
           type="lower",
           lab=TRUE,
           lab_size=3,
           method="circle",
           colors=c("red2", "white", "blue3"),
           title="Correlation of variables",
           ggtheme=theme_minimal)

We see only two pairs with correlations above 0.85. The 5 year and 10 year treasury bills. Both are proxies for long term interest rates. The Deposit growth rate and the money supply measure M2 is the other pair, and we will remove the deposit growth rate. This should take care of multicollinearity

tooHigh <- findCorrelation(cor(predictors, use="na.or.complete"), cutoff = .85, names = TRUE)
tooHigh

## [1] "DPG"  "TB10"

predictors[ ,c(tooHigh)] <- list(NULL)
colnames(predictors)

##  [1] "TB5"  "BLR"  "CPI"  "CAC"  "DCR"  "ELC"  "EXC"  "FXR"  "IPR"  "M2"  
## [11] "M1"   "PE"   "STR"  "EXG"  "IMG"  "LOAG"

Near Zero variance

There are no near zero predictors in the dataset.

caret::nearZeroVar(predictors, names = TRUE)

## character(0)

Banking Sector Models : Experiment 1 all variables

We will first split the data into train and test sets.We will use the 70% level to split.

set.seed(675)
#Splitting data into training and test datasets
bank<-cbind(mcb, predictors)
splitt <- createDataPartition(bank$mcb, p=0.7, list=FALSE)
x_train <- bank[splitt, ]
x_train<-x_train[,-1]
y_train <- as.data.frame(bank$mcb[splitt])
x_test <- bank[-splitt, ]
x_test<-x_test[,-1]
y_test <- as.data.frame(bank$mcb[-splitt])
#Correct column names
colnames(y_train) <- c("MCB")
colnames(y_test) <- c("MCB")
ctrl <- trainControl(method = "cv", number = 10)

Linear Models

lm <- train(x_train, y_train$MCB,
                method = "lm", 
                trControl = ctrl)

summary(lm)

## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96497 -0.33397  0.03142  0.29204  0.95887 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.029511   0.040872  -0.722   0.4717    
## TB5         -0.037517   0.073739  -0.509   0.6119    
## BLR         -0.059279   0.076782  -0.772   0.4416    
## CPI          0.103418   0.064924   1.593   0.1139    
## CAC         -0.093574   0.098566  -0.949   0.3444    
## DCR          1.120611   0.211030   5.310 5.22e-07 ***
## ELC          0.016439   0.046745   0.352   0.7257    
## EXC          0.072083   0.097757   0.737   0.4624    
## FXR          0.223101   0.125678   1.775   0.0784 .  
## IPR          0.030919   0.057307   0.540   0.5905    
## M2          -0.525381   0.126821  -4.143 6.48e-05 ***
## M1           0.085463   0.104784   0.816   0.4164    
## PE           0.865999   0.071854  12.052  < 2e-16 ***
## STR         -0.031095   0.078873  -0.394   0.6941    
## EXG          0.009485   0.072210   0.131   0.8957    
## IMG         -0.110853   0.070618  -1.570   0.1192    
## LOAG        -0.221254   0.134126  -1.650   0.1017    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4565 on 118 degrees of freedom
## Multiple R-squared:  0.8224, Adjusted R-squared:  0.7984 
## F-statistic: 34.16 on 16 and 118 DF,  p-value: < 2.2e-16

lm$results

##   intercept      RMSE  Rsquared       MAE     RMSESD RsquaredSD      MAESD
## 1      TRUE 0.4959997 0.7458327 0.4109339 0.05713202  0.1184137 0.04192749

plot(varImp(lm), main='LM Variables by importance')

plot(residuals(lm), main='Residuals LM' )

Partial Least Squares (PLS)

set.seed(100)
pls <- train(x_train, y_train$MCB,
  method = "pls",
  tuneLength = 25,
  trControl = ctrl)

summary(pls)

## Data:    X dimension: 135 16 
##  Y dimension: 135 1
## Fit method: oscorespls
## Number of components considered: 14
## TRAINING: % variance explained
##           1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X           23.92    36.28    57.01    64.44    72.45    77.18    80.04
## .outcome    49.15    69.91    74.80    77.96    79.59    80.10    80.85
##           8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X           83.12    86.16     88.59     92.55     94.61     95.04     96.58
## .outcome    81.50    81.87     82.12     82.18     82.21     82.24     82.24

plot(pls, main='PLS Components versus RMSE')

pls$bestTune

##    ncomp
## 14    14

#Save lowest RMSE in results
pls_results <- pls$results %>% 
  filter(ncomp==14)

pls_results

##   ncomp      RMSE  Rsquared       MAE     RMSESD RsquaredSD      MAESD
## 1    14 0.4851892 0.7626439 0.4032369 0.08212703  0.1255137 0.08777629

plot(varImp(pls), main='PLS Variables by Importance')

plot(residuals(pls) )

Ridge Regression

ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 15))

ridge <- train(x_train, y_train$MCB,
                        method = "ridge",
                        tuneGrid = ridgeGrid,
                        trControl = ctrl)

summary(ridge)

##             Length Class      Mode     
## call          4    -none-     call     
## actions      25    -none-     list     
## allset       16    -none-     numeric  
## beta.pure   400    -none-     numeric  
## vn           16    -none-     character
## mu            1    -none-     numeric  
## normx        16    -none-     numeric  
## meanx        16    -none-     numeric  
## lambda        1    -none-     numeric  
## L1norm       25    -none-     numeric  
## penalty      25    -none-     numeric  
## df           25    -none-     numeric  
## Cp           25    -none-     numeric  
## sigma2        1    -none-     numeric  
## xNames       16    -none-     character
## problemType   1    -none-     character
## tuneValue     1    data.frame list     
## obsLevels     1    -none-     logical  
## param         0    -none-     list

plot(ridge, main='RMSE vs Weight Decay')

ridge$bestTune

##        lambda
## 2 0.007142857

ridge_results <- ridge$results  
ridge_results<-ridge_results[row.names(ridge_results) == 2, ]

plot(varImp(ridge), main='Ridge Variables by Importance')

plot(residuals(ridge) )

### Summary of linear models

We see similar variables in the top ten important variables list for all three linear models.

p1<-plot(varImp(lm), top=10, main='LM')
p2<-plot(varImp(pls), top=10, main='PLS')
p3<-plot(varImp(ridge),top=10, main='Ridge')
gridExtra::grid.arrange(p1, p2,p3,  ncol = 3)

Non Linear Models

kNN Model

knn <- train(x = x_train, y = y_train$MCB,
                   method = "knn",
                   tuneLength = 25, 
                   trControl = ctrl)

knn

## k-Nearest Neighbors 
## 
## 135 samples
##  16 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 123, 122, 120, 120, 122, 121, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.4297728  0.8288522  0.3206046
##    7  0.4796549  0.8171145  0.3508061
##    9  0.5503164  0.7696747  0.3956877
##   11  0.5615214  0.7850441  0.3991806
##   13  0.5908162  0.7443228  0.4174694
##   15  0.6069796  0.7311551  0.4267778
##   17  0.6281199  0.6912893  0.4458129
##   19  0.6413438  0.6910558  0.4534921
##   21  0.6511627  0.6744651  0.4625602
##   23  0.6661625  0.6608598  0.4707100
##   25  0.6841502  0.6495093  0.4878065
##   27  0.6907142  0.6397021  0.4956525
##   29  0.7053210  0.6202751  0.5044997
##   31  0.7166127  0.6030113  0.5134191
##   33  0.7247616  0.6016335  0.5200782
##   35  0.7377396  0.5758104  0.5295366
##   37  0.7446191  0.5641693  0.5324217
##   39  0.7552855  0.5433497  0.5360298
##   41  0.7524054  0.5507129  0.5348462
##   43  0.7611104  0.5344460  0.5431021
##   45  0.7690780  0.5245354  0.5486000
##   47  0.7786472  0.5047102  0.5557803
##   49  0.7847242  0.4910153  0.5605712
##   51  0.7909341  0.4759253  0.5651026
##   53  0.8007937  0.4577813  0.5734161
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

knnPred <- predict(knn, newdata = x_test)

knn_results <- postResample(pred = knnPred, obs = y_test$MCB)
knn_results

##      RMSE  Rsquared       MAE 
## 0.4229597 0.8157786 0.3166943

plot(varImp(knn), main='kNN Variables by Importance')

plot(residuals(knn), main='Residuals kNN Model')

Neural Network

nnetGrid <- expand.grid(.decay = c(0, .01, 1),
                        .size = c(1:10),
                        .bag = FALSE)
set.seed(100)
nnet <- train(x = x_train,
                  y = y_train$MCB,
                  method = "avNNet",
                  tuneGrid = nnetGrid,
                  trControl = ctrl,
                  linout = FALSE,  trace = FALSE,
                  MaxNWts = 5* (ncol(x_train) + 1) + 5 + 1,
                  maxit = 250)

nnet

## Model Averaged Neural Network 
## 
## 135 samples
##  16 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 122, 121, 121, 123, 121, 121, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE      
##   0.00    1    0.9798420        NaN  0.7773056
##   0.00    2    0.9798416  0.6459013  0.7773053
##   0.00    3    0.9771588  0.4347334  0.7763439
##   0.00    4    0.9754947  0.3975196  0.7741696
##   0.00    5    0.9780130  0.1328233  0.7744443
##   0.00    6          NaN        NaN        NaN
##   0.00    7          NaN        NaN        NaN
##   0.00    8          NaN        NaN        NaN
##   0.00    9          NaN        NaN        NaN
##   0.00   10          NaN        NaN        NaN
##   0.01    1    0.7651605  0.6796539  0.6407079
##   0.01    2    0.7487507  0.7351500  0.6195027
##   0.01    3    0.7504572  0.7473636  0.6207154
##   0.01    4    0.7547597  0.7308998  0.6270593
##   0.01    5    0.7553922  0.7208053  0.6281627
##   0.01    6          NaN        NaN        NaN
##   0.01    7          NaN        NaN        NaN
##   0.01    8          NaN        NaN        NaN
##   0.01    9          NaN        NaN        NaN
##   0.01   10          NaN        NaN        NaN
##   1.00    1    0.8767042  0.6455948  0.7395006
##   1.00    2    0.8520395  0.6696216  0.7176598
##   1.00    3    0.8444341  0.6703542  0.7104081
##   1.00    4    0.8350163  0.6786887  0.7025693
##   1.00    5    0.8321711  0.6804939  0.6998788
##   1.00    6          NaN        NaN        NaN
##   1.00    7          NaN        NaN        NaN
##   1.00    8          NaN        NaN        NaN
##   1.00    9          NaN        NaN        NaN
##   1.00   10          NaN        NaN        NaN
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 2, decay = 0.01 and bag = FALSE.

summary(nnet)

##             Length Class      Mode     
## model        5     -none-     list     
## repeats      1     -none-     numeric  
## bag          1     -none-     logical  
## seeds        5     -none-     numeric  
## names       16     -none-     character
## terms        3     terms      call     
## coefnames   16     -none-     character
## xlevels      0     -none-     list     
## xNames      16     -none-     character
## problemType  1     -none-     character
## tuneValue    3     data.frame list     
## obsLevels    1     -none-     logical  
## param        4     -none-     list

nnet$bestTune

##    size decay   bag
## 12    2  0.01 FALSE

nnetPred <- predict(nnet, newdata=x_test)
NNET <- postResample(pred = nnetPred, obs = y_test$MCB)
NNET

##      RMSE  Rsquared       MAE 
## 0.7089727 0.7360134 0.5978394

plotmo(nnet)

##  plotmo grid:    TB5        BLR        CPI         CAC        DCR         ELC
##           -0.1084383 0.04074487 -0.2307414 -0.02716393 -0.3587624 -0.09066284
##         EXC        FXR        IPR         M2        M1         PE        STR
##  -0.3752772 0.04448927 0.05049619 0.04518537 0.1069864 -0.2425196 -0.2105864
##       EXG        IMG       LOAG
##  -0.14393 -0.1250235 -0.1866336

plot(varImp(nnet), main= 'Nnet Variables by Importance')

Multivariate Adaptive Regression Splines (MARS)

set.seed(200)
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
mars<- train(x = x_train, y =  y_train$MCB,
                  method = "earth", 
                  tuneGrid = marsGrid,
                  trControl = ctrl)

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

mars

## Multivariate Adaptive Regression Spline 
## 
## 135 samples
##  16 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 120, 122, 121, 122, 123, 122, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE      
##   1        2      0.7828631  0.3961661  0.6214264
##   1        3      0.6558746  0.6139275  0.5160067
##   1        4      0.5669884  0.6987255  0.4632596
##   1        5      0.5431638  0.7175091  0.4434431
##   1        6      0.4782926  0.7698784  0.4093307
##   1        7      0.4324762  0.8051950  0.3574345
##   1        8      0.4131732  0.8307522  0.3423554
##   1        9      0.4106595  0.8332552  0.3433455
##   1       10      0.4204844  0.8188496  0.3545389
##   1       11      0.4091654  0.8323371  0.3460230
##   1       12      0.4101774  0.8328498  0.3465441
##   1       13      0.4448865  0.7988036  0.3608075
##   1       14      0.4548243  0.7910200  0.3730259
##   1       15      0.4545797  0.7854380  0.3660873
##   1       16      0.4648106  0.7759917  0.3709382
##   1       17      0.4564673  0.7811348  0.3646842
##   1       18      0.4555716  0.7807619  0.3699457
##   1       19      0.4600576  0.7784273  0.3733455
##   1       20      0.4600576  0.7784273  0.3733455
##   1       21      0.4582866  0.7796145  0.3709677
##   1       22      0.4591089  0.7792369  0.3723104
##   1       23      0.4591089  0.7792369  0.3723104
##   1       24      0.4591089  0.7792369  0.3723104
##   1       25      0.4591089  0.7792369  0.3723104
##   1       26      0.4591089  0.7792369  0.3723104
##   1       27      0.4591089  0.7792369  0.3723104
##   1       28      0.4591089  0.7792369  0.3723104
##   1       29      0.4591089  0.7792369  0.3723104
##   1       30      0.4591089  0.7792369  0.3723104
##   1       31      0.4591089  0.7792369  0.3723104
##   1       32      0.4591089  0.7792369  0.3723104
##   1       33      0.4591089  0.7792369  0.3723104
##   1       34      0.4591089  0.7792369  0.3723104
##   1       35      0.4591089  0.7792369  0.3723104
##   1       36      0.4591089  0.7792369  0.3723104
##   1       37      0.4591089  0.7792369  0.3723104
##   1       38      0.4591089  0.7792369  0.3723104
##   2        2      0.7828631  0.3961661  0.6214264
##   2        3      0.6684515  0.6057972  0.5332141
##   2        4      0.5784430  0.6851370  0.4675201
##   2        5      0.5440394  0.7223495  0.4414993
##   2        6      0.4791392  0.7849091  0.3705263
##   2        7      0.4019894  0.8550400  0.3183873
##   2        8      0.3627111  0.8808382  0.2860863
##   2        9      0.3397024  0.8926157  0.2755388
##   2       10      0.3427765  0.8901162  0.2748148
##   2       11      0.3476235  0.8832383  0.2735279
##   2       12      0.3539016  0.8789854  0.2799788
##   2       13      0.3549362  0.8858539  0.2765993
##   2       14      0.3510347  0.8864693  0.2731189
##   2       15      0.3486083  0.8855971  0.2751290
##   2       16      0.3472564  0.8913351  0.2743585
##   2       17      0.3475296  0.8913129  0.2770975
##   2       18      0.3412851  0.8938953  0.2732802
##   2       19      0.4329626  0.8435168  0.3086288
##   2       20      0.4329626  0.8435168  0.3086288
##   2       21      0.4329626  0.8435168  0.3086288
##   2       22      0.4329626  0.8435168  0.3086288
##   2       23      0.4329626  0.8435168  0.3086288
##   2       24      0.4265904  0.8439140  0.3035353
##   2       25      0.4265904  0.8439140  0.3035353
##   2       26      0.4265904  0.8439140  0.3035353
##   2       27      0.4265904  0.8439140  0.3035353
##   2       28      0.4265904  0.8439140  0.3035353
##   2       29      0.4265904  0.8439140  0.3035353
##   2       30      0.4265904  0.8439140  0.3035353
##   2       31      0.4265904  0.8439140  0.3035353
##   2       32      0.4265904  0.8439140  0.3035353
##   2       33      0.4265904  0.8439140  0.3035353
##   2       34      0.4265904  0.8439140  0.3035353
##   2       35      0.4265904  0.8439140  0.3035353
##   2       36      0.4265904  0.8439140  0.3035353
##   2       37      0.4265904  0.8439140  0.3035353
##   2       38      0.4265904  0.8439140  0.3035353
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 9 and degree = 2.

marsPred <- predict(mars, newdata=x_test)
MARS <- postResample(pred = marsPred, obs = y_test$MCB)
MARS

##      RMSE  Rsquared       MAE 
## 0.4478498 0.7982848 0.3175453

plotmo(mars)

##  plotmo grid:    TB5        BLR        CPI         CAC        DCR         ELC
##           -0.1084383 0.04074487 -0.2307414 -0.02716393 -0.3587624 -0.09066284
##         EXC        FXR        IPR         M2        M1         PE        STR
##  -0.3752772 0.04448927 0.05049619 0.04518537 0.1069864 -0.2425196 -0.2105864
##       EXG        IMG       LOAG
##  -0.14393 -0.1250235 -0.1866336

plot(varImp(mars), main='MARS Variables by Importance')

plot(residuals(mars))

Support Vector Machines (SVM

set.seed(100)
svm <- train(x = x_train, y =  y_train$MCB,
                   method = "svmLinear",
                   tuneLength = 25,
                   trControl = trainControl(method = "cv"))
svm

## Support Vector Machines with Linear Kernel 
## 
## 135 samples
##  16 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 122, 121, 121, 123, 121, 121, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.5299002  0.7293758  0.4323107
## 
## Tuning parameter 'C' was held constant at a value of 1

svmPred <- predict(svm, newdata=x_test)
SVM<- postResample(pred = svmPred, obs = y_test$MCB)
SVM

##      RMSE  Rsquared       MAE 
## 0.4470014 0.8183016 0.3579210

plotmo(svm)

##  plotmo grid:    TB5        BLR        CPI         CAC        DCR         ELC
##           -0.1084383 0.04074487 -0.2307414 -0.02716393 -0.3587624 -0.09066284
##         EXC        FXR        IPR         M2        M1         PE        STR
##  -0.3752772 0.04448927 0.05049619 0.04518537 0.1069864 -0.2425196 -0.2105864
##       EXG        IMG       LOAG
##  -0.14393 -0.1250235 -0.1866336

plot(varImp(svm), main='SVM Variables by Importance')

Summary of non linear

p1<-plot(varImp(knn), top=10, main='kNN')
p2<-plot(varImp(nnet), top=10, main='Nnet')
p3<-plot(varImp(mars),top=10, main='MARS')
p4<-plot(varImp(svm), top=10, main='SVM')
gridExtra::grid.arrange(p1, p2,p3,p4,  ncol = 4)

We see that Price earnings ratio, Import growth and CPI show up as the most important variables in all the non linear models.

Evaluate models for bank

The following table shows the RMSE and Rsquare for the different models.

#read in results
lm_res<- c('Linear Model', lm$results$RMSE, lm$results$Rsquared, lm$results$MAE)
pls_res <- c('Partial Least Square', pls_results$RMSE, pls_results$Rsquared, pls_results$MAE)
ridge_res <- c('Ridge Regression', ridge_results$RMSE, ridge_results$Rsquared, ridge_results$MAE)
knn_res <- c('KNN', knn_results['RMSE'], knn_results['Rsquared'], knn_results['MAE'])
nn_res<-c('Neural Network', NNET[1], NNET[2], NNET[3])
mars_res <- c('Multivariate Adaptive Regression Spline', MARS[1], MARS[2], MARS[3])
svm_res <- c('Support Vector Machines - Linear', SVM[1], SVM[2], SVM[3])
#Combine and rename
results<-rbind(lm_res, pls_res, ridge_res, knn_res, nn_res, mars_res, svm_res)
colnames(results) <- c('Model', 'RMSE', 'Rsquared', 'MAE')
row.names(results) <- c(1:nrow(results))
results%>%kable(caption='Summary statistics Numerical') %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = T)

Summary statistics Numerical
Model	RMSE	Rsquared	MAE
Linear Model	0.495999685837682	0.74583271563391	0.41093389688591
Partial Least Square	0.48518921044043	0.762643880226287	0.403236872362957
Ridge Regression	0.47385269413623	0.767248641116019	0.399191411013123
KNN	0.422959673355544	0.815778618831522	0.316694334051523
Neural Network	0.70897274410554	0.736013420545942	0.597839410610149
Multivariate Adaptive Regression Spline	0.447849756743571	0.798284808287012	0.317545329886213
Support Vector Machines - Linear	0.447001401474033	0.818301592804144	0.357921012972581

WE can see that the Support Vector Machines did the best job of predicting the bank stock returns with both the highest RSquare and second lowest RMSE. The rest of the models did not lag far behind and have acceptable Rsquares and RMSEs. It may be concluded that the macroeconomic predictors can be useful in predicting the stock returns of the banking sector companies listed on the Dhaka Stock Exchange.

Insurance Sector Models : Experiment 1 all variables

We will be doing the exact same analysis using the stock returns of the companies from the insurance sector.

We will first split the data into train and test sets.We will use the 70% level to split.

set.seed(676)
#Splitting data into training and test datasets
ins<-cbind(mci, predictors)
splitt <- createDataPartition(ins$mci, p=0.7, list=FALSE)
x_train <- ins[splitt, ]
x_train<-x_train[,-1]
y_train <- as.data.frame(ins$mci[splitt])
x_test <- ins[-splitt, ]
x_test<-x_test[,-1]
y_test <- as.data.frame(ins$mci[-splitt])
#Correct column names
colnames(y_train) <- c("MCI")
colnames(y_test) <- c("MCI")
ctrl <- trainControl(method = "cv", number = 10)

Linear Models

lm1 <- train(x_train, y_train$MCI,
                method = "lm", 
                trControl = ctrl)

summary(lm)

## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96497 -0.33397  0.03142  0.29204  0.95887 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.029511   0.040872  -0.722   0.4717    
## TB5         -0.037517   0.073739  -0.509   0.6119    
## BLR         -0.059279   0.076782  -0.772   0.4416    
## CPI          0.103418   0.064924   1.593   0.1139    
## CAC         -0.093574   0.098566  -0.949   0.3444    
## DCR          1.120611   0.211030   5.310 5.22e-07 ***
## ELC          0.016439   0.046745   0.352   0.7257    
## EXC          0.072083   0.097757   0.737   0.4624    
## FXR          0.223101   0.125678   1.775   0.0784 .  
## IPR          0.030919   0.057307   0.540   0.5905    
## M2          -0.525381   0.126821  -4.143 6.48e-05 ***
## M1           0.085463   0.104784   0.816   0.4164    
## PE           0.865999   0.071854  12.052  < 2e-16 ***
## STR         -0.031095   0.078873  -0.394   0.6941    
## EXG          0.009485   0.072210   0.131   0.8957    
## IMG         -0.110853   0.070618  -1.570   0.1192    
## LOAG        -0.221254   0.134126  -1.650   0.1017    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4565 on 118 degrees of freedom
## Multiple R-squared:  0.8224, Adjusted R-squared:  0.7984 
## F-statistic: 34.16 on 16 and 118 DF,  p-value: < 2.2e-16

lm1$results

##   intercept      RMSE  Rsquared       MAE    RMSESD RsquaredSD     MAESD
## 1      TRUE 0.4419947 0.7137254 0.3628797 0.1264806  0.1437704 0.1095867

plot(varImp(lm1), main='LM Variables by importance')

plot(residuals(lm1), main='Residuals LM' )

Partial Least Squares (PLS)

set.seed(100)
pls1 <- train(x_train, y_train$MCI,
  method = "pls",
  tuneLength = 25,
  trControl = ctrl)

summary(pls1)

## Data:    X dimension: 135 16 
##  Y dimension: 135 1
## Fit method: oscorespls
## Number of components considered: 5
## TRAINING: % variance explained
##           1 comps  2 comps  3 comps  4 comps  5 comps
## X           21.83    41.83    52.16    61.66    68.30
## .outcome    47.80    56.88    69.32    74.62    75.85

plot(pls1, main='PLS Components versus RMSE')

pls1$bestTune

##   ncomp
## 5     5

#Save lowest RMSE in results
pls1_results <- pls1$results %>% 
  filter(ncomp==5)

pls1_results

##   ncomp      RMSE  Rsquared       MAE    RMSESD RsquaredSD      MAESD
## 1     5 0.4571651 0.6828355 0.3640916 0.1350082  0.1558406 0.09594917

plot(varImp(pls1), main='PLS Variables by Importance')

plot(residuals(pls1) )

Ridge Regression

ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 15))

ridge1 <- train(x_train, y_train$MCI,
                        method = "ridge",
                        tuneGrid = ridgeGrid,
                        trControl = ctrl)

summary(ridge1)

##             Length Class      Mode     
## call          4    -none-     call     
## actions      17    -none-     list     
## allset       16    -none-     numeric  
## beta.pure   272    -none-     numeric  
## vn           16    -none-     character
## mu            1    -none-     numeric  
## normx        16    -none-     numeric  
## meanx        16    -none-     numeric  
## lambda        1    -none-     numeric  
## L1norm       17    -none-     numeric  
## penalty      17    -none-     numeric  
## df           17    -none-     numeric  
## Cp           17    -none-     numeric  
## sigma2        1    -none-     numeric  
## xNames       16    -none-     character
## problemType   1    -none-     character
## tuneValue     1    data.frame list     
## obsLevels     1    -none-     logical  
## param         0    -none-     list

plot(ridge1, main='RMSE vs Weight Decay')

ridge1$bestTune

##       lambda
## 6 0.03571429

ridge1_results <- ridge1$results  
ridge1_results<-ridge1_results[row.names(ridge1_results) == 6, ]
ridge1_results

##       lambda     RMSE  Rsquared       MAE    RMSESD RsquaredSD      MAESD
## 6 0.03571429 0.455275 0.6685421 0.3564199 0.1057322   0.171994 0.07441358

plot(varImp(ridge1), main='Ridge Variables by Importance')

plot(residuals(ridge1) )

Summary of linear models

We see similar variables in the top ten important variables list for all three linear models.

p1<-plot(varImp(lm1), top=10, main='LM')
p2<-plot(varImp(pls1), top=10, main='PLS')
p3<-plot(varImp(ridge1),top=10, main='Ridge')
gridExtra::grid.arrange(p1, p2,p3,  ncol = 3)

We find that PE has been replaced by Change in Exchange rate and loan growth rates as the top variables. The other important variables are deposit growth rate, monet supply M2, export growth rate and import growth rate.

Non Linear Models

kNN Model

knn1 <- train(x = x_train, y = y_train$MCI,
                   method = "knn",
                   tuneLength = 25, 
                   trControl = ctrl)

knn1

## k-Nearest Neighbors 
## 
## 135 samples
##  16 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 123, 122, 120, 120, 122, 121, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.4089157  0.7898811  0.2922318
##    7  0.4219275  0.7861062  0.3015809
##    9  0.4458421  0.7410069  0.3206597
##   11  0.4539090  0.7391085  0.3246510
##   13  0.4722146  0.7113184  0.3357770
##   15  0.4788309  0.6997660  0.3450264
##   17  0.4864670  0.6834446  0.3513716
##   19  0.4932438  0.6607236  0.3611435
##   21  0.5042321  0.6513884  0.3672226
##   23  0.5252982  0.6240902  0.3813403
##   25  0.5401341  0.6017261  0.3924175
##   27  0.5509078  0.5839875  0.4013634
##   29  0.5578200  0.5825403  0.4032653
##   31  0.5703572  0.5611357  0.4147600
##   33  0.5792777  0.5468639  0.4222959
##   35  0.5883212  0.5286205  0.4313152
##   37  0.5945825  0.5244124  0.4350325
##   39  0.6003365  0.5175872  0.4371826
##   41  0.6059037  0.5141532  0.4427362
##   43  0.6112735  0.5020125  0.4474559
##   45  0.6154121  0.4955817  0.4503586
##   47  0.6229250  0.4857005  0.4554690
##   49  0.6268308  0.4823045  0.4582686
##   51  0.6330973  0.4725126  0.4641476
##   53  0.6387549  0.4628344  0.4685732
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

knnPred1 <- predict(knn1, newdata = x_test)

knn_results1 <- postResample(pred = knnPred1, obs = y_test$MCI)
knn_results1

##      RMSE  Rsquared       MAE 
## 0.7188419 0.8105752 0.4376714

plot(varImp(knn1), main='kNN Variables by Importance')

plot(residuals(knn1), main='Residuals kNN Model')

Neural Network

nnetGrid <- expand.grid(.decay = c(0, .01, 1),
                        .size = c(1:10),
                        .bag = FALSE)
set.seed(110)
nnet1 <- train(x = x_train,
                  y = y_train$MCI,
                  method = "avNNet",
                  tuneGrid = nnetGrid,
                  trControl = ctrl,
                  linout = FALSE,  trace = FALSE,
                  MaxNWts = 5* (ncol(x_train) + 1) + 5 + 1,
                  maxit = 250)

nnet1

## Model Averaged Neural Network 
## 
## 135 samples
##  16 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 121, 123, 120, 122, 122, 122, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE      
##   0.00    1    0.8006616        NaN  0.6138065
##   0.00    2    0.8006616  0.5411679  0.6138065
##   0.00    3    0.8006616  0.7694807  0.6138065
##   0.00    4    0.8006616  0.3868924  0.6138065
##   0.00    5    0.8006616        NaN  0.6138065
##   0.00    6          NaN        NaN        NaN
##   0.00    7          NaN        NaN        NaN
##   0.00    8          NaN        NaN        NaN
##   0.00    9          NaN        NaN        NaN
##   0.00   10          NaN        NaN        NaN
##   0.01    1    0.6283319  0.6169347  0.5135357
##   0.01    2    0.6359674  0.6078016  0.5231397
##   0.01    3    0.6332928  0.6211338  0.5212375
##   0.01    4    0.6346771  0.6220899  0.5200557
##   0.01    5    0.6268023  0.6712441  0.5135239
##   0.01    6          NaN        NaN        NaN
##   0.01    7          NaN        NaN        NaN
##   0.01    8          NaN        NaN        NaN
##   0.01    9          NaN        NaN        NaN
##   0.01   10          NaN        NaN        NaN
##   1.00    1    0.7270736  0.6347139  0.6031713
##   1.00    2    0.7062747  0.6335868  0.5846712
##   1.00    3    0.6959433  0.6375719  0.5761733
##   1.00    4    0.6901206  0.6415519  0.5712686
##   1.00    5    0.6865532  0.6402069  0.5676773
##   1.00    6          NaN        NaN        NaN
##   1.00    7          NaN        NaN        NaN
##   1.00    8          NaN        NaN        NaN
##   1.00    9          NaN        NaN        NaN
##   1.00   10          NaN        NaN        NaN
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.01 and bag = FALSE.

summary(nnet1)

##             Length Class      Mode     
## model        5     -none-     list     
## repeats      1     -none-     numeric  
## bag          1     -none-     logical  
## seeds        5     -none-     numeric  
## names       16     -none-     character
## terms        3     terms      call     
## coefnames   16     -none-     character
## xlevels      0     -none-     list     
## xNames      16     -none-     character
## problemType  1     -none-     character
## tuneValue    3     data.frame list     
## obsLevels    1     -none-     logical  
## param        4     -none-     list

nnet1$bestTune

##    size decay   bag
## 15    5  0.01 FALSE

nnetPred1 <- predict(nnet1, newdata=x_test)
NNET1 <- postResample(pred = nnetPred1, obs = y_test$MCI)
NNET1

##      RMSE  Rsquared       MAE 
## 1.0486480 0.6621918 0.6928478

plotmo(nnet1)

##  plotmo grid:    TB5          BLR        CPI         CAC        DCR
##           -0.1084383 -0.009221552 -0.4738836 -0.02544334 -0.3416235
##           ELC        EXC         FXR         IPR         M2          M1
##  0.0005972519 -0.3511269 -0.09473711 -0.07587996 -0.1076045 -0.08482591
##          PE        STR        EXG        IMG       LOAG
##  -0.2797727 -0.2087861 -0.1724328 -0.1190359 -0.3120793

plot(varImp(nnet1), main= 'Nnet Variables by Importance')

Multivariate Adaptive Regression Splines (MARS)

set.seed(200)
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
mars1<- train(x = x_train, y =  y_train$MCI,
                  method = "earth", 
                  tuneGrid = marsGrid,
                  trControl = ctrl)
mars1

## Multivariate Adaptive Regression Spline 
## 
## 135 samples
##  16 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 120, 122, 121, 122, 123, 122, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE      
##   1        2      0.6253293  0.4634208  0.4688157
##   1        3      0.5184894  0.5975585  0.3909778
##   1        4      0.5040306  0.6290170  0.3800183
##   1        5      0.4457547  0.6934899  0.3489214
##   1        6      0.4184976  0.7187839  0.3279691
##   1        7      0.3978064  0.7287622  0.3122345
##   1        8      0.8475289  0.6794028  0.4456061
##   1        9      0.9474415  0.6759347  0.4696865
##   1       10      0.9363486  0.6901546  0.4656000
##   1       11      0.9651696  0.6921126  0.4738686
##   1       12      0.9826423  0.6830498  0.4857455
##   1       13      1.0025784  0.6858247  0.4852533
##   1       14      0.9961714  0.6834668  0.4844759
##   1       15      0.9482046  0.6712909  0.4749547
##   1       16      0.9916951  0.6744043  0.4906862
##   1       17      0.9951780  0.6623029  0.4890224
##   1       18      0.9897283  0.6680592  0.4853171
##   1       19      0.9865714  0.6717363  0.4858423
##   1       20      0.9933326  0.6669622  0.4857460
##   1       21      0.9930403  0.6679093  0.4855560
##   1       22      0.9930403  0.6679093  0.4855560
##   1       23      0.9930403  0.6679093  0.4855560
##   1       24      0.9930403  0.6679093  0.4855560
##   1       25      0.9930403  0.6679093  0.4855560
##   1       26      0.9930403  0.6679093  0.4855560
##   1       27      0.9930403  0.6679093  0.4855560
##   1       28      0.9930403  0.6679093  0.4855560
##   1       29      0.9930403  0.6679093  0.4855560
##   1       30      0.9930403  0.6679093  0.4855560
##   1       31      0.9930403  0.6679093  0.4855560
##   1       32      0.9930403  0.6679093  0.4855560
##   1       33      0.9930403  0.6679093  0.4855560
##   1       34      0.9930403  0.6679093  0.4855560
##   1       35      0.9930403  0.6679093  0.4855560
##   1       36      0.9930403  0.6679093  0.4855560
##   1       37      0.9930403  0.6679093  0.4855560
##   1       38      0.9930403  0.6679093  0.4855560
##   2        2      0.6372212  0.4362399  0.4754387
##   2        3      0.4658445  0.5765872  0.3612062
##   2        4      0.3894096  0.7007385  0.3047747
##   2        5      0.3592511  0.7382962  0.2929671
##   2        6      0.3496631  0.7542496  0.2843372
##   2        7      0.3372170  0.7718030  0.2736506
##   2        8      0.3409674  0.7767622  0.2731574
##   2        9      0.3519288  0.7797334  0.2786078
##   2       10      0.3612549  0.7745890  0.2798221
##   2       11      0.3562219  0.7805599  0.2743897
##   2       12      0.3562991  0.7856655  0.2712994
##   2       13      0.3541912  0.7952552  0.2689984
##   2       14      0.3611164  0.7911902  0.2764053
##   2       15      0.3582301  0.7902572  0.2707426
##   2       16      0.3840778  0.7483496  0.2911929
##   2       17      0.3823398  0.7532270  0.2926700
##   2       18      0.3894509  0.7460225  0.2959487
##   2       19      0.3970489  0.7381758  0.3013780
##   2       20      0.3987436  0.7415567  0.3038427
##   2       21      0.3987868  0.7417502  0.3042704
##   2       22      0.3987868  0.7417502  0.3042704
##   2       23      0.4028302  0.7365767  0.3069198
##   2       24      0.4028302  0.7365767  0.3069198
##   2       25      0.4028302  0.7365767  0.3069198
##   2       26      0.4028302  0.7365767  0.3069198
##   2       27      0.4028302  0.7365767  0.3069198
##   2       28      0.4028302  0.7365767  0.3069198
##   2       29      0.4028302  0.7365767  0.3069198
##   2       30      0.4028302  0.7365767  0.3069198
##   2       31      0.4028302  0.7365767  0.3069198
##   2       32      0.4028302  0.7365767  0.3069198
##   2       33      0.4028302  0.7365767  0.3069198
##   2       34      0.4028302  0.7365767  0.3069198
##   2       35      0.4028302  0.7365767  0.3069198
##   2       36      0.4028302  0.7365767  0.3069198
##   2       37      0.4028302  0.7365767  0.3069198
##   2       38      0.4028302  0.7365767  0.3069198
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 7 and degree = 2.

marsPred1 <- predict(mars, newdata=x_test)
MARS1 <- postResample(pred = marsPred1, obs = y_test$MCI)
MARS1

##      RMSE  Rsquared       MAE 
## 1.1980882 0.2436580 0.7450327

plotmo(mars1)

##  plotmo grid:    TB5          BLR        CPI         CAC        DCR
##           -0.1084383 -0.009221552 -0.4738836 -0.02544334 -0.3416235
##           ELC        EXC         FXR         IPR         M2          M1
##  0.0005972519 -0.3511269 -0.09473711 -0.07587996 -0.1076045 -0.08482591
##          PE        STR        EXG        IMG       LOAG
##  -0.2797727 -0.2087861 -0.1724328 -0.1190359 -0.3120793

plot(varImp(mars1), main='MARS Variables by Importance')

plot(residuals(mars1))

Support Vector Machines (SVM

set.seed(120)
svm1 <- train(x = x_train, y =  y_train$MCI,
                   method = "svmLinear",
                   tuneLength = 25,
                   trControl = trainControl(method = "cv"))
svm1

## Support Vector Machines with Linear Kernel 
## 
## 135 samples
##  16 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 122, 120, 122, 122, 122, 121, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.5216752  0.6033346  0.4004447
## 
## Tuning parameter 'C' was held constant at a value of 1

svmPred1 <- predict(svm, newdata=x_test)
SVM1<- postResample(pred = svmPred1, obs = y_test$MCI)
SVM1

##      RMSE  Rsquared       MAE 
## 1.1930641 0.2204846 0.7591112

plotmo(svm1)

##  plotmo grid:    TB5          BLR        CPI         CAC        DCR
##           -0.1084383 -0.009221552 -0.4738836 -0.02544334 -0.3416235
##           ELC        EXC         FXR         IPR         M2          M1
##  0.0005972519 -0.3511269 -0.09473711 -0.07587996 -0.1076045 -0.08482591
##          PE        STR        EXG        IMG       LOAG
##  -0.2797727 -0.2087861 -0.1724328 -0.1190359 -0.3120793

plot(varImp(svm1), main='SVM Variables by Importance')

Summary of non linear

p1<-plot(varImp(knn1), top=10, main='kNN')
p2<-plot(varImp(nnet1), top=10, main='Nnet')
p3<-plot(varImp(mars1),top=10, main='MARS')
p4<-plot(varImp(svm1), top=10, main='SVM')
gridExtra::grid.arrange(p1, p2,p3,p4,  ncol = 4)

We find similar kinds of variables in the top 10 list with non linear models also.

Evaluate models for insurance

The following table shows the RMSE and Rsquare for the different models.

#read in results
lm_res1<- c('Linear Model', lm1$results$RMSE, lm1$results$Rsquared, lm1$results$MAE)
pls_res1 <- c('Partial Least Square', pls1_results$RMSE, pls1_results$Rsquared, pls1_results$MAE)
ridge_res1 <- c('Ridge Regression', ridge1_results$RMSE, ridge1_results$Rsquared, ridge1_results$MAE)
knn_res1 <- c('KNN', knn_results1['RMSE'], knn_results1['Rsquared'], knn_results1['MAE'])
nn_res1<-c('Neural Network', NNET1[1], NNET1[2], NNET1[3])
mars_res1 <- c('Multivariate Adaptive Regression Spline', MARS1[1], MARS1[2], MARS1[3])
svm_res1 <- c('Support Vector Machines - Linear', SVM1[1], SVM1[2], SVM1[3])
#Combine and rename
results1<-rbind(lm_res1, pls_res1, ridge_res1, knn_res1, nn_res1, mars_res1, svm_res1)
colnames(results1) <- c('Model', 'RMSE', 'Rsquared', 'MAE')
row.names(results1) <- c(1:nrow(results1))
results1%>%kable(caption='Summary statistics Numerical') %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = T)

Summary statistics Numerical
Model	RMSE	Rsquared	MAE
Linear Model	0.441994714504095	0.713725381606877	0.362879692679264
Partial Least Square	0.457165079634718	0.68283545519077	0.364091568491284
Ridge Regression	0.455275044250047	0.668542109144494	0.356419860940034
KNN	0.718841873641116	0.810575215264818	0.437671374202208
Neural Network	1.04864800557038	0.66219183514197	0.692847769311332
Multivariate Adaptive Regression Spline	1.19808822090789	0.243657960011675	0.745032667431987
Support Vector Machines - Linear	1.1930640845381	0.220484580291216	0.759111175592163

kNN seems to do the best job with the highest Rsquare, although the RMSE is almost double those from the linear models.

Experiment 2 : Best model top variables

We will be repeating the best models with selected varaibles from the top 10 to see if we can simplify by restricting the number of variables

Bank Model Support Vector Machines

We will run the SVM model with the top seven variables. By trial and error we found that this gives an Rsquare very close to that by using all variables.

x_train_bank<-x_train[c('PE', 'IMG', 'CPI', 'LOAG', 'EXC', 'M2', 'DCR')]
x_test_bank<-x_test[c('PE', 'IMG', 'CPI', 'LOAG', 'EXC', 'M2', 'DCR')]
y_train <- as.data.frame(bank$mcb[splitt])
y_test <- as.data.frame(bank$mcb[-splitt])
#Correct column names
colnames(y_train) <- c("MCB")
colnames(y_test) <- c("MCB")

set.seed(200)
svm1 <- train(x = x_train_bank, y =  y_train$MCB,
                   method = "svmLinear",
                   tuneLength = 25,
                   trControl = trainControl(method = "cv"))
svm1

## Support Vector Machines with Linear Kernel 
## 
## 135 samples
##   7 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 120, 122, 121, 122, 123, 122, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.4578548  0.8120911  0.3589988
## 
## Tuning parameter 'C' was held constant at a value of 1

svmPred1 <- predict(svm1, newdata=x_test_bank)
SVM1<- postResample(pred = svmPred1, obs = y_test$MCB)
SVM1

##      RMSE  Rsquared       MAE 
## 0.4741013 0.7838049 0.3746547

We can see that using only the top 7 variables gives us an Rsquare of 78%, which is very close to the rSquare of 81% using all variables. The RMSE has increased slightly from 0.44 to 0.50.

Insurance Model kNN

We will choose the top 9 variables from the kNN model for insurance (by trial and error, we find that this gives the same Rsquare as using all the variables)

x_train_ins<-x_train[c('LOAG', 'PE', 'M2', 'DCR', 'IMG', 'TB5', 'CPI', 'M1', 'EXC')]
x_test_ins<-x_test[c('LOAG','PE', 'M2', 'DCR', 'IMG', 'TB5', 'CPI', 'M1', 'EXC')]
y_train <- as.data.frame(ins$mci[splitt])
y_test <- as.data.frame(ins$mci[-splitt])
#Correct column names
colnames(y_train) <- c("MCI")
colnames(y_test) <- c("MCI")

knn2 <- train(x = x_train_ins, y = y_train$MCI,
                   method = "knn",
                   tuneLength = 25, 
                   trControl = ctrl)

knn2

## k-Nearest Neighbors 
## 
## 135 samples
##   9 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 122, 122, 120, 122, 123, 121, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.3838326  0.8108170  0.2707159
##    7  0.4190181  0.7710194  0.2861551
##    9  0.4354676  0.7792612  0.3032874
##   11  0.4452021  0.7598363  0.3132272
##   13  0.4486929  0.7709435  0.3122924
##   15  0.4598753  0.7575727  0.3225840
##   17  0.4756409  0.7311253  0.3337932
##   19  0.4844690  0.7271731  0.3391267
##   21  0.5000145  0.7053357  0.3502120
##   23  0.5194768  0.6764689  0.3661537
##   25  0.5300205  0.6544241  0.3757139
##   27  0.5388365  0.6414956  0.3813517
##   29  0.5518070  0.6197262  0.3926299
##   31  0.5614596  0.6042875  0.3989024
##   33  0.5699084  0.5948181  0.4075900
##   35  0.5772494  0.5840137  0.4129995
##   37  0.5849488  0.5746314  0.4196079
##   39  0.5922449  0.5697349  0.4240797
##   41  0.6000519  0.5601955  0.4292254
##   43  0.6038245  0.5582317  0.4328874
##   45  0.6092438  0.5543898  0.4350027
##   47  0.6189312  0.5344113  0.4420499
##   49  0.6257952  0.5283148  0.4462889
##   51  0.6308411  0.5223434  0.4481774
##   53  0.6369616  0.5152562  0.4522889
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

knnPred2 <- predict(knn2, newdata = x_test_ins)
knn_results2 <- postResample(pred = knnPred2, obs = y_test$MCI)
knn_results2

##      RMSE  Rsquared       MAE 
## 0.6963655 0.8143327 0.3977580

The RMSE has actually decreased in this case from 0.72 to 0.70.

Conclusion

We find that using seven out of the 16 variables for the banking sector and nine out of the 16 variables for the insurance sector gives us an approximately 80% Rsquare with low RMSE in predicting the returns of those sectors. We conclude that macroeconomic variables can help predict the stock returns of both the banking and the insurance sectors on the Dhaka Stock Exchange.

Further research may be carried out using different lags (here a lag of 12 has been used to calculate returns) and test if the results improve and similar prediction levels can be reached with fewer variables.

References

[1] Garcia, Valeriano F. and Liu, Lin (1999) [Macroeconomic Determinants of Stock Market Development] (https://www.tandfonline.com/doi/abs/10.1080/15140326.1999.12040532)

[2] Nijam, H M., Ismail, SMM. And Musthafa, AMM. (2015) [The Impact of Macro-Economic Variables on Stock Performance; Evidence from Sri Lanka] (https://journals.co.za/doi/abs/10.10520/EJC-100b084600)

[3] Srivastava, A. (2010) * Relevance of Macro Economic factors for the Indian Stock Market*

[4] Haque, Abdul and Sarwar, Suleman (2012) Macro-Determinants of Stock Returns in Pakistan

[5] Nurazi, Ridwan and Usman, Berto (2016) [Bank Stock Returns in Responding the Contribution of Fundamental and Macroeconomic Effects] (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2822943)

[6] Okech, Timothy Chrispinus and Mugambi, Mike (2016) [Effect of Macroeconomic Variables on Stock Returns of Listed Commercial Banks in Kenya] (http://erepo.usiu.ac.ke/handle/11732/3014)

[7] Bhattarai, Bishnu Prasad (2018) [ The Firm Specific and Macroeconomic Variables Effects on Share Prices of Nepalese Commercial Banks and Insurance Companies] (http://buscompress.com/uploads/3/4/9/8/34980536/riber_7-s3_01k18-152_1-11.pdf)

[8] Ali, Md. Bayezid (2011) [ Impact of Micro and Macroeconomic Variables on Emerging Stock Market Return: A Case on Dhaka Stock Exchange (DSE)] (https://d1wqtxts1xzle7.cloudfront.net/30907270/idjrbjournal00025.pdf?1363030931=&response-content-disposition=inline%3B+filename%3DImpact_of_Micro_and_Macroeconomic_Variab.pdf&Expires=1616843755&Signature=EPMXQEMYmHVj2KHB2ckISk8Fz1jMSfMHnGpSVemDo4BHKNaN-K8IWTecO~WB81F6z8wQg1-02g9CHjWpGXVds47C4Z69HUBc6bHm3LjpRw6aBeFoi8~IbVPgZSZ0wuxWJTHR7ELRRY3Nk7L6qZi7Ywf7NN8ho1aTWJ~a7cG~45SbbjLRmYHA7QShsMxFUPyTlD8U7LW1ZGm-yDGpz1X01jkzgU-vkSJsQv92B3rQOuWn6cHHfCDiC~BUsa1gVfkKO0LzTk19KHR-mFOZijGKhZtb7zRkzQDTgLaox1EcIo32E1Es3uBKhizKeebbhHb2DO0V-~ym-hPuKytlUeUsDQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA)

[9] Mohiuddin, Md., Alam, Md. Didarul and Shahid, Abdullah Ibneyy (2008) [An Empirical Study of the Relationship between Macroeconomic Variables and Stock Price: A Study on Dhaka Stock Exchange] (https://core.ac.uk/download/pdf/6505614.pdf)

[10] Hasan, Mostofa Mahmud (2010) [Sector-wise stock return analysis: An Evidence from Dhaka Stock Exchange in Bangladesh] (https://d1wqtxts1xzle7.cloudfront.net/46085986/10850-32538-1-PB.pdf?1464668157=&response-content-disposition=inline%3B+filename%3DSector_Wise_Stock_Return_Analysis_An_Evi.pdf&Expires=1616845297&Signature=ZODj~2rpNWB-QSB389dl~7a3pLguchO2UVMrLlYAD2UEFzWnuFkA9wDfPbe9E3BW6gccJAB2KCCiav6-98t3nWJqC8r3oxb1lAoTa46lAF~dGVaOPdXrG6mucj~X8PmNqrVHE03-SsrAU9DdVWzwTmEY6Dmg3vvO1EvXTerHVhA4oTEX3UDK8HaN2aADqA69QoWlMv2TdipXHfH~HhgPHhqxmPvyEFYFXBQDj12wNeww3o87-oO92Bryhex19bnfomyh5cCpLXNMRRiAhgMxele4P~FvGUDP05hOETGS7gzKJvGmgWsxMCkqpjtRPJXgnUNkToSMkncpg15on5LaXQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA)

Data 698 proposal

Farhana Zahir

5/16/2021

Introduction

Review of the Literature

Research Questions

Methodology

Data Source

Exploratory Analysis

Banking Sector Models : Experiment 1 all variables

Linear Models

Partial Least Squares (PLS)

Ridge Regression

Non Linear Models

kNN Model

Neural Network

Multivariate Adaptive Regression Splines (MARS)

Support Vector Machines (SVM

Summary of non linear

Evaluate models for bank

Insurance Sector Models : Experiment 1 all variables

Linear Models

Partial Least Squares (PLS)

Ridge Regression

Summary of linear models

Non Linear Models

kNN Model

Neural Network

Multivariate Adaptive Regression Splines (MARS)

Support Vector Machines (SVM

Summary of non linear

Evaluate models for insurance

Experiment 2 : Best model top variables

Conclusion

References