Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Data

data source: Dr.Oguz Akbilgic, oguzakbilgic ‘@’ gmail.com University of Tennessee, Knoxville

Data is collected from imkb.gov.tr and finance.yahoo.com. Data is organized with regard to working days in Istanbul Stock Exchange.

Stock exchange returns. Istanbul stock exchange national 100 index, Standard & poor’s 500 return index, Stock market return index of Germany, Stock market return index of UK, Stock market return index of Japan, Stock market return index of Brazil, MSCI European index, MSCI emerging markets index.

I will use this dataset to build a model to see how well the Istanbul Stock Exchange can be predicted using a linear function of a set of variables including the indexes of Standard & poor’s 500 return index, Stock market return index of Germany, Stock market return index of UK, Stock market return index of Japan, Stock market return index of Brazil, MSCI European index, MSCI emerging markets index.

stock <- read.csv("https://raw.githubusercontent.com/YunMai-SPS/DATA605_homework/master/data605_week11/data_akbilgic.csv?token=AX_Wu01wPi1P-ODNoTC5AFvE77jDTKZqks5aEdUfwA%3D%3D")

stock <-data.frame(lapply(stock, as.character), stringsAsFactors=FALSE)

colnames(stock) <- as.character(stock[1,])
stock <-stock[-1,]
stock[,-1] <-data.frame(lapply(stock[,-1], as.numeric))
colnames(stock)[2] <- "ISE_TL"
colnames(stock)[3] <- "ISE_US"

Model: do backward elimination to determine the predictors to be kept in the the model.

stock.lm <- lm(ISE_US ~ SP+DAX+FTSE+NIKKEI+BOVESPA+EU+EM, data=stock)
summary(stock.lm)
## 
## Call:
## lm(formula = ISE_US ~ SP + DAX + FTSE + NIKKEI + BOVESPA + EU + 
##     EM, data = stock)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.054156 -0.007969  0.000542  0.007853  0.051871 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0005485  0.0005963   0.920 0.358106    
## SP           0.0548698  0.0705293   0.778 0.436934    
## DAX         -0.2126757  0.1204698  -1.765 0.078077 .  
## FTSE        -0.2059296  0.1520761  -1.354 0.176277    
## NIKKEI       0.0408475  0.0511000   0.799 0.424439    
## BOVESPA     -0.2440132  0.0665978  -3.664 0.000273 ***
## EU           1.0917300  0.2141996   5.097 4.82e-07 ***
## EM           0.9923386  0.1126628   8.808  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01371 on 528 degrees of freedom
## Multiple R-squared:  0.5843, Adjusted R-squared:  0.5788 
## F-statistic:   106 on 7 and 528 DF,  p-value: < 2.2e-16

remove the predictor with the largest p-value that is larger than 0.05 from the model

stock.lm <- update(stock.lm, .~. - SP, data = stock)
summary(stock.lm)
## 
## Call:
## lm(formula = ISE_US ~ DAX + FTSE + NIKKEI + BOVESPA + EU + EM, 
##     data = stock)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.054798 -0.007957  0.000362  0.007859  0.052032 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0005545  0.0005960   0.930 0.352660    
## DAX         -0.1949370  0.1182482  -1.649 0.099835 .  
## FTSE        -0.1993698  0.1517855  -1.313 0.189585    
## NIKKEI       0.0418392  0.0510651   0.819 0.412967    
## BOVESPA     -0.2163260  0.0562702  -3.844 0.000136 ***
## EU           1.0961353  0.2140448   5.121 4.26e-07 ***
## EM           0.9761992  0.1106950   8.819  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0137 on 529 degrees of freedom
## Multiple R-squared:  0.5838, Adjusted R-squared:  0.5791 
## F-statistic: 123.7 on 6 and 529 DF,  p-value: < 2.2e-16
stock.lm <- update(stock.lm, .~. - FTSE, data = stock)
summary(stock.lm)
## 
## Call:
## lm(formula = ISE_US ~ DAX + NIKKEI + BOVESPA + EU + EM, data = stock)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.055500 -0.007924  0.000430  0.007781  0.053103 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0005374  0.0005963   0.901 0.367847    
## DAX         -0.1650660  0.1161202  -1.422 0.155757    
## NIKKEI       0.0469994  0.0509485   0.922 0.356694    
## BOVESPA     -0.2157796  0.0563072  -3.832 0.000142 ***
## EU           0.8834953  0.1401219   6.305 6.07e-10 ***
## EM           0.9673656  0.1105661   8.749  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01371 on 530 degrees of freedom
## Multiple R-squared:  0.5824, Adjusted R-squared:  0.5785 
## F-statistic: 147.9 on 5 and 530 DF,  p-value: < 2.2e-16
stock.lm <- update(stock.lm, .~. - NIKKEI, data = stock)
summary(stock.lm)
## 
## Call:
## lm(formula = ISE_US ~ DAX + BOVESPA + EU + EM, data = stock)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.055814 -0.007854  0.000290  0.008038  0.052443 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0005149  0.0005957   0.864    0.388    
## DAX         -0.1653683  0.1161034  -1.424    0.155    
## BOVESPA     -0.2315351  0.0536468  -4.316 1.90e-05 ***
## EU           0.8769010  0.1399198   6.267 7.61e-10 ***
## EM           1.0261241  0.0903628  11.356  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01371 on 531 degrees of freedom
## Multiple R-squared:  0.5818, Adjusted R-squared:  0.5786 
## F-statistic: 184.7 on 4 and 531 DF,  p-value: < 2.2e-16
stock.lm <- update(stock.lm, .~. -DAX, data = stock)
summary(stock.lm)
## 
## Call:
## lm(formula = ISE_US ~ BOVESPA + EU + EM, data = stock)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.054666 -0.007918  0.000285  0.007882  0.051979 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0004758  0.0005956   0.799    0.425    
## BOVESPA     -0.2335540  0.0536799  -4.351 1.63e-05 ***
## EU           0.7024443  0.0677052  10.375  < 2e-16 ***
## EM           1.0303495  0.0904013  11.398  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01372 on 532 degrees of freedom
## Multiple R-squared:  0.5802, Adjusted R-squared:  0.5778 
## F-statistic: 245.1 on 3 and 532 DF,  p-value: < 2.2e-16

the final model:

\(ISE = 0.0004758 + -0.2335540\times BOVESPA + 0.7024443 \times EU + 1.0303495 \times EM\)

Residule Analysis

plot(fitted(stock.lm),resid(stock.lm))
abline(0, 0) 

The above plot shows that the residuals appear to be somewhat uniformly scattered about zero. This plot did not show that the model is invalid. The draw the Q-Q plot to see the whether the distribution of the redidules is normal.

qqnorm(resid(stock.lm))
qqline(resid(stock.lm))

The Q-Q plot shows that the residuals roughly follow the indicated line.

reference:

Paper: Akbilgic, O., Bozdogan, H., Balaban, M.E., (2013) A novel Hybrid RBF Neural Networks model as a forecaster, Statistics and Computing. DOI 10.1007/s11222-013-9375-7

PhD Thesis: Oguz Akbilgic, (2011) Hibrit Radyal Tabanlı Fonksiyon AÄYları ile DeÄYiÅYken Seçimi ve Tahminleme: Menkul Kıymet Yatırım Kararlarına İliÅYkin Bir Uygulama, Istanbul University