LOGISTIC REGRESSION.

  • Use of logistic regression model to predict Direction using Lag1 through Lag5 and Volume.
glm.fit=glm(Direction∼Lag1+Lag2+Lag3+Lag4+Lag5+Volume ,
data=Smarket ,family =binomial )
  • Getting the summary.
summary (glm.fit)

Call:
glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + 
    Volume, family = binomial, data = Smarket)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.446  -1.203   1.065   1.145   1.326  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.126000   0.240736  -0.523    0.601
Lag1        -0.073074   0.050167  -1.457    0.145
Lag2        -0.042301   0.050086  -0.845    0.398
Lag3         0.011085   0.049939   0.222    0.824
Lag4         0.009359   0.049974   0.187    0.851
Lag5         0.010313   0.049511   0.208    0.835
Volume       0.135441   0.158360   0.855    0.392

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1731.2  on 1249  degrees of freedom
Residual deviance: 1727.6  on 1243  degrees of freedom
AIC: 1741.6

Number of Fisher Scoring iterations: 3
  • The smallest p-value here is associated with Lag1. The negative coefficient for this predictor suggests that if the market had a positive return yesterday, then it is less likely to go up today. However, at a value of 0.15, the p-value is still relatively large, and so there is no clear evidence of a real association between Lag1 and Direction.
  • Getting the coefficients for this fitted model.
coef(glm.fit)
 (Intercept)         Lag1         Lag2         Lag3         Lag4         Lag5       Volume 
-0.126000257 -0.073073746 -0.042301344  0.011085108  0.009358938  0.010313068  0.135440659 
  • Can also use the summary() function to access particular aspects of the fitted model, such as the p-values for the coefficients.
summary (glm.fit )$coef
                Estimate Std. Error    z value  Pr(>|z|)
(Intercept) -0.126000257 0.24073574 -0.5233966 0.6006983
Lag1        -0.073073746 0.05016739 -1.4565986 0.1452272
Lag2        -0.042301344 0.05008605 -0.8445733 0.3983491
Lag3         0.011085108 0.04993854  0.2219750 0.8243333
Lag4         0.009358938 0.04997413  0.1872757 0.8514445
Lag5         0.010313068 0.04951146  0.2082966 0.8349974
Volume       0.135440659 0.15835970  0.8552723 0.3924004
summary (glm.fit )$coef [,4]
(Intercept)        Lag1        Lag2        Lag3        Lag4        Lag5      Volume 
  0.6006983   0.1452272   0.3983491   0.8243333   0.8514445   0.8349974   0.3924004 
  • The predict() function can be used to predict the probability that the market will go up, given values of the predictors
glm.probs =predict (glm.fit ,type = "response")
glm.probs [1:10]
        1         2         3         4         5         6         7         8         9 
0.5070841 0.4814679 0.4811388 0.5152224 0.5107812 0.5069565 0.4926509 0.5092292 0.5176135 
       10 
0.4888378 
contrasts (Direction )
     Up
Down  0
Up    1
  • We must convert these predicted probabilities into class labels, Up or Down for us to make predictions whether the market will go up or down on a particular day.
glm.pred=rep ("Down " ,1250)
glm.pred[glm.probs >.5]=" Up"
  • The 2 above commands create a vector of class predictions based on whether the predicted probability of a market increase is greater than or less than 0.5.
  • The table() function can be used to produce a confusion matrix in order to determine how many observations were correctly or incorrectly classified.
table(glm.pred ,Direction)
        Direction
glm.pred Down  Up
    Up    457 507
   Down   145 141
(507+145)/1250
[1] 0.5216
mean(glm.pred== Direction)
[1] 0
train =(Year <2005)
Smarket.2005= Smarket [! train ,]
dim(Smarket.2005)
[1] 252   9
Direction.2005= Direction [! train]
  • Boolean vectors can be used to obtain a subset of the rows or columns of a matrix.
Smarket[train, ]
  • Smarket[!train,] yields a submatrix of the stock market data containing only the observations for which train is FALSE, i.e, the observations with dates in 2005.
Smarket[!train,]
  • Now fitting a logistic regression model using only the subset of the observations that correspond to dates before 2005, using the subset argument. We then obtain predicted probabilities of the stock market going up for each of the days in our test set—that is, for the days in 2005.
glm.fit=glm(Direction∼Lag1+Lag2+Lag3+Lag4+Lag5+Volume ,
data=Smarket ,family=binomial ,subset=train )
glm.probs = predict (glm.fit ,Smarket.2005 , type ="response")
  • We then compute the predictions for 2005 and compare them to the actual movements of the market over that time period.
glm.pred=rep ("Down " ,252)
glm.pred[glm.probs >.5]=" Up"
table(glm.pred ,Direction.2005)
        Direction.2005
glm.pred Down Up
    Up     34 44
   Down    77 97
mean(glm.pred== Direction.2005)
[1] 0
mean(glm.pred!= Direction.2005)
[1] 1
  • Below we have refit the logistic regression using just Lag1 and Lag2, which seemed to have the highest predictive power in the original logistic regression model.
glm.fit = glm(Direction ~ Lag1 + Lag2 ,data = Smarket ,family = binomial ,
subset = train)
glm.probs = predict (glm.fit ,Smarket.2005 , type = "response")
glm.pred = rep ("Down" ,252)
glm.pred[glm.probs > .5] = "Up"
table(glm.pred ,Direction.2005)
        Direction.2005
glm.pred Down  Up
    Down   35  35
    Up     76 106
mean(glm.pred== Direction.2005)
[1] 0.5595238
106/(106+76)
[1] 0.5824176
  • *The results appear to be better since 56% of the daily movements have been correctly predicted.Note that a much simpler strategy of predicting that the market will increase every day will also be correct 56% of the time.
  • In the event that we want to predict the returns associated with particular values of Lag1 and Lag2. In particular, we want to predict Direction on a day when Lag1 and Lag2 equal 1.2 and 1.1, respectively, and on the other, when they equal 1.5 and −0.8. We do this using the predict() function.
predict (glm.fit ,newdata = data.frame(Lag1 = c(1.2 ,1.5) ,
Lag2 = c(1.1 , -0.8) ),type = "response")
        1         2 
0.4791462 0.4960939 
LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCmF1dGhvcjogIkFkaGlzbGFjeSINCi0tLQ0KDQojI0xPR0lTVElDIFJFR1JFU1NJT04uDQoNCg0KPi0gKlVzZSBvZiBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVsIHRvIHByZWRpY3QgRGlyZWN0aW9uIHVzaW5nIExhZzEgdGhyb3VnaCBMYWc1IGFuZCBWb2x1bWUuKg0KDQpgYGB7cn0NCmdsbS5maXQ9Z2xtKERpcmVjdGlvbuKIvExhZzErTGFnMitMYWczK0xhZzQrTGFnNStWb2x1bWUgLA0KZGF0YT1TbWFya2V0ICxmYW1pbHkgPWJpbm9taWFsICkNCmBgYA0KDQo+LSAqR2V0dGluZyB0aGUgc3VtbWFyeS4qDQoNCmBgYHtyfQ0Kc3VtbWFyeSAoZ2xtLmZpdCkNCmBgYA0KDQoNCj4tICpUaGUgc21hbGxlc3QgcC12YWx1ZSBoZXJlIGlzIGFzc29jaWF0ZWQgd2l0aCBMYWcxLiBUaGUgbmVnYXRpdmUgY29lZmZpY2llbnQgZm9yIHRoaXMgcHJlZGljdG9yIHN1Z2dlc3RzIHRoYXQgaWYgdGhlIG1hcmtldCBoYWQgYSBwb3NpdGl2ZSByZXR1cm4geWVzdGVyZGF5LCB0aGVuIGl0IGlzIGxlc3MgbGlrZWx5IHRvIGdvIHVwIHRvZGF5LiBIb3dldmVyLCBhdCBhIHZhbHVlIG9mIDAuMTUsIHRoZSBwLXZhbHVlIGlzIHN0aWxsIHJlbGF0aXZlbHkgbGFyZ2UsIGFuZCBzbyB0aGVyZSBpcyBubyBjbGVhciBldmlkZW5jZSBvZiBhIHJlYWwgYXNzb2NpYXRpb24gYmV0d2VlbiBMYWcxIGFuZCBEaXJlY3Rpb24uKg0KDQoNCj4tICpHZXR0aW5nIHRoZSBjb2VmZmljaWVudHMgZm9yIHRoaXMgZml0dGVkIG1vZGVsLioNCg0KYGBge3J9DQpjb2VmKGdsbS5maXQpDQpgYGANCg0KDQo+LSAqQ2FuIGFsc28gdXNlIHRoZSBzdW1tYXJ5KCkgZnVuY3Rpb24gdG8gYWNjZXNzIHBhcnRpY3VsYXIgYXNwZWN0cyBvZiB0aGUgZml0dGVkIG1vZGVsLCBzdWNoIGFzIHRoZSBwLXZhbHVlcyBmb3IgdGhlIGNvZWZmaWNpZW50cy4qDQoNCmBgYHtyfQ0Kc3VtbWFyeSAoZ2xtLmZpdCApJGNvZWYNCmBgYA0KDQoNCmBgYHtyfQ0Kc3VtbWFyeSAoZ2xtLmZpdCApJGNvZWYgWyw0XQ0KYGBgDQoNCg0KPi0gKlRoZSBwcmVkaWN0KCkgZnVuY3Rpb24gY2FuIGJlIHVzZWQgdG8gcHJlZGljdCB0aGUgcHJvYmFiaWxpdHkgdGhhdCB0aGUgbWFya2V0IHdpbGwgZ28gdXAsIGdpdmVuIHZhbHVlcyBvZiB0aGUgcHJlZGljdG9ycyoNCg0KYGBge3J9DQpnbG0ucHJvYnMgPXByZWRpY3QgKGdsbS5maXQgLHR5cGUgPSAicmVzcG9uc2UiKQ0KYGBgDQoNCg0KYGBge3J9DQpnbG0ucHJvYnMgWzE6MTBdDQpgYGANCg0KDQpgYGB7cn0NCmNvbnRyYXN0cyAoRGlyZWN0aW9uICkNCmBgYA0KDQoNCj4tICpXZSBtdXN0IGNvbnZlcnQgdGhlc2UgcHJlZGljdGVkIHByb2JhYmlsaXRpZXMgaW50byBjbGFzcyBsYWJlbHMsIFVwIG9yIERvd24gZm9yIHVzIHRvIG1ha2UgcHJlZGljdGlvbnMgd2hldGhlciB0aGUgbWFya2V0IHdpbGwgZ28gdXAgb3IgZG93biBvbiBhIHBhcnRpY3VsYXIgZGF5LioNCg0KDQpgYGB7cn0NCmdsbS5wcmVkPXJlcCAoIkRvd24gIiAsMTI1MCkNCmBgYA0KDQpgYGB7cn0NCmdsbS5wcmVkW2dsbS5wcm9icyA+LjVdPSIgVXAiDQpgYGANCg0KPi0gKlRoZSAyIGFib3ZlIGNvbW1hbmRzIGNyZWF0ZSBhIHZlY3RvciBvZiBjbGFzcyBwcmVkaWN0aW9ucyBiYXNlZCBvbiB3aGV0aGVyIHRoZSBwcmVkaWN0ZWQgcHJvYmFiaWxpdHkgb2YgYSBtYXJrZXQgaW5jcmVhc2UgaXMgZ3JlYXRlciB0aGFuIG9yIGxlc3MgdGhhbiAwLjUuKg0KDQoNCj4tICpUaGUgdGFibGUoKSBmdW5jdGlvbiBjYW4gYmUgdXNlZCB0byBwcm9kdWNlIGEgY29uZnVzaW9uIG1hdHJpeCBpbiBvcmRlciB0byBkZXRlcm1pbmUgaG93IG1hbnkgb2JzZXJ2YXRpb25zIHdlcmUgY29ycmVjdGx5IG9yIGluY29ycmVjdGx5IGNsYXNzaWZpZWQuKg0KDQpgYGB7cn0NCnRhYmxlKGdsbS5wcmVkICxEaXJlY3Rpb24pDQpgYGANCg0KDQpgYGB7cn0NCig1MDcrMTQ1KS8xMjUwDQpgYGANCg0KDQpgYGB7cn0NCm1lYW4oZ2xtLnByZWQ9PSBEaXJlY3Rpb24pDQpgYGANCg0KDQpgYGB7cn0NCnRyYWluID0oWWVhciA8MjAwNSkNCmBgYA0KDQoNCmBgYHtyfQ0KU21hcmtldC4yMDA1PSBTbWFya2V0IFshIHRyYWluICxdDQpgYGANCg0KDQpgYGB7cn0NCmRpbShTbWFya2V0LjIwMDUpDQpgYGANCg0KDQpgYGB7cn0NCkRpcmVjdGlvbi4yMDA1PSBEaXJlY3Rpb24gWyEgdHJhaW5dDQpgYGANCg0KDQo+LSAqQm9vbGVhbiB2ZWN0b3JzIGNhbiBiZSB1c2VkIHRvIG9idGFpbiBhIHN1YnNldCBvZiB0aGUgcm93cyBvciBjb2x1bW5zIG9mIGEgbWF0cml4LioNCg0KYGBge3J9DQpTbWFya2V0W3RyYWluLCBdDQpgYGANCg0KDQo+LSAqU21hcmtldFshdHJhaW4sXSB5aWVsZHMgYSBzdWJtYXRyaXggb2YgdGhlIHN0b2NrIG1hcmtldCBkYXRhIGNvbnRhaW5pbmcgb25seSB0aGUgb2JzZXJ2YXRpb25zIGZvciB3aGljaCB0cmFpbiBpcyBGQUxTRSwgaS5lLCB0aGUgb2JzZXJ2YXRpb25zIHdpdGggZGF0ZXMgaW4gMjAwNS4qDQoNCmBgYHtyfQ0KU21hcmtldFshdHJhaW4sXQ0KYGBgDQoNCg0KPi0gKk5vdyBmaXR0aW5nIGEgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbCB1c2luZyBvbmx5IHRoZSBzdWJzZXQgb2YgdGhlIG9ic2VydmF0aW9ucyB0aGF0IGNvcnJlc3BvbmQgdG8gZGF0ZXMgYmVmb3JlIDIwMDUsIHVzaW5nIHRoZSBzdWJzZXQgYXJndW1lbnQuIFdlIHRoZW4gb2J0YWluIHByZWRpY3RlZCBwcm9iYWJpbGl0aWVzIG9mIHRoZSBzdG9jayBtYXJrZXQgZ29pbmcgdXAgZm9yIGVhY2ggb2YgdGhlIGRheXMgaW4gb3VyIHRlc3Qgc2V04oCUdGhhdCBpcywgZm9yIHRoZSBkYXlzIGluIDIwMDUuKg0KDQpgYGB7cn0NCmdsbS5maXQ9Z2xtKERpcmVjdGlvbuKIvExhZzErTGFnMitMYWczK0xhZzQrTGFnNStWb2x1bWUgLA0KZGF0YT1TbWFya2V0ICxmYW1pbHk9Ymlub21pYWwgLHN1YnNldD10cmFpbiApDQpgYGANCg0KDQpgYGB7cn0NCmdsbS5wcm9icyA9IHByZWRpY3QgKGdsbS5maXQgLFNtYXJrZXQuMjAwNSAsIHR5cGUgPSJyZXNwb25zZSIpDQpgYGANCg0KDQo+LSAqV2UgdGhlbiBjb21wdXRlIHRoZSBwcmVkaWN0aW9ucyBmb3IgMjAwNSBhbmQgY29tcGFyZSB0aGVtIHRvIHRoZSBhY3R1YWwgbW92ZW1lbnRzIG9mIHRoZSBtYXJrZXQgb3ZlciB0aGF0IHRpbWUgcGVyaW9kLioNCg0KYGBge3J9DQpnbG0ucHJlZD1yZXAgKCJEb3duICIgLDI1MikNCmBgYA0KDQoNCmBgYHtyfQ0KZ2xtLnByZWRbZ2xtLnByb2JzID4uNV09IiBVcCINCmBgYA0KDQoNCmBgYHtyfQ0KdGFibGUoZ2xtLnByZWQgLERpcmVjdGlvbi4yMDA1KQ0KYGBgDQoNCg0KYGBge3J9DQptZWFuKGdsbS5wcmVkPT0gRGlyZWN0aW9uLjIwMDUpDQpgYGANCg0KDQpgYGB7cn0NCm1lYW4oZ2xtLnByZWQhPSBEaXJlY3Rpb24uMjAwNSkNCmBgYA0KDQoNCj4tICpCZWxvdyB3ZSBoYXZlIHJlZml0IHRoZSBsb2dpc3RpYyByZWdyZXNzaW9uIHVzaW5nIGp1c3QgTGFnMSBhbmQgTGFnMiwgd2hpY2ggc2VlbWVkIHRvIGhhdmUgdGhlIGhpZ2hlc3QgcHJlZGljdGl2ZSBwb3dlciBpbiB0aGUgb3JpZ2luYWwgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbC4qDQoNCmBgYHtyfQ0KZ2xtLmZpdCA9IGdsbShEaXJlY3Rpb24gfiBMYWcxICsgTGFnMiAsZGF0YSA9IFNtYXJrZXQgLGZhbWlseSA9IGJpbm9taWFsICwNCnN1YnNldCA9IHRyYWluKQ0KYGBgDQoNCg0KYGBge3J9DQpnbG0ucHJvYnMgPSBwcmVkaWN0IChnbG0uZml0ICxTbWFya2V0LjIwMDUgLCB0eXBlID0gInJlc3BvbnNlIikNCmBgYA0KDQoNCmBgYHtyfQ0KZ2xtLnByZWQgPSByZXAgKCJEb3duIiAsMjUyKQ0KYGBgDQoNCg0KYGBge3J9DQpnbG0ucHJlZFtnbG0ucHJvYnMgPiAuNV0gPSAiVXAiDQpgYGANCg0KDQpgYGB7cn0NCnRhYmxlKGdsbS5wcmVkICxEaXJlY3Rpb24uMjAwNSkNCmBgYA0KDQoNCmBgYHtyfQ0KbWVhbihnbG0ucHJlZD09IERpcmVjdGlvbi4yMDA1KQ0KYGBgDQoNCg0KYGBge3J9DQoxMDYvKDEwNis3NikNCmBgYA0KDQoNCj4tICpUaGUgcmVzdWx0cyBhcHBlYXIgdG8gYmUgYmV0dGVyIHNpbmNlIDU2JSBvZiB0aGUgZGFpbHkgbW92ZW1lbnRzIGhhdmUgYmVlbiBjb3JyZWN0bHkgcHJlZGljdGVkLk5vdGUgdGhhdCBhIG11Y2ggc2ltcGxlciBzdHJhdGVneSBvZiBwcmVkaWN0aW5nIHRoYXQgdGhlIG1hcmtldCB3aWxsIGluY3JlYXNlIGV2ZXJ5IGRheSB3aWxsDQphbHNvIGJlIGNvcnJlY3QgNTYlIG9mIHRoZSB0aW1lLiANCg0KDQoNCj4tICpJbiB0aGUgZXZlbnQgdGhhdCB3ZSB3YW50IHRvIHByZWRpY3QgdGhlIHJldHVybnMgYXNzb2NpYXRlZCB3aXRoIHBhcnRpY3VsYXIgdmFsdWVzIG9mIExhZzEgYW5kIExhZzIuIEluIHBhcnRpY3VsYXIsIHdlIHdhbnQgdG8gcHJlZGljdCBEaXJlY3Rpb24gb24gYSBkYXkgd2hlbiBMYWcxIGFuZCBMYWcyIGVxdWFsIDEuMiBhbmQgMS4xLCByZXNwZWN0aXZlbHksIGFuZCBvbiB0aGUgb3RoZXIsIHdoZW4gdGhleSBlcXVhbCAxLjUgYW5kIOKIkjAuOC4gV2UgZG8gdGhpcyB1c2luZyB0aGUgcHJlZGljdCgpIGZ1bmN0aW9uLioNCg0KYGBge3J9DQpwcmVkaWN0IChnbG0uZml0ICxuZXdkYXRhID0gZGF0YS5mcmFtZShMYWcxID0gYygxLjIgLDEuNSkgLA0KTGFnMiA9IGMoMS4xICwgLTAuOCkgKSx0eXBlID0gInJlc3BvbnNlIikNCmBgYA0KDQo=