1 Recap the basics of the regression modelling in R

2 Read relevant literature

3 Learn (value relevance) modelling by example

A. Download and import the datasheet

  • All files are available at this shared folder

  • For this Tutorial, download the file “FU902-dataset-02” to your local computer

  • Set a working directory using the command setwd(“C:/…/…/…/…”); where you have the downloaded file stored and where outputs will be exported (e.g., setwd(“C:/Users/MyPC/Documents/R/Paper01”)

  • Upload the download file from the working directory (for how to upload data in other formats, like txt, csv, please refer to the R-Basics)

library(readxl)
dataset <- as.data.frame(read_excel("FU902-dataset-02.xlsx"))
attach(dataset)

B. Descriptive statistics

  1. Calculate the descriptive statistics for Earnings, Revenue, Total assets, Equity, Share return, Earnings per share, Cash flows from operations per share, Equity per share
library(psych)
describeBy(subset(dataset, select=c("Revenue", "Earnings", "Total assets","Equity", "Return", "CFO", "EPS","CFOPS","EQPS")))
##              vars   n        mean          sd     median    trimmed        mad
## Revenue         1 396  2890594.80  4668241.79 1066914.00 1814930.11 1519771.75
## Earnings        2 396   247448.28   423529.88  108989.50  197991.79  163160.87
## Total assets    3 396 15742613.96 35685555.49 3695848.00 8273928.19 5436429.56
## Equity          4 395  2425160.12  2825823.79 1440325.00 1940964.89 2068808.18
## Return          5 396        0.27        1.22       0.10       0.12       0.38
## CFO             6 396   530785.74  1082452.79  166278.00  401520.00  312070.99
## EPS             7 396        3.13        8.87       0.64       1.08       0.96
## CFOPS           8 396        4.40       15.55       0.79       1.80       1.41
## EQPS            9 396       18.18       36.61       6.65       9.51       8.81
##                      min          max        range skew kurtosis         se
## Revenue         -5754.00  28769966.00  28775720.00 2.90     9.51  234587.98
## Earnings     -1389358.00   2711539.00   4100897.00 1.16     5.75   21283.18
## Total assets        0.00 220659433.00 220659433.00 4.44    20.54 1793266.64
## Equity         -15923.00  13871914.00  13887837.00 1.47     2.02  142182.63
## Return             -0.84        16.07        16.91 8.79    97.46       0.06
## CFO          -3346000.00   6722000.00  10068000.00 1.63     7.34   54395.30
## EPS                -6.00        62.62        68.61 4.12    17.92       0.45
## CFOPS             -78.81       117.56       196.37 3.20    21.12       0.78
## EQPS               -0.87       319.24       320.11 4.37    23.64       1.84

C. Define models

  1. For each explanatory variable (EPS, CFOPS, EQPS), define a model that regress returns against the level of the variable in the basic model and the level of the variable, as well as the change-for-the-period in the extended model
  • Model 1A: \[ R_(i,t)= α+β_1* EPS_(i,t)+ε_(i,t) \]
  • Model 1B: \[ R_(i,t)= α+β_1* EPS_(i,t)+β_2* ∆EPS_(i,t)+ε_(i,t) \]
  • Model 2A: \[ R_(i,t)= α+β_1* CFOPS_(i,t)+ε_(i,t) \]
  • Model 2B: \[ R_(i,t)= α+β_1* CFOPS_(i,t)+β_2*∆CFOPS_(i,t)+ε_(i,t) \]
  • Model 3A: \[ R_(i,t)= α+β_1*EQPS_(i,t)+ε_(i,t) \]
  • Model 3B: \[ R_(i,t)= α+β_1*EQPS_(i,t) + β_2*∆EQPS_(i,t)+ε_(i,t) \]
Model1a <- (Return ~ EPSd)
Model1b <- (Return ~ EPSd + dEPSd)
Model2a <- (Return ~ CFOPSd)
Model2b <- (Return ~ CFOPSd + dCFOPSd)
Model3a <- (Return ~ EQPSd)
Model3b <- (Return ~ EQPSd + dEQPSd)
  1. Define a combined model that regress returns against the level of all three variables in the basic model, as well as the change-for-the-period in the extended model
  • Model 4A: \[ R_(i,t)= α+β_1*EPS_(i,t)+ β_2*CFOPS_(i,t)+ β_3*EQPS_(i,t)+ε_(i,t) \]
  • Model 4B: \[ R_(i,t)= α+β_1*EPS_(i,t)+ β_2*∆EPS_(i,t)+ β_3*CFOPS_(i,t)+β_4*∆CFOPS_(i,t)+ β_5* EQPS_(i,t)+β_6*∆BVPS_(i,t)+ε_(i,t) \]
Model4a <- (Return ~ EPSd + CFOPSd + EQPSd)
Model4b <- (Return ~ EPSd + dEPSd + CFOPSd + dCFOPSd + EQPSd + dEQPSd)

D. Regression model

Suggested reading for the plm package

  1. Run the OLS regression, panel data regression with the Fixed Effects (fixed for firm) and with the Random Effects
library(plm)
OLS1a <- plm(Model1a, dataset, model= "pooling")
summary(OLS1a)
## Pooling Model
## 
## Call:
## plm(formula = Model1a, data = dataset, model = "pooling")
## 
## Balanced Panel: n = 33, T = 12, N = 396
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -1.866087 -0.443718 -0.199199  0.093122 11.665962 
## 
## Coefficients:
##             Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept) 0.168126   0.056482  2.9767  0.003094 ** 
## EPSd        2.372685   0.249498  9.5098 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    590.58
## Residual Sum of Squares: 480.33
## R-Squared:      0.18668
## Adj. R-Squared: 0.18462
## F-statistic: 90.4367 on 1 and 394 DF, p-value: < 2.22e-16
FE1a <- plm(Model1a, dataset, effect="individual", model= "within")
summary(FE1a)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = Model1a, data = dataset, effect = "individual", 
##     model = "within")
## 
## Balanced Panel: n = 33, T = 12, N = 396
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -2.881429 -0.337051 -0.081025  0.218799  9.339446 
## 
## Coefficients:
##      Estimate Std. Error t-value  Pr(>|t|)    
## EPSd  3.32292    0.25393  13.086 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    536.65
## Residual Sum of Squares: 364.32
## R-Squared:      0.32113
## Adj. R-Squared: 0.25924
## F-statistic: 171.239 on 1 and 362 DF, p-value: < 2.22e-16
RE1a <- plm(Model1a, dataset, model= "random")
summary(RE1a)
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = Model1a, data = dataset, model = "random")
## 
## Balanced Panel: n = 33, T = 12, N = 396
## 
## Effects:
##                   var std.dev share
## idiosyncratic 1.00640 1.00319  0.96
## individual    0.04144 0.20357  0.04
## theta: 0.1819
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -1.96277 -0.42207 -0.17517  0.11799 11.25359 
## 
## Coefficients:
##             Estimate Std. Error z-value Pr(>|z|)    
## (Intercept) 0.156759   0.066146  2.3699  0.01779 *  
## EPSd        2.641152   0.248727 10.6187  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    572.75
## Residual Sum of Squares: 445.31
## R-Squared:      0.22251
## Adj. R-Squared: 0.22053
## Chisq: 112.757 on 1 DF, p-value: < 2.22e-16

In all three models, Earnings per share are significantly associated with the share return. This is fine as far as for the robustness of the results, but still the best model should be identified.

  1. Selection tests among OLS, FE, RE
  • Lagrange Multiplier Test - (Breusch-Pagan) to decide between OLS and RE
plmtest(OLS1a, effect = "individual", type = "bp")
## 
##  Lagrange Multiplier Test - (Breusch-Pagan)
## 
## data:  Model1a
## chisq = 43.039, df = 1, p-value = 5.366e-11
## alternative hypothesis: significant effects
  • F Test for individual effects to decide between OLS and FE
pFtest (FE1a, OLS1a)
## 
##  F test for individual effects
## 
## data:  Model1a
## F = 3.6024, df1 = 32, df2 = 362, p-value = 1.474e-09
## alternative hypothesis: significant effects
  • Hausman Test to decide between RE and FE
phtest(Model1a, dataset)
## 
##  Hausman Test
## 
## data:  Model1a
## chisq = 177.63, df = 1, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent

Note: Both the FE and RE models are superior to the OLS model. Therefore, the Hausman Test is necessary to decide which of the panel data approach is more preferable. As the p-value is less than 0.05, FE shall be chosen.

  1. Run diagnostic tests
  • Check for heteroskedasticity (Breusch-Pagan Test)
library(lmtest)
bptest(Model1a, data = dataset)
## 
##  studentized Breusch-Pagan test
## 
## data:  Model1a
## BP = 31.627, df = 1, p-value = 1.868e-08
  • Check for autocorrelation (Durbin-Watson Test)
dwtest(Model1a, data = dataset)
## 
##  Durbin-Watson test
## 
## data:  Model1a
## DW = 1.8728, p-value = 0.1028
## alternative hypothesis: true autocorrelation is greater than 0
  • Check for cross-sectional dependence (Perasan CD Test)
pcdtest(Model1a, dataset, index = c("Company", "Year"))
## 
##  Pesaran CD test for cross-sectional dependence in panels
## 
## data:  Return ~ EPSd
## z = 28.694, p-value < 2.2e-16
## alternative hypothesis: cross-sectional dependence
  • Check for serial correlation in FE panels (Wooldridge’s CD Test)
pwartest(Model1a, dataset, effect = "individual")
## 
##  Wooldridge's test for serial correlation in FE panels
## 
## data:  plm.model
## F = 1.6332, df1 = 1, df2 = 361, p-value = 0.2021
## alternative hypothesis: serial correlation

Note: As the FE model exhibits heteroskedasticity and cross-sectional dependence, it shall be fixed, using clustered (robust) errors. In this case, Driscoll and Kraay estimator (also build in plm package) would be the most suitable

coeftest(FE1a, vcov = vcovSCC)
## 
## t test of coefficients:
## 
##      Estimate Std. Error t value Pr(>|t|)  
## EPSd   3.3229     1.3677  2.4296   0.0156 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note: After the correction, the EPSd is still significant, but at a lower significance level, as the p-value has increased

Highly recommended reading (robust standard error estimators for panel models):

  1. Run remaining models
OLS1b <- plm(Model1b, dataset, model= "pooling")
FE1b <- plm(Model1b, dataset, effect="individual", model= "within")
RE1b <- plm(Model1b, dataset, model= "random", random.method="swar")
OLS2a <- plm(Model2a, dataset, model= "pooling")
FE2a <- plm(Model2a, dataset, effect="individual", model= "within")
RE2a <- plm(Model2a, dataset, model= "random", random.method="swar")
OLS2b <- plm(Model2b, dataset, model= "pooling")
FE2b <- plm(Model2b, dataset, effect="individual", model= "within")
RE2b <- plm(Model2b, dataset, model= "random", random.method="swar")
OLS3a <- plm(Model3a, dataset, model= "pooling")
FE3a <- plm(Model3a, dataset, effect="individual", model= "within")
RE3a <- plm(Model3a, dataset, model= "random", random.method="swar")
OLS3b <- plm(Model3b, dataset, model= "pooling")
FE3b <- plm(Model3b, dataset, effect="individual", model= "within")
RE3b <- plm(Model3b, dataset, model= "random", random.method="swar")
OLS4a <- plm(Model4a, dataset, model= "pooling")
FE4a <- plm(Model4a, dataset, effect="individual", model= "within")
RE4a <- plm(Model4a, dataset, model= "random", random.method="swar")
OLS4b <- plm(Model4b, dataset, model= "pooling")
FE4b <- plm(Model4b, dataset, effect="individual", model= "within")
RE4b <- plm(Model4b, dataset, model= "random", random.method="swar")
  • Presentation of all OLS models
library(stargazer)
stargazer(OLS1a, OLS1b, type = "text", object.names = TRUE)
## 
## ============================================================
##                            Dependent variable:              
##              -----------------------------------------------
##                                  Return                     
##                        (1)                     (2)          
##                       OLS1a                   OLS1b         
## ------------------------------------------------------------
## EPSd                2.373***                1.578***        
##                      (0.249)                 (0.284)        
##                                                             
## dEPSd                                       1.479***        
##                                              (0.277)        
##                                                             
## Constant            0.168***                0.178***        
##                      (0.056)                 (0.055)        
##                                                             
## ------------------------------------------------------------
## Observations           396                     396          
## R2                    0.187                   0.242         
## Adjusted R2           0.185                   0.238         
## F Statistic  90.437*** (df = 1; 394) 62.584*** (df = 2; 393)
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01
stargazer(OLS2a, OLS2b, type = "text", object.names = TRUE)
## 
## ===========================================================
##                           Dependent variable:              
##              ----------------------------------------------
##                                  Return                    
##                        (1)                    (2)          
##                       OLS2a                  OLS2b         
## -----------------------------------------------------------
## CFOPSd              0.406***                 0.262*        
##                      (0.108)                (0.149)        
##                                                            
## dCFOPSd                                      0.237         
##                                             (0.168)        
##                                                            
## Constant            0.209***                0.231***       
##                      (0.063)                (0.065)        
##                                                            
## -----------------------------------------------------------
## Observations           396                    396          
## R2                    0.034                  0.039         
## Adjusted R2           0.032                  0.034         
## F Statistic  13.998*** (df = 1; 394) 8.011*** (df = 2; 393)
## ===========================================================
## Note:                           *p<0.1; **p<0.05; ***p<0.01
stargazer(OLS3a, OLS3b, type = "text", object.names = TRUE)
## 
## ====================================================
##                        Dependent variable:          
##              ---------------------------------------
##                              Return                 
##                      (1)                 (2)        
##                     OLS3a               OLS3b       
## ----------------------------------------------------
## EQPSd               0.032               0.032       
##                    (0.083)             (0.083)      
##                                                     
## dEQPSd                                 0.0005       
##                                        (0.056)      
##                                                     
## Constant           0.240**             0.240**      
##                    (0.095)             (0.095)      
##                                                     
## ----------------------------------------------------
## Observations         396                 396        
## R2                 0.0004              0.0004       
## Adjusted R2        -0.002              -0.005       
## F Statistic  0.153 (df = 1; 394) 0.076 (df = 2; 393)
## ====================================================
## Note:                    *p<0.1; **p<0.05; ***p<0.01
stargazer(OLS4a, OLS4b, type = "text", object.names = TRUE)
## 
## ============================================================
##                            Dependent variable:              
##              -----------------------------------------------
##                                  Return                     
##                        (1)                     (2)          
##                       OLS4a                   OLS4b         
## ------------------------------------------------------------
## EPSd                2.296***                1.580***        
##                      (0.255)                 (0.292)        
##                                                             
## dEPSd                                       1.517***        
##                                              (0.281)        
##                                                             
## CFOPSd               0.217**                 0.276**        
##                      (0.103)                 (0.138)        
##                                                             
## dCFOPSd                                      -0.081         
##                                              (0.156)        
##                                                             
## EQPSd                 0.073                   0.014         
##                      (0.076)                 (0.075)        
##                                                             
## dEQPSd                                      -0.116**        
##                                              (0.051)        
##                                                             
## Constant              0.076                   0.117         
##                      (0.087)                 (0.084)        
##                                                             
## ------------------------------------------------------------
## Observations           396                     396          
## R2                    0.200                   0.262         
## Adjusted R2           0.194                   0.251         
## F Statistic  32.634*** (df = 3; 392) 23.067*** (df = 6; 389)
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

Note: For each model, the selection and diagnostics tests (presented only for Model1a above) shall be run!