1 Recap the basics of the regression modelling in R

FU902: Seminar 02

2 Read relevant literature

Ball, R., & Brown, P. (1968). An empirical evaluation of accounting income numbers. Journal of Accounting Research, 6(2), 159–178. https://doi.org/10.2307/2490232
Beaver, W. H. (1968). The information content of annual earnings announcements. Journal of Accounting Research, 6, 67–92. https://doi.org/10.2307/2490070
Lev, B. (1989). On the usefulness of earnings and earnings research: lessons and directions from two dcades of empirical research. Journal of Accounting Research, 27, 153–192. https://doi.org/10.2307/2491070
Easton, P. D., & Harris, T. S. (1991). Earnings as an explanatory variable for returns. Journal of Accounting Research, 29(1), 19–36. https://doi.org/10.2307/2491026
Ohlson, J. A. (1995). Earnings, book values, and dividends in equity valuation. Contemporary accounting research, 11(2), 661-687. https://doi.org/10.1111/j.1911-3846.1995.tb00461.x
Feltham, G. A., & Ohlson, J. A. (1995). Valuation and clean surplus accounting for operating and financial activities. Contemporary accounting research, 11(2), 689-731. https://doi.org/10.1111/j.1911-3846.1995.tb00462.x
Ball, R., & Shivakumar, L. (2008). How much new information is there in earnings?. Journal of accounting research, 46(5), 975-1016. https://doi.org/10.1111/j.1475-679X.2008.00299.x
Beisland, L. A. (2009). A review of the value relevance literature. The Open Business Journal, 2(1). https://doi.org/10.2174/1874915100902010007
Filip, A., & Raffournier, B. (2010). The value relevance of earnings in a transition economy: the case of Romania. The International Journal of Accounting, 45(1), 77-103. https://doi.org/10.1016/j.intacc.2010.01.004

3 Learn (value relevance) modelling by example

A. Download and import the datasheet

All files are available at this shared folder
For this Tutorial, download the file “FU902-dataset-02” to your local computer
Set a working directory using the command setwd(“C:/…/…/…/…”); where you have the downloaded file stored and where outputs will be exported (e.g., setwd(“C:/Users/MyPC/Documents/R/Paper01”)
Upload the download file from the working directory (for how to upload data in other formats, like txt, csv, please refer to the R-Basics)

library(readxl)
dataset <- as.data.frame(read_excel("FU902-dataset-02.xlsx"))
attach(dataset)

B. Descriptive statistics

Calculate the descriptive statistics for Earnings, Revenue, Total assets, Equity, Share return, Earnings per share, Cash flows from operations per share, Equity per share

library(psych)
describeBy(subset(dataset, select=c("Revenue", "Earnings", "Total assets","Equity", "Return", "CFO", "EPS","CFOPS","EQPS")))

##              vars   n        mean          sd     median    trimmed        mad
## Revenue         1 396  2890594.80  4668241.79 1066914.00 1814930.11 1519771.75
## Earnings        2 396   247448.28   423529.88  108989.50  197991.79  163160.87
## Total assets    3 396 15742613.96 35685555.49 3695848.00 8273928.19 5436429.56
## Equity          4 395  2425160.12  2825823.79 1440325.00 1940964.89 2068808.18
## Return          5 396        0.27        1.22       0.10       0.12       0.38
## CFO             6 396   530785.74  1082452.79  166278.00  401520.00  312070.99
## EPS             7 396        3.13        8.87       0.64       1.08       0.96
## CFOPS           8 396        4.40       15.55       0.79       1.80       1.41
## EQPS            9 396       18.18       36.61       6.65       9.51       8.81
##                      min          max        range skew kurtosis         se
## Revenue         -5754.00  28769966.00  28775720.00 2.90     9.51  234587.98
## Earnings     -1389358.00   2711539.00   4100897.00 1.16     5.75   21283.18
## Total assets        0.00 220659433.00 220659433.00 4.44    20.54 1793266.64
## Equity         -15923.00  13871914.00  13887837.00 1.47     2.02  142182.63
## Return             -0.84        16.07        16.91 8.79    97.46       0.06
## CFO          -3346000.00   6722000.00  10068000.00 1.63     7.34   54395.30
## EPS                -6.00        62.62        68.61 4.12    17.92       0.45
## CFOPS             -78.81       117.56       196.37 3.20    21.12       0.78
## EQPS               -0.87       319.24       320.11 4.37    23.64       1.84

C. Define models

For each explanatory variable (EPS, CFOPS, EQPS), define a model that regress returns against the level of the variable in the basic model and the level of the variable, as well as the change-for-the-period in the extended model

Model 1A: \[ R_(i,t)= α+β_1* EPS_(i,t)+ε_(i,t) \]
Model 1B: \[ R_(i,t)= α+β_1* EPS_(i,t)+β_2* ∆EPS_(i,t)+ε_(i,t) \]
Model 2A: \[ R_(i,t)= α+β_1* CFOPS_(i,t)+ε_(i,t) \]
Model 2B: \[ R_(i,t)= α+β_1* CFOPS_(i,t)+β_2*∆CFOPS_(i,t)+ε_(i,t) \]
Model 3A: \[ R_(i,t)= α+β_1*EQPS_(i,t)+ε_(i,t) \]
Model 3B: \[ R_(i,t)= α+β_1*EQPS_(i,t) + β_2*∆EQPS_(i,t)+ε_(i,t) \]

Model1a <- (Return ~ EPSd)
Model1b <- (Return ~ EPSd + dEPSd)
Model2a <- (Return ~ CFOPSd)
Model2b <- (Return ~ CFOPSd + dCFOPSd)
Model3a <- (Return ~ EQPSd)
Model3b <- (Return ~ EQPSd + dEQPSd)

Define a combined model that regress returns against the level of all three variables in the basic model, as well as the change-for-the-period in the extended model

Model 4A: \[ R_(i,t)= α+β_1*EPS_(i,t)+ β_2*CFOPS_(i,t)+ β_3*EQPS_(i,t)+ε_(i,t) \]
Model 4B: \[ R_(i,t)= α+β_1*EPS_(i,t)+ β_2*∆EPS_(i,t)+ β_3*CFOPS_(i,t)+β_4*∆CFOPS_(i,t)+ β_5* EQPS_(i,t)+β_6*∆BVPS_(i,t)+ε_(i,t) \]

Model4a <- (Return ~ EPSd + CFOPSd + EQPSd)
Model4b <- (Return ~ EPSd + dEPSd + CFOPSd + dCFOPSd + EQPSd + dEQPSd)

D. Regression model

Suggested reading for the plm package

Croissant, Y., & Millo, G. (2008). Panel Data Econometrics in R: The plm Package. Journal of Statistical Software, 27(2), 1–43. https://doi.org/10.18637/jss.v027.i02

Run the OLS regression, panel data regression with the Fixed Effects (fixed for firm) and with the Random Effects

library(plm)
OLS1a <- plm(Model1a, dataset, model= "pooling")
summary(OLS1a)

## Pooling Model
## 
## Call:
## plm(formula = Model1a, data = dataset, model = "pooling")
## 
## Balanced Panel: n = 33, T = 12, N = 396
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -1.866087 -0.443718 -0.199199  0.093122 11.665962 
## 
## Coefficients:
##             Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept) 0.168126   0.056482  2.9767  0.003094 ** 
## EPSd        2.372685   0.249498  9.5098 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    590.58
## Residual Sum of Squares: 480.33
## R-Squared:      0.18668
## Adj. R-Squared: 0.18462
## F-statistic: 90.4367 on 1 and 394 DF, p-value: < 2.22e-16

FE1a <- plm(Model1a, dataset, effect="individual", model= "within")
summary(FE1a)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = Model1a, data = dataset, effect = "individual", 
##     model = "within")
## 
## Balanced Panel: n = 33, T = 12, N = 396
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -2.881429 -0.337051 -0.081025  0.218799  9.339446 
## 
## Coefficients:
##      Estimate Std. Error t-value  Pr(>|t|)    
## EPSd  3.32292    0.25393  13.086 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    536.65
## Residual Sum of Squares: 364.32
## R-Squared:      0.32113
## Adj. R-Squared: 0.25924
## F-statistic: 171.239 on 1 and 362 DF, p-value: < 2.22e-16

RE1a <- plm(Model1a, dataset, model= "random")
summary(RE1a)

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = Model1a, data = dataset, model = "random")
## 
## Balanced Panel: n = 33, T = 12, N = 396
## 
## Effects:
##                   var std.dev share
## idiosyncratic 1.00640 1.00319  0.96
## individual    0.04144 0.20357  0.04
## theta: 0.1819
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -1.96277 -0.42207 -0.17517  0.11799 11.25359 
## 
## Coefficients:
##             Estimate Std. Error z-value Pr(>|z|)    
## (Intercept) 0.156759   0.066146  2.3699  0.01779 *  
## EPSd        2.641152   0.248727 10.6187  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    572.75
## Residual Sum of Squares: 445.31
## R-Squared:      0.22251
## Adj. R-Squared: 0.22053
## Chisq: 112.757 on 1 DF, p-value: < 2.22e-16

In all three models, Earnings per share are significantly associated with the share return. This is fine as far as for the robustness of the results, but still the best model should be identified.

Selection tests among OLS, FE, RE

Lagrange Multiplier Test - (Breusch-Pagan) to decide between OLS and RE

plmtest(OLS1a, effect = "individual", type = "bp")

## 
##  Lagrange Multiplier Test - (Breusch-Pagan)
## 
## data:  Model1a
## chisq = 43.039, df = 1, p-value = 5.366e-11
## alternative hypothesis: significant effects

F Test for individual effects to decide between OLS and FE

pFtest (FE1a, OLS1a)

## 
##  F test for individual effects
## 
## data:  Model1a
## F = 3.6024, df1 = 32, df2 = 362, p-value = 1.474e-09
## alternative hypothesis: significant effects

Hausman Test to decide between RE and FE

phtest(Model1a, dataset)

## 
##  Hausman Test
## 
## data:  Model1a
## chisq = 177.63, df = 1, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent

Note: Both the FE and RE models are superior to the OLS model. Therefore, the Hausman Test is necessary to decide which of the panel data approach is more preferable. As the p-value is less than 0.05, FE shall be chosen.

Run diagnostic tests

Check for heteroskedasticity (Breusch-Pagan Test)

library(lmtest)
bptest(Model1a, data = dataset)

## 
##  studentized Breusch-Pagan test
## 
## data:  Model1a
## BP = 31.627, df = 1, p-value = 1.868e-08

Check for autocorrelation (Durbin-Watson Test)

dwtest(Model1a, data = dataset)

## 
##  Durbin-Watson test
## 
## data:  Model1a
## DW = 1.8728, p-value = 0.1028
## alternative hypothesis: true autocorrelation is greater than 0

Check for cross-sectional dependence (Perasan CD Test)

pcdtest(Model1a, dataset, index = c("Company", "Year"))

## 
##  Pesaran CD test for cross-sectional dependence in panels
## 
## data:  Return ~ EPSd
## z = 28.694, p-value < 2.2e-16
## alternative hypothesis: cross-sectional dependence

Check for serial correlation in FE panels (Wooldridge’s CD Test)

pwartest(Model1a, dataset, effect = "individual")

## 
##  Wooldridge's test for serial correlation in FE panels
## 
## data:  plm.model
## F = 1.6332, df1 = 1, df2 = 361, p-value = 0.2021
## alternative hypothesis: serial correlation

Note: As the FE model exhibits heteroskedasticity and cross-sectional dependence, it shall be fixed, using clustered (robust) errors. In this case, Driscoll and Kraay estimator (also build in plm package) would be the most suitable

coeftest(FE1a, vcov = vcovSCC)

## 
## t test of coefficients:
## 
##      Estimate Std. Error t value Pr(>|t|)  
## EPSd   3.3229     1.3677  2.4296   0.0156 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note: After the correction, the EPSd is still significant, but at a lower significance level, as the p-value has increased

Highly recommended reading (robust standard error estimators for panel models):

Millo, G. (2017). Robust Standard Error Estimators for Panel Models: A Unifying Approach. Journal of Statistical Software, 82(3), 1–27. https://doi.org/10.18637/jss.v082.i03

Run remaining models

OLS1b <- plm(Model1b, dataset, model= "pooling")
FE1b <- plm(Model1b, dataset, effect="individual", model= "within")
RE1b <- plm(Model1b, dataset, model= "random", random.method="swar")
OLS2a <- plm(Model2a, dataset, model= "pooling")
FE2a <- plm(Model2a, dataset, effect="individual", model= "within")
RE2a <- plm(Model2a, dataset, model= "random", random.method="swar")
OLS2b <- plm(Model2b, dataset, model= "pooling")
FE2b <- plm(Model2b, dataset, effect="individual", model= "within")
RE2b <- plm(Model2b, dataset, model= "random", random.method="swar")
OLS3a <- plm(Model3a, dataset, model= "pooling")
FE3a <- plm(Model3a, dataset, effect="individual", model= "within")
RE3a <- plm(Model3a, dataset, model= "random", random.method="swar")
OLS3b <- plm(Model3b, dataset, model= "pooling")
FE3b <- plm(Model3b, dataset, effect="individual", model= "within")
RE3b <- plm(Model3b, dataset, model= "random", random.method="swar")
OLS4a <- plm(Model4a, dataset, model= "pooling")
FE4a <- plm(Model4a, dataset, effect="individual", model= "within")
RE4a <- plm(Model4a, dataset, model= "random", random.method="swar")
OLS4b <- plm(Model4b, dataset, model= "pooling")
FE4b <- plm(Model4b, dataset, effect="individual", model= "within")
RE4b <- plm(Model4b, dataset, model= "random", random.method="swar")

Presentation of all OLS models

library(stargazer)
stargazer(OLS1a, OLS1b, type = "text", object.names = TRUE)

## 
## ============================================================
##                            Dependent variable:              
##              -----------------------------------------------
##                                  Return                     
##                        (1)                     (2)          
##                       OLS1a                   OLS1b         
## ------------------------------------------------------------
## EPSd                2.373***                1.578***        
##                      (0.249)                 (0.284)        
##                                                             
## dEPSd                                       1.479***        
##                                              (0.277)        
##                                                             
## Constant            0.168***                0.178***        
##                      (0.056)                 (0.055)        
##                                                             
## ------------------------------------------------------------
## Observations           396                     396          
## R2                    0.187                   0.242         
## Adjusted R2           0.185                   0.238         
## F Statistic  90.437*** (df = 1; 394) 62.584*** (df = 2; 393)
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

stargazer(OLS2a, OLS2b, type = "text", object.names = TRUE)

## 
## ===========================================================
##                           Dependent variable:              
##              ----------------------------------------------
##                                  Return                    
##                        (1)                    (2)          
##                       OLS2a                  OLS2b         
## -----------------------------------------------------------
## CFOPSd              0.406***                 0.262*        
##                      (0.108)                (0.149)        
##                                                            
## dCFOPSd                                      0.237         
##                                             (0.168)        
##                                                            
## Constant            0.209***                0.231***       
##                      (0.063)                (0.065)        
##                                                            
## -----------------------------------------------------------
## Observations           396                    396          
## R2                    0.034                  0.039         
## Adjusted R2           0.032                  0.034         
## F Statistic  13.998*** (df = 1; 394) 8.011*** (df = 2; 393)
## ===========================================================
## Note:                           *p<0.1; **p<0.05; ***p<0.01

stargazer(OLS3a, OLS3b, type = "text", object.names = TRUE)

## 
## ====================================================
##                        Dependent variable:          
##              ---------------------------------------
##                              Return                 
##                      (1)                 (2)        
##                     OLS3a               OLS3b       
## ----------------------------------------------------
## EQPSd               0.032               0.032       
##                    (0.083)             (0.083)      
##                                                     
## dEQPSd                                 0.0005       
##                                        (0.056)      
##                                                     
## Constant           0.240**             0.240**      
##                    (0.095)             (0.095)      
##                                                     
## ----------------------------------------------------
## Observations         396                 396        
## R2                 0.0004              0.0004       
## Adjusted R2        -0.002              -0.005       
## F Statistic  0.153 (df = 1; 394) 0.076 (df = 2; 393)
## ====================================================
## Note:                    *p<0.1; **p<0.05; ***p<0.01

stargazer(OLS4a, OLS4b, type = "text", object.names = TRUE)

## 
## ============================================================
##                            Dependent variable:              
##              -----------------------------------------------
##                                  Return                     
##                        (1)                     (2)          
##                       OLS4a                   OLS4b         
## ------------------------------------------------------------
## EPSd                2.296***                1.580***        
##                      (0.255)                 (0.292)        
##                                                             
## dEPSd                                       1.517***        
##                                              (0.281)        
##                                                             
## CFOPSd               0.217**                 0.276**        
##                      (0.103)                 (0.138)        
##                                                             
## dCFOPSd                                      -0.081         
##                                              (0.156)        
##                                                             
## EQPSd                 0.073                   0.014         
##                      (0.076)                 (0.075)        
##                                                             
## dEQPSd                                      -0.116**        
##                                              (0.051)        
##                                                             
## Constant              0.076                   0.117         
##                      (0.087)                 (0.084)        
##                                                             
## ------------------------------------------------------------
## Observations           396                     396          
## R2                    0.200                   0.262         
## Adjusted R2           0.194                   0.251         
## F Statistic  32.634*** (df = 3; 392) 23.067*** (df = 6; 389)
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

Note: For each model, the selection and diagnostics tests (presented only for Model1a above) shall be run!

FU902: Seminar 03

David Procházka

2023-11-14

1 Recap the basics of the regression modelling in R

2 Read relevant literature

3 Learn (value relevance) modelling by example