1 Data importing

library(readxl)
X7_Listed_Companies_ESE <- read_excel("C:/Users/Fakudze/downloads/7 Listed Companies - ESE.xlsx")
View(X7_Listed_Companies_ESE)

2 Sample data examination

2.1 Examining the number of observations and variables

library(tidyverse)
dim(X7_Listed_Companies_ESE)

## [1] 35  6

The data sample has 35 rows and 6 columns

2.2 Examining the data types

glimpse(X7_Listed_Companies_ESE)

## Rows: 35
## Columns: 6
## $ Firm <chr> "NEDBANK", "NEDBANK", "NEDBANK", "NEDBANK", "NEDBANK", "RES", "RE…
## $ Year <dbl> 2019, 2020, 2021, 2022, 2023, 2019, 2020, 2021, 2022, 2023, 2019,…
## $ ROE  <dbl> 14.00000000, 9.70000000, 14.40000000, 16.00000000, 17.00000000, 1…
## $ SDTA <dbl> 81.7106734, 87.1510726, 80.2505882, 81.7817823, 78.3318297, 16.15…
## $ LDTA <dbl> 1.9053492, 0.3492974, 4.2922815, 2.1881426, 5.3314670, 18.2492308…
## $ DTE  <dbl> 510.3524038, 700.0236787, 546.9506171, 523.8273926, 512.1186051, …

The sample data has 4numeric variables and one character variable

2.3 Examining the availability of NA values

colSums(is.na(X7_Listed_Companies_ESE))

## Firm Year  ROE SDTA LDTA  DTE 
##    0    0    0    0    0    0

sum(X7_Listed_Companies_ESE == " ")

## [1] 0

There are no NA and blank values in the sample

3 NORMALITY TESTING (SHAPIRO TEST)

3.1 Normality test for ROE

library(stats)
shapiro_test_ROE <- shapiro.test(X7_Listed_Companies_ESE$ROE)
print(shapiro_test_ROE)

## 
##  Shapiro-Wilk normality test
## 
## data:  X7_Listed_Companies_ESE$ROE
## W = 0.96939, p-value = 0.4268

The p-value = 0.4268 is greater than the significance level. This indicates that the data is normally distributed

3.2 Visual distribution of ROE

hist(X7_Listed_Companies_ESE$ROE, 
     main = "Histogram of ROE", xlab = "ROE")

3.3 Normality test for SDTA

shapiro_test_SDTA <- shapiro.test(X7_Listed_Companies_ESE$SDTA)
print(shapiro_test_SDTA)

## 
##  Shapiro-Wilk normality test
## 
## data:  X7_Listed_Companies_ESE$SDTA
## W = 0.78096, p-value = 8.827e-06

The p-value = 8.827e-06 is less than the significance level of 0.05. This indicates that the data is not normally distributed.

3.4 Visual distribution of SDTA

hist(X7_Listed_Companies_ESE$SDTA, 
     main = "Histogram of SDTA", xlab = "SDTA")

3.5 Normality test for LDTA

shapiro_test_LDTA <- shapiro.test(X7_Listed_Companies_ESE$LDTA)
print(shapiro_test_LDTA)

## 
##  Shapiro-Wilk normality test
## 
## data:  X7_Listed_Companies_ESE$LDTA
## W = 0.77905, p-value = 8.133e-06

The p-value = 8.133e-06 is less than the significance level of 0.05. This indicates that the data is not normally distributed.

3.6 Visual distribution for LDTA

hist(X7_Listed_Companies_ESE$LDTA,
     main = "Histogram of LDTA", xlab = "LDTA")

3.7 Normality test for DTE

shapiro_test_DTE <- shapiro.test(X7_Listed_Companies_ESE$DTE)
print(shapiro_test_DTE)

## 
##  Shapiro-Wilk normality test
## 
## data:  X7_Listed_Companies_ESE$DTE
## W = 0.73732, p-value = 1.492e-06

The p-value = 1.492e-06 is less than the significance level of 0.05. This indicates that the data is not normally distributed.

3.8 Visual distribution of DTE

hist(X7_Listed_Companies_ESE$DTE,
     main = "Histogram for DTE", xlab = "DTE")

4 Multicoloniality tests for the independent variables (fit a linear model)

library(car)

model <- lm(ROE ~ SDTA + LDTA + DTE, data = X7_Listed_Companies_ESE)

vif_results <- vif(model)
print(vif_results)

##      SDTA      LDTA       DTE 
##  9.496595  1.450525 10.573011

The VIF of 9.496595 for the SDTA indicates a moderate to high correlation to the other independent variables. The VIF of 1.450595 for the LDTA indicates that it does not correlate with the other independent variables. On the other hand, the VIF value 0f 10.573011 for the DTE indicates very high multicollinearity to the other independent variables and it might need to be removed from the model.

5 AUTOCORRELATION TESTING

library(lmtest)
lm_auto <- lm(ROE ~ SDTA + LDTA + DTE, data = X7_Listed_Companies_ESE)

dw_test <- dwtest(lm_auto)
print(dw_test)

## 
##  Durbin-Watson test
## 
## data:  lm_auto
## DW = 1.1955, p-value = 0.001504
## alternative hypothesis: true autocorrelation is greater than 0

The dw value of 1.1955 indicates a positive autocorrelation. This implies that a positive error in one observation is followed by another in subsequent observations. This calls for the application of Generalized Least Squares procedure to correct the autocorrelation.

6 HETEROSCEDASTICITY TESTING

6.1 fit a linear model

library(lmtest)
library(car)

lm_model <- lm(ROE ~ SDTA + LDTA + DTE, data = X7_Listed_Companies_ESE)

6.2 Perform a Breusch-Pagan test

Bp_test <- bptest(lm_model)
print(Bp_test)

## 
##  studentized Breusch-Pagan test
## 
## data:  lm_model
## BP = 4.6498, df = 3, p-value = 0.1993

The p-value of 0.1993 indicates no evidence of heteroscedasticity (it is greater than 0.05). The variance of the errors/residuals is not constant across all levels of the independent variables. The results of our model are more rebust, they are less sensitive to the changes in the model or data.

Regression Analysis

Celumusa Fakudze

2026-05-26