Sustainability Analysis Based on CSMAR Database and China Securities ESG Rating Data
Project Background
In today’s society, more and more investors are beginning to pay attention to the environmental, social and governance (ESG) performance of enterprises, and the evaluation and analysis of ESG factors has become an indispensable part of investment decisions.
Sustainable finance generally refers to the process of taking appropriate account of environmental (E), social (S) and governance (G) factors in financial sector investment decisions, thereby increasing long-term investment in sustainable economic activities and projects.ESG emphasizes the benefits to the environment and society, as well as the improvement of governance systems. As a cutting-edge concept to effectively balance the needs of high-quality economic, environmental and social development in the era of new economic development, ESG theory and its application system can provide a framework and path reference for enterprises to achieve their own high-quality development, practice responsible investment, and promote low-carbon transformation, and is also a wind vane of the global capital market. This project aims to explore the relationship between ESG ratings and stock market performance using R language and collected data. This report will examine the correlation between a stock’s ESG score and its annual return, and delve into the impact of environmental, social and governance scores on stock market performance. My plan for processing data is shown in Table below.
Data Preparation
As mentioned in the background of the project, I needed to use stock data and ESG rating data, and I consulted the literature extensively, planning to start with China’s A-share market and obtained data from two data sources: CSMAR database and China Securities ESG rating data. Among them, the CSMAR database provides the annual market data of each stock in China’s A-share market (excluding ST shares), including the opening price, closing price, turnover and other indicators. The ESG rating data of China Securities provides the ESG score of each stock each year, including the comprehensive score and the score of E, S and G.
Considering that the two data sets need to be combined for data analysis, I use the ticker symbol (stock) and the year (year) as the variables connecting the two data sets. Use the merge function in R language to connect the ESG score data to the annual market data set and name the new data set merge.csv. It should be noted, however, that before doing this, I noticed that the stock columns in the two original datasets were not in the same format, which can also be seen in the 00 folder, that is, the stocks in stock.xlsx are listed as 1,2, while ESG.xlsx, stocks are listed as 000001.SZ, 000002.SZ, etc. Therefore, I used some operations to convert the stock format into a six-digit code form, and thus realize the merger of the two data sets.
For the combined data set, we need to make sure that each variable has the correct data type. According to the structure of the data set output by the R code, the data type of each variable is as follows:
Integer type (int) : stock, year, industry;
Numeric: open (opening price), close (closing price), volume (volume), value (turnover), circulation_value (circulation market value), total_value (total market value), score (ESG score), E_score (environmental score), S_score (environmental score), Social Score), G_score (Governance Score);
stock year open close volume value circulation_value total_value
1 1 2010 24.52 15.79 7311625122 143000000000 49033611 55028367
2 1 2011 15.82 15.59 5714947128 95739891656 48412542 79873033
3 1 2012 15.59 16.02 4286576838 65565367397 49747836 82076074
4 1 2013 16.32 12.25 17715171071 265000000000 68304798 100417668
5 1 2014 12.12 15.84 21455238348 256000000000 155813520 180970333
6 1 2015 15.99 11.99 36620106027 531000000000 141530614 171561027
company industry grade score E_grade E_score S_grade S_score
1 Ping An Bank Co., Ltd. 1 BBB 81.49 CCC 69.87 BBB 75.07
2 Ping An Bank Co., Ltd. 1 BBB 80.71 CC 60.53 BBB 77.54
3 Ping An Bank Co., Ltd. 1 BBB 84.76 B 72.98 BBB 82.01
4 Ping An Bank Co., Ltd. 1 A 85.16 BB 76.20 A 82.01
5 Ping An Bank Co., Ltd. 1 BBB 84.45 B 72.98 BBB 83.75
6 Ping An Bank Co., Ltd. 1 BBB 83.63 B 72.98 BBB 80.15
G_grade G_score
1 AA 91.51
2 AA 92.55
3 AA 92.29
4 AA 91.62
5 AA 90.41
6 AA 91.14
str(merged_data)
'data.frame': 21312 obs. of 18 variables:
$ stock : int 1 1 1 1 1 1 1 1 1 1 ...
$ year : int 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 ...
$ open : num 24.5 15.8 15.6 16.3 12.1 ...
$ close : num 15.8 15.6 16 12.2 15.8 ...
$ volume : num 7.31e+09 5.71e+09 4.29e+09 1.77e+10 2.15e+10 ...
$ value : num 1.43e+11 9.57e+10 6.56e+10 2.65e+11 2.56e+11 ...
$ circulation_value: num 4.90e+07 4.84e+07 4.97e+07 6.83e+07 1.56e+08 ...
$ total_value : num 5.50e+07 7.99e+07 8.21e+07 1.00e+08 1.81e+08 ...
$ company : chr "Ping An Bank Co., Ltd." "Ping An Bank Co., Ltd." "Ping An Bank Co., Ltd." "Ping An Bank Co., Ltd." ...
$ industry : int 1 1 1 1 1 1 1 1 1 1 ...
$ grade : chr "BBB" "BBB" "BBB" "A" ...
$ score : num 81.5 80.7 84.8 85.2 84.5 ...
$ E_grade : chr "CCC" "CC" "B" "BB" ...
$ E_score : num 69.9 60.5 73 76.2 73 ...
$ S_grade : chr "BBB" "BBB" "BBB" "A" ...
$ S_score : num 75.1 77.5 82 82 83.8 ...
$ G_grade : chr "AA" "AA" "AA" "AA" ...
$ G_score : num 91.5 92.5 92.3 91.6 90.4 ...
EDA Analysis
# calculate year returncalculate_annual_return <-function(data) { data <- data %>%arrange(year) # Arrange data by year data$year_return <-c(NA, diff(data$close)/data$close[-nrow(data)]) # Calculate annual returnsreturn(data)}merged_data <- merged_data %>%group_by(stock) %>%do(calculate_annual_return(.))merged_data$year_return <-ifelse(is.na(merged_data$year_return), 0, merged_data$year_return)write.csv(merged_data, file ='~/Final Project Data/01_data_cleaning/stock_esg.csv')
First, I will calculate the quantitative relationship among ESG composite index, E index, S index and G index ratings to observe the ESG scores of China’s A-share listed companies. From the results shown in the figure below, we can see that the ESG index and S (Social) index are mostly rated as B,BB and CCC, while the E (Environmental) index is mainly rated as C. It is worth noting that the Goverance index is mostly ahead of the BBB index. This means that many companies have poor performance in environmental protection and need to strengthen environmental awareness and practices; The deficiencies in social responsibility and social impact require more social input and improvement. While most companies are doing relatively well in terms of governance structure and management, there is still room for improvement.
# histlibrary(ggplot2)library(gridExtra)
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
plot_rating_histogram <-function(data, rating_col, title) {ggplot(data, aes(x =!!as.symbol(rating_col), fill =!!as.symbol(rating_col))) +geom_bar() +labs(title = title, x = rating_col, y ="Frequency") +theme_minimal() +theme(legend.position ="none")}esg_histogram <-plot_rating_histogram(merged_data, "grade", "ESG Composite Rating")e_histogram <-plot_rating_histogram(merged_data, "E_grade", "Environmental (E) Rating")s_histogram <-plot_rating_histogram(merged_data, "S_grade", "Social (S) Rating")g_histogram <-plot_rating_histogram(merged_data, "G_grade", "Governance (G) Rating")combined_plot <-grid.arrange(esg_histogram, e_histogram, s_histogram, g_histogram, nrow =2, ncol =2, top ="Distribution of ESG Ratings")
In the second step, I drew the correlation coefficient heat map of each numerical indicator to detect whether there is correlation between various variables, so as to avoid the occurrence of regression analysis multicollinearity in the following steps. In the following steps, I will use G_score, S_score,E_score and score as independent variables, and year_return,volume and circulation_value as dependent variables. Looking at the impact of ESG rating index on stock return rate, trading volume and current market value, it can be noticed from the heat map that among the four independent variables mentioned above, score is highly correlated with the other three independent variables. The reason is that score is weighted by the scores of the other three indexes, while the correlation among the other three indexes is low. So it can be used for regression analysis.
Then, I aggregate the data according to industry classification, calculate the average rate of return of each industry and the average of each index in each year, and then draw the annual change curve. The results can be seen in the following set of graphs. It can be seen that the difference between the five charts is obvious, not only the trend, but also the performance of different industries in different indexes have different rankings.
Before analyzing the results, I would like to explain the classification of industries. In the table, I used numerical serial numbers to represent each industry for easy classification. 1-6 represent: including Finance, Utilities, Properties, Conglomerates, Industrials, Commerce six major industries.
It can be seen that the yields of the six industries have remained almost consistent in the intermediate process, and by the end of 2021, each industry has different results. In the composite index, Finance and Properties have higher scores, while Conglomerates and Utilities have the lowest overall scores, which has a lot to do with the nature of their work and the environment. From the perspective of each grading index, the E index increases year by year, which means that listed companies pay more and more attention to environmental Governance, while the governance score decreases year by year until 2019, which may be because the score is very high at the beginning, and the stability and improvement of this part is ignored later.
Through the above EDA analysis, we can see some relationships between ESG index and industry, time, and market. In the next part we will do regression analysis using indices and quotations to try to find more meaningful results.
<ggproto object: Class ScaleDiscrete, Scale, gg>
aesthetics: colour
axis_order: function
break_info: function
break_positions: function
breaks: waiver
call: call
clone: function
dimension: function
drop: TRUE
expand: waiver
get_breaks: function
get_breaks_minor: function
get_labels: function
get_limits: function
guide: legend
is_discrete: function
is_empty: function
labels: Finance Utilities Properties Conglomerates Industrials C ...
limits: function
make_sec_title: function
make_title: function
map: function
map_df: function
n.breaks.cache: NULL
na.translate: TRUE
na.value: grey50
name: waiver
palette: function
palette.cache: NULL
position: left
range: environment
rescale: function
reset: function
scale_name: manual
train: function
train_df: function
transform: function
transform_df: function
super: <ggproto object: Class ScaleDiscrete, Scale, gg>
`summarise()` has grouped output by 'industry'. You can override using the
`.groups` argument.
line_chart <-ggplot(industry_scores, aes(x = year, y = mean_score, color =factor(industry))) +geom_line(size =1.5, alpha =0.8) +labs(title ="Mean Governance Score by Industry", x ="Year", y ="Mean Governance Score", color ="Industry") +scale_color_manual(values = industry_colors, labels = industry_labels) +theme_minimal() +theme(legend.position ="right", plot.title =element_text(size =16, hjust =0.5, face ="bold"), axis.title.x =element_text(size =14, face ="bold"), axis.title.y =element_text(size =14, face ="bold"),axis.text =element_text(size =12),panel.grid.major =element_line(color ="lightgray"), panel.grid.minor =element_blank()) print(line_chart)
Regression Analysis
As mentioned above, I used four indexes as independent variables to carry out unary regression analysis on annual return rate, trading volume and circulating market value respectively, and the results are as follows:
Call:
lm(formula = circulation_value ~ score + E_score + S_score +
G_score, data = merged_data)
Residuals:
Min 1Q Median 3Q Max
-52715366 -15682548 -8388972 268318 2556818573
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -117717402 6032807 -19.513 < 2e-16 ***
score 2727645 530095 5.146 2.69e-07 ***
E_score 210922 140343 1.503 0.13288
S_score -487195 163024 -2.988 0.00281 **
G_score -524989 250484 -2.096 0.03610 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 69200000 on 21307 degrees of freedom
Multiple R-squared: 0.02481, Adjusted R-squared: 0.02463
F-statistic: 135.5 on 4 and 21307 DF, p-value: < 2.2e-16
It can be seen that Adjusted R-squared is only 0.0032 for the annual rate of return, indicating that the independent variable cannot explain the variation of the annual rate of return. In the significance test, the p-value of E_score variable is significantly less than 0.05 (p = 0.0221), indicating that E_score may have a significant impact on the annual return rate. For the transaction volume, the p-value of all independent variables is significantly less than 0.05, indicating that score, E_score, S_score and G_score may have significant effects on the transaction volume. For the circulating market value, the p-value of score and S_score variables is significantly less than 0.05 (p = 2.69e-07 and p = 0.00281, respectively), indicating that score and S_score may have a significant impact on the circulating market value.
In general, the impact of ESG rating on yield is low, but it has a high impact on trading volume and circulating market value. Since the return rate is determined by a variety of factors, it can be seen that the ESG rating score will indeed have a great impact on the trading volume and circulating market value, that is, traders will be judged by the ESG index and will change the number of trading volumes of individual stocks.
Target Audience and Conclusion
The target audience for this report includes:
Individuals or institutions seeking to make informed investment decisions, taking into account ESG criteria and their potential impact on financial performance.
Financial analysts who analyze stock market trends, evaluate company performance, and advise investors on portfolio management strategies.
Business leaders interested in understanding how ESG performance affects company valuations, investor perceptions, and access to capital, as well as government officials and regulators relevant to promoting sustainable economic development, enhancing corporate transparency, and fostering responsible business practices.
Scholars who study the intersection of finance, sustainability, and corporate governance, seeking empirical relationships between ESG factors and market outcomes. In summary, while the influence of ESG ratings on annual stock returns was limited, they significantly affected trading volume and market capitalization. This suggests that investors and traders consider ESG factors when making investment decisions, potentially leading to changes in trading behavior.
In conclusion, the topic of relationship between ESG ratings and stock market performance was explored with data collection and analysis. It matters to sustainable finance because understanding how ESG factors correlate with stock market performance can inform investment decisions, encouraging long-term investment in sustainable economic activities and projects. In general, the impact of ESG rating on yield is low, but it has a high impact on trading volume and circulating market value. Since the return rate is determined by a variety of factors, the ESG rating score will indeed have a significant impact on the trading volume and circulating market value, that is, traders will judge it by the ESG index and adjust trading strategy.
Overall, the project provides insights into the relationship between ESG ratings and stock market performance, contributing to the understanding of sustainable finance and responsible investment practices.