final_Linyi

Exploratory Data Analysis

The question I wanna answer is how green finance has impacted green development in China. The green development included the performance of polluting industries and the energy consumption.

library(here)
here() starts at /Users/zhenglinyi/Desktop/24 spring/sustainable finance/final paper
library(ggplot2)
library(readr)
# Load necessary libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ stringr   1.5.1
✔ forcats   1.0.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the datasets
energy_data <- here("03_data_processed", "China_energy.csv") |> 
  read_csv()
Rows: 200 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): indicator
dbl (2): year, value

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
environment_data <-  here("03_data_processed", "China_environment.csv") |> 
  read_csv()
Rows: 80 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): indicator
dbl (2): year, value

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
performance_data <- here("03_data_processed", "China_performance.csv") |> 
  read_csv()
Rows: 672 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): pfmc_name, indicator
dbl (2): year, value

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Inspect the structure of the datasets
str(energy_data)
spc_tbl_ [200 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ indicator: chr [1:200] "Growth Rate of GDP (%)" "Growth Rate of GDP (%)" "Growth Rate of GDP (%)" "Growth Rate of GDP (%)" ...
 $ year     : num [1:200] 2002 2003 2004 2005 2006 ...
 $ value    : num [1:200] 9.1 10 10.1 11.4 12.7 14.2 9.7 9.4 10.6 9.6 ...
 - attr(*, "spec")=
  .. cols(
  ..   indicator = col_character(),
  ..   year = col_double(),
  ..   value = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
str(environment_data)
spc_tbl_ [80 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ indicator: chr [1:80] "Total Investment in Environmental Pollution Control (100 million yuan)" "Total Investment in Environmental Pollution Control (100 million yuan)" "Total Investment in Environmental Pollution Control (100 million yuan)" "Total Investment in Environmental Pollution Control (100 million yuan)" ...
 $ year     : num [1:80] 2002 2003 2004 2005 2006 ...
 $ value    : num [1:80] 1456 1750 2058 2565 2780 ...
 - attr(*, "spec")=
  .. cols(
  ..   indicator = col_character(),
  ..   year = col_double(),
  ..   value = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
str(performance_data)
spc_tbl_ [672 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ pfmc_name: chr [1:672] "Rate of Return on Net Assets (%)" "Rate of Return on Net Assets (%)" "Rate of Return on Net Assets (%)" "Rate of Return on Net Assets (%)" ...
 $ indicator: chr [1:672] "Petroleum and Petrochemical Industry" "Petroleum and Petrochemical Industry" "Petroleum and Petrochemical Industry" "Petroleum and Petrochemical Industry" ...
 $ year     : num [1:672] 2008 2009 2010 2011 2012 ...
 $ value    : num [1:672] 11.9 6.5 1.2 6.77 6.84 4.24 5.08 2.38 -5.2 -5.3 ...
 - attr(*, "spec")=
  .. cols(
  ..   pfmc_name = col_character(),
  ..   indicator = col_character(),
  ..   year = col_double(),
  ..   value = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
# Interpolate missing values for a smooth line plot (if appropriate)
energy_data <- energy_data %>%
  group_by(indicator) %>%
  mutate(value = ifelse(is.na(value), approx(year, value, year)$y, value))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `value = ifelse(is.na(value), approx(year, value, year)$y,
  value)`.
ℹ In group 7: `indicator = "Total Energy Consumption (tce/10,000 yuan)"`.
Caused by warning in `regularize.values()`:
! collapsing to unique 'x' values
# Plot the energy data, this time with the missing values interpolated
ggplot(energy_data, aes(x = year, y = value, color = indicator)) +
  geom_line() +
  labs(title = "Trends in Energy Production and Consumption", x = "Year", y = "Value")
Warning: Removed 4 rows containing missing values (`geom_line()`).

# Impute missing values with median
environment_data_imputed <- environment_data %>%
  group_by(indicator) %>%
  mutate(value = ifelse(is.na(value), median(value, na.rm = TRUE), value))

# Create the stacked area chart with the imputed data
environment_data_imputed %>%
  ggplot(aes(x = year, y = value, fill = indicator)) +
  geom_area(position = 'stack') +
  theme_minimal() +
  labs(title = "Investment in Environmental Pollution Control Over Time",
       x = "Year",
       y = "Investment (100 million yuan)",
       fill = "Indicator")

# Analyze performance metrics in a specific industry, e.g., Power Generation Industry
performance_data %>%
  filter(indicator == "Power Generation Industry") %>%
  ggplot(aes(x = year, y = value, color = pfmc_name)) +
  geom_line() +
  labs(title = "Performance Metrics in Power Generation Industry",
       x = "Year",
       y = "Performance Value")

  1. Trends in Energy Production and Consumption:

    Increasing Energy Production and Consumption per Capita: This could indicate that the energy sector is expanding, potentially with green finance contributing to the development of more energy resources, possibly including renewable energy.

    Stable Growth Rates: Fluctuations in the growth rate of energy consumption and production might reflect changes in investment focus or economic conditions. Consistent or increasing investment in green energy could smooth out extreme fluctuations if it leads to a stable supply of renewable energy.

    Flat Total Efficiency: The stagnation in total efficiency suggests that, despite potential investments in green technology, significant gains in energy efficiency may not yet be realized, or measurement methods may not capture efficiency improvements from green finance initiatives.

    Flat Total Energy Consumption: If this metric refers to a total figure rather than per capita, the flat line could indicate that increased efficiency or shifts to renewable sources are balancing out increases in per capita consumption.

  2. Investment in Environmental Pollution Control Over Time:

    Increasing Investment: The growth in investment in environmental pollution control suggests that there is a focus on sustainability, potentially influenced by green finance. This could reflect investments in cleaner production technologies, pollution control measures, or environmental restoration projects.

    Sharp Decline in the Last Year: A sharp decrease may indicate a change in policy, a reduction in available green finance, or external economic factors impacting investment.

  3. Performance Metrics in Power Generation Industry:

    Variable Profitability Metrics: The volatility in profitability and return on investment metrics could reflect the challenges the power generation industry faces in transitioning to green technology. Initial investments in green development may not yield immediate financial returns but can be expected to improve over time.

    Flat Technological Investment Ratio: If green finance is directed toward technology, a flat investment ratio might suggest that such investments are not keeping pace with the growth of the industry or that other areas are being prioritized.

The increase in energy consumption and production per capita, along with the increasing investment in pollution control, can be seen as signs of development. However, the lack of significant efficiency gains may point to a need for further investment or more effective deployment of green finance. It’s also important to note that while green finance may support growth in clean energy and pollution control, it can take time for investments to translate into observable efficiency improvements and performance gains in industries like power generation.