The first step is to install packages and load data in file raw_data in this data_cleaning file.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.1.8
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(dplyr)
library(here)
## Warning: package 'here' was built under R version 4.2.3
## here() starts at C:/Users/Runnan/Documents/sustainable finance/Final Project
library(readxl)
## Warning: package 'readxl' was built under R version 4.2.3
library(qmd)
## Warning: package 'qmd' was built under R version 4.2.3
library(readxl)
#this step is to load data_list I saved in the first file to this file so that I can directly use data sets without importing them again#
print("Reading data_list.rds file")
## [1] "Reading data_list.rds file"
data_list <- readRDS("C:/Users/Runnan/Documents/sustainable finance/Final Project/data_list.rds")
# To check if the names are correct#
names(data_list)
## [1] "Historical_GHG_Emission.xlsx"
## [2] "Electricity.xlsx"
## [3] "Energy_use"
## [4] "Infrastructure_Economic_Accounts.xlsx"
## [5] "Population.xlsx"
## [6] "PPPGDP.xlsx"
## [7] "Supply_Demand_NG.xlsx"
data_list <- readRDS("C:/Users/Runnan/Documents/sustainable finance/Final Project/data_list.rds")
print(data_list)
## $Historical_GHG_Emission.xlsx
## # A tibble: 14 × 34
## Country Source Sector Gas `1990` `1991` `1992` `1993` `1994` `1995` `1996`
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CAN Climat… Total… All … 540. 536. 551. 547. 565. 580. 598.
## 2 CAN Climat… Total… All … 607. 603. 617. 613. 631. 646. 668.
## 3 CAN Climat… Energy All … 446. 442. 456. 454. 469. 482. 497.
## 4 CAN Climat… Indus… All … 29.1 28.5 26.8 24.9 27.2 27.1 27.6
## 5 CAN Climat… Agric… All … 47.7 48.1 49.3 49.8 49.9 52.6 55.0
## 6 CAN Climat… Waste All … 17.7 18.0 18.2 18.4 18.4 18.4 18.3
## 7 CAN Climat… Land-… All … 66.5 66.5 66.5 66.5 66.5 66.8 70.8
## 8 CAN Climat… Bunke… All … 5.68 5.53 5.82 5.27 5.60 5.86 6.11
## 9 CAN Climat… Elect… All … 139. 139. 146. 138. 143. 148. 149.
## 10 CAN Climat… Build… All … 64.6 64.1 65.9 69.1 70.3 69.3 74.8
## 11 CAN Climat… Manuf… All … 77.0 75.0 74.0 73.9 77.2 78.3 81.8
## 12 CAN Climat… Trans… All … 129. 125. 129. 131. 137. 141. 144.
## 13 CAN Climat… Other… All … 7.38 7.43 8.03 8.38 8.47 8.98 9.54
## 14 CAN Climat… Fugit… All … 29.7 31.2 33.0 33.1 33.2 35.9 36.8
## # … with 23 more variables: `1997` <dbl>, `1998` <dbl>, `1999` <dbl>,
## # `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, `2004` <dbl>,
## # `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>, `2009` <dbl>,
## # `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>,
## # `2015` <dbl>, `2016` <dbl>, `2017` <dbl>, `2018` <dbl>, `2019` <dbl>
##
## $Electricity.xlsx
## # A tibble: 3,809 × 16
## REF_DATE GEO DGUID Source Fuel …¹ UOM UOM_ID SCALA…² SCALA…³ VECTOR
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 1995 Canada 2016A0000… Total… Total … Giga… 146 units 0 v4171…
## 2 1995 Canada 2016A0000… Total… Coal Giga… 146 units 0 v4171…
## 3 1995 Canada 2016A0000… Total… Natura… Giga… 146 units 0 v4171…
## 4 1995 Canada 2016A0000… Total… Diesel… Giga… 146 units 0 v4171…
## 5 1995 Canada 2016A0000… Total… Heavy … Giga… 146 units 0 v4171…
## 6 1995 Canada 2016A0000… Total… Wood a… Giga… 146 units 0 v4171…
## 7 1995 Canada 2016A0000… Total… Other … Giga… 146 units 0 v4171…
## 8 1995 Canada 2016A0000… Utili… Total … Giga… 146 units 0 v4171…
## 9 1995 Canada 2016A0000… Utili… Coal Giga… 146 units 0 v4171…
## 10 1995 Canada 2016A0000… Utili… Natura… Giga… 146 units 0 v4171…
## # … with 3,799 more rows, 6 more variables: COORDINATE <chr>, VALUE <dbl>,
## # STATUS <chr>, SYMBOL <lgl>, TERMINATED <lgl>, DECIMALS <dbl>, and
## # abbreviated variable names ¹`Fuel type`, ²SCALAR_FACTOR, ³SCALAR_ID
##
## $Energy_use
## # A tibble: 240 × 16
## REF_DATE GEO DGUID Physi…¹ Goods…² UOM UOM_ID SCALA…³ SCALA…⁴ VECTOR
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 2012 Canada 2021A000… Direct… Total,… Tera… 283 units 0 v1456…
## 2 2012 Canada 2021A000… Direct… Total,… Tera… 283 units 0 v1456…
## 3 2012 Canada 2021A000… Direct… Total,… Tera… 283 units 0 v1456…
## 4 2012 Canada 2021A000… Direct… Clean … Tera… 283 units 0 v1456…
## 5 2012 Canada 2021A000… Direct… Electr… Tera… 283 units 0 v1456…
## 6 2012 Canada 2021A000… Direct… Electr… Tera… 283 units 0 v1456…
## 7 2012 Canada 2021A000… Direct… Biofue… Tera… 283 units 0 v1456…
## 8 2012 Canada 2021A000… Direct… Waste … Tera… 283 units 0 v1456…
## 9 2012 Canada 2021A000… Direct… Waste … Tera… 283 units 0 v1456…
## 10 2012 Canada 2021A000… Direct… Total,… Tera… 283 units 0 v1456…
## # … with 230 more rows, 6 more variables: COORDINATE <chr>, VALUE <dbl>,
## # STATUS <lgl>, SYMBOL <lgl>, TERMINATED <lgl>, DECIMALS <dbl>, and
## # abbreviated variable names ¹`Physical flow`,
## # ²`Goods and services (products)`, ³SCALAR_FACTOR, ⁴SCALAR_ID
##
## $Infrastructure_Economic_Accounts.xlsx
## # A tibble: 100,800 × 17
## REF_DATE GEO DGUID Estim…¹ Impact Asset UOM UOM_ID SCALA…² SCALA…³ VECTOR
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 2009 Cana… 2016… Greenh… Total… Tota… Tonn… 287 thousa… 3 v1273…
## 2 2009 Cana… 2016… Greenh… Total… Comm… Tonn… 287 thousa… 3 v1273…
## 3 2009 Cana… 2016… Greenh… Total… Spor… Tonn… 287 thousa… 3 v1273…
## 4 2009 Cana… 2016… Greenh… Total… Indo… Tonn… 287 thousa… 3 v1273…
## 5 2009 Cana… 2016… Greenh… Total… Stud… Tonn… 287 thousa… 3 v1273…
## 6 2009 Cana… 2016… Greenh… Total… Airp… Tonn… 287 thousa… 3 v1273…
## 7 2009 Cana… 2016… Greenh… Total… Comm… Tonn… 287 thousa… 3 v1273…
## 8 2009 Cana… 2016… Greenh… Total… Inst… Tonn… 287 thousa… 3 v1273…
## 9 2009 Cana… 2016… Greenh… Total… Scho… Tonn… 287 thousa… 3 v1273…
## 10 2009 Cana… 2016… Greenh… Total… Hosp… Tonn… 287 thousa… 3 v1273…
## # … with 100,790 more rows, 6 more variables: COORDINATE <chr>, VALUE <dbl>,
## # STATUS <lgl>, SYMBOL <lgl>, TERMINATED <lgl>, DECIMALS <dbl>, and
## # abbreviated variable names ¹Estimate, ²SCALAR_FACTOR, ³SCALAR_ID
##
## $Population.xlsx
## # A tibble: 266 × 36
## Country N…¹ Count…² Indic…³ Indic…⁴ `1990` `1991` `1992` `1993` `1994` `1995`
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Aruba ABW Popula… SP.POP… 6.57e4 6.79e4 7.02e4 7.24e4 7.47e4 7.70e4
## 2 Africa Eas… AFE Popula… SP.POP… 3.10e8 3.19e8 3.27e8 3.36e8 3.44e8 3.53e8
## 3 Afghanistan AFG Popula… SP.POP… 1.07e7 1.07e7 1.21e7 1.40e7 1.55e7 1.64e7
## 4 Africa Wes… AFW Popula… SP.POP… 2.07e8 2.12e8 2.18e8 2.24e8 2.30e8 2.36e8
## 5 Angola AGO Popula… SP.POP… 1.18e7 1.22e7 1.26e7 1.30e7 1.35e7 1.39e7
## 6 Albania ALB Popula… SP.POP… 3.29e6 3.27e6 3.25e6 3.23e6 3.21e6 3.19e6
## 7 Andorra AND Popula… SP.POP… 5.36e4 5.54e4 5.73e4 5.92e4 6.10e4 6.29e4
## 8 Arab World ARB Popula… SP.POP… 2.26e8 2.31e8 2.37e8 2.43e8 2.50e8 2.56e8
## 9 United Ara… ARE Popula… SP.POP… 1.90e6 2.01e6 2.12e6 2.22e6 2.33e6 2.43e6
## 10 Argentina ARG Popula… SP.POP… 3.26e7 3.31e7 3.36e7 3.40e7 3.45e7 3.49e7
## # … with 256 more rows, 26 more variables: `1996` <dbl>, `1997` <dbl>,
## # `1998` <dbl>, `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>,
## # `2003` <dbl>, `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
## # `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
## # `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>,
## # `2018` <dbl>, `2019` <dbl>, `2020` <dbl>, `2021` <dbl>, and abbreviated
## # variable names ¹`Country Name`, ²`Country Code`, ³`Indicator Name`, …
##
## $PPPGDP.xlsx
## # A tibble: 269 × 33
## Data Sour…¹ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Last Updat… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 Country Na… 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
## 4 Aruba 1363… 1522… 1648… 1810… 2001… 2095… 2158… 2350… 2424… 2489… 2739…
## 5 Africa Eas… 5624… 5821… 5843… 5973… 6217… 6632… 7125… 7562… 7789… 8123… 8596…
## 6 Afghanistan <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 7 Africa Wes… 3614… 3784… 3969… 4022… 4105… 4281… 4564… 4845… 5076… 5230… 5541…
## 8 Angola 3866… 4036… 3887… 3025… 3131… 3676… 4251… 4638… 4911… 5089… 5363…
## 9 Albania 8379… 6237… 5920… 6640… 7345… 8498… 9442… 8556… 9451… 1079… 1192…
## 10 Andorra <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## # … with 259 more rows, 21 more variables: ...13 <chr>, ...14 <chr>,
## # ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
## # ...20 <chr>, ...21 <chr>, ...22 <chr>, ...23 <chr>, ...24 <chr>,
## # ...25 <chr>, ...26 <chr>, ...27 <chr>, ...28 <chr>, ...29 <chr>,
## # ...30 <chr>, ...31 <chr>, ...32 <chr>, ...33 <chr>, and abbreviated
## # variable name ¹`Data Source`
##
## $Supply_Demand_NG.xlsx
## # A tibble: 52,927 × 17
## REF_DATE GEO DGUID Fuel …¹ Sector Measu…² UOM UOM_ID SCALA…³ SCALA…⁴
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl>
## 1 1995 Canada 2016A000… Gas pl… Produ… Megali… Mega… 208 units 0
## 2 1995 Canada 2016A000… Gas pl… Produ… Terajo… Tera… 283 units 0
## 3 1995 Canada 2016A000… Gas pl… Expor… Megali… Mega… 208 units 0
## 4 1995 Canada 2016A000… Gas pl… Expor… Terajo… Tera… 283 units 0
## 5 1995 Canada 2016A000… Gas pl… Impor… Megali… Mega… 208 units 0
## 6 1995 Canada 2016A000… Gas pl… Impor… Terajo… Tera… 283 units 0
## 7 1995 Canada 2016A000… Gas pl… Inter… Megali… Mega… 208 units 0
## 8 1995 Canada 2016A000… Gas pl… Inter… Terajo… Tera… 283 units 0
## 9 1995 Canada 2016A000… Gas pl… Stock… Megali… Mega… 208 units 0
## 10 1995 Canada 2016A000… Gas pl… Stock… Terajo… Tera… 283 units 0
## # … with 52,917 more rows, 7 more variables: VECTOR <chr>, COORDINATE <chr>,
## # VALUE <dbl>, STATUS <chr>, SYMBOL <lgl>, TERMINATED <lgl>, DECIMALS <dbl>,
## # and abbreviated variable names ¹`Fuel type`, ²Measures, ³SCALAR_FACTOR,
## # ⁴SCALAR_ID
# An example to show that the data in the first file has been correctly load in this one#
print(data_list$Historical_GHG_Emission)
## # A tibble: 14 × 34
## Country Source Sector Gas `1990` `1991` `1992` `1993` `1994` `1995` `1996`
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CAN Climat… Total… All … 540. 536. 551. 547. 565. 580. 598.
## 2 CAN Climat… Total… All … 607. 603. 617. 613. 631. 646. 668.
## 3 CAN Climat… Energy All … 446. 442. 456. 454. 469. 482. 497.
## 4 CAN Climat… Indus… All … 29.1 28.5 26.8 24.9 27.2 27.1 27.6
## 5 CAN Climat… Agric… All … 47.7 48.1 49.3 49.8 49.9 52.6 55.0
## 6 CAN Climat… Waste All … 17.7 18.0 18.2 18.4 18.4 18.4 18.3
## 7 CAN Climat… Land-… All … 66.5 66.5 66.5 66.5 66.5 66.8 70.8
## 8 CAN Climat… Bunke… All … 5.68 5.53 5.82 5.27 5.60 5.86 6.11
## 9 CAN Climat… Elect… All … 139. 139. 146. 138. 143. 148. 149.
## 10 CAN Climat… Build… All … 64.6 64.1 65.9 69.1 70.3 69.3 74.8
## 11 CAN Climat… Manuf… All … 77.0 75.0 74.0 73.9 77.2 78.3 81.8
## 12 CAN Climat… Trans… All … 129. 125. 129. 131. 137. 141. 144.
## 13 CAN Climat… Other… All … 7.38 7.43 8.03 8.38 8.47 8.98 9.54
## 14 CAN Climat… Fugit… All … 29.7 31.2 33.0 33.1 33.2 35.9 36.8
## # … with 23 more variables: `1997` <dbl>, `1998` <dbl>, `1999` <dbl>,
## # `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, `2004` <dbl>,
## # `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>, `2009` <dbl>,
## # `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>,
## # `2015` <dbl>, `2016` <dbl>, `2017` <dbl>, `2018` <dbl>, `2019` <dbl>
For this project, there are currently 7 data sets. Some of them are global data, others are country data only for Canada. The first step is to adjust different data sets into a similar format so that it is easier to compare or joint for future calculation.
Historical_GHG_Emission is a country level data, it contains the historical GHG emission in Canada from year 1990 to 2019. Right now, it is a width data, so it has to be converted to pivot long format. Also, the data set contains total GHG emission with and without LUCF, as the purpose for this project is to identify the energy use and emissions, I delete the inclusive portion of GHG emissions and other titles which are unnecessary. The table “Emissions_by_sector” shows the GHG emissions of different industries in Canada from 1990 to 2019. The table “Emissions_by_sector_long” is the data converted to pivot long format.
Emissions_by_sector<-data_list$Historical_GHG_Emission[-c(2),-c(1,2,4)]
Emissions_by_sector_long<-Emissions_by_sector%>%
pivot_longer(cols=-c(Sector),names_to="Year", values_to="Emissions")
print(Emissions_by_sector_long)
## # A tibble: 390 × 3
## Sector Year Emissions
## <chr> <chr> <dbl>
## 1 Total excluding LUCF 1990 540.
## 2 Total excluding LUCF 1991 536.
## 3 Total excluding LUCF 1992 551.
## 4 Total excluding LUCF 1993 547.
## 5 Total excluding LUCF 1994 565.
## 6 Total excluding LUCF 1995 580.
## 7 Total excluding LUCF 1996 598.
## 8 Total excluding LUCF 1997 612.
## 9 Total excluding LUCF 1998 620.
## 10 Total excluding LUCF 1999 625.
## # … with 380 more rows
The table “Electricity” shows how much electricity generated from fossil fuels. The data is divided into different provinces from 1995 to 2002. The data is not up-to-date, so I might change or add any other data that is more relevant to the present. The data is already pivot long format, so I only need to delete columns that are providng unrelated information, such as “vector, ID” and change the column titles.
#delete the unrelated columns and rename"
Electricity_generated <- data_list$Electricity [-c(3,7,8,9,10,11,13,14,15,16)]%>%
rename("Year"="REF_DATE")
print(Electricity_generated)
## # A tibble: 3,809 × 6
## Year GEO Source `Fuel type` UOM VALUE
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 1995 Canada Total industry and utilities Total all fuels Giga… 1.18e5
## 2 1995 Canada Total industry and utilities Coal Giga… 8.52e4
## 3 1995 Canada Total industry and utilities Natural gas Giga… 1.86e4
## 4 1995 Canada Total industry and utilities Diesel and light fuel… Giga… 2.41e3
## 5 1995 Canada Total industry and utilities Heavy fuel oil Giga… 3.45e3
## 6 1995 Canada Total industry and utilities Wood and spent pulpin… Giga… 2.69e3
## 7 1995 Canada Total industry and utilities Other fuels Giga… 5.41e3
## 8 1995 Canada Utilities Total all fuels Giga… 1.04e5
## 9 1995 Canada Utilities Coal Giga… 8.52e4
## 10 1995 Canada Utilities Natural gas Giga… 1.37e4
## # … with 3,799 more rows
Because there are too many categories, for the convenience of further calculation, I create some sub-data sets based on the regions and choose Ontario as an example to illustrate.
#use filter to modify this code to create subsets for every province#
Canada <- Electricity_generated %>% filter(GEO=="Canada")
Newfoundland_and_Labrador <- Electricity_generated %>% filter(GEO== "Newfoundland and Labrador")
Prince_Edward_Island <- Electricity_generated %>% filter (GEO == "Prince Edward Island")
Nova_Scotia <- Electricity_generated %>% filter (GEO == "Nova Scotia")
New_Brunswick <- Electricity_generated %>% filter (GEO == "New Brunswick")
Quebec <- Electricity_generated %>% filter (GEO=="Quebec")
Ontario <- Electricity_generated %>% filter (GEO == "Ontario")
Manitoba <- Electricity_generated %>% filter (GEO == "Manitoba")
Saskatchewan <- Electricity_generated %>% filter (GEO == "Saskatchewan")
Alberta <- Electricity_generated %>% filter (GEO == "Alberta")
British_Columbia <- Electricity_generated %>% filter (GEO == "British Columbia")
Yukon_Northwest_Territories_and_Nunavut <- Electricity_generated %>% filter (GEO=="Yukon, Northwest Territories and Nunavut")
print(Ontario)
## # A tibble: 368 × 6
## Year GEO Source `Fuel type` UOM VALUE
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 1995 Ontario Total industry and utilities Total all fuels Giga… 26143.
## 2 1995 Ontario Total industry and utilities Coal Giga… 16708.
## 3 1995 Ontario Total industry and utilities Natural gas Giga… 7836.
## 4 1995 Ontario Total industry and utilities Diesel and light fue… Giga… 177.
## 5 1995 Ontario Total industry and utilities Heavy fuel oil Giga… 128.
## 6 1995 Ontario Total industry and utilities Wood and spent pulpi… Giga… 488
## 7 1995 Ontario Total industry and utilities Other fuels Giga… 807.
## 8 1995 Ontario Utilities Total all fuels Giga… 22919.
## 9 1995 Ontario Utilities Coal Giga… 16692.
## 10 1995 Ontario Utilities Natural gas Giga… 6039
## # … with 358 more rows
In the future, if it is necessary to observe the contribution of different fuel types in each province to electricity generated, I create another type of sub data set . Taking Ontario as an example, I created sub-data set that shows Coal’s contribution to energy generated.
Ontario_fueltype <- Ontario %>% filter (`Fuel type`=="Coal")
print(Ontario_fueltype)
## # A tibble: 54 × 6
## Year GEO Source `Fuel type` UOM VALUE
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 1995 Ontario Total industry and utilities Coal Gigawatt hours 16708.
## 2 1995 Ontario Utilities Coal Gigawatt hours 16692.
## 3 1996 Ontario Total industry and utilities Coal Gigawatt hours 19001.
## 4 1996 Ontario Utilities Coal Gigawatt hours 18988.
## 5 1997 Ontario Total industry and utilities Coal Gigawatt hours 23654.
## 6 1997 Ontario Utilities Coal Gigawatt hours 23622.
## 7 1998 Ontario Total industry and utilities Coal Gigawatt hours 32845.
## 8 1998 Ontario Utilities Coal Gigawatt hours 32804
## 9 1999 Ontario Total industry and utilities Coal Gigawatt hours 33630.
## 10 1999 Ontario Utilities Coal Gigawatt hours 33618.
## # … with 44 more rows
Similar, for Table Energy_use, I delete the unrelated columns and change the REF_DATE to Year, and divide data into two separate data sets based on Physical flow. For future use, it would be easy to change the filter.
Besides, the table Energy_use shows the physical flows into different goods and services in Canada from 2012 to 2019, the physical flows are flows from direct energy use and direct GHG emissions.
Energy_use_CAN <- data_list$Energy_use[-c(2, 3, 6:11, 13:16)] %>%
rename(Year = REF_DATE, `Good and Services` = `Goods and services (products)`)
Energy_use_CAN_FLow <- Energy_use_CAN %>% filter(`Physical flow` == "Direct energy use")
print(Energy_use_CAN_FLow)
## # A tibble: 120 × 4
## Year `Physical flow` `Good and Services` VALUE
## <dbl> <chr> <chr> <dbl>
## 1 2012 Direct energy use Total, environmental and clean technology pro… 146990
## 2 2012 Direct energy use Total, environmental products 84670
## 3 2012 Direct energy use Total, environmental goods 49529
## 4 2012 Direct energy use Clean electricity 41635
## 5 2012 Direct energy use Electricity from nuclear 12029
## 6 2012 Direct energy use Electricity from renewable sources 29607
## 7 2012 Direct energy use Biofuels and primary goods 2131
## 8 2012 Direct energy use Waste and scrap goods 5763
## 9 2012 Direct energy use Waste management and remediation services 35141
## 10 2012 Direct energy use Total, clean technology products 62320
## # … with 110 more rows
In the table Infrastructure Economic Account, main indicators include GHG emissions as a result of production of infrastructure assets and greenhouses value per emis input proportion, focusing on the environmental aspect to aid in understanding the relationship between investment in infrastructure and the environment. Similar, delete the unrelated columns and rename.
Infrastructure_Economic_Accounts_CAN <- data_list$Infrastructure_Economic_Accounts[-c(3,7:12,14:17)]%>%
rename("Year"="REF_DATE")
print(Infrastructure_Economic_Accounts_CAN)
## # A tibble: 100,800 × 6
## Year GEO Estimate Impact Asset VALUE
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 2009 Canada Greenhouse gas emissions Total impact Total assets 21276
## 2 2009 Canada Greenhouse gas emissions Total impact Commercial buildings 752
## 3 2009 Canada Greenhouse gas emissions Total impact Sports facilities w… 181
## 4 2009 Canada Greenhouse gas emissions Total impact Indoor recreational… 506
## 5 2009 Canada Greenhouse gas emissions Total impact Student residences 1
## 6 2009 Canada Greenhouse gas emissions Total impact Airports and other … 55
## 7 2009 Canada Greenhouse gas emissions Total impact Communications buil… 9
## 8 2009 Canada Greenhouse gas emissions Total impact Institutional build… 3978
## 9 2009 Canada Greenhouse gas emissions Total impact Schools, colleges, … 1987
## 10 2009 Canada Greenhouse gas emissions Total impact Hospitals 1199
## # … with 100,790 more rows
Because the table Infrastructure Economic Accounts are divided by regions and assets as well as different ways of impact, I create sub data sets to display different regions and impact in different ways on different assets.
Ontario_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter (GEO=="Ontario")
BC_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "British Columbia")
Quebec_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "Quebec")
Alberta_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "Alberta")
Manitoba_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "Manitoba")
NB_Impact <- Infrastructure_Economic_Accounts_CAN %>%filter(GEO == "New Brunswick")
NL_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "Newfoundland and Labrador")
NWT_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "Yukon, Northwest Territories and Nunavut")
NS_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "Nova Scotia")
PEI_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "Prince Edward Island")
Saskatchewan_Impact <- Infrastructure_Economic_Accounts_CAN %>% filter(GEO == "Saskatchewan")
For table Population, it is a global level, therefore, I delete other countries and only focus in Canada. Meanwhile, I convert this into pivot long format.
Population_CAN <- data_list$Population%>%
filter(`Country Name` == "Canada")%>%
select(-c(2:4))
Population_CAN_long <- Population_CAN %>%
pivot_longer(cols = -c('Country Name'),
names_to = "Year",
values_to = "Population") %>%
mutate(Year = as.numeric(str_replace(Year, "\\D", "")))
print(Population_CAN_long)
## # A tibble: 32 × 3
## `Country Name` Year Population
## <chr> <dbl> <dbl>
## 1 Canada 1990 27691138
## 2 Canada 1991 28037420
## 3 Canada 1992 28371264
## 4 Canada 1993 28684764
## 5 Canada 1994 29000663
## 6 Canada 1995 29302311
## 7 Canada 1996 29610218
## 8 Canada 1997 29905948
## 9 Canada 1998 30155173
## 10 Canada 1999 30401286
## # … with 22 more rows
PPPGDP table is also the global level. But this table does not have columns names, and the country name isn’t shown as a title. So I will need to delete the first two rows and then create names for each column and then convert them into pivot long format.
PPPGDP_CAN <- data_list$PPPGDP %>%
slice(-1:-2) %>%
setNames(.[1,]) %>%
slice(-1)
PPPGDP_CAN_long <- PPPGDP_CAN %>%
pivot_longer(cols = -c('Country Name'),
names_to = "Year",
values_to = "PPPGDP")
print(PPPGDP_CAN_long)
## # A tibble: 8,512 × 3
## `Country Name` Year PPPGDP
## <chr> <chr> <chr>
## 1 Aruba 1990 1363755808.7800636
## 2 Aruba 1991 1522141298.8258934
## 3 Aruba 1992 1648407757.6416767
## 4 Aruba 1993 1810796526.6005981
## 5 Aruba 1994 2001192963.822479
## 6 Aruba 1995 2095198112.5692353
## 7 Aruba 1996 2158860032.2951546
## 8 Aruba 1997 2350843673.1923804
## 9 Aruba 1998 2424658706.9520659
## 10 Aruba 1999 2489269108.5688787
## # … with 8,502 more rows
Finally, the Supply and Demand NG shows the supply and demand of natural gas liquids annually. There are various terms in the table, for example, production means the capture, extraction or manufacture of fuels or energy in forms that are ready for general use. Other detailed information can be found in here https://www150.statcan.gc.ca/n1/pub/57-003-x/57-003-x2023001-eng.htm Megalitres and Terajoules are two different units of measurement, and it is possible to convert between the two units. Therefore, it’s not meaningful to compare megalitres and only focus one unit. Here I choose megalitres.
Supply_Demand_NG_CAN <- data_list$Supply_Demand_NG[-c(3,7:12,14:17)] %>%
filter(Measures == "Megalitres") %>%
rename("Year"="REF_DATE")
print(Supply_Demand_NG_CAN)
## # A tibble: 26,485 × 6
## Year GEO `Fuel type` Sector Measu…¹ VALUE
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 1995 Canada Gas plant natural gas liquids (NGL's) Production Megali… 25121.
## 2 1995 Canada Gas plant natural gas liquids (NGL's) Exports Megali… 9958.
## 3 1995 Canada Gas plant natural gas liquids (NGL's) Imports Megali… 463
## 4 1995 Canada Gas plant natural gas liquids (NGL's) Inter-regi… Megali… 0
## 5 1995 Canada Gas plant natural gas liquids (NGL's) Stock vari… Megali… 665.
## 6 1995 Canada Gas plant natural gas liquids (NGL's) Inter-prod… Megali… NA
## 7 1995 Canada Gas plant natural gas liquids (NGL's) Other adju… Megali… 4343.
## 8 1995 Canada Gas plant natural gas liquids (NGL's) Availabili… Megali… 19304.
## 9 1995 Canada Gas plant natural gas liquids (NGL's) Transforme… Megali… 2962.
## 10 1995 Canada Gas plant natural gas liquids (NGL's) Net refine… Megali… 1701.
## # … with 26,475 more rows, and abbreviated variable name ¹Measures
Next, I would like to know how many variable or unique values there are in the Sector column and in the Fuel type column.
n_sectors <- length(unique(Supply_Demand_NG_CAN$Sector))
print(n_sectors)
## [1] 33
n_fueltype <- length(unique(Supply_Demand_NG_CAN$`Fuel type`))
print(n_fueltype)
## [1] 4