1. Description of Dataset

##  [1] "Year"                                             
##  [2] "Ontario GHG ID"                                   
##  [3] "Facility Owner"                                   
##  [4] "Facility Name"                                    
##  [5] "Facility City"                                    
##  [6] "Facility Primary NAICS Code"                      
##  [7] "Carbon dioxide (CO2) from non-biomass in CO2e (t)"
##  [8] "Carbon dioxide (CO2) from biomass in CO2e (t)"    
##  [9] "Methane (CH4) in CO2e (t)"                        
## [10] "Nitrous oxide (N2O) in CO2e (t)"                  
## [11] "Sulphur hexafluoride (SF6) in CO2e (t)"           
## [12] "Hydrofluorocarbons (HFCs) in CO2e (t)"            
## [13] "Perfluorocarbons (PFCs) in CO2e (t)"              
## [14] "Nitrogen Trifluoride (NF3) in CO2e (t)"           
## [15] "Total CO2e from all sources in CO2e (t)"          
## [16] "Reporting Amount in CO2e (t)"                     
## [17] "Verification Amount in CO2e (t)"                  
## [18] "Accredited Verification Body"

Note: CO2e standardizes all greenhouse gases based on their global warming potential with respect to CO2, expressed in metric tonnes (t)

2. Background of the Data

Specified Greenhouse Gases Activities from 2010 to 2021 were collected by the Government of Ontario, specifically by the Ministry of the Environment, Conservation and Parks. It was collected by using quantitation methods in the incorporated Guideline for Quantification, Reporting and Verification of Greenhouse Gas Emissions. The dataset was used as a baseline to understand emissions profiles and manage and reduce greenhouse gas emissions.

3. Overall Research Question

This report aims to explore the relationship between various factors in greenhouse gas emissions in Ontario from 2010 to 2021 based on Specified Greenhouse Gases Activities from 2010 to 2021 data. In particular, the key research questions are:

  1. What is the trend in greenhouse gas emissions data from 2010 to 2021?

  2. Which facility owner in Ontario produced the highest amount of total greenhouse gas emissions (in CO2e (t)) in 2010 to 2021?

  3. Which city in Ontario emits the highest amount of total greenhouse gas emissions (in CO2e (t)) in 2010 to 2021?

  4. Which year recorded the highest average greenhouse gas emissions (in CO2e (t)) in Ontario?

  5. How do the reported emissions data compare to the actual verified emissions from facilities in Ontario?

  6. How can we forecast the future growth of average greenhouse gas emissions in Ontario?

4. Tables

4.1. Greenhouse Gas Emissions By Year in CO2e (t)

Greenhouse Gas Emissions By Year in CO2e (t)
Year CO2 CH4 N2O SF6 HFCs PFCs NF3 Mean CO2e
2010 58718651 240624.7 397213.0 0.00 18393.00 0 0 398489.6
2011 52638649 241876.8 356883.6 0.00 181.14 0 0 352567.3
2012 52624766 225215.5 302992.0 170.38 352.79 0 0 354357.1
2013 47726844 222279.3 292826.9 0.00 500.32 0 0 313263.1
2014 44982436 240587.5 254573.0 0.00 1646.91 0 0 293415.0
2015 45565600 238669.9 287930.7 0.00 1348.97 0 0 193670.9
2016 46048303 821491.5 276829.9 87723.37 287.93 0 0 171762.8
2017 42731567 847137.9 264599.3 105695.36 318.75 0 0 156869.6
2018 45501375 764380.4 268592.9 111119.27 940.75 0 0 165825.0
2019 13986535 3708263.4 255851.2 174880.51 2367.05 0 0 138253.3
2020 13158262 3627471.3 240969.1 166137.95 2653.43 0 0 126251.8
2021 13375289 4032305.9 289023.1 182290.90 5135.17 0 0 133722.9

The table shows the amount of greenhouse gases produced each year from 2010 to 2021 based on the Specified Greenhouse Gases Activities from 2010 to 2021 data. The table includes the total amount of Carbon Dioxide (\(CO_2\)), Methane (\(NH_4\)), Nitrous Oxide (\(N_2O\)), Sulfur hexafluoride (\(SF_6\)), Hydrofluorocarbons (HFCs), Perfluorocarbons (PFCs), and Nitrogen Trifluoride (\(NF_3\)) gases produced in CO2e (t) units. Moreover the data of Sulfur hexafluoride (\(SF_6\)) and Hydrofluorocarbons (HFCs) are rounded to 2 decimal places. Additionally, the rightmostmost column shows the average total greenhouse gases in CO2e (t) produced by each facility with available data. The table illustrates a gradual decrease in average CO2e emissions each year, starting from 59374943 in 2010 to 51082150 in 2021.

4.2. Top 10 CO2e Producer by Facility Owner (2010-2021)

Top 10 CO2e Producer by Facility Owner (2010-2021)
Facility Owner Total CO2
Imperial Oil 37945995
Stelco 37914837
Ontario Power Generation 27343473
ArcelorMittal Dofasco 25546697
Essar Steel Algoma 20781568
Domtar 19739183
ArcelorMittal Dofasco G.P. 19279341
Algoma Steel 16436029
St. Marys Cement 15322028
Northland Power 12562646

The table displays the top 10 facility owners in terms of total CO2e produced from all sources from 2010 to 2021 based on the dataset. The name of facility owners are listed under the column “Facility Owner” and the corresponding total CO2e produced are listed under “Total CO2e” with CO2e (t) as the unit.

The table is sorted in descending order based on the total CO2e produced, with Imperial Oil as the biggest contributor of total greenhouse CO2e, followed by Stelco and Ontario Power Generation.

This table can be useful for policymakers, officials, or analysts to identify major greenhouse gas contributors and guide regulatory actions to be made. It can also set a benchmark for companies aiming to reduce their emissions.

4.3. Top 10 CO2e Producer by Facility City (2010-2021)

Top CO2e Producer by Facility City (2010-2021)
Facility City Total CO2
Hamilton 69432908
Sault Ste. Marie 53440464
Sarnia 52581271
Haldimand County 38208158
Corunna 34334863
Courtright 28827147
Nanticoke 27926277
Mississauga 22567536
Bowmanville 16911617
Thunder Bay 16740678

The table shows the top 10 facility cities in terms of total CO2e produced from all sources from 2010 to 2021 based on the dataset. The names of facility cities are listed under the column “Facility City” and the corresponding total CO2e produced are listed under “Total CO2e” with CO2e (t) as the unit.

The table is sorted in descending order based on the total CO2e produced, with Hamilton as the biggest contributor of total greenhouse CO2e, followed by Sault Ste. Marie and Sarnia.

This table can be useful for policymakers, city officials, or analysts to identify major greenhouse gas contributors and guide regulatory actions to be made. It can also set a benchmark for cities aiming to reduce their emissions and improve their environmental policies.

4.4. Top 10 Difference between Reported CO2e and Verified CO2e by Facility Owner in CO2e (t)

Top 10 Difference between Reported CO2e and Verified CO2e by Facility Owner (2010-2021) in CO2e (t)
Facility Owner Difference Reported CO2e Verified CO2e
Domtar 17087994 2651189 19739183
Produits forestiers Résolu 8175408 1488808 9664216
AV Terrace Bay 7100097 1393026 8493123
Resolute FP Canada 3898188 772200 4670388
AbiBow Canada 3723549 446374 4169923
Northland Power 3340330 9222316 12562646
Anthony Forest Products Company 2581197 537676 3118873
Tembec 1568336 219235 1787571
Ontario Power Generation 1193820 26149653 27343473
Atlantic Power LP 1126288 2834339 3960627

The table shows the top 10 facility owners in terms of difference between the reported CO2e produced and verified CO2e produced from all sources based on the dataset. It represents the amount of additional CO2e that is found to be actually produced compared to the total CO2e produced that is reported by the facility owners from 2010 to 2021. The names of facility owners are listed under the column “Facility Owner” and the corresponding total additional CO2e produced are listed under “Difference” with CO2e (t) as the unit.

This table is sorted in descending order based on the values in the column “Difference”, with Domtar having the largest discrepancy between reported and actual CO2, followed by Produits forestiers Resolu and AV Terrace Bay.

The information presented by this table can be useful for regulators to ensure compliance with environmental laws, as it highlights discrepancies and helps to identify facility owners that may be underreporting greenhouse gas emissions. Additionally, it can highlight gaps in reports, driving future improvements.

5. Graphs

5.1. Top 10 Facility Cities by Total Greenhouse Gases

The bar graph illustrates the top 10 facility cities in terms of total greenhouse gases produced based on the dataset from 2010 until 2021 in CO2e (t). Each bar graph represents the total greenhouse gases produced per city.

Based on the graph, it is sorted in descending order and we can see that Hamilton was the city which produced the most emissions, with an approximation of 70 million CO2e (t), followed by Sault Ste. Marie, with an emission of around 53 million CO2e (t) and Sarina, with an emission of roughly 52 million CO2e (t).

5.2. Total Greenhouse Gas Emissions excluding Carbon Dioxide by Year

The line graph shows the overall emission of each greenhouse gases, excluding carbon dioxide, from 2010 until 2021, in CO2e (t).

Based on the graph, we can see that from 2015 until 2016, methane (NH4) was significantly increasing from roughly 250 000 CO2e (t) to approximately 800 000 CO2e (t), then it slightly decreasing for 2 years and followed by a drastic growth in 2019 with roughly 3.7 million CO2e (t). The other gases were relatively stable in these period.

5.3. Total Carbon Dioxide Emission from Non-biomass and Biomass per Year

The bar graph illustrates the total carbon dioxide gas emission from both non-biomas and biomass for 11 years since 2010, measured in CO2e (t).

Based on the graph, we can see that carbon dioxide from non-biomass has more emission compared to carbon dioxide from biomass. Overall, the trend of the carbon dioxide emission is relatively decreasing as the years go by (from 2010 with approximately 58 million CO2e (t) to 2021 with roughly 45 milion CO2e (t)).

5.4. Distribution of Reported vs. Actual Emissions (2010-2021)

The density plot represents the distribution of reported and actual emissions, measured in CO2e (t).

We can see that, based on the graph, the actual emission is higher than the reported emission. The peak of the distribution of the reported emission is when the emission is at approximately 41.5 million CO2e (t), whereas, the of the distribution of the actual emission is when the emission is at roughly 46 million CO2e (t).

6. Hypothesis Testing

Hypothesis Testing assuming the variance of the data is not equal Hypothesis: \(H_0\) : The average reported emission in CO₂e (r) is equal to the average verified emission in CO₂e (v); that is,

\(H_0\): \(\mu_r\) = \(\mu_v\)

\(H_a\): The average reported emission in CO₂e (r) is not equal to the average verified emission (v); that is,

\(H_a\): \(\mu_r \neq \mu_v\)

To evaluate this hypothesis, we can apply a two-sample t-test to compare the mean of reported emissions with the mean of verified emissions. This involves splitting the data into two groups: one for reported emissions and one for verified emissions.

We then calculate a t-statistic and the corresponding p-value, which represents the probability of observing a difference in sample means as extreme (or more extreme) than the one we calculate, under the assumption that the null hypothesis is true. If the p-value is less than our significance threshold (commonly 0.05), we can reject the null hypothesis and conclude that there is a statistically significant difference between the reported and verified emissions.

In addition, we can compute a confidence interval for the difference in means to estimate the likely range of values in which the true difference between the reported and verified emissions lies.

Overall, this approach allows us to assess whether discrepancies between reported and verified CO₂e emissions are statistically significant, and helps quantify the size of that difference if it exists.

## 
##  Welch Two Sample t-test
## 
## data:  Greenhouse$`Reporting Amount in CO2e (t)` and Greenhouse$`Verification Amount in CO2e (t)`
## t = 1.1044, df = 5743.9, p-value = 0.2695
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11348.04  40632.38
## sample estimates:
## mean of x mean of y 
##  179946.1  165303.9

Note that the p-value is 0.2695 > 0.05, this would imply that we fail to reject the null-hypothesis. We also have the 95% confidence interval (-11348.04, 40632.38) which includes the zero mean, further suggesting that there isn’t any significant difference between the reported amount of CO2e(t) and the actual verified amount of CO2e(t)

7. Bootstrapping

Estimating the Mean of Total CO2e Produced by Each Facility

By using our Greenhouse data, the calculated mean of total CO2e produced by each facility is as follows:

## [1] 200631.6

We can use bootstrapping to estimate the sampling distribution of the mean of total CO2e produced by each facility and compute the confidence interval.

##     2.5%    97.5% 
## 114514.9 316146.1

Based on the calculation, we get a 95% confidence interval of (114514.9, 316146.1). This means that we are 95% confident that the true mean total CO2e from all sources in CO2e (t) falls within this range. The bootstrap sampling distribution is illustrated as follows:

8. Regression Analysis

Analyzing the Relationship between Year and Average CO2e Emission

We want to check the relationship between the year and the average CO2e emission. We first evaluate the regression using a non-linear model, check the intercepts, and then plot it.

## 
## Call:
## lm(formula = emission_avg ~ Year + I(Year^2), data = emission_by_year)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -30316 -17086  -2607  14940  38880 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  7.908e+09  2.761e+09   2.865   0.0186 *
## Year        -7.821e+06  2.740e+06  -2.855   0.0189 *
## I(Year^2)    1.934e+03  6.796e+02   2.845   0.0192 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24830 on 9 degrees of freedom
## Multiple R-squared:  0.9506, Adjusted R-squared:  0.9396 
## F-statistic: 86.61 on 2 and 9 DF,  p-value: 1.323e-06

Explanation

  • Multiple R-Squared: 95%

  • Adjusted R-Squared: 93%

Since both are greater than 90%, this would mean that the regression model captures more than 90% of the variance in the data, indicating that it is a good fit.

Note that the p-values for all var (intercept, Year, Year^2) is significant, meaning they are all crucial in explaining the relationship between the year and average emission. Additionally, the interpretation of regression parameters are as follows:

  • Intercept (7.908e+09): This intercept represents the predicted emissions for the starting year of the data (2010)

  • Year (-7.821e+06): This negative coefficient for year represents a linear effect of year, meaning that for each one-unit increase in the variable Year, there would be a decrease of approximately 7.82 million tons in avg_emission

  • I(Year^2)(1.934e+0.3): This positive coefficient for year^2 represents a quadratic effect of year, and since it’s positive, it suggests that the relationship between year and avg_emission is curvilinear.

Plotting the Result

Interpreting the graph: we can see from the linear regression model that the trend for the “Average Emission” is decreasing over the year.

9. Cross Validation

We will use cross validation to check the previous regression model, analyzing the relationship between the year and the emission average in tons.

The idea is to split the data into k-folds / k-parts. The approach we are using is that we use k-1 parts as the training set, and the 1 part left as the testing set. We are going to repeat this process k times using each k part as the testing set exactly once.

## [1] 0.1138045

We have an average Mean Sqare Error (MSE) of 0.1138045 which means that the average error of the regression model is small which conclude that our regression line model fits well. Hence, there is likely a relationship between the year and the emission average.

10. Summary of Research

Based on this report, these are the key findings from greenhouse gas emissions report by facility data:

  1. The mean of total CO2e produced by each facility in 2010-2021 is 200631.6, with the average greenhouse gas emissions per facility gradually decreasing each year from 2010 to 2021. Based on our regression analysis, we could expect the average greenhouse gas emission to slightly decrease in the future.

  2. The major greenhouse gas is CO2, which is produced in much larger quantities than other gases. While the emissions of other gases remain relatively stable each year, methane (NH4) experiences a rapid increase in 2019.

  3. From 2010 to 2021, the facility owner in Ontario that produced the highest amount of total greenhouse gas emissions (in CO2e) is Imperial Oil.

  4. From 2010 to 2021, the city in Ontario which emits the highest amount of total greenhouse gas emissions (in CO2e) is Hamilton.

  5. 2010 marks the year that recorded the highest average greenhouse gas emissions (in CO2e) in Ontario.

  6. There are some discrepancies between the reported emissions data compared to the actual verified emissions from facilities in Ontario. However, the difference is not significant, as shown in our test of hypothesis.

These findings highlight the need to address emissions from major facilities and cities, while also improving the accuracy of emissions reporting.