1. Covariance vs. Correlation

  1. Covariance: Measures the direction of the relationship between two random variables.

    • he covariance value can range from -∞ to +∞, with a negative value indicating a negative relationship and a positive value indicating a positive relationship. The greater this number, the more reliant the relationship. Positive covariance denotes a direct relationship and is represented by a positive number.
  2. Correlation: Measures the direction and the magnitude of the relationship between two random variables.

3. Data Selection

# Load data
greenhouse_gas <- read.csv("/Users/pin.lyu/Desktop/greenhouse_gas_emissions.csv")

carbon_footprint <- read.csv("/Users/pin.lyu/Desktop/carbon_footprint_by_product.csv")
colnames(carbon_footprint) <- c("Year", "Product", "Base_Storage", "Carbon_Footprint")

colnames(greenhouse_gas) <- c("Year", "Category", "Type", "Scope", "Description", "Emissions")

4. Merge Data

# Combine two data sets 
Emissions <- inner_join(greenhouse_gas, carbon_footprint,
                        by = "Year" )
## Warning in inner_join(greenhouse_gas, carbon_footprint, by = "Year"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 86 of `x` matches multiple rows in `y`.
## ℹ Row 2 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
# summary table
stargazer(Emissions, type = 'text', title = 'Emission Dats Summary Statistics')
## 
## Emission Dats Summary Statistics
## ====================================================================
## Statistic         N      Mean        St. Dev.      Min       Max    
## --------------------------------------------------------------------
## Year             153   2,018.333       2.218      2,015     2,022   
## Emissions        119 2,002,429.000 5,406,279.000 -500,000 29,600,000
## Base_Storage     153    71.111        33.081        32       128    
## Carbon_Footprint 153    64.778         8.037        54        79    
## --------------------------------------------------------------------

5. Correlation & Covariance

plot(Carbon_Footprint ~ Base_Storage,
      data = Emissions,
      main = "Carbon Footprint vs. Base Storage (2015 - 2022)",
      xlab = "Carbon Footprint",
      ylab = "Base Storage"
)

# Correlation value
cor(Emissions$Carbon_Footprint, Emissions$Base_Storage, use = "complete.obs")
## [1] 0.140592

Comments: As the graph suggests, the correlation value between the two selected variables are 0.141 which means that there is not a significant relationship between the two variables.

# Covariance value
cov(Emissions$Carbon_Footprint, Emissions$Base_Storage, use = "complete.obs")
## [1] 37.38012

Comments: The value of covariance is at a level of positive 37.38 which suggest that there is a positive relationship between the two variables selected. However, since the maximal value that covariance can take is positive infinite, this means that the positive relationship between the two is relatively weak.