library(ggplot2)
library(markdown)
library(rmarkdown)
library(tidyr)
library(tidyselect)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ tibble  3.1.6     ✔ dplyr   1.0.7
## ✔ readr   2.1.2     ✔ stringr 1.4.0
## ✔ purrr   0.3.4     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(readxl)
library(stats)

Research Question

I am interested in examining the relationship between exports from China to the US, and the increase in Co2 emissions over the years. China is marked to be the highest Co2 emitting country followed by the US. However, it is clear that China pulls some of the carbon weight for the United States by manufacturing a wealth of goods. Research suggests that economic activity in the form of international trade accounts for a significant portion of Chinese Co2 emissions. Initial reports suggest that exports could account for up to 34% of total carbon emissions from china, suggesting more research in this area is needed. Specifically, it is relevant to determine the impact of exports to specific countries, with the goal of guiding international policy on trade and environmental impact.

Hypothesis

I am comparing export data with Co2 emissions for three of China’s top export consumers: USA, Hong Kong, and Japan. My goal is to determine if there is 1) a relationship between export values and carbon emissions 2) to determine the level of difference between export values and carbon emissions from these three consumer bodies. This will be an initial probe into the dilemma, as obvious further analyses would be needed con consider differences in export items and their respective Co2 emissions. For example, it could be the case that while the US is the top export consumer, Japan may consume items that substantiate a greater sum of Co2 emissions.

Export Data

Four initial datasets have been pulled for the purpose of this study. All export data includes data on exports from China to the US, Japan, and Hong Kong. These datasets include date ranges from 1992 to 2019 The overall dataset has 3 columns, each containing 30 rows. This dataset was chosen because it covers a relatively adequate sample range from 1992-2020. This data was pulled from the United Nations COMTRADE database on comerce and trade.

export_USA<- read_excel("~/Documents/export data.xlsx")
export_japan<- read_excel("~/Downloads/comtrade_historical_CHNJPN00002.xls")
export_hk<- read_excel("~/Downloads/comtrade_historical_CHNHKG00002.xls")
summarise(export_USA)
## # A tibble: 1 × 0
summarise(export_japan)
## # A tibble: 1 × 0
summarise(export_hk)
## # A tibble: 1 × 0

Carbon Data

Carbon data was pulled to show the difference in carbon emissions from china between the years 2010-2020. It is unclear, yet, if this dataset will be used for final drafts, as it exludes a number of years reviewed in the export data. As a result, the gaps in years may lead to weaker analyses. For now, this data will be considered.

library(readxl)
carbon_data <- read_excel("~/Documents/Carbon Data.xlsx")
summarise(carbon_data)
## # A tibble: 1 × 0

Initial Displays

ggplot(export_USA, aes(x = Date, y = Value)) + geom_bar(stat="identity") + labs(title = "China Exports to USA")

ggplot(export_japan, aes(x = Date, y = Value)) + geom_bar(stat="identity") + labs(title = "China Exports to Japan")

ggplot(export_hk, aes(x = Date, y = Value)) + geom_bar(stat="identity") + labs(title = "China Exports to Hong Kong")

Creating a single dataset

Needing to merge these datasets to make a single set for analysis. Here, both data sets were joined by Date to create a single data set containing export values and carbon emissions.

all_china<- list(export_USA, export_hk, export_japan, carbon_data)
all_china1<-all_china %>% reduce(full_join, by='Date')
view(all_china1)

Hypothesis Testing

usa_reg<-lm(Emissions ~ Value.x, data = all_china1)
hk_reg<-lm(Emissions ~ Value.y, data = all_china1)
jp_reg<-lm(Emissions ~ Value, data = all_china1)
all_reg<-lm(Emissions ~ Value + Value.x + Value.y, data = all_china1)
summary(usa_reg)
## 
## Call:
## lm(formula = Emissions ~ Value.x, data = all_china1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1090596  -212490   -25167   181380   978675 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.577e+06  1.211e+05   21.29   <2e-16 ***
## Value.x     1.881e-05  4.789e-07   39.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 399500 on 26 degrees of freedom
## Multiple R-squared:  0.9834, Adjusted R-squared:  0.9828 
## F-statistic:  1543 on 1 and 26 DF,  p-value: < 2.2e-16
summary(hk_reg)
## 
## Call:
## lm(formula = Emissions ~ Value.y, data = all_china1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1761173  -327620   -31856   364807  1521943 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.429e+06  2.268e+05   10.71 4.97e-11 ***
## Value.y     2.423e-05  1.136e-06   21.33  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 721600 on 26 degrees of freedom
## Multiple R-squared:  0.9459, Adjusted R-squared:  0.9439 
## F-statistic:   455 on 1 and 26 DF,  p-value: < 2.2e-16
summary(jp_reg)
## 
## Call:
## lm(formula = Emissions ~ Value, data = all_china1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -958479 -368672  -16869  465440  935227 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.245e+06  1.943e+05   6.408 8.68e-07 ***
## Value       5.953e-05  1.978e-06  30.094  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 518500 on 26 degrees of freedom
## Multiple R-squared:  0.9721, Adjusted R-squared:  0.971 
## F-statistic: 905.7 on 1 and 26 DF,  p-value: < 2.2e-16
summary(all_reg)
## 
## Call:
## lm(formula = Emissions ~ Value + Value.x + Value.y, data = all_china1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -486684 -136543  -44185   75062  853091 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.001e+06  1.449e+05  13.806 6.51e-13 ***
## Value       2.305e-05  5.236e-06   4.402  0.00019 ***
## Value.x     1.115e-05  1.592e-06   7.001 3.07e-07 ***
## Value.y     8.537e-07  2.006e-06   0.426  0.67422    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 282000 on 24 degrees of freedom
## Multiple R-squared:  0.9924, Adjusted R-squared:  0.9914 
## F-statistic:  1042 on 3 and 24 DF,  p-value: < 2.2e-16
jp_plot<-ggplot(data = all_china1, aes(x = Value, y = Emissions)) +
     geom_point() +
     geom_smooth(method = lm)
jp_plot
## `geom_smooth()` using formula 'y ~ x'

hk_plot<-ggplot(data = all_china1, aes(x = Value.y, y = Emissions)) +
     geom_point() +
     geom_smooth(method = lm)
hk_plot
## `geom_smooth()` using formula 'y ~ x'

usa_plot<-ggplot(data = all_china1, aes(x = Value.x, y = Emissions)) +
     geom_point() +
     geom_smooth(method = lm)
usa_plot
## `geom_smooth()` using formula 'y ~ x'

### Diagnostics

par(mfrow = c(2,3)); plot(usa_reg, which = 1:6)

par(mfrow = c(2,3)); plot(hk_reg, which = 1:6)

par(mfrow = c(2,3)); plot(jp_reg, which = 1:6)

par(mfrow = c(2,3)); plot(all_reg, which = 1:6)

Transformations/Modifications

Diagnostics Round 2

Conclusions and Discussion