2023-03-14

Reading in Data Set

## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Greenhouse Gas Emissions

Greenhouse gas emissions are a major contributor to climate change, which poses significant threats to human societies and natural ecosystems. Understanding greenhouse gas emissions is essential for developing effective policies and initiatives to reduce emissions and mitigate the impacts of climate change. By analyzing trends and patterns in emissions data, as well as the factors that influence emissions, we can identify ways to reduce emissions and transition to more sustainable forms of energy and resource use. Additionally, understanding the spatial and temporal dynamics of emissions data can help to inform localized and targeted sustainability interventions, while also supporting global efforts to reduce greenhouse gas emissions and mitigate the impacts of climate change.

Interval Estimation

Interval estimation is a statistical technique that is used to estimate the value of an unknown population parameter such as a mean or proportion. By constructing a sample from the population, an interval is created that is based on the variability and uncertainty that is presented in the sample data. This interval of values is likely to contain the true value of the parameter. By providing a range of values rather than a single point, interval estimation allows for a nuanced understanding of the population being estimated.

Equation used in Interval Estimation

\(CI = \bar{x} \pm z_{1-\frac{\alpha}{2}} \frac{s}{\sqrt{n}}\)

where:

\(CI\) is the confidence interval \(\bar{x}\) is the sample mean \(z_{1-\frac{\alpha}{2}}\) is the critical value from the t-distribution with \(n-1\) degrees of freedom and a significance level of \(\alpha/2\) \(s\) is the sample standard deviation \(n\) is the sample size

GDP vs. Greenhouse Gas Emissions

The following ggplot on the next slide is a scatter plot of GDP vs. total greenhouse gas (GHG) emissions for the countries United States, China, and India from the years 2000-2010. The size of the markers on the plot corresponds to the population of each country. The shape of the markers denotes which decade the data point falls into. The plot shows the relationship between a country’s GDP and their GHG emissions, and highlights the differences between these three major economies.

GDP vs. Greenhouse Gas Emissions

GDP vs. Greenhouse Gas Emissions Using Interval Estimation

The plot on the following slide shows all of this plus the confidence intervals. Interval estimation is depicted through the shaded areas around the trend lines. The confidence interval is computed using the standard error of the estimated regression line. It represents a range of values where the true regression line is likely to fall with 95% confidence.

The confidence intervals show us the uncertainty of the estimated relationship between GDP and total greenhouse gas emissions.

GDP vs. Greenhouse Gas Emissions - R Code

# Filter data
df_filtered <- df %>% 
  filter(!is.na(gdp) & !is.na(total_ghg) & year >= 2000 & year <= 2010 
         & country %in% c("United States", "China", "India"))

# Create plot
ggplot(df_filtered, aes(x = gdp, y = total_ghg, color = country)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "GDP vs. Total Greenhouse Gas Emissions",
       x = "GDP (USD)",
       y = "Total Greenhouse Gas Emissions (kt CO2eq)") + theme_bw()

## GDP vs. Greenhouse Gas Emissions - Plot

## `geom_smooth()` using formula = 'y ~ x'

GDP vs. Greenhouse Gas Emissions

China has the highest GDP per Greenhouse Gas Emissions, but the US has higher GDP for a lower greenhouse gas emissions. According to BBC, China’s emissions exceed all developed nations combined because of its reliance on coal power (https://www.bbc.com/news/world-asia-57018837).

Total Greenhouse Gas Emissions Per Capita for US, China, and India

The following slide shows a line plot with confidence intervals using ggplot. The plot shows the trend of total greenhouse gas emissions per capita from 1990 to 2018 for the US, China, and India. The x-axis shows the year. The y-axis shows the mean total greenhouse gas emissions per capita in metric tons of CO2eq.

The line in the plot represents the mean value of greenhouse gas emissions per capita for each year, while the shaded area around the line represents the 95% confidence interval. This is calculated from the standard error of the mean. The plot shows us the trends in greenhouse gas emissions per capita for each country and to compare them to each other.

Equation for Standard Error of the Mean used to calculte confidence Interval

\(SEM = \frac{s}{\sqrt{n}}\)

where:

\(SEM\) is the standard error of the mean \(s\) is the sample standard deviation \(n\) is the sample size

Interval estimation - Total GHG Emissions per capita

Describing Total Greenhouse Gas Emissions Per Capita for US, China, and India

The above code generates a line plot with confidence intervals. The plot shows the trend of total greenhouse gas emissions per capita from 1990 to 2018 for a selected group of countries. The x-axis shows the year while the y-axis shows the mean total greenhouse gas emissions per capita in metric tons of CO2eq.

The line in the plot represents the mean value of greenhouse gas emissions per capita for each year, while the shaded area around the line represents the 95% confidence interval. The plot allows us to visualize the trends in greenhouse gas emissions per capita for each country and to compare them to each other. As noted, there is no shaded area because there is no variation in the data when looking at country, years, and the greenhouse gas emissions per year. This is an example of something to look out for when using interval estimations.

Total Greenhouse Gas Emissions Over the Years for US, China, and India

Adding confidence intervals

Total Greenhouse Gas Emissions Over the Years for US, China, and India - cont

This plot is showing the total greenhouse gas emissions over the years for the United States, China, and India, with the x-axis showing the year and the y-axis showing the total greenhouse gas emissions in metric tons of CO2 equivalent.

The shaded areas around each line show the 95% confidence interval of a linear regression line that was fit to the data using the stat_smooth() function. The width of each ribbon is the uncertainty of the estimated regression line, with wider ribbons indicating higher uncertainty. China has a wider ribbon, so it has higher uncertainty, while India has a smaller ribbon, which means there’s lower uncertainty.