For a long time countries have upped their strategies to reduce CO2 emissions in an effort to combat climate change. And it is well-known that one of the main causes of the elevated amount of emissions we’ve seen in recent decades is due to fossil fuel use. What we will look into today is whether the production of crude oil has actually decreased in recent years.
We will be referencing the Energy Statistical Yearbook of 2021, the most recent data we have on the subject as of today. We will import this data and store it as a tibble.
crude_oil <- read.csv("crude_oil_production.csv")
crude_oil <- as_tibble(crude_oil)
crude_oil
## # A tibble: 58 × 35
## Region Country X1990 X1991 X1992 X1993 X1994 X1995 X1996
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 World World 3176. 3.18e+3 3.22e+3 3.24e+3 3273. 3.32e+3 3.42e+3
## 2 OECD OECD 895. 9.14e+2 9.29e+2 9.36e+2 971. 9.82e+2 1.01e+3
## 3 G7 G7 611. 6.10e+2 6.12e+2 6.11e+2 633. 6.37e+2 6.39e+2
## 4 BRICS BRICS 734. 6.80e+2 6.08e+2 5.68e+2 540. 5.37e+2 5.47e+2
## 5 Europe Europe 220. 2.31e+2 2.47e+2 2.60e+2 302. 3.14e+2 3.32e+2
## 6 Europe European Union 40.6 3.96e+1 3.94e+1 3.92e+1 40.7 3.99e+1 3.97e+1
## 7 Europe Belgium 0 0 0 0 0 0 0
## 8 Europe Czechia 0.22 2.21e-1 2.37e-1 2.57e-1 0.27 2.76e-1 2.51e-1
## 9 Europe France 3.47 3.80e+0 3.56e+0 3.46e+0 3.39 3.03e+0 2.65e+0
## 10 Europe Germany 5.52 4.62e+0 4.40e+0 4.17e+0 3.95 3.92e+0 3.82e+0
## # ℹ 48 more rows
## # ℹ 26 more variables: X1997 <dbl>, X1998 <dbl>, X1999 <dbl>, X2000 <dbl>,
## # X2001 <dbl>, X2002 <dbl>, X2003 <dbl>, X2004 <dbl>, X2005 <dbl>,
## # X2006 <dbl>, X2007 <dbl>, X2008 <dbl>, X2009 <dbl>, X2010 <dbl>,
## # X2011 <dbl>, X2012 <dbl>, X2013 <dbl>, X2014 <dbl>, X2015 <dbl>,
## # X2016 <dbl>, X2017 <dbl>, X2018 <dbl>, X2019 <dbl>, X2020 <dbl>,
## # X2019...2020.... <chr>, X2000...2020....year. <chr>
First, we need to transform this wide format data into a long format one. We do this using the gather() function and storing the new data frame in a new variable.
tidy_crude_oil <-
crude_oil %>%
gather(X1990:X2020, key = "year", value = "production")
tidy_crude_oil
## # A tibble: 1,798 × 6
## Region Country X2019...2020.... X2000...2020....year. year production
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 World World -6.122764325 0.703700339 X1990 3176.
## 2 OECD OECD -1.702930893 1.125223918 X1990 895.
## 3 G7 G7 -3.734706629 2.635709966 X1990 611.
## 4 BRICS BRICS -3.914874381 2.151645326 X1990 734.
## 5 Europe Europe 8.262357189 -3.179398944 X1990 220.
## 6 Europe European Union 5.358650162 -2.492301259 X1990 40.6
## 7 Europe Belgium 0 - X1990 0
## 8 Europe Czechia -41.71779141 -6.344444088 X1990 0.22
## 9 Europe France -5.450733753 -3.620793991 X1990 3.47
## 10 Europe Germany -3.036871148 -1.512489462 X1990 5.52
## # ℹ 1,788 more rows
We also need to rename and rearrange the columns so it becomes easier to read and understand. We do this using the rename() and relocate() function. We will also convert the last two columns to numeric values so we can conclude some analysis on them.
tidy_crude_oil <- rename(tidy_crude_oil, "X2019_vs_2020" = "X2019...2020...." , "X2000_vs_2020" = "X2000...2020....year.")
tidy_crude_oil <-
tidy_crude_oil %>% relocate("X2019_vs_2020", .after = "production")
tidy_crude_oil <-
tidy_crude_oil %>% relocate("X2000_vs_2020", .after = "X2019_vs_2020")
tidy_crude_oil$X2019_vs_2020 <- as.numeric(tidy_crude_oil$X2019_vs_2020)
tidy_crude_oil$X2000_vs_2020 <- as.numeric(tidy_crude_oil$X2000_vs_2020)
tidy_crude_oil
## # A tibble: 1,798 × 6
## Region Country year production X2019_vs_2020 X2000_vs_2020
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 World World X1990 3176. -6.12 0.704
## 2 OECD OECD X1990 895. -1.70 1.13
## 3 G7 G7 X1990 611. -3.73 2.64
## 4 BRICS BRICS X1990 734. -3.91 2.15
## 5 Europe Europe X1990 220. 8.26 -3.18
## 6 Europe European Union X1990 40.6 5.36 -2.49
## 7 Europe Belgium X1990 0 0 NA
## 8 Europe Czechia X1990 0.22 -41.7 -6.34
## 9 Europe France X1990 3.47 -5.45 -3.62
## 10 Europe Germany X1990 5.52 -3.04 -1.51
## # ℹ 1,788 more rows
We now have a clean data to which we can apply different analysis functions to understand better the answer to our main question.
The first insight we will try to compute is the average change in crude oil production for all regions between 2019 and 2020, and 2000 and 2020. To do this we will use the summarise() function.
tidy_crude_oil %>%
summarise(mean = mean(X2019_vs_2020, na.rm = TRUE))
## # A tibble: 1 × 1
## mean
## <dbl>
## 1 -5.87
tidy_crude_oil %>%
summarise(mean = mean(X2000_vs_2020, na.rm = TRUE))
## # A tibble: 1 × 1
## mean
## <dbl>
## 1 -0.962
What we can conclude from this, is that in general crude oil production has decreased 5.87% from 2019 to 2020, but has only decreased 0.96% in average in all regions since 2020. This does not represent the change that most nations are looking to see in crude oil production.
We now will look deeper into this data by grouping this data by production level.
tidy_crude_oil %>%
group_by(Region) %>%
summarise(mean = mean(X2019_vs_2020, na.rm = TRUE))
## # A tibble: 12 × 2
## Region mean
## <chr> <dbl>
## 1 Africa -8.16
## 2 America -4.08
## 3 Asia -12.3
## 4 BRICS -3.91
## 5 Europe -2.61
## 6 G7 -3.73
## 7 Latin America -10.9
## 8 Middle-East -8.65
## 9 North America -3.88
## 10 OECD -1.70
## 11 Pacific 6.46
## 12 World -6.12
tidy_crude_oil %>%
group_by(Region) %>%
summarise(mean = mean(X2000_vs_2020, na.rm = TRUE))
## # A tibble: 12 × 2
## Region mean
## <chr> <dbl>
## 1 Africa -1.29
## 2 America 1.57
## 3 Asia -1.85
## 4 BRICS 2.15
## 5 Europe -1.80
## 6 G7 2.64
## 7 Latin America -1.43
## 8 Middle-East 0.407
## 9 North America 3.64
## 10 OECD 1.13
## 11 Pacific -2.86
## 12 World 0.704
What this shows is that, similar to the global view we did first, the decrease in crude oil production is more prominent from 2019 and 2020, than from 2000 to 2020. We also can conclude that the regions with the biggest decrease in crude oil production since 2000 are Asia, Europe, and Latin America.
Finally, we will graph the data by Region to better understand the cange in crude oil production throughout the globe.
line_graph <- ggplot(tidy_crude_oil, aes(year, production, colour = Region)) + geom_point()
line_graph + labs(title = "Crude Oil Production by Region per Year", y="Production (in thousands of barrels)", x="Year") + ylim(0,2000) + guides(x=guide_axis(angle = 90))
In conclusion, we see that oil production has remained stable over the last 20 years. But we see a promising decline in the last few years that is expected to continue.