Is Crude Oil Production Increasing or Decreasing?

For a long time countries have upped their strategies to reduce CO2 emissions in an effort to combat climate change. And it is well-known that one of the main causes of the elevated amount of emissions we’ve seen in recent decades is due to fossil fuel use. What we will look into today is whether the production of crude oil has actually decreased in recent years.

Let’s import the data

We will be referencing the Energy Statistical Yearbook of 2021, the most recent data we have on the subject as of today. We will import this data and store it as a tibble.

crude_oil <- read.csv("crude_oil_production.csv")
crude_oil <- as_tibble(crude_oil)
crude_oil
## # A tibble: 58 × 35
##    Region Country          X1990   X1991   X1992   X1993   X1994   X1995   X1996
##    <chr>  <chr>            <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 World  World          3176.   3.18e+3 3.22e+3 3.24e+3 3273.   3.32e+3 3.42e+3
##  2 OECD   OECD            895.   9.14e+2 9.29e+2 9.36e+2  971.   9.82e+2 1.01e+3
##  3 G7     G7              611.   6.10e+2 6.12e+2 6.11e+2  633.   6.37e+2 6.39e+2
##  4 BRICS  BRICS           734.   6.80e+2 6.08e+2 5.68e+2  540.   5.37e+2 5.47e+2
##  5 Europe Europe          220.   2.31e+2 2.47e+2 2.60e+2  302.   3.14e+2 3.32e+2
##  6 Europe European Union   40.6  3.96e+1 3.94e+1 3.92e+1   40.7  3.99e+1 3.97e+1
##  7 Europe Belgium           0    0       0       0          0    0       0      
##  8 Europe Czechia           0.22 2.21e-1 2.37e-1 2.57e-1    0.27 2.76e-1 2.51e-1
##  9 Europe France            3.47 3.80e+0 3.56e+0 3.46e+0    3.39 3.03e+0 2.65e+0
## 10 Europe Germany           5.52 4.62e+0 4.40e+0 4.17e+0    3.95 3.92e+0 3.82e+0
## # ℹ 48 more rows
## # ℹ 26 more variables: X1997 <dbl>, X1998 <dbl>, X1999 <dbl>, X2000 <dbl>,
## #   X2001 <dbl>, X2002 <dbl>, X2003 <dbl>, X2004 <dbl>, X2005 <dbl>,
## #   X2006 <dbl>, X2007 <dbl>, X2008 <dbl>, X2009 <dbl>, X2010 <dbl>,
## #   X2011 <dbl>, X2012 <dbl>, X2013 <dbl>, X2014 <dbl>, X2015 <dbl>,
## #   X2016 <dbl>, X2017 <dbl>, X2018 <dbl>, X2019 <dbl>, X2020 <dbl>,
## #   X2019...2020.... <chr>, X2000...2020....year. <chr>

Let’s tidy up the data

First, we need to transform this wide format data into a long format one. We do this using the gather() function and storing the new data frame in a new variable.

tidy_crude_oil <- 
crude_oil %>%
gather(X1990:X2020, key = "year", value = "production")
tidy_crude_oil
## # A tibble: 1,798 × 6
##    Region Country        X2019...2020.... X2000...2020....year. year  production
##    <chr>  <chr>          <chr>            <chr>                 <chr>      <dbl>
##  1 World  World          -6.122764325     0.703700339           X1990    3176.  
##  2 OECD   OECD           -1.702930893     1.125223918           X1990     895.  
##  3 G7     G7             -3.734706629     2.635709966           X1990     611.  
##  4 BRICS  BRICS          -3.914874381     2.151645326           X1990     734.  
##  5 Europe Europe         8.262357189      -3.179398944          X1990     220.  
##  6 Europe European Union 5.358650162      -2.492301259          X1990      40.6 
##  7 Europe Belgium        0                -                     X1990       0   
##  8 Europe Czechia        -41.71779141     -6.344444088          X1990       0.22
##  9 Europe France         -5.450733753     -3.620793991          X1990       3.47
## 10 Europe Germany        -3.036871148     -1.512489462          X1990       5.52
## # ℹ 1,788 more rows

We also need to rename and rearrange the columns so it becomes easier to read and understand. We do this using the rename() and relocate() function. We will also convert the last two columns to numeric values so we can conclude some analysis on them.

tidy_crude_oil <- rename(tidy_crude_oil, "X2019_vs_2020" = "X2019...2020...." , "X2000_vs_2020" = "X2000...2020....year.")
tidy_crude_oil <- 
tidy_crude_oil %>% relocate("X2019_vs_2020", .after = "production")
tidy_crude_oil <- 
tidy_crude_oil %>% relocate("X2000_vs_2020", .after = "X2019_vs_2020")
tidy_crude_oil$X2019_vs_2020 <- as.numeric(tidy_crude_oil$X2019_vs_2020)
tidy_crude_oil$X2000_vs_2020 <- as.numeric(tidy_crude_oil$X2000_vs_2020)
tidy_crude_oil
## # A tibble: 1,798 × 6
##    Region Country        year  production X2019_vs_2020 X2000_vs_2020
##    <chr>  <chr>          <chr>      <dbl>         <dbl>         <dbl>
##  1 World  World          X1990    3176.           -6.12         0.704
##  2 OECD   OECD           X1990     895.           -1.70         1.13 
##  3 G7     G7             X1990     611.           -3.73         2.64 
##  4 BRICS  BRICS          X1990     734.           -3.91         2.15 
##  5 Europe Europe         X1990     220.            8.26        -3.18 
##  6 Europe European Union X1990      40.6           5.36        -2.49 
##  7 Europe Belgium        X1990       0             0           NA    
##  8 Europe Czechia        X1990       0.22        -41.7         -6.34 
##  9 Europe France         X1990       3.47         -5.45        -3.62 
## 10 Europe Germany        X1990       5.52         -3.04        -1.51 
## # ℹ 1,788 more rows

We now have a clean data to which we can apply different analysis functions to understand better the answer to our main question.

Data Analysis

The first insight we will try to compute is the average change in crude oil production for all regions between 2019 and 2020, and 2000 and 2020. To do this we will use the summarise() function.

tidy_crude_oil %>%
  summarise(mean = mean(X2019_vs_2020, na.rm = TRUE))
## # A tibble: 1 × 1
##    mean
##   <dbl>
## 1 -5.87
tidy_crude_oil %>%
  summarise(mean = mean(X2000_vs_2020, na.rm = TRUE))
## # A tibble: 1 × 1
##     mean
##    <dbl>
## 1 -0.962

What we can conclude from this, is that in general crude oil production has decreased 5.87% from 2019 to 2020, but has only decreased 0.96% in average in all regions since 2020. This does not represent the change that most nations are looking to see in crude oil production.

We now will look deeper into this data by grouping this data by production level.

tidy_crude_oil %>%
  group_by(Region) %>%
    summarise(mean = mean(X2019_vs_2020, na.rm = TRUE))
## # A tibble: 12 × 2
##    Region          mean
##    <chr>          <dbl>
##  1 Africa         -8.16
##  2 America        -4.08
##  3 Asia          -12.3 
##  4 BRICS          -3.91
##  5 Europe         -2.61
##  6 G7             -3.73
##  7 Latin America -10.9 
##  8 Middle-East    -8.65
##  9 North America  -3.88
## 10 OECD           -1.70
## 11 Pacific         6.46
## 12 World          -6.12
tidy_crude_oil %>%
  group_by(Region) %>%
    summarise(mean = mean(X2000_vs_2020, na.rm = TRUE))
## # A tibble: 12 × 2
##    Region          mean
##    <chr>          <dbl>
##  1 Africa        -1.29 
##  2 America        1.57 
##  3 Asia          -1.85 
##  4 BRICS          2.15 
##  5 Europe        -1.80 
##  6 G7             2.64 
##  7 Latin America -1.43 
##  8 Middle-East    0.407
##  9 North America  3.64 
## 10 OECD           1.13 
## 11 Pacific       -2.86 
## 12 World          0.704

What this shows is that, similar to the global view we did first, the decrease in crude oil production is more prominent from 2019 and 2020, than from 2000 to 2020. We also can conclude that the regions with the biggest decrease in crude oil production since 2000 are Asia, Europe, and Latin America.

Visualizing the data

Finally, we will graph the data by Region to better understand the cange in crude oil production throughout the globe.

line_graph <- ggplot(tidy_crude_oil, aes(year, production, colour = Region)) + geom_point()
line_graph + labs(title = "Crude Oil Production by Region per Year", y="Production (in thousands of barrels)", x="Year") + ylim(0,2000) + guides(x=guide_axis(angle = 90))

In conclusion, we see that oil production has remained stable over the last 20 years. But we see a promising decline in the last few years that is expected to continue.