Introduction

In spite of global efforts to transition towards sustainable energy solutions, crude oil retains its significance as a primary energy source in numerous nations and sectors.

Through refinement processes, crude oil yields a range of products including gasoline, diesel, kerosene, and heavy fuel oil, which are essential for powering transportation vehicles, aircraft, maritime vessels, machinery, and providing heating in select regions.

Notably, the United States occupies a prominent position as both the largest producer and consumer of crude oil on the global stage. This report will undertake an examination of crude oil imports specifically within the context of the United States.

Key Findings

Analysis

Import Data

oildf <- read.csv("~/Case Studies/US Oil Imports Case Study/US Oil Imports/data.csv")

Import Libraries

library('stringr')
library('dplyr')
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library('ggplot2')
library('patchwork')

Create New Data Frame

# create a new data frame containing only rows where origintype name is 'region'
regiondf <- subset(oildf, originTypeName == "Region")

See Structure of Data

str(regiondf)
## 'data.frame':    124589 obs. of  8 variables:
##  $ year               : int  2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
##  $ month              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ originName         : chr  "Africa" "Africa" "Africa" "Africa" ...
##  $ originTypeName     : chr  "Region" "Region" "Region" "Region" ...
##  $ destinationName    : chr  "SHELL OIL PRODUCTS US / ST ROSE / LA" "SHELL OIL PRODUCTS US / ST ROSE / LA" "OIL TANKING PL INC / HOUSTON (GULF) / TX" "EXXONMOBIL REFINING & SPLY CO / BEAUMONT / TX" ...
##  $ destinationTypeName: chr  "Refinery" "Refinery" "Refinery" "Refinery" ...
##  $ gradeName          : chr  "Light Sweet" "Medium" "Light Sweet" "Light Sweet" ...
##  $ quantity           : int  665 2417 381 1501 474 1024 4249 1617 1076 274 ...

Data Cleaning

Clean Column Names

# insert '_' to column names 
newcolnames <- gsub("(?<=[a-z])(?=[A-Z])", "_", colnames(regiondf), perl = TRUE)
colnames(regiondf) <- newcolnames
# change all column names to lower case 
colnames(regiondf) <- tolower(colnames(regiondf))

Clean Table Entries

# capitalize only the first letter of each word in destination name 
regiondf$destination_name <- str_to_title(regiondf$destination_name)
# remove trailing spaces 
regiondf <- regiondf %>%  
    mutate_at(vars(origin_name, origin_type_name, destination_name, destination_type_name, grade_name), trimws)
# convert month column from numeric month values to character representations 
regiondf$month <- month.name[regiondf$month]

Check for Duplicates

sum(duplicated(regiondf))
## [1] 0

Check for NAs

sum(is.na(regiondf))
## [1] 0

Plots and Summary Tables

Total Yearly Oil Import Quantities across 2009 to 2024

yearly <- aggregate(quantity ~ year, regiondf, FUN = sum)
ggplot(yearly, aes(x = year, y = quantity)) +
    geom_line(color = "grey25") + 
    geom_point(aes(color = factor(year)), show.legend = FALSE) +
    labs(x = "Year", y = "Quantity ('000 barrels)") +
    theme(plot.background = element_rect(fill = "ivory"),
          panel.background = element_rect(fill = "ivory2"),
          text = element_text(family = "Helvetica", size = 10),
          axis.text = element_text(family = "Helvetica", size = 10),
          axis.title = element_text(family = "Helvetica", size = 10))

Subset of Dataframe Excluding 2024

# drop 2024 as data is incomplete
regiondf2 <- subset(regiondf, year != 2024)

Summary of Oil Imports into The US Each Year

yeardf <- split(regiondf2$quantity, regiondf2$year)
lapply(yeardf, summary)
## $`2009`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2     423     878    2360    1994   92737 
## 
## $`2010`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     435     920    2496    2140   85796 
## 
## $`2011`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     400     873    2385    1983   81939 
## 
## $`2012`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     301     663    2254    1730   76845 
## 
## $`2013`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     276     639    2162    1691   71053 
## 
## $`2014`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     277     648    2231    1762   69005 
## 
## $`2015`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     309     709    2262    1835   69488 
## 
## $`2016`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     348     813    2427    2056   71114 
## 
## $`2017`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     347     866    2511    2184   76280 
## 
## $`2018`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0   346.5   796.0  2432.3  2049.0 89584.0 
## 
## $`2019`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     349     822    2255    2001   87337 
## 
## $`2020`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     348     723    2189    1768   82885 
## 
## $`2021`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     399     804    2259    1874   92380 
## 
## $`2022`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2     397     862    2464    2064   92573 
## 
## $`2023`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     391     907    2584    2248   93508

Average Monthly Oil Import Quantities across 2009 to 2023

monthorder <- c("January", "February", "March", "April", "May", "June", "July",
                "August", "September", "October", "November", "December")
regiondf2$month <- factor(regiondf2$month, levels = monthorder)
monthly <- aggregate(quantity ~ month, regiondf2, function(x) round(mean(x), 2))
ggplot(monthly, aes(x = month, y = quantity)) +
    geom_line(group = 1, color = "grey25") +
    geom_point(aes(color = factor(month)), show.legend = FALSE) +
    labs(x = "Month", y = "Average Quantity ('000 barrels)") + 
    theme(plot.background = element_rect(fill = "ivory"),
          panel.background = element_rect(fill = "ivory2"),
          text = element_text(family = "Helvetica", size = 10),
          axis.text = element_text(family = "Helvetica", size = 10),
          axis.text.x = element_text(angle = 45, hjust = 1),
          axis.title = element_text(family = "Helvetica", size = 10))

Top 3 Overall and Yearly Oil Export Regions to The US

# top 3 oil export regions to the US of all time
t3rat <- regiondf2 %>% 
    group_by(origin_name) %>%  
    summarise(total_quantity = sum(quantity)) %>%  
    top_n(3, total_quantity)
origincolour <- c("thistle2", "cadetblue1", "hotpink1")
p1 <- ggplot(t3rat, aes(x = origin_name, y = total_quantity, fill = origincolour)) +
    geom_bar(stat = "identity", colour = "grey23", size = 0.2) + 
    scale_fill_manual(values = origincolour) +
    labs(x = "Origin", y = "Quantity ('000 barrels)", fill = "Origin Name") + 
    theme(plot.background = element_rect(fill = "ivory"),
          panel.background = element_rect(fill = "ivory2"),
          legend.background = element_rect(fill = "ivory"),
          text = element_text("Helvetica", size = 10), 
          axis.text = element_text("Helvetica", size = 10),
          axis.title = element_text(family = "Helvetica", size = 10))

# top 3 oil export regions to the US each year 
t3ry <- regiondf2 %>%  
    group_by(year, origin_name) %>%  
    summarise(total_quantity = sum(quantity)) %>%  
    top_n(3, total_quantity) %>%  
    arrange(year, desc(total_quantity))
origincolour2 <- c("lemonchiffon","hotpink1","thistle2", "cadetblue1")
p2 <- ggplot(t3ry, aes(x = year, y = total_quantity, fill = origin_name)) + 
    scale_fill_manual(values = origincolour2) +
    geom_bar(stat = "identity", color = "grey23", size = 0.2) +
    labs(x = "Year", y = "Quantity ('000 barrels)", fill = "Origin Name") +
    theme(plot.background = element_rect(fill = "ivory"),
          panel.background = element_rect(fill = "ivory2"),
          legend.background = element_rect(fill = "ivory"),
          text = element_text(family = "Helvetica", size = 10), 
          axis.text = element_text(family = "Helvetica", size = 10),
          axis.title = element_text(family = "Helvetica", size = 10))

combined1 <- p1 + p2 + plot_layout(ncol = 2)

combined1

Top 5 Overall and Yearly Oil Destinations in The US

# top 5 oil destination in the US of all time
t5dat <- regiondf2 %>%  
    group_by(destination_name) %>%  
    summarise(total_quantity = sum(quantity)) %>%  
    top_n(5, total_quantity) %>%  
    arrange(desc(total_quantity))
destinationcolours <- c("violetred2", "lavenderblush1", "darkseagreen1", "darkslategray2", "orchid")
p3 <- ggplot(t5dat, aes(x = destination_name, y = total_quantity, fill = destination_name)) +
    scale_fill_manual(values = destinationcolours) +
    geom_bar(stat = "identity", color = "grey23", size = 0.2) +
    labs(x = "Destination Name",y = "Quantity ('000 barrels)", fill = "Destination Name") +
    theme(plot.background = element_rect(fill = "ivory"),
          panel.background = element_rect(fill = "ivory2"),
          legend.background = element_rect(fill = "ivory"),
          text = element_text(family = "Helvetica", size = 10),
          axis.text = element_text(family = "Helvetica", size = 10), 
          axis.text.x = element_text(angle = 25, hjust = 1),
          axis.title = element_text(family = "Helvetica", size = 10))

# top 5 oil destination in the US each year
t5dy <- regiondf2 %>%  
    group_by(year, destination_name) %>%  
    summarise(total_quantity = sum(quantity)) %>%  
    top_n(5, total_quantity) %>%  
    arrange(year, desc(total_quantity))
destinationcolours2 <- c("lightgoldenrod","lightcoral","violetred2", "lavenderblush1", "darkseagreen1", "darkslategray2", "orchid")
p4 <- ggplot(t5dy, aes(x = year, y = total_quantity, fill = destination_name)) +
    geom_bar(stat = "identity", color = "grey23", size = 0.2) +
    scale_fill_manual(values = destinationcolours2) +
    labs(x = "Year", y = "Quantity ('000 barrels)", fill = "Destination Name") +
    theme(plot.background = element_rect(fill = "ivory"),
          panel.background = element_rect(fill = "ivory2"),
          legend.background = element_rect(fill = "ivory"),
          text = element_text(family = "Helvetica", size = 10),
          axis.text = element_text(family = "Helvetica", size = 10),
          axis.title = element_text(family = "Helvetica", size = 10))

combined2 <- p3 + p4 + plot_layout(ncol = 2)

combined2

Total Oil Import Quantities for Each Oil Grade

# distribution of import quantities by oil grade
graderank <- regiondf2 %>%
    group_by(grade_name) %>%  
    summarise(total_quantity = sum(quantity)) %>%  
    arrange(desc(total_quantity))
gradecolours <- c("cyan2", "aquamarine1", "slategray2","thistle1", "palevioletred")
ggplot(graderank, aes(x = grade_name, y = total_quantity, fill = grade_name)) +
    geom_bar(stat = "identity", color = "grey23", size = 0.2, show.legend =  FALSE) +
    scale_fill_manual(values = gradecolours) + 
    labs(x = "Grade Name", y = "Quantity ('000 barrels)") +
    theme(plot.background = element_rect(fill = "ivory"),
          panel.background = element_rect(fill = "ivory2"),
          text = element_text(family = "Helvetica", size = 10), 
          axis.title = element_text(family = "Helvetica", size = 10),
          axis.text = element_text(family = "Helvetica", size = 10))

Distribution of Oil Grade Imported by the Top 5 Desitinations in The US

t5dtgat <- subset(regiondf2, destination_name %in% c("United States", "Padd3 (Gulf Coast)",
                                                     "Padd2 (Midwest)", "Texas", "Padd5 (West Coast)"))
t5dtgat <- t5dtgat %>% 
    group_by(destination_name, grade_name) %>%  
    summarise(total_quantity = sum(quantity)) %>% 
    arrange(desc(total_quantity))
ggplot(t5dtgat, aes(x = destination_name, y = total_quantity, fill = grade_name)) +
    geom_bar(stat = "identity", color = "grey23", size  = 0.2) + 
    scale_fill_manual(values = gradecolours) + 
    labs(x = "Destination Name", y = "Quantity ('000 barrels)", fill = "Grade") + 
    theme(plot.background = element_rect(fill = "ivory"),
          panel.background = element_rect(fill = "ivory2"),
          legend.background = element_rect(fill = 'ivory'),
          text = element_text(family = "Helvetica", size = 10),
          axis.title = element_text(family = "Helvetica", size = 10),
          axis.text = element_text(family = "Helvetica", size = 10),
          axis.text.x = element_text(angle = 25, hjust = 1))

Conclusion

In conclusion, the analysis elucidates the dynamics of the United States’ oil trade and consumption patterns. While crude oil remains a fundamental energy source, there is a discernible decline in imports, reflecting global shifts towards greener energy alternatives such as wind, solar, bioenergy, and hydroelectric power. The consistent seasonal fluctuation in import volumes underscores the necessity of considering temporal factors in energy planning and market analysis. Understanding the preferences for specific oil grades and distribution patterns within the United States offers critical insights for stakeholders to adapt to evolving market conditions and make informed decisions regarding energy security and sustainability. These findings provide a solid foundation for policymakers, industry participants, and investors to effectively navigate the intricate landscape of the global energy transition.

Credit

Data Source