1.Synopsis

1.1 Problem Statement

This a study about the Imports of Crude oil by US over time. Sources Suggest that US imported 9.4 million barrels of petroleum from 88 countries in 2015. Imports and Exports play an important part in determining the economic growth of any country and crude oil forms one of the most important part of source for energy and hence it is an important aspect for any country. Through this study I have tried to find the trend of imports over time and effects it has prices the citizens have to pay.

1.2 Introduction to Dataset

This is a public dataset which contains the historical data about the U.S. Imports of Crude Oil from 1973-present. The dataset has Monthly report that provides national import data including quantity, price of unit for crude oil. It also measures the changes from the previous period. Statistics are also reported on a year-to-date basis. The data was downloaded from an Online Resource where we can find many national dataset regarding various economic and social aspects. Through this data the trend analysis of crude oil import is done over time using various data visualization techniques.

1.3 Insights

Some Interesting insights are that: The Quantity of Imports has been decreasing in the last five years after attaining peak in 2008.

The Change in price of a unit of crude oil decreases as the year ends that is from September to January and increases from April to July.

Looking at this pattern it can be said that for the month of November the Unit Price will be either the same as that of October or witness a very slight increase.

2. Packages Used

library(readr)       # To Read the Data table in R#
library(tidyverse)   # To display data as tibble, use ggplot and various over functions to clean the data # 
library(stringr)     # To use functions to split the strings and form separate columns from messy data #
library(DT)         #To view the data in a scrollable table#
library(purrr)      # To use map function#
library(varhandle)  #To unfactor a column to be numeric#
library(stats)      #To calculate Statistics#

3.Data Preparation

3.1 Data Source

The data source can be found here.The data is imported from this .txt file using read_table. The raw data has 286 observations and 13 variables. The data can be looked at in a neat form in this pdf. From this pdf and datasource we can get the entire description of each variable. The data was last updated in November as it has the data points till October 2016.

3.2 Variable Description

There are seven variables in the original dataset. They are as follows:

Month The month for which import of crude oil is provided

Quantity Quantity of crude oil imported (in thousands of barrel)

Change in Quantity The change in the import of crude oil from the previous period stated

Value Value of crude oil imported (in thousands of Dollars)

Change in Value The change in the amount spent on import of crude oil from the previous period stated

Unit Price Price of one unit barrel crude oil imported (in Dollars)

Change in Price The change in the price of one unit of crude oil compared to the previous period stated

3.3 Data Cleaning

The data was read from the text file and the imported dataset looked like the following.

url<-"http://www.census.gov/foreign-trade/statistics/historical/petr.txt"
dataset<-read_table(url, col_names = FALSE, skip = 12)
datatable(dataset)

This was a very messy data and the table was blocked with no “Year” variable and no uniformity in it. Also the table was spread so that that it had 13 variables instead of seven. Also the seventh column had the entries for two columns merged. In order to tidy the data I followed the following steps:

Deleting the unwanted rows

delrows<-function(df,rowdeleted){df[-seq(0,nrow(df), by=rowdeleted),]}
dataset1<-delrows(dataset,14)

Separating the merged column

temp <- data.frame(value= numeric(0), space = character(0), 
                month = character(0), stringsAsFactors = FALSE)
for (i in 1:nrow(dataset1)) 
{
  if (grepl("\\d", dataset1$X7[i]))
  {
    a<-unlist(strsplit(dataset1$X7[i], " "))
    
  }else{
    a<-c("NA","",dataset1$X7[i])
  }
  temp[i,]<-t(a)
}

Combining temp file to main dataset

dataset1<-dataset1[,-7]
colnames(dataset1)<-c("Month1", "Quantity1", "Change in Quantity1", "Value1", 
                      "Change in Value1", "Unit Price1", "Quantity2","Change in Quantity2",
                      "Value2", "Change in Value2", "Unit Price2","Change in Price2")

tillunitprice <- which(names(dataset1)=="Unit Price1")
dataset1<-data.frame(dataset1[1:tillunitprice],
                     "Change in Price1" = temp[,"value"],
                     "Month2" = temp[,"month"], 
                     dataset1[(tillunitprice+1):ncol(dataset1)], stringsAsFactors = FALSE)

dataset1<-delrows(dataset1,13)
datatable(dataset1)

Inserting the year column

y1<-seq(2016,1974,-2)
yearfunction<-function(x){
  rep(x,each = 12)
}

yeardataframe1<-as.data.frame(yearfunction(y1))
colnames(yeardataframe1)<-("Year")
dataset1<-data.frame("Year" = yeardataframe1[,"Year"],
                     dataset1[1:ncol(dataset1)], stringsAsFactors = FALSE)

y2<-seq(2015,1973,-2)
yeardataframe2<-as.data.frame(yearfunction(y2))
colnames(yeardataframe2)<-("Year")
tillchangeinprice <- which(names(dataset1)=="Change.in.Price1")
dataset1<-data.frame(dataset1[1:tillchangeinprice],
                     "Year2" = yeardataframe2[,"Year"],
                     dataset1[(tillchangeinprice+1):ncol(dataset1)], stringsAsFactors = FALSE)

datatable(dataset1)

Separating Files

c<-c(9:16)
part1<-dataset1[,-c]
c<-c(1:8)
part2<-dataset1[,-c]

colnames(part1)<-c("Year","Month", "Quantity", 
                   "Change in Quantity", "Value", 
                   "Change in Value", "Unit Price",
                   "Change in Price")
colnames(part2)<-c("Year","Month", "Quantity", 
                   "Change in Quantity", "Value", 
                   "Change in Value", "Unit Price",
                   "Change in Price")

Changing DataTypes, Merging and Sorting

part1$`Change in Price`<-as.numeric(part1$`Change in Price`)
finaldataset<-merge(part1,part2, all = TRUE)
finaldataset<-finaldataset[order(finaldataset$Year, 
                                 match(finaldataset$Month,
                                  month.name)), ]
c<-(527:528)
finaldataset<-finaldataset[-c,]

3.4 Final Dataset

After all the steps the data is now in usable format and this is what the final dataset looks like.

datatable(finaldataset)

3.5 Summary Of Dataset

Some of the important descriptions of the dataset can be found here.

str(finaldataset)
## 'data.frame':    526 obs. of  8 variables:
##  $ Year              : num  1973 1973 1973 1973 1973 ...
##  $ Month             : chr  "January" "February" "March" "April" ...
##  $ Quantity          : num  102750 95276 112053 98841 123102 ...
##  $ Change in Quantity: num  NA -7474 16776 -13212 24261 ...
##  $ Value             : num  282153 259675 316283 279958 357042 ...
##  $ Change in Value   : num  NA -22478 56608 -36325 77084 ...
##  $ Unit Price        : num  2.75 2.73 2.82 2.83 2.9 3.05 3.15 3.26 3.38 3.54 ...
##  $ Change in Price   : num  NA -0.02 0.09 0.01 0.07 0.15 0.1 0.11 0.12 0.16 ...
columns<-c(3:8)
map(finaldataset[,columns],mean, na.rm = TRUE)
## $Quantity
## [1] 215965.5
## 
## $`Change in Quantity`
## [1] 278.0571
## 
## $Value
## [1] 8106674
## 
## $`Change in Value`
## [1] 21570.43
## 
## $`Unit Price`
## [1] 33.89865
## 
## $`Change in Price`
## [1] 0.07097143
map(finaldataset[,columns],sd, na.rm = TRUE)
## $Quantity
## [1] 68378.2
## 
## $`Change in Quantity`
## [1] 23735.39
## 
## $Value
## [1] 8061893
## 
## $`Change in Value`
## [1] 1449285
## 
## $`Unit Price`
## [1] 27.68565
## 
## $`Change in Price`
## [1] 2.836669
map(finaldataset[,columns],quantile, na.rm = TRUE)
## $Quantity
##       0%      25%      50%      75%     100% 
##  61132.0 158898.8 217083.0 274258.2 344729.0 
## 
## $`Change in Quantity`
##     0%    25%    50%    75%   100% 
## -80824 -15837    805  15717  70909 
## 
## $Value
##       0%      25%      50%      75%     100% 
##   259675  2857979  4135165 10486362 42322157 
## 
## $`Change in Value`
##        0%       25%       50%       75%      100% 
## -12377435   -360055     56608    456380   7787417 
## 
## $`Unit Price`
##       0%      25%      50%      75%     100% 
##   2.7300  14.8475  23.5400  39.3800 124.5800 
## 
## $`Change in Price`
##     0%    25%    50%    75%   100% 
## -24.99  -0.52   0.09   0.96  11.19
sum(is.na(finaldataset))
## [1] 3
cor(finaldataset$Quantity, finaldataset$`Unit Price`, method = "pearson")
## [1] 0.4158794

As there are just three missing values in the very first observation we let them be as those values are not used anywhere in the exploratory analysis. It can be seen that there is very less correlation between Quantity Imported and its Price. This implies that the demand of oil is always high and not much can be commented about the price of a unit just by looking at the Quantity imported.

4. Exploratory Data Analysis

***Starting with the very basic analysis we can observe the trend that the value of the imports follows over the years in observations. For this an total for each year is plotted against the years. It is observed that the value of the imports is more or less same till 1999 after which it rises dramatically to reach peak in 2008(probably around the time of recession). In last five years it has continuously came down.

finaldataset %>% group_by(Year) %>% mutate(TotalValue = sum(Value, na.rm = TRUE)) %>% 
                 ggplot(aes(Year,TotalValue))+geom_point(color = "blue")+geom_line() + 
                 ggtitle("Total Value of Oil Imported By Year")

***There was an interesting pattern when the Price of a Unit was observed for individual months. To observe this pattern the Unit Prices for years 1990 onwards were plotted for each month and it was observed that the prices increase as the year progresses. The median for the months January to April is lower than that for months June to October. It is interesting to note that many of the years had low values for May as the median for that month is particularly low but the value again starts rising as the year goes to its end.

finaldataset$MonthNumber<-match(finaldataset$Month, month.name)
finaldataset$Month_Abb<-factor(month.abb[finaldataset$MonthNumber], levels = month.abb)

finaldataset %>% filter(Year %in% c(1990:2016)) %>% group_by(Month_Abb) %>% 
                ggplot(aes(x=Month_Abb,y = `Unit Price` ))+ 
                geom_boxplot(outlier.colour = "orange") +
                geom_jitter(alpha = .1, width = .25) + 
                ggtitle("Distribution of Unit Price by Month (1990 onwards)") + 
                labs(x = "Month", y = "Price of a Unit")

To further investigate this point a sample of observations consisting of only past ten years was considered. This time the change in price of a unit was chosen for analysis. And this chart provided useful results. It can be observed that the price change in January is always negative. This shows that the prices decrease from September of the previous year to February of the next year. But as the fiscal Year approaches its end the prices start increasing. This can be shown as the change in prices for March to May is mostly positive. June to August the prices are mostly increasing or constant. The Maximum change in Price over a month was spotted from October to November 2008.

finaldataset %>% filter(Year %in% c(2005:2016)) %>% 
                 ggplot(aes(x = Year, y = `Change in Price` )) +
                 geom_bar(stat = "identity" , color = "red") +
                 facet_wrap(~ Month_Abb, ncol = 4) + 
                 ggtitle("Change in Price/Unit Crude oil by Year")

Another Important Factor which indicated the economic stability of a country is the tax/surcharge collected by the goverment. A new variable is introduced here “Cost per Unit” which calculated the cost price of the crude oil as purchased by the government. The selling price is given in the data set. Using this cost price and selling price the tax collected by the government is calculated. The average tax per a single unit of crude oil per year is plotted against the year. It can be seen that the value lies between -2 cents to 2.7 cents.

finaldataset %>% mutate(CostPerUnit = Value/Quantity, 
                 Surcharge = `Unit Price` - CostPerUnit) %>% group_by(Year) %>%                                 mutate(AverageTax = mean(Surcharge, na.rm = TRUE)) %>%
                 ggplot(aes(Year, AverageTax)) + geom_bar(stat = "identity" , color="bisque")+                  ggtitle("Average Tax Collected over the Years on Crude Oil") + 
                 labs(y = "Average Tax", x = "Year" )

At Last, the observations with maximum value of Price for one unit of crude oil are mentioned below. Again we see that 2011/2012 registered the maximum price for one unit crude oil. Thus in a period of 43 years without considering the fluctuations the maximum price was observed in April 2012.

finaldataset %>% filter(Year %in% c(2011:2016)) %>% 
                 arrange(desc(`Unit Price`)) %>% 
                 top_n(10, `Unit Price`)

5.Summary Section

5.1 Addressing the Problem Statement

In this study about the US import of crude oil it can be seen that starting from the 1970s the import increased over time but from the last few years is falling which is good indication. This shows that the economy is not dependent on just the crude oil for energy requirements. Moreover it can be seen that the change in price per month is mostly negative from sep to January but rises towards the end of a Fiscal year and the beginning of another one. It can be predicted for the next two months that the Change in Unit price will be greater than zero and the price will increase as looking at the previous trend it can be seen that the change was slightly positive in October and once the change is positive in October it is not negative for November.

5.2 Conclusions

This study enhanced my understanding about a very interesting topic which is of global Importance apart from my expertise in using R. The important insight that are drawn above utilize the understanding of concepts taught in class and acquired through self learning.