Purpose, Process, Product

With this project we will practice reading, cleaning, and exploring data, building data frames, pivot tables, and plots. We will use functions to perform repeatable tasks. We will fit a distribution to data. We will visualize results with plots using ggplot2 in this project to explore the data and gain further insight into the results of our analysis. We will summarize our findings in a report documented with an R markdown file and word or pdf output.

Part 1

Data Preparation and Exploration

In this set we will build a data set using filters and if and diff statements. We will then answer some questions using plots and a pivot table report. We will then write a function to house our approach in case we would like to run the same analysis on other data sets.

Problem

Supply chain managers at our company continue to note we have a significant exposure to heating oil prices (Heating Oil No. 2, or HO2), specifically New York Harbor. The exposure hits the variable cost of producing several products. When HO2 is volatile, so is earnings. Our company has missed earnings forecasts for five straight quarters. To get a handle on Brent we download this data set and review some basic aspects of the prices.

Below is the top 5 rows from the dataframe of the Heating Oil

DATE DHOILNYH
6/2/1986 0.402
6/3/1986 0.393
6/4/1986 0.378
6/5/1986 0.390
6/6/1986 0.385
6/9/1986 0.373

The data structure for the dataframe is shown below:

## 'data.frame':    7697 obs. of  2 variables:
##  $ DATE    : chr  "6/2/1986" "6/3/1986" "6/4/1986" "6/5/1986" ...
##  $ DHOILNYH: num  0.402 0.393 0.378 0.39 0.385 0.373 0.365 0.389 0.394 0.398 ...

|| || || || The distribution of the HO2 price is shown below using Boxplot

The boxplot shows some un-usual pattern which can be considered as an oulier. This can also be as a process of un -usual events.

Questions

  1. What is the nature of HO2 returns? We want to reflect the ups and downs of price movements, something of prime interest to management. First, we calculate percentage changes as log returns. Our interest is in the ups and downs. To look at that we use if and else statements to define a new column called direction. We will build a data frame to house this analysis.

Below is a new structure of the new dataframe created with returns and direction and the price

## 'data.frame':    7696 obs. of  5 variables:
##  $ date     : Date, format: "1986-06-03" "1986-06-04" ...
##  $ price    : num  0.393 0.378 0.39 0.385 0.373 0.365 0.389 0.394 0.398 0.379 ...
##  $ return   : num  -2.26 -3.89 3.13 -1.29 -3.17 ...
##  $ size     : num  2.26 3.89 3.13 1.29 3.17 ...
##  $ direction: Factor w/ 3 levels "down","same",..: 1 1 3 1 1 1 3 3 3 1 ...

The datasets has a total observation of 7696.

Below is the plot of the returns using the ggplot2 packages

There seams to be be high flunctuation at every decades with a very high flunctuation around 1990 and 2000. The lowest returns happens around the year 2000.

Below is a bar graph of the absolute value of price rates using ggplot2

There are significant changes that occur in 1990 and round year 2000. The economic events that occur at trhis period need to further explore to understand what factors responsible for this huge changes.

Overlaying return on size

Summary statistics

mean std_dev median skewness kurtosis
0.0179 2.5236 0 -1.4353 38.2595

Creating a Pivot Table pivot size and return on direction to answer the following question: * What is the average and range of returns by direction? * How often might we view positive or negative movements in HO2?

Below is the pivot table generated from the code execution output:

## 
## FALSE  TRUE 
##  4039  3657
## 
## FALSE  TRUE 
##  3936  3760
## 
## down same   up 
## 3657  279 3760
## 
## FALSE  TRUE 
##  7417   279
direction return.avg return.sd quantile.5 quantile.95 percent
down -1.77 1.99 -4.78 -0.19 47.52
same 0.00 0.00 0.00 0.00 3.63
up 1.76 1.75 0.18 4.82 48.86

Part 2

We will use the data from Part 1 to investigate the distribution of returns we generated. This will entail fitting the data to some parametric distributions as well as

Problem

We want to further characterize the distribution of up and down movements visually. Also we would like to repeat the analysis periodically for inclusion in management reports.

Questions

  1. How can we show the differences in the shape of ups and downs in HO2, especially given our tolerance for risk? Let’s use the HO2.df data frame with ggplot2 and the cumulative relative frequency function stat_ecdf.

  1. How can we regularly, and reliably, analyze HO2 price movements? For this requirement, let’s write a function similar to data_moments. Name this new function HO2_movement().

Test HO2_movement() with data and display results in a table with 2 decimal places.

direction return.avg return.sd quantile.5 quantile.95 percent
down -1.77 1.99 -4.78 -0.19 47.52
same 0.00 0.00 0.00 0.00 3.63
up 1.76 1.75 0.18 4.82 48.86

Morale: more work today (build the function) means less work tomorrow (write yet another report).

  1. Suppose we wanted to simulate future movements in HO2 returns. What distribution might we use to run those scenarios? Here, let’s use the MASS package’s fitdistr() function to find the optimal fit of the HO2 data to a parametric distribution.
## 'data.frame':    7696 obs. of  5 variables:
##  $ date     : Date, format: "1986-06-03" "1986-06-04" ...
##  $ price    : num  0.393 0.378 0.39 0.385 0.373 0.365 0.389 0.394 0.398 0.379 ...
##  $ return   : num  -2.26 -3.89 3.13 -1.29 -3.17 ...
##  $ size     : num  2.26 3.89 3.13 1.29 3.17 ...
##  $ direction: Factor w/ 3 levels "down","same",..: 1 1 3 1 1 1 3 3 3 1 ...
##      shape         rate   
##   1.30753665   0.74299635 
##  (0.02716171) (0.01872184)
##         m             s            df     
##   -1.30565487    0.91307703    2.50894659 
##  ( 0.02170850) ( 0.02061868) ( 0.12442996)
##      shape         rate   
##   1.31056202   0.73969342 
##  (0.02761041) (0.01889467)

Summary

Skills used

  1. ggplot2: This package is used for data visualization
  2. Moment: Moment is used for statistical summary like standard deviation, kutorsis, skewness etc
  3. dplyr: For dataframe manipulation and computation in other to get the required format.
  4. MASS: For optimization function.

Data Insights

The time series plot shows lots of return clustering and spikes, especially negative ones. The lots of return volatility in the magnitude alone correlate with financial innovations from the ’80s and ’90s, as well as Gulf 1, Gulf 2, Great Recession, and 2011 events. Negative skew means there are more observations less than the median. High kurtosis means a very heavy tail, especially in negative returns.

There is a significant change in 1991 and year 2000. These can also be attributed The Gulf War is waged in the Middle East, by a U.N.-authorized coalition force from thirty-four nations, led by the U.S. and United Kingdom, against Iraq. 1991 — The World Wide Web publicly debuts as an Internet service.

Concluding Remarks

NY Harbor Heating Oil has shown historical volatility and, as fuel for burning in furnaces and boilers in buildings, accounts for 25% the yield of a barrel of crude oil, the second largest cut after gasoline. The hard-to-predict futures requires a management plan with financial analytics to evaluate NYHH02, Brent crude and other oil performances on a regular basis. Management should prepare for the strong possibility that bad tail events frequently happen and might happen again. Policy advise: establish a decision process for approvals when the input prices are higher than the tolerable rate. If we are buying, we would experience strong swings in cost and input product utilization at the procurement end of the chain.

From the historical data, we can conclude that event like the gulf war, war in iraq and the financial crisis in 2009 has significant impact in the HO2.

Its is highly violatile and sensitive to housing price based on the activities that happened in 2009.