With this project we will practice reading, cleaning, and exploring data, building data frames, pivot tables, and plots. We will use functions to perform repeatable tasks. We will fit a distribution to data. We will visualize results with plots using ggplot2 in this project to explore the data and gain further insight into the results of our analysis. We will summarize our findings in a report documented with an R markdown file and word or pdf output.
In this set we will build a data set using filters and if and diff statements. We will then answer some questions using plots and a pivot table report. We will then write a function to house our approach in case we would like to run the same analysis on other data sets.
Supply chain managers at our company continue to note we have a significant exposure to heating oil prices (Heating Oil No. 2, or HO2), specifically New York Harbor. The exposure hits the variable cost of producing several products. When HO2 is volatile, so is earnings. Our company has missed earnings forecasts for five straight quarters. To get a handle on Brent we download this data set and review some basic aspects of the prices.
Below is the top 5 rows from the dataframe of the Heating Oil
| DATE | DHOILNYH |
|---|---|
| 6/2/1986 | 0.402 |
| 6/3/1986 | 0.393 |
| 6/4/1986 | 0.378 |
| 6/5/1986 | 0.390 |
| 6/6/1986 | 0.385 |
| 6/9/1986 | 0.373 |
The data structure for the dataframe is shown below:
## 'data.frame': 7697 obs. of 2 variables:
## $ DATE : chr "6/2/1986" "6/3/1986" "6/4/1986" "6/5/1986" ...
## $ DHOILNYH: num 0.402 0.393 0.378 0.39 0.385 0.373 0.365 0.389 0.394 0.398 ...
|| || || || The distribution of the HO2 price is shown below using Boxplot 
The boxplot shows some un-usual pattern which can be considered as an oulier. This can also be as a process of un -usual events.
if and else statements to define a new column called direction. We will build a data frame to house this analysis.Below is a new structure of the new dataframe created with returns and direction and the price
## 'data.frame': 7696 obs. of 5 variables:
## $ date : Date, format: "1986-06-03" "1986-06-04" ...
## $ price : num 0.393 0.378 0.39 0.385 0.373 0.365 0.389 0.394 0.398 0.379 ...
## $ return : num -2.26 -3.89 3.13 -1.29 -3.17 ...
## $ size : num 2.26 3.89 3.13 1.29 3.17 ...
## $ direction: Factor w/ 3 levels "down","same",..: 1 1 3 1 1 1 3 3 3 1 ...
The datasets has a total observation of 7696.
Below is the plot of the returns using the ggplot2 packages

There seams to be be high flunctuation at every decades with a very high flunctuation around 1990 and 2000. The lowest returns happens around the year 2000.
Below is a bar graph of the absolute value of price rates using ggplot2

There are significant changes that occur in 1990 and round year 2000. The economic events that occur at trhis period need to further explore to understand what factors responsible for this huge changes.
Overlaying return on size

Summary statistics
| mean | std_dev | median | skewness | kurtosis |
|---|---|---|---|---|
| 0.0179 | 2.5236 | 0 | -1.4353 | 38.2595 |
Creating a Pivot Table pivot size and return on direction to answer the following question: * What is the average and range of returns by direction? * How often might we view positive or negative movements in HO2?
Below is the pivot table generated from the code execution output:
##
## FALSE TRUE
## 4039 3657
##
## FALSE TRUE
## 3936 3760
##
## down same up
## 3657 279 3760
##
## FALSE TRUE
## 7417 279
| direction | return.avg | return.sd | quantile.5 | quantile.95 | percent |
|---|---|---|---|---|---|
| down | -1.77 | 1.99 | -4.78 | -0.19 | 47.52 |
| same | 0.00 | 0.00 | 0.00 | 0.00 | 3.63 |
| up | 1.76 | 1.75 | 0.18 | 4.82 | 48.86 |
We will use the data from Part 1 to investigate the distribution of returns we generated. This will entail fitting the data to some parametric distributions as well as
We want to further characterize the distribution of up and down movements visually. Also we would like to repeat the analysis periodically for inclusion in management reports.
HO2.df data frame with ggplot2 and the cumulative relative frequency function stat_ecdf.
data_moments. Name this new function HO2_movement().Test HO2_movement() with data and display results in a table with 2 decimal places.
| direction | return.avg | return.sd | quantile.5 | quantile.95 | percent |
|---|---|---|---|---|---|
| down | -1.77 | 1.99 | -4.78 | -0.19 | 47.52 |
| same | 0.00 | 0.00 | 0.00 | 0.00 | 3.63 |
| up | 1.76 | 1.75 | 0.18 | 4.82 | 48.86 |
Morale: more work today (build the function) means less work tomorrow (write yet another report).
MASS package’s fitdistr() function to find the optimal fit of the HO2 data to a parametric distribution.## 'data.frame': 7696 obs. of 5 variables:
## $ date : Date, format: "1986-06-03" "1986-06-04" ...
## $ price : num 0.393 0.378 0.39 0.385 0.373 0.365 0.389 0.394 0.398 0.379 ...
## $ return : num -2.26 -3.89 3.13 -1.29 -3.17 ...
## $ size : num 2.26 3.89 3.13 1.29 3.17 ...
## $ direction: Factor w/ 3 levels "down","same",..: 1 1 3 1 1 1 3 3 3 1 ...
## shape rate
## 1.30753665 0.74299635
## (0.02716171) (0.01872184)
## m s df
## -1.30565487 0.91307703 2.50894659
## ( 0.02170850) ( 0.02061868) ( 0.12442996)
## shape rate
## 1.31056202 0.73969342
## (0.02761041) (0.01889467)
The time series plot shows lots of return clustering and spikes, especially negative ones. The lots of return volatility in the magnitude alone correlate with financial innovations from the ’80s and ’90s, as well as Gulf 1, Gulf 2, Great Recession, and 2011 events. Negative skew means there are more observations less than the median. High kurtosis means a very heavy tail, especially in negative returns.
There is a significant change in 1991 and year 2000. These can also be attributed The Gulf War is waged in the Middle East, by a U.N.-authorized coalition force from thirty-four nations, led by the U.S. and United Kingdom, against Iraq. 1991 — The World Wide Web publicly debuts as an Internet service.
NY Harbor Heating Oil has shown historical volatility and, as fuel for burning in furnaces and boilers in buildings, accounts for 25% the yield of a barrel of crude oil, the second largest cut after gasoline. The hard-to-predict futures requires a management plan with financial analytics to evaluate NYHH02, Brent crude and other oil performances on a regular basis. Management should prepare for the strong possibility that bad tail events frequently happen and might happen again. Policy advise: establish a decision process for approvals when the input prices are higher than the tolerable rate. If we are buying, we would experience strong swings in cost and input product utilization at the procurement end of the chain.
From the historical data, we can conclude that event like the gulf war, war in iraq and the financial crisis in 2009 has significant impact in the HO2.
Its is highly violatile and sensitive to housing price based on the activities that happened in 2009.