Introduction

Natural gas price known as one of the more volatile commodity in the

market. This is surprising since the Natural gas market is local, and only

few parameters are supposed to influence its price. In this project I

selected few parameters that might influence Natural gas price and

analyse the relation between them.


‘Total storage’ is the variable that probably comprises the most influence

on Natural gas price. This variable summarizes the production and

consumption and published weekly by ‘Energy Information Administration’.


Weather or more specifically the ‘Temperature’, is another important

variable, that determines the consumption of Natural gas. Daily average

‘Temperatures’ are published daily by ‘NOAA’.


The energy market include many components that competing with one another,

here I choose the price of ‘Oil’ to represents the energy market. Oil

market differ in many aspects from the natural gas market, therefore is a

good candidate to be included in a model.


The general strength of the economy, another variable that might

influences consumption of natural gas. I chosen the ‘S&P’ index to

represent the economy strength.


========

Read the Data: S&P……, Natural Gas……. and OIL…….. Daily Price,

NG storage……, and Temp at New York……., Texas……, California……

and Wyoming…….

========

Univariate Plots Section

========

Data Structure


## [1] 5797   22
##  [1] "Date"          "Temp_CA"       "Temp_CA_B"     "diff_Temp_CA" 
##  [5] "Temp_WY"       "Temp_WY_B"     "diff_Temp_WY"  "Temp_TX"      
##  [9] "Temp_TX_B"     "diff_Temp_TX"  "Temp_NY"       "Temp_NY_B"    
## [13] "diff_Temp_NY"  "Temp_MN"       "Temp_MN_B"     "diff_Temp_MN" 
## [17] "Tot_storg"     "Tot_storg_B"   "diff_storg"    "NG_Price"     
## [21] "WTI_Price"     "S_P_Close_adj"
## [1] "Temp_CA_B" ":"         "Avg"       "Over"      "Under"    
## [1] "Temp_WY_B" ":"         "Avg"       "Over"      "Under"    
## [1] "Temp_TX_B" ":"         "Avg"       "Over"      "Under"    
## [1] "Temp_NY_B" ":"         "Avg"       "Over"      "Under"    
## [1] "Temp_MN_B" ":"         "Avg"       "Over"      "Under"    
## [1] "Tot_storg_B" ":"           "Avg"         "Over"        "Under"
##     Temp_CA       diff_Temp_CA        Temp_WY        diff_Temp_WY     
##  Min.   :42.00   Min.   :-17.333   Min.   :-15.00   Min.   :-42.2308  
##  1st Qu.:60.00   1st Qu.: -3.724   1st Qu.: 32.00   1st Qu.: -5.1538  
##  Median :67.00   Median : -0.400   Median : 45.00   Median :  0.6923  
##  Mean   :66.99   Mean   :  0.000   Mean   : 45.78   Mean   :  0.0000  
##  3rd Qu.:74.00   3rd Qu.:  3.400   3rd Qu.: 61.00   3rd Qu.:  6.1333  
##  Max.   :95.00   Max.   : 19.067   Max.   : 88.00   Max.   : 24.5333  
##  NA's   :450     NA's   :450       NA's   :820      NA's   :820       
##     Temp_TX       diff_Temp_TX         Temp_NY      diff_Temp_NY    
##  Min.   :13.00   Min.   :-29.3333   Min.   : 8.0   Min.   :-27.267  
##  1st Qu.:51.00   1st Qu.: -4.5333   1st Qu.:42.0   1st Qu.: -4.800  
##  Median :66.00   Median :  0.2308   Median :57.0   Median : -0.200  
##  Mean   :63.76   Mean   :  0.0000   Mean   :55.9   Mean   :  0.000  
##  3rd Qu.:78.00   3rd Qu.:  4.7333   3rd Qu.:71.0   3rd Qu.:  4.467  
##  Max.   :95.00   Max.   : 26.5333   Max.   :94.0   Max.   : 27.133  
##  NA's   :402     NA's   :402        NA's   :318    NA's   :318      
##     Temp_MN       diff_Temp_MN      Tot_storg      diff_storg       
##  Min.   :21.25   Min.   :-19.64   Min.   : 642   Min.   :-1134.917  
##  1st Qu.:45.75   1st Qu.: -2.25   1st Qu.:1796   1st Qu.: -254.188  
##  Median :57.25   Median :  0.10   Median :2500   Median :   15.706  
##  Mean   :57.68   Mean   :  0.00   Mean   :2433   Mean   :    2.974  
##  3rd Qu.:70.00   3rd Qu.:  2.55   3rd Qu.:3071   3rd Qu.:  286.938  
##  Max.   :84.50   Max.   : 14.03   Max.   :3929   Max.   : 1085.938  
##  NA's   :1024    NA's   :1024                                       
##     NG_Price        WTI_Price      S_P_Close_adj   
##  Min.   : 1.630   Min.   : 11.38   Min.   : 676.5  
##  1st Qu.: 3.370   1st Qu.: 31.08   1st Qu.:1121.2  
##  Median : 4.390   Median : 61.63   Median :1272.4  
##  Mean   : 4.942   Mean   : 61.82   Mean   :1296.8  
##  3rd Qu.: 6.180   3rd Qu.: 89.30   3rd Qu.:1418.2  
##  Max.   :18.480   Max.   :145.31   Max.   :2117.4  
##  NA's   :1732     NA's   :1720     NA's   :1711

50% of the time, Natural Gas price oscillates between ‘$3.37’ to ‘$6.18’.

This range represents 90% change in price. Similar parameter measure in the

total storage of Natural Gas, shows only 60% change. ‘Oil’, ‘S&P’ and

‘Temperature’ showing changes of 180%, 30% and 50% respectively.

========

First we will look at the histogram of each Variable


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.630   3.370   4.390   4.942   6.180  18.480    1732

Figure 1: Distribution of Natural Gas prices, depicted using a linear

scale (left) and logarithmic scale (right). Notice that the distribution

is not specific.


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   11.38   31.08   61.63   61.82   89.30  145.30    1720

Figure 2: Distribution of Oil prices, depicted using a linear scale (left)

and logarithmic scale (right). Again the distribution is not specific.


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   676.5  1121.0  1272.0  1297.0  1418.0  2117.0    1711

Figure 3: Similar as in figure 1 and figure 2 but for the S&P prices,

similar to previous variables the distribution is not specific.


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     642    1796    2500    2433    3071    3929
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -1135.000  -254.200    15.710     2.975   286.900  1086.000

Figure 4: Distribution of Natural Gas ‘Total Storage’. The distribution

seems compact on the linear scale (left). On the right, histogram that

represent the difference between the ‘Total Storage’ and the daily

average.


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -15.00   32.00   45.00   45.78   61.00   88.00     820
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -42.2300  -5.1540   0.6923   0.0000   6.1330  24.5300      820

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   13.00   51.00   66.00   63.75   78.00   95.00     402
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -29.3300  -4.5330   0.2308   0.0000   4.7330  26.5300      402

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     8.0    42.0    57.0    55.9    71.0    94.0     318
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -27.270  -4.800  -0.200   0.000   4.467  27.130     318

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   42.00   60.00   67.00   66.99   74.00   95.00     450
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -17.330  -3.724  -0.400   0.000   3.400  19.070     450

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   21.25   45.75   57.25   57.68   70.00   84.50    1024
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -19.64   -2.25    0.10    0.00    2.55   14.03    1024

Figure 5: Distribution of ‘Temperature’ for four different sites and the

average. Here again distribution seems compact on the linear scale (left).

On the right, histogram that represent the difference between the

‘Temperature’ and the daily average.


========

Univariate Analysis

What is the structure of your dataset?


There are 10 numeric variables:

‘Date’ ‘Temp_CA’ ‘Temp_WY’ ‘Temp_TX’

‘Temp_NY’ ‘Temp_MN’ ‘Tot_storg’

‘NG_Price’ ‘WTI_Price’ ‘S_P_Close_adj’.


There are 6 categorical variables:

‘Tot_storg_B’ ‘Temp_CA_B’ ‘Temp_WY_B’

‘Temp_TX_B’ ‘Temp_NY_B’ ‘Temp_MN_B’


Each variable include a time course of ~5800 time steps.


Categorical variables have 3 possible values: ‘Avg’,‘Over’ or ‘Under’

When temperatures are 20% higher (lower) from the average in the summer

(winter), consumption is expected to rise and therefore storage might go

‘Under’ (‘Over’) the average.


Storage also has a categorical variable with similar 3 possible values:

‘Avg’,‘Over’ or ‘Under’, for values that are 20% ‘Over’ or ‘Under’ the

average.


In addition for each categorical variable I calculated continues variable

that represents the difference from average.

========

What is/are the main feature(s) of interest in your dataset?


‘Natural gas’ Price and its relation to ‘Natural gas Storage’ are the main

features of interest.


I will want to determine in what way ‘Storage’ deficit or surplus

correlates or predicts Natural gas price. I believe that ‘Storage’

or some derivative of ‘Storage’ encapsulate the supply and demand

in the market and naturally the price. The main interest of this project

is to build a model that will predict Natural gas prices.

========

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest?


Temperature: The largest consumers of Natural gas are power plants.

When extreme (or mild) temperatures remain for long time the high

(or low) energy consumption might change the storage balance and

consequently Natural gas prices are expected to fallow.


Oil: Perhaps the most dominant player in the Energy market is Oil.

This commodity is influence by many parameters from politics to

advancement of technology. Eventually persistence changes in oil will

proliferate to other energy commodities, in our case Natural gas prices.


S&P: Economy growths are associated with changes in energy consumption,

and thereafter change in Natural gas prices.

========

Did you create any new variables from existing variables in the dataset?


Yes.

Daily average Temperature: Since the data-set include more than 10 years,

it is possible to calculate an average for each day in the year.


Daily average Storage: Same as for temperature.


Difference between Temperature and the Daily average Temperature: This

variable is expected to be more correlated with Natural gas prices.


Difference between Storage and the Daily average Storage: Same as for

temperature.

========

Of the features you investigated, were there any unusual distributions?


All the variables of the prices (Natural gas, Oil, S&P) have right side

long tail distribution. In the case of Natural gas, long tail is

associates with short term increases in prices that are probably

associates with bad weather. Oil prices show the most irregular with

two high peaks around 20$ and 100$. S&P prices is also skewed to the

right, this is caused by the uptrend.


The Temperature and the Storage data show compact distribution that is

associate with the oscillatory tendency of the variables. When transform

those variables compare to the average the distribution become more or less

normal.

========

Did you perform any operations on the data to tidy, adjust, or change the

form of the data? If so, why did you do this?


Yes,

The Temperature includes data only to 2013, so for any calculation or

visualisation that include Temperature I include or presented only the

years 1999-2013 period.


‘Storage’ data updated weekly, therefore I had to interpolate to achieve

daily resolution. For that I simply include the last reported value for

the next 6 days.

========

Bivariate Plots Section

It is important to look at the time course of each variable and see if we

can characterize any common trend or oscillation.


Figure 6: Time course of Natural gas price for the years 1999-2015


Figure 7: Time course of S&P price for the years 1999-2015


Figure 8: Time course of Oil price for the years 1999-2015


Storage and Temperature are expected to oscillate regularly

It makes sense to stack yearly charts instead of presenting long time

course. It is also interesting to look at variation from the average.

Figure 9: Yearly time course for the total (left) and differential (right)

Natural Gas storage for the years 1999-2015.


Figure 10: Yearly time course for the total (left) or differential (right)

Temperature in 4 different sits and the average for the years 1999-2013


Scatter plot can expose relation between different variables.

Next we will look on some relations.


##              NG_Price  Tot_storg diff_storg
## NG_Price    1.0000000 -0.1532673 -0.3140274
## Tot_storg  -0.1532673  1.0000000  0.4820455
## diff_storg -0.3140274  0.4820455  1.0000000

Figure 11 A: Natural Gas price versus total (left) or differential (right)

storage for the years 1999-2015.


##                 NG_Price     Temp_MN diff_Temp_MN
## NG_Price      1.00000000 -0.06701201   -0.1076766
## Temp_MN      -0.06701201  1.00000000    0.2912299
## diff_Temp_MN -0.10767659  0.29122988    1.0000000

Figure 11 B: Natural Gas price versus total (left) or differential (right)

temperature for the years 1999-2015.


##                 NG_Price S_P_Close_adj WTI_Price
## NG_Price      1.00000000    0.01964506 0.2227468
## S_P_Close_adj 0.01964506    1.00000000 0.4588397
## WTI_Price     0.22274677    0.45883974 1.0000000

Figure 12: Natural Gas price versus oil and S&P and Oil versus S&P for the

years 1999-2015


##               Tot_storg    Temp_MN diff_storg diff_Temp_MN
## Tot_storg    1.00000000 0.04128615 0.48204551    0.0285670
## Temp_MN      0.04128615 1.00000000 0.07903532    0.2912299
## diff_storg   0.48204551 0.07903532 1.00000000    0.0870010
## diff_Temp_MN 0.02856700 0.29122988 0.08700100    1.0000000

Figure 13: temperatures versus Storage for total values (left) and

differential values (right), years 1999-2013. Notice how the phase

shift between these two cyclic variable create a circle on the left.


Another way to find relations between variables is to categorize the

variable and presents the box plot for each category. here the category

box plot for the average storage and for the Temperature.


## Source: local data frame [3 x 2]
## 
##   Tot_storg_B To_strg_X_NG_Prce
## 1         Avg        -0.1441367
## 2        Over        -0.5523560
## 3       Under         0.1168235

Figure 14: Distribution of Natural Gas price versus categorical storage

(‘Over’, ‘Under’ or around the average (‘Avg’)), for the years 1999-2015.

the color bar represents the total storage for each value on the chart.

Notice thatwhen the storage is at its total low capacity (blue colors)

more valuestend to be categorized as ‘under’ and Natural Gas prices are

higher.


## Source: local data frame [3 x 2]
## 
##   Temp_MN_B TMP_X_NG_Prce
## 1       Avg   -0.03380176
## 2      Over   -0.11191966
## 3     Under   -0.15481746

Figure 15: Distribution of Natural Gas price versus categorical average

Temperature (‘Over’, ‘Under’ or around the average (‘Avg’)), for the years

1999-2013. Notice that when the Temperature is at its total low boundary

(blue colors) more values tend to be categorized as ‘under’, In this case

Natural Gas prices are not significantly higher.


## Source: local data frame [3 x 2]
## 
##   Temp_NY_B TMP_NY_X_NG_Prce
## 1       Avg     -0.007556627
## 2      Over     -0.009706085
## 3     Under     -0.123056873

Figure 16 : Same as for figure 15 but the temperature over New York.

Notice that when the Temperature is at its total low boundary

(blue colors) more values tend to be categorized as ‘under’, In this

case Natural Gas prices are not significantly higher.


## Source: local data frame [3 x 2]
## 
##   Temp_WY_B TMP_WY_X_NG_Prce
## 1       Avg       -0.0528076
## 2      Over       -0.0305201
## 3     Under       -0.1107332

Figure 17 : Same as for figure 15 but the temperature over Wyoming.


## Source: local data frame [3 x 2]
## 
##   Temp_TX_B TMP_TX_X_NG_Prce
## 1       Avg     -0.004726036
## 2      Over     -0.034693772
## 3     Under     -0.131771058

Figure 18: Same as for figure 15 but the temperature over Texas.


## Source: local data frame [3 x 2]
## 
##   Temp_CA_B TMP_CA_X_NG_Prce
## 1       Avg     -0.065810989
## 2      Over      0.002918756
## 3     Under      0.003412581

Figure 19: Same as for figure 15 but the temperature over California.


Bivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features

in the dataset?


‘Natural Gas total Storage’ and ‘Temperatures’ reveal regular yearly

oscillation pattern (Figures 9 and 10). Obviously the temperatures at

different sites show relatively high correlation values (0.6-0.95), this

is easy to explain since the yearly common pattern is common in all

the sites is stronger than the local fluctuations.


I expected to find high correlation between the total storage and the

Temperature, because extreme temperatures (cold or hot) will cause high

consumption. Surprisingly the correlation between the variables is low

(r < 0.085 in three sites and ~0,15 in California). This can be explained

by a phase shift between the two variables. The circular pattern in

figure 13 on the left, support this explanation.


The correlation between Natural Gas prices and the temperature at different

sites are lower than -0.07.


The correlation between Natural Gas prices and the storage is around

-0.15. The low correlations values support our basic assumption that the

regular yearly pattern does not affect Natural Gas price.


##               Temp_CA      Temp_WY     Temp_TX     Temp_NY     Temp_MN
## Temp_CA    1.00000000  0.735444569  0.61531646  0.65087735  0.79734993
## Temp_WY    0.73544457  1.000000000  0.81129197  0.74525655  0.93029901
## Temp_TX    0.61531646  0.811291970  1.00000000  0.82616075  0.92505581
## Temp_NY    0.65087735  0.745256551  0.82616075  1.00000000  0.91026300
## Temp_MN    0.79734993  0.930299007  0.92505581  0.91026300  1.00000000
## Tot_storg  0.15803156  0.000797175 -0.03539659  0.08237812  0.04128615
## NG_Price  -0.03065331 -0.067089300 -0.06326081 -0.06688572 -0.06701201
##              Tot_storg    NG_Price
## Temp_CA    0.158031562 -0.03065331
## Temp_WY    0.000797175 -0.06708930
## Temp_TX   -0.035396593 -0.06326081
## Temp_NY    0.082378118 -0.06688572
## Temp_MN    0.041286152 -0.06701201
## Tot_storg  1.000000000 -0.15326731
## NG_Price  -0.153267307  1.00000000

The regular yearly oscillation in the Temperatures and storage

are gone, after removing of the average (Figures 9 and 10). Correlations

between sites are lower than 0.25.


The correlation between the temperature diff and the storage diff is

indeed higher than the correlation between the total values is but still

very low (0.09 compare to 0.04).


As expected correlation we find relatively High negative correlation value

between the storage diff and Natural Gas price (-0.31).

Indeed, in figure 14 we see that Storage ‘Under’ the average is associates

with high Natural Gas prices and vice verse.


The correlation between the temperature diff and Natural Gas prices is low

(-0.1) just slightly higher than the total value (-0.07).


The correlation between Natural Gas prices and the diff temperature at

different sites are just slightly higher than the values obtained for the

total temperature. (r > -0.1 compare to r > -0.07).


##              diff_Temp_CA diff_Temp_WY diff_Temp_TX diff_Temp_NY
## diff_Temp_CA   1.00000000  0.220898543  -0.16147412  -0.15647710
## diff_Temp_WY   0.22089854  1.000000000   0.22207063  -0.13187436
## diff_Temp_TX  -0.16147412  0.222070631   1.00000000   0.25363262
## diff_Temp_NY  -0.15647710 -0.131874360   0.25363262   1.00000000
## diff_Temp_MN   0.30932571  0.682137116   0.65704584   0.44089789
## diff_storg     0.05159556  0.003577635   0.03872496   0.11032799
## NG_Price       0.02465302 -0.075679962  -0.06864114  -0.08951413
##              diff_Temp_MN   diff_storg    NG_Price
## diff_Temp_CA    0.3093257  0.051595557  0.02465302
## diff_Temp_WY    0.6821371  0.003577635 -0.07567996
## diff_Temp_TX    0.6570458  0.038724957 -0.06864114
## diff_Temp_NY    0.4408979  0.110327990 -0.08951413
## diff_Temp_MN    1.0000000  0.087001004 -0.10767659
## diff_storg      0.0870010  1.000000000 -0.31402736
## NG_Price       -0.1076766 -0.314027363  1.00000000

Correlation values between Natural Gas and Oil is ~ 0.22 and

between S&P is practically 0,


The correlation between Oil and S&P is relatively higher! 0.46. This can

be explained as economy growth is correlated with higher oil consumption.


Correlation values between oil and Natural Gas storage are surprisingly

higher (r ~ 0.6) compare to the negative correlation with the Natural

Gas price (r ~ -0.3). I’m not sure I understand this high correlation,

but it is very interesting finding.


##               diff_Temp_MN diff_storg S_P_Close_adj WTI_Price    NG_Price
## diff_Temp_MN    1.00000000  0.0870010    0.05351308 0.0301859 -0.10767659
## diff_storg      0.08700100  1.0000000    0.25026325 0.6341025 -0.31402736
## S_P_Close_adj   0.05351308  0.2502632    1.00000000 0.4588397  0.01964506
## WTI_Price       0.03018590  0.6341025    0.45883974 1.0000000  0.22274677
## NG_Price       -0.10767659 -0.3140274    0.01964506 0.2227468  1.00000000

=========

Did you observe any interesting relationships between the other features

(not the main feature(s) of interest)?


see discussion above.

=========

What was the strongest relationship you found?


Omitting the correlation between temperatures at different sites. Few

interesting relations.

diff_Temp_MN - NG_Price -0.10

diff_Temp_NY - diff_storg 0.11

diff_storg - WTI_Price 0.63

Temp_MN - WTI_Price 0.14

S_P_Close_adj- WTI_Price 0.45

WTI_Price - Temp_MN 0.14

=========


Multivariate Plots Section


It is interesting if the correlations between the variables depend on the

years. It is possible to visualise this by adding another feature to the

chart.


Figure 20: Natural Gas price versus Total and differential storage, the

colors in the upper 2 figures depict the different years that grouped

together. The grouping is even stronger when we look at the diff storage

on the right. The price versus years chart reveals that on some years the

diff storage variable is mapped to the price as a drift from red to blue

(for example in 2012 and in 2001).


We can calculate the correlation between the price and the differential

storage at different years.


look at the correlation between NG_Price and diff_storg across different

years


## Source: local data frame [15 x 2]
## 
##    year Storage_corr
## 1  1999   -0.7294453
## 2  2000   -0.8924068
## 3  2001   -0.9464521
## 4  2002   -0.7683198
## 5  2003   -0.3811539
## 6  2004    0.4280155
## 7  2005   -0.7678334
## 8  2006   -0.1546736
## 9  2007   -0.2814906
## 10 2008   -0.8101995
## 11 2009   -0.5555645
## 12 2010   -0.3132351
## 13 2011   -0.6827816
## 14 2012   -0.8531196
## 15 2013   -0.7563140
## Warning: Stacking not well defined when ymin != 0

Figure 21: Correlation between Natural Gas price and diff storage at

different years, clearly we can see that correlation change with years.

But perhaps more importantly is that the changes are not chaotic but

have some trend.


Figure 22: same as in figure 20 but here Natural Gas is plotted versus

the diff Temp. Notice that in this case in every year (color) for the

same temp difference we have low and high prices. This is also apparent

at the change in price with years, (bottom right) the colors in this

case do not drift gradually


Again we can look at the actual correlation values in each year.

## Source: local data frame [15 x 2]
## 
##    year   Temp_corr
## 1  1999 -0.11049689
## 2  2000 -0.48550205
## 3  2001 -0.25247534
## 4  2002 -0.15724076
## 5  2003 -0.46831319
## 6  2004 -0.16472860
## 7  2005 -0.08559575
## 8  2006  0.30827462
## 9  2007 -0.14109156
## 10 2008  0.02304362
## 11 2009 -0.06444451
## 12 2010 -0.30705068
## 13 2011 -0.18292568
## 14 2012 -0.21111542
## 15 2013 -0.19409883
## Warning: Stacking not well defined when ymin != 0

Figure 23 : in this case the correlation is lower, but again not chaotic

( in most years we obtain negative correlation of around -0.2)


## Source: local data frame [15 x 2]
## 
##    year    S_P_corr
## 1  1999  0.44350051
## 2  2000 -0.55670275
## 3  2001  0.71317629
## 4  2002 -0.56097296
## 5  2003 -0.36810068
## 6  2004  0.24713270
## 7  2005  0.51071959
## 8  2006 -0.03382188
## 9  2007 -0.07560007
## 10 2008  0.71021792
## 11 2009 -0.01623393
## 12 2010 -0.42482277
## 13 2011  0.49349541
## 14 2012  0.37061700
## 15 2013  0.33753725
## Warning: Stacking not well defined when ymin != 0

Figure 24: same as above but here we look at the correlation between

Natural Gas and S&P, the correlation values in this case are sporadic.


## Source: local data frame [15 x 2]
## 
##    year    WTI_corr
## 1  1999  0.73011326
## 2  2000  0.20715653
## 3  2001  0.64506176
## 4  2002  0.79127760
## 5  2003  0.47702588
## 6  2004  0.31833830
## 7  2005  0.72281280
## 8  2006 -0.03328527
## 9  2007 -0.23480866
## 10 2008  0.87247248
## 11 2009 -0.23223902
## 12 2010 -0.25482049
## 13 2011  0.11523410
## 14 2012 -0.60280745
## 15 2013 -0.29430215
## Warning: Stacking not well defined when ymin != 0

Figure 25: same as above but here we look at the correlation between

Natural Gas and Oil, the correlation values in this case are sporadic.


## Source: local data frame [15 x 2]
## 
##    year corr_Strg_Tmp
## 1  1999   -0.18050204
## 2  2000    0.48650392
## 3  2001    0.25077913
## 4  2002   -0.01951085
## 5  2003    0.24072780
## 6  2004   -0.15606296
## 7  2005   -0.09999636
## 8  2006   -0.15177588
## 9  2007   -0.22323150
## 10 2008   -0.05999107
## 11 2009   -0.20665424
## 12 2010   -0.10107451
## 13 2011   -0.04251272
## 14 2012    0.17353399
## 15 2013    0.02118832
## Warning: Stacking not well defined when ymin != 0

Figure 26: same as above but here we look at the correlation between

diff storage and Diff Temp, the correlation values in this case are

sporadic.


===========

Linear model using the Diff values of Temp and Storage

## 
## Calls:
## m1: lm(formula = I(NG_Price) ~ I(diff_storg), data = D)
## m2: lm(formula = I(NG_Price) ~ I(diff_storg) + WTI_Price, data = D)
## m3: lm(formula = I(NG_Price) ~ I(diff_storg) + WTI_Price + diff_Temp_MN, 
##     data = D)
## m4: lm(formula = I(NG_Price) ~ I(diff_storg) + WTI_Price + diff_Temp_MN + 
##     S_P_Close_adj, data = D)
## 
## =======================================================
##                    m1        m2        m3        m4    
## -------------------------------------------------------
## (Intercept)      5.079***  1.854***  1.865***  3.432***
##                 (0.038)   (0.086)   (0.086)   (0.187)  
## I(diff_storg)   -0.002*** -0.005*** -0.005*** -0.005***
##                 (0.000)   (0.000)   (0.000)   (0.000)  
## WTI_Price                  0.054***  0.054***  0.059***
##                           (0.001)   (0.001)   (0.001)  
## diff_Temp_MN                        -0.036*** -0.033***
##                                     (0.008)   (0.008)  
## S_P_Close_adj                                 -0.002***
##                                               (0.000)  
## -------------------------------------------------------
## R-squared           0.099     0.396     0.400     0.416
## adj. R-squared      0.098     0.396     0.400     0.415
## sigma               2.182     1.786     1.781     1.757
## F                 361.135  1083.033   733.858   587.005
## p                   0.000     0.000     0.000     0.000
## Log-likelihood  -7263.107 -6601.160 -6590.266 -6546.663
## Deviance        15718.510 10527.840 10458.625 10186.109
## AIC             14532.214 13210.319 13190.532 13105.326
## BIC             14550.522 13234.730 13221.045 13141.942
## N                3303      3303      3303      3303    
## =======================================================

===========

Linear model using the Total values of Temp and Storage

## 
## Calls:
## m1: lm(formula = I(NG_Price) ~ I(Tot_storg), data = D)
## m2: lm(formula = I(NG_Price) ~ I(Tot_storg) + Temp_MN, data = D)
## m3: lm(formula = I(NG_Price) ~ I(Tot_storg) + Temp_MN + S_P_Close_adj, 
##     data = D)
## m4: lm(formula = I(NG_Price) ~ I(Tot_storg) + Temp_MN + S_P_Close_adj + 
##     WTI_Price, data = D)
## 
## =======================================================
##                    m1        m2        m3        m4    
## -------------------------------------------------------
## (Intercept)      6.095***  6.661***  6.157***  7.733***
##                 (0.129)   (0.205)   (0.298)   (0.294)  
## I(Tot_storg)    -0.000*** -0.000*** -0.000*** -0.001***
##                 (0.000)   (0.000)   (0.000)   (0.000)  
## Temp_MN                   -0.010*** -0.010*** -0.018***
##                           (0.003)   (0.003)   (0.003)  
## S_P_Close_adj                        0.000*   -0.001***
##                                     (0.000)   (0.000)  
## WTI_Price                                      0.029***
##                                               (0.001)  
## -------------------------------------------------------
## R-squared           0.023     0.027     0.029     0.130
## adj. R-squared      0.023     0.027     0.028     0.129
## sigma               2.271     2.267     2.266     2.145
## F                  79.409    46.100    32.578   123.096
## p                   0.000     0.000     0.000     0.000
## Log-likelihood  -7395.309 -7389.058 -7386.351 -7204.758
## Deviance        17028.504 16964.177 16936.389 15172.859
## AIC             14796.618 14786.117 14782.702 14421.516
## BIC             14814.926 14810.527 14813.215 14458.132
## N                3303      3303      3303      3303    
## =======================================================

Multivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. Were there features that strengthened each other in terms

of looking at your feature(s) of interest?


the most interesting observation is that indeed in different years the

correlation changes, this indicate that some feature I did not consider

at first is influencing the correlations between Natural Gas price and

After all in the last few years with the development of the shell

drilling some major changes come to the energy market, and obviously still

is.


we can also see that in the correlation between the storage and the

temperature, since 2004the correlation between them is negative which means

that even when temperature is extreme price still goes down as the over

supply dominating.


Another interesting relation is the positive correlation between Oil price

and gas storage, the correlation was higher in absolute values from the

correlation between the gas and its storage , and it was on a opposite

companies apply using the Gaz price.


It is interesting that the correlation between the Natural Gas Price and

other parameters change in the course of years. This imply that perhaps

there is another parameter that we did not consider in the analysis most

interesting relation seems to be the cause that changes, strength this

point of view is the fact that the correlation between the Gas and the

oil is negative.


==========

Were there any interesting or surprising interactions between features?

yes few surprises, look above ,


==========

OPTIONAL: Did you create any models with your dataset? Discuss the

trengths and limitations of your model.

yes I created a simple linear

The variables in the linear model account for 40% of the variance in the

price of Natural Gas. The diff_storg explains 20% of the variance.

When adding WTI_Price the explains ~40% of the variance.

When adding diff_Temp_MN the model explains ~41% of the variance.

==========

Final Plots and Summary


Plot One

Description One

Changes in Natural Gas prices (X-axis) as function of time

(Y-axis, years). Each bar represents the change within one year. In most

years we find large change in price that sometimes can be even dramatic.

In this project we wanted to know which parameter affects Natural Gas prices.


Plot Two

Description Two

Correlations values (Y-axis) as function of time (X-axis years). Here

we compare the correlation of Natural Gas prices with storage (pink),

Temperature (green) and Oil prices (blue). It is obvious that the

correlation change in different years. The correlation with storage

and temperature is negative in almost all years. The correlation with

oil was positive till 2006 and since is negative.

Plot Three

Description Three

Correlations values (Y-axis) as function of time (X-axis years). Here

we compare the correlation of Oil prices with storage (pink), S&P (green)

and Natural Gas prices (blue).

It is obvious that in this case, correlation, despite being high change

signs even between consecutive years. Surprisingly even the correlation

between S&P and Oil is not a clear cut.


Reflection

The analysis show that Natural gas prices correlates with the temperature

and the storage, we also found correlation with Oil prices. When looking

on the correlation data over years we notice that on average correlation

values do not change dramatically over time.

The same analysis done on Oil show those correlation values in this case

even being high are volatiles. This comparison suggests that a model for

NG price might work for longer than for Oil.

To build a better model I will need to look at other variables that might

affect the consumption demand balance. One option is rig count.

Another Option is to try to modify some parameters. For example average

temperature over one week instead of one day, and then calculate the

difference might reflect longer periods of severe temperature that will

affect the storage.