Datetime Index

To analyze financial data the index typically needs to be a date and a time. Here’s how we can create one:

> import numpy as np
+ import pandas as pd
+ import matplotlib.pyplot as plt
+ from datetime import datetime

> # To illustrate the order of arguments
+ my_year = 2017
+ my_month = 1
+ my_day = 2
+ my_hour = 13
+ my_minute = 30
+ my_second = 15

> # January 2nd, 2017
+ my_date = datetime(my_year,my_month,my_day)
+ my_date # Time defaults to 0:00

datetime.datetime(2017, 1, 2, 0, 0)

> # January 2nd, 2017 at 13:30:15
+ my_date_time = datetime(my_year,my_month,my_day,
+                         my_hour,my_minute,my_second)
+ my_date_time

datetime.datetime(2017, 1, 2, 13, 30, 15)

You can retrieve any part of the datetime object:

> my_date.day

> my_date_time.hour

Pandas with Datetime Index

Pandas has several functions and methods to work with time series.

> # Create an example datetime list/array
+ first_two = [datetime(2016, 1, 1), datetime(2016, 1, 2)]
+ first_two

[datetime.datetime(2016, 1, 1, 0, 0), datetime.datetime(2016, 1, 2, 0, 0)]

> # Converted to an index
+ dt_ind = pd.DatetimeIndex(first_two)
+ dt_ind

DatetimeIndex(['2016-01-01', '2016-01-02'], dtype='datetime64[ns]', freq=None)

> # Attached to some random data
+ data = np.random.randn(2,2)
+ print(data)

[[-0.33535521  0.66132426]
 [-0.05602004 -0.18535596]]

> cols = ['A','B']

> df = pd.DataFrame(data,dt_ind,cols)
+ df

                   A         B
2016-01-01 -0.335355  0.661324
2016-01-02 -0.056020 -0.185356

> df.index

DatetimeIndex(['2016-01-01', '2016-01-02'], dtype='datetime64[ns]', freq=None)

> # Latest Date Location
+ df.index.argmax()

> df.index.max()

Timestamp('2016-01-02 00:00:00')

> # Earliest Date Index Location
+ df.index.argmin()

> df.index.min()

Timestamp('2016-01-01 00:00:00')

Time Resampling

Sometimes you need to convert a Date column into an index:

> # Import Data
> df = pd.read_csv('time_data/walmart_stock.csv')

Notice that Date is just a non-null object:

> df.head()

         Date       Open       High        Low      Close    Volume  Adj Close
0  2012-01-03  59.970001  61.060001  59.869999  60.330002  12668800  52.619235
1  2012-01-04  60.209999  60.349998  59.470001  59.709999   9593300  52.078475
2  2012-01-05  59.349998  59.619999  58.369999  59.419998  12768200  51.825539
3  2012-01-06  59.419998  59.450001  58.869999  59.000000   8069400  51.459220
4  2012-01-09  59.029999  59.549999  58.919998  59.180000   6679300  51.616215

> df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 7 columns):
Date         1258 non-null object
Open         1258 non-null float64
High         1258 non-null float64
Low          1258 non-null float64
Close        1258 non-null float64
Volume       1258 non-null int64
Adj Close    1258 non-null float64
dtypes: float64(5), int64(1), object(1)
memory usage: 68.9+ KB

Create a date index from the date column:

> # convert Date to a datetime and set as index
+ df['Date'] = pd.to_datetime(df['Date'])
+ df.head()

        Date       Open       High        Low      Close    Volume  Adj Close
0 2012-01-03  59.970001  61.060001  59.869999  60.330002  12668800  52.619235
1 2012-01-04  60.209999  60.349998  59.470001  59.709999   9593300  52.078475
2 2012-01-05  59.349998  59.619999  58.369999  59.419998  12768200  51.825539
3 2012-01-06  59.419998  59.450001  58.869999  59.000000   8069400  51.459220
4 2012-01-09  59.029999  59.549999  58.919998  59.180000   6679300  51.616215

> df.set_index('Date',inplace=True)
+ df.head()

                 Open       High        Low      Close    Volume  Adj Close
Date                                                                       
2012-01-03  59.970001  61.060001  59.869999  60.330002  12668800  52.619235
2012-01-04  60.209999  60.349998  59.470001  59.709999   9593300  52.078475
2012-01-05  59.349998  59.619999  58.369999  59.419998  12768200  51.825539
2012-01-06  59.419998  59.450001  58.869999  59.000000   8069400  51.459220
2012-01-09  59.029999  59.549999  58.919998  59.180000   6679300  51.616215

As a faster alternative you could just set the index when importing the file:

                 Open       High        Low      Close    Volume  Adj Close
Date                                                                       
2012-01-03  59.970001  61.060001  59.869999  60.330002  12668800  52.619235
2012-01-04  60.209999  60.349998  59.470001  59.709999   9593300  52.078475
2012-01-05  59.349998  59.619999  58.369999  59.419998  12768200  51.825539
2012-01-06  59.419998  59.450001  58.869999  59.000000   8069400  51.459220
2012-01-09  59.029999  59.549999  58.919998  59.180000   6679300  51.616215

> df = pd.read_csv('time_data/walmart_stock.csv', 
>                  index_col='Date',parse_dates=True)
> df.head()

Resample

You can resample (aggregate) the data using resample().

Here is a list of offset codes:

Alias	Description
B	business day frequency
C	custom business day frequency (experimental)
D	calendar day frequency
W	weekly frequency
M	month end frequency
SM	semi-month end frequency (15th and end of month)
BM	business month end frequency
CBM	custom business month end frequency
MS	month start frequency
SMS	semi-month start frequency (1st and 15th)
BMS	business month start frequency
CBMS	custom business month start frequency
Q	quarter end frequency
BQ	business quarter endfrequency
QS	quarter start frequency
BQS	business quarter start frequency
A	year end frequency
BA	business year end frequency
AS	year start frequency
BAS	business year start frequency
BH	business hour frequency
H	hourly frequency
T, min	minutely frequency
S	secondly frequency
L, ms	milliseconds
U, us	microseconds
N	nanoseconds

> # Our index
+ df.index

DatetimeIndex(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06',
               '2012-01-09', '2012-01-10', '2012-01-11', '2012-01-12',
               '2012-01-13', '2012-01-17',
               ...
               '2016-12-16', '2016-12-19', '2016-12-20', '2016-12-21',
               '2016-12-22', '2016-12-23', '2016-12-27', '2016-12-28',
               '2016-12-29', '2016-12-30'],
              dtype='datetime64[ns]', name='Date', length=1258, freq=None)

When aggregating we need a mathematical rule to join the rows (mean,sum,count,etc…):

> # Yearly Means
+ df.resample(rule='A').mean()

                 Open       High        Low      Close        Volume  Adj Close
Date                                                                           
2012-12-31  67.158680  67.602120  66.786520  67.215120  9.239015e+06  59.389349
2013-12-31  75.264048  75.729405  74.843055  75.320516  6.951496e+06  68.147179
2014-12-31  77.274524  77.740040  76.864405  77.327381  6.515612e+06  71.709712
2015-12-31  72.569405  73.064167  72.034802  72.491111  9.040769e+06  68.831426
2016-12-31  69.481349  70.019643  69.023492  69.547063  9.371645e+06  68.054229

> df.resample(rule='A').max()

                 Open       High        Low      Close    Volume  Adj Close
Date                                                                       
2012-12-31  77.599998  77.599998  76.690002  77.150002  38007300  68.568371
2013-12-31  81.209999  81.370003  80.820000  81.209999  25683700  73.929868
2014-12-31  87.080002  88.089996  86.480003  87.540001  22812400  81.707680
2015-12-31  90.800003  90.970001  89.250000  90.470001  80898100  84.914216
2016-12-31  74.500000  75.190002  73.629997  74.300003  35076700  73.233524

> df['Close'].resample('A').mean().plot(kind='bar');
+ plt.title('Yearly Mean Close Price for Walmart');
+ plt.tight_layout()
+ plt.show()

> df['Open'].resample('M').max().plot(kind='bar',figsize=(16,6))
+ plt.title('Monthly Max Opening Price for Walmart');
+ plt.tight_layout()
+ plt.show()

Custom Resampling

You could also create your own custom resampling function:

> def first_day(entry):
+     return entry[0]

> # price on first day of the year
+ df.resample(rule='A').apply(first_day)

                 Open       High        Low      Close    Volume  Adj Close
Date                                                                       
2012-12-31  59.970001  61.060001  59.869999  60.330002  12668800  52.619235
2013-12-31  68.930000  69.239998  68.449997  69.239998  10390800  61.879708
2014-12-31  78.720001  79.470001  78.500000  78.910004   6878000  72.254228
2015-12-31  86.269997  86.720001  85.550003  85.900002   4501800  80.624861
2016-12-31  60.500000  61.490002  60.360001  61.459999  11989200  59.289713

Time Shifts

Sometimes you may need to shift all your data up or down along the time series index:

SHIFT FORWARD

> df.shift(1).head()

                 Open       High        Low      Close      Volume  Adj Close
Date                                                                         
2012-01-03        NaN        NaN        NaN        NaN         NaN        NaN
2012-01-04  59.970001  61.060001  59.869999  60.330002  12668800.0  52.619235
2012-01-05  60.209999  60.349998  59.470001  59.709999   9593300.0  52.078475
2012-01-06  59.349998  59.619999  58.369999  59.419998  12768200.0  51.825539
2012-01-09  59.419998  59.450001  58.869999  59.000000   8069400.0  51.459220

SHIFT BACKWARDS

> df.shift(-1).tail()

                 Open       High        Low      Close     Volume  Adj Close
Date                                                                        
2016-12-23  69.300003  69.820000  69.250000  69.699997  4435700.0  69.191240
2016-12-27  69.940002  70.000000  69.260002  69.309998  4875700.0  68.804087
2016-12-28  69.209999  69.519997  69.120003  69.260002  4298400.0  68.754456
2016-12-29  69.120003  69.430000  68.830002  69.120003  6889500.0  68.615479
2016-12-30        NaN        NaN        NaN        NaN        NaN        NaN

SHIFTING BASED OFF TIME STRING CODE

> # Shift everything forward to month end
+ df.tshift(periods=1,freq='M').head()

                 Open       High        Low      Close    Volume  Adj Close
Date                                                                       
2012-01-31  59.970001  61.060001  59.869999  60.330002  12668800  52.619235
2012-01-31  60.209999  60.349998  59.470001  59.709999   9593300  52.078475
2012-01-31  59.349998  59.619999  58.369999  59.419998  12768200  51.825539
2012-01-31  59.419998  59.450001  58.869999  59.000000   8069400  51.459220
2012-01-31  59.029999  59.549999  58.919998  59.180000   6679300  51.616215

Pandas Rolling and Expanding

We can also create data based off of a rolling mean:

PLOT OF OPEN PRICES

> df['Open'].plot(figsize=(16,6));
+ plt.show()

7 DAY ROLLING MEAN

> # 7 day rolling mean
+ df.rolling(7).mean().head(20)

                 Open       High        Low      Close        Volume  Adj Close
Date                                                                           
2012-01-03        NaN        NaN        NaN        NaN           NaN        NaN
2012-01-04        NaN        NaN        NaN        NaN           NaN        NaN
2012-01-05        NaN        NaN        NaN        NaN           NaN        NaN
2012-01-06        NaN        NaN        NaN        NaN           NaN        NaN
2012-01-09        NaN        NaN        NaN        NaN           NaN        NaN
2012-01-10        NaN        NaN        NaN        NaN           NaN        NaN
2012-01-11  59.495714  59.895714  59.074285  59.440000  9.007414e+06  51.842984
2012-01-12  59.469999  59.744285  59.007143  59.321429  8.231357e+06  51.739567
2012-01-13  59.322857  59.638571  58.941428  59.297143  7.965071e+06  51.718386
2012-01-17  59.397143  59.708571  59.105714  59.358572  7.355329e+06  51.771963
2012-01-18  59.450000  59.791428  59.217143  59.502857  7.047043e+06  51.897808
2012-01-19  59.578572  59.960000  59.335715  59.707143  7.412086e+06  52.075984
2012-01-20  59.767143  60.180000  59.577143  59.988571  7.908014e+06  52.321443
2012-01-23  60.017143  60.387143  59.787143  60.204285  8.017800e+06  52.509586
2012-01-24  60.154286  60.672857  59.979999  60.474285  8.035857e+06  52.745077
2012-01-25  60.440000  60.958572  60.270000  60.749999  7.776786e+06  52.985553
2012-01-26  60.715714  61.205714  60.448571  60.910000  7.624814e+06  53.125103
2012-01-27  60.868572  61.361429  60.575714  61.010000  7.678514e+06  53.212323
2012-01-30  60.945715  61.445714  60.661428  61.108571  7.450271e+06  53.298295
2012-01-31  61.057143  61.491429  60.648571  61.158571  7.362086e+06  53.341905

PLOT WITH 30 DAY ROLLING MEAN

> df['Open'].plot(figsize=(16,6));
+ df.rolling(window=30).mean()['Close'].plot();
+ plt.show()

You can add a legend by making the rolling value a new column.

> df['Close: 30 Day Mean'] = df['Close'].rolling(window=30).mean()
+ df[['Close','Close: 30 Day Mean']].plot(figsize=(16,6));
+ plt.show()

Expanding

Instead of setting a window you could also take into account everything from the selected Date and before. For example, the \(x\) value represents the Date and the \(y\) value represents the mean of all values up to that Date.

> df['Close'].expanding(min_periods=1
+                       ).mean().plot(figsize=(16,6));
+ plt.show()

Bollinger Bands

Bollinger Bands are volatility bands placed above and below a moving average. The bands automatically widen when volatility increases and narrow when volatility decreases. According to Bollinger, the bands should contain 88-89% of price action, which makes a move outside the bands significant. Technically, prices are relatively high when above the upper band and relatively low when below the lower band.

> # \ is a line break
+ df['Close: 30 Day Mean'] = df['Close']. \
+                 rolling(window=20).mean()
+ df['Upper'] = df['Close: 30 Day Mean'] + \
+                 2*df['Close'].rolling(window=20).std()
+ df['Lower'] = df['Close: 30 Day Mean'] - \
+                 2*df['Close'].rolling(window=20).std()
+ df[['Close','Close: 30 Day Mean','Upper','Lower']]. \
+                 plot(figsize=(16,6));
+ plt.show()

Pandas with Time Series Data

Produced in R Markdown

Paul Jozefek

2020-12-06