To analyze financial data the index typically needs to be a date and a time. Here’s how we can create one:
> import numpy as np
+ import pandas as pd
+ import matplotlib.pyplot as plt
+ from datetime import datetime
> # To illustrate the order of arguments
+ my_year = 2017
+ my_month = 1
+ my_day = 2
+ my_hour = 13
+ my_minute = 30
+ my_second = 15
> # January 2nd, 2017
+ my_date = datetime(my_year,my_month,my_day)
+ my_date # Time defaults to 0:00
datetime.datetime(2017, 1, 2, 0, 0)
> # January 2nd, 2017 at 13:30:15
+ my_date_time = datetime(my_year,my_month,my_day,
+ my_hour,my_minute,my_second)
+ my_date_time
datetime.datetime(2017, 1, 2, 13, 30, 15)
You can retrieve any part of the datetime object:
2
13
Pandas has several functions and methods to work with time series.
> # Create an example datetime list/array
+ first_two = [datetime(2016, 1, 1), datetime(2016, 1, 2)]
+ first_two
[datetime.datetime(2016, 1, 1, 0, 0), datetime.datetime(2016, 1, 2, 0, 0)]
DatetimeIndex(['2016-01-01', '2016-01-02'], dtype='datetime64[ns]', freq=None)
[[-0.33535521 0.66132426]
[-0.05602004 -0.18535596]]
A B
2016-01-01 -0.335355 0.661324
2016-01-02 -0.056020 -0.185356
DatetimeIndex(['2016-01-01', '2016-01-02'], dtype='datetime64[ns]', freq=None)
1
Timestamp('2016-01-02 00:00:00')
0
Timestamp('2016-01-01 00:00:00')
Sometimes you need to convert a Date
column into an index:
Notice that Date
is just a non-null object:
Date Open High Low Close Volume Adj Close
0 2012-01-03 59.970001 61.060001 59.869999 60.330002 12668800 52.619235
1 2012-01-04 60.209999 60.349998 59.470001 59.709999 9593300 52.078475
2 2012-01-05 59.349998 59.619999 58.369999 59.419998 12768200 51.825539
3 2012-01-06 59.419998 59.450001 58.869999 59.000000 8069400 51.459220
4 2012-01-09 59.029999 59.549999 58.919998 59.180000 6679300 51.616215
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 7 columns):
Date 1258 non-null object
Open 1258 non-null float64
High 1258 non-null float64
Low 1258 non-null float64
Close 1258 non-null float64
Volume 1258 non-null int64
Adj Close 1258 non-null float64
dtypes: float64(5), int64(1), object(1)
memory usage: 68.9+ KB
Create a date index from the date column:
> # convert Date to a datetime and set as index
+ df['Date'] = pd.to_datetime(df['Date'])
+ df.head()
Date Open High Low Close Volume Adj Close
0 2012-01-03 59.970001 61.060001 59.869999 60.330002 12668800 52.619235
1 2012-01-04 60.209999 60.349998 59.470001 59.709999 9593300 52.078475
2 2012-01-05 59.349998 59.619999 58.369999 59.419998 12768200 51.825539
3 2012-01-06 59.419998 59.450001 58.869999 59.000000 8069400 51.459220
4 2012-01-09 59.029999 59.549999 58.919998 59.180000 6679300 51.616215
Open High Low Close Volume Adj Close
Date
2012-01-03 59.970001 61.060001 59.869999 60.330002 12668800 52.619235
2012-01-04 60.209999 60.349998 59.470001 59.709999 9593300 52.078475
2012-01-05 59.349998 59.619999 58.369999 59.419998 12768200 51.825539
2012-01-06 59.419998 59.450001 58.869999 59.000000 8069400 51.459220
2012-01-09 59.029999 59.549999 58.919998 59.180000 6679300 51.616215
As a faster alternative you could just set the index when importing the file:
Open High Low Close Volume Adj Close
Date
2012-01-03 59.970001 61.060001 59.869999 60.330002 12668800 52.619235
2012-01-04 60.209999 60.349998 59.470001 59.709999 9593300 52.078475
2012-01-05 59.349998 59.619999 58.369999 59.419998 12768200 51.825539
2012-01-06 59.419998 59.450001 58.869999 59.000000 8069400 51.459220
2012-01-09 59.029999 59.549999 58.919998 59.180000 6679300 51.616215
You can resample (aggregate) the data using resample()
.
Here is a list of offset codes:
Alias | Description |
---|---|
B | business day frequency |
C | custom business day frequency (experimental) |
D | calendar day frequency |
W | weekly frequency |
M | month end frequency |
SM | semi-month end frequency (15th and end of month) |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
SMS | semi-month start frequency (1st and 15th) |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
BQ | business quarter endfrequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A | year end frequency |
BA | business year end frequency |
AS | year start frequency |
BAS | business year start frequency |
BH | business hour frequency |
H | hourly frequency |
T, min | minutely frequency |
S | secondly frequency |
L, ms | milliseconds |
U, us | microseconds |
N | nanoseconds |
DatetimeIndex(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06',
'2012-01-09', '2012-01-10', '2012-01-11', '2012-01-12',
'2012-01-13', '2012-01-17',
...
'2016-12-16', '2016-12-19', '2016-12-20', '2016-12-21',
'2016-12-22', '2016-12-23', '2016-12-27', '2016-12-28',
'2016-12-29', '2016-12-30'],
dtype='datetime64[ns]', name='Date', length=1258, freq=None)
When aggregating we need a mathematical rule to join the rows (mean,sum,count,etc…):
Open High Low Close Volume Adj Close
Date
2012-12-31 67.158680 67.602120 66.786520 67.215120 9.239015e+06 59.389349
2013-12-31 75.264048 75.729405 74.843055 75.320516 6.951496e+06 68.147179
2014-12-31 77.274524 77.740040 76.864405 77.327381 6.515612e+06 71.709712
2015-12-31 72.569405 73.064167 72.034802 72.491111 9.040769e+06 68.831426
2016-12-31 69.481349 70.019643 69.023492 69.547063 9.371645e+06 68.054229
Open High Low Close Volume Adj Close
Date
2012-12-31 77.599998 77.599998 76.690002 77.150002 38007300 68.568371
2013-12-31 81.209999 81.370003 80.820000 81.209999 25683700 73.929868
2014-12-31 87.080002 88.089996 86.480003 87.540001 22812400 81.707680
2015-12-31 90.800003 90.970001 89.250000 90.470001 80898100 84.914216
2016-12-31 74.500000 75.190002 73.629997 74.300003 35076700 73.233524
> df['Close'].resample('A').mean().plot(kind='bar');
+ plt.title('Yearly Mean Close Price for Walmart');
+ plt.tight_layout()
+ plt.show()
> df['Open'].resample('M').max().plot(kind='bar',figsize=(16,6))
+ plt.title('Monthly Max Opening Price for Walmart');
+ plt.tight_layout()
+ plt.show()
You could also create your own custom resampling function:
Open High Low Close Volume Adj Close
Date
2012-12-31 59.970001 61.060001 59.869999 60.330002 12668800 52.619235
2013-12-31 68.930000 69.239998 68.449997 69.239998 10390800 61.879708
2014-12-31 78.720001 79.470001 78.500000 78.910004 6878000 72.254228
2015-12-31 86.269997 86.720001 85.550003 85.900002 4501800 80.624861
2016-12-31 60.500000 61.490002 60.360001 61.459999 11989200 59.289713
Sometimes you may need to shift all your data up or down along the time series index:
SHIFT FORWARD
Open High Low Close Volume Adj Close
Date
2012-01-03 NaN NaN NaN NaN NaN NaN
2012-01-04 59.970001 61.060001 59.869999 60.330002 12668800.0 52.619235
2012-01-05 60.209999 60.349998 59.470001 59.709999 9593300.0 52.078475
2012-01-06 59.349998 59.619999 58.369999 59.419998 12768200.0 51.825539
2012-01-09 59.419998 59.450001 58.869999 59.000000 8069400.0 51.459220
SHIFT BACKWARDS
Open High Low Close Volume Adj Close
Date
2016-12-23 69.300003 69.820000 69.250000 69.699997 4435700.0 69.191240
2016-12-27 69.940002 70.000000 69.260002 69.309998 4875700.0 68.804087
2016-12-28 69.209999 69.519997 69.120003 69.260002 4298400.0 68.754456
2016-12-29 69.120003 69.430000 68.830002 69.120003 6889500.0 68.615479
2016-12-30 NaN NaN NaN NaN NaN NaN
SHIFTING BASED OFF TIME STRING CODE
Open High Low Close Volume Adj Close
Date
2012-01-31 59.970001 61.060001 59.869999 60.330002 12668800 52.619235
2012-01-31 60.209999 60.349998 59.470001 59.709999 9593300 52.078475
2012-01-31 59.349998 59.619999 58.369999 59.419998 12768200 51.825539
2012-01-31 59.419998 59.450001 58.869999 59.000000 8069400 51.459220
2012-01-31 59.029999 59.549999 58.919998 59.180000 6679300 51.616215
We can also create data based off of a rolling mean:
PLOT OF OPEN PRICES
7 DAY ROLLING MEAN
Open High Low Close Volume Adj Close
Date
2012-01-03 NaN NaN NaN NaN NaN NaN
2012-01-04 NaN NaN NaN NaN NaN NaN
2012-01-05 NaN NaN NaN NaN NaN NaN
2012-01-06 NaN NaN NaN NaN NaN NaN
2012-01-09 NaN NaN NaN NaN NaN NaN
2012-01-10 NaN NaN NaN NaN NaN NaN
2012-01-11 59.495714 59.895714 59.074285 59.440000 9.007414e+06 51.842984
2012-01-12 59.469999 59.744285 59.007143 59.321429 8.231357e+06 51.739567
2012-01-13 59.322857 59.638571 58.941428 59.297143 7.965071e+06 51.718386
2012-01-17 59.397143 59.708571 59.105714 59.358572 7.355329e+06 51.771963
2012-01-18 59.450000 59.791428 59.217143 59.502857 7.047043e+06 51.897808
2012-01-19 59.578572 59.960000 59.335715 59.707143 7.412086e+06 52.075984
2012-01-20 59.767143 60.180000 59.577143 59.988571 7.908014e+06 52.321443
2012-01-23 60.017143 60.387143 59.787143 60.204285 8.017800e+06 52.509586
2012-01-24 60.154286 60.672857 59.979999 60.474285 8.035857e+06 52.745077
2012-01-25 60.440000 60.958572 60.270000 60.749999 7.776786e+06 52.985553
2012-01-26 60.715714 61.205714 60.448571 60.910000 7.624814e+06 53.125103
2012-01-27 60.868572 61.361429 60.575714 61.010000 7.678514e+06 53.212323
2012-01-30 60.945715 61.445714 60.661428 61.108571 7.450271e+06 53.298295
2012-01-31 61.057143 61.491429 60.648571 61.158571 7.362086e+06 53.341905
PLOT WITH 30 DAY ROLLING MEAN
You can add a legend by making the rolling value a new column.
> df['Close: 30 Day Mean'] = df['Close'].rolling(window=30).mean()
+ df[['Close','Close: 30 Day Mean']].plot(figsize=(16,6));
+ plt.show()
Instead of setting a window you could also take into account everything from the selected Date
and before. For example, the \(x\) value represents the Date
and the \(y\) value represents the mean of all values up to that Date
.
Bollinger Bands are volatility bands placed above and below a moving average. The bands automatically widen when volatility increases and narrow when volatility decreases. According to Bollinger, the bands should contain 88-89% of price action, which makes a move outside the bands significant. Technically, prices are relatively high when above the upper band and relatively low when below the lower band.
> # \ is a line break
+ df['Close: 30 Day Mean'] = df['Close']. \
+ rolling(window=20).mean()
+ df['Upper'] = df['Close: 30 Day Mean'] + \
+ 2*df['Close'].rolling(window=20).std()
+ df['Lower'] = df['Close: 30 Day Mean'] - \
+ 2*df['Close'].rolling(window=20).std()
+ df[['Close','Close: 30 Day Mean','Upper','Lower']]. \
+ plot(figsize=(16,6));
+ plt.show()