The Pandas package has built-in capabilities for data visualization. It’s built-off of matplotlib.
> import seaborn as sns
+ import numpy as np
+ import pandas as pd
> df1 = pd.read_csv('df1',index_col=0)
> df2 = pd.read_csv('df2')
> df1.head()
> df2.head()
A B C D
2000-01-01 1.339091 -0.163643 -0.646443 1.041233
2000-01-02 -0.774984 0.137034 -0.882716 -2.253382
2000-01-03 -0.921037 -0.482943 -0.417100 0.478638
2000-01-04 -1.738808 -0.072973 0.056517 0.015085
2000-01-05 -0.905980 1.778576 0.381918 0.291436
a b c d
0 0.039762 0.218517 0.103423 0.957904
1 0.937288 0.041567 0.899125 0.977680
2 0.780504 0.008948 0.557808 0.797510
3 0.672717 0.247870 0.264071 0.444358
4 0.053829 0.520124 0.552264 0.190008
There are several plot types built-in to pandas, most of them statistical plots by nature:
You can also just call df.plot(kind='hist')
or replace that kind argument with any of the key terms shown in the list above (e.g. ‘box’,‘barh’, etc..)
There are a few different ways to call the plots.
> sns.set_style(style="darkgrid")
+ df1['A'].hist(bins=30, ec='yellow')
> df1['A'].plot(kind='hist', bins=30,
+ ec='white', color='green')
> df1['A'].plot.hist(bins=30, color='orange',
+ ec='black')
> df2.plot.area( alpha=0.4)
> df2.plot.bar(color=['cyan', 'orange', 'pink', 'purple'])
> df2.plot.bar(stacked=True,
+ color=['cyan', 'orange', 'pink', 'purple'])
> df1.reset_index().plot.line(x='index',y='B',
+ figsize=(12,3),lw=1,color='purple')
> df1.plot.scatter(x='A',y='B')
You can use c
to color based off another column value Use cmap to indicate colormap to use. For all the colormaps, check out: http://matplotlib.org/users/colormaps.html
> df1.plot.scatter(x='A',y='B',c='C',
+ cmap='coolwarm')
Or use s
to indicate size based off another column. s
parameter needs to be an array, not just the name of a column:
> df1.plot.scatter(x='A',y='B',
+ s=df1['C']*100)
> df2.plot.box(patch_artist=True)
+ # Can also pass a by= argument for groupby
Useful for Bivariate Data, alternative to scatterplot:
> df = pd.DataFrame(np.random.randn(1000, 2),
+ columns=['a', 'b'])
+ df.plot.hexbin(x='a',y='b',
+ gridsize=25,cmap='Oranges')
> df2['a'].plot.kde()
> df2.plot.density()