Pandas Visualization

Plot Types
Histograms
Area
Barplots
Line Plots
Scatter Plots
BoxPlots
Hexagonal Bin Plot
KDE plot

Pandas Visualization

The Pandas package has built-in capabilities for data visualization. It’s built-off of matplotlib.

Imports

> import seaborn as sns
+ import numpy as np
+ import pandas as pd

The Data

> df1 = pd.read_csv('df1',index_col=0)
> df2 = pd.read_csv('df2')
> df1.head()
> df2.head()

                   A         B         C         D
2000-01-01  1.339091 -0.163643 -0.646443  1.041233
2000-01-02 -0.774984  0.137034 -0.882716 -2.253382
2000-01-03 -0.921037 -0.482943 -0.417100  0.478638
2000-01-04 -1.738808 -0.072973  0.056517  0.015085
2000-01-05 -0.905980  1.778576  0.381918  0.291436

          a         b         c         d
0  0.039762  0.218517  0.103423  0.957904
1  0.937288  0.041567  0.899125  0.977680
2  0.780504  0.008948  0.557808  0.797510
3  0.672717  0.247870  0.264071  0.444358
4  0.053829  0.520124  0.552264  0.190008

Plot Types

There are several plot types built-in to pandas, most of them statistical plots by nature:

df.plot.area
df.plot.barh
df.plot.density
df.plot.hist
df.plot.line
df.plot.scatter
df.plot.bar
df.plot.box
df.plot.hexbin
df.plot.kde
df.plot.pie

You can also just call df.plot(kind='hist') or replace that kind argument with any of the key terms shown in the list above (e.g. ‘box’,‘barh’, etc..)

Histograms

There are a few different ways to call the plots.

> sns.set_style(style="darkgrid")
+ df1['A'].hist(bins=30, ec='yellow')

> df1['A'].plot(kind='hist', bins=30, 
+ ec='white', color='green')

> df1['A'].plot.hist(bins=30, color='orange',
+ ec='black')

Area

> df2.plot.area( alpha=0.4)

Barplots

> df2.plot.bar(color=['cyan', 'orange', 'pink', 'purple'])

> df2.plot.bar(stacked=True, 
+ color=['cyan', 'orange', 'pink', 'purple'])

Line Plots

> df1.reset_index().plot.line(x='index',y='B',
+               figsize=(12,3),lw=1,color='purple')

Scatter Plots

> df1.plot.scatter(x='A',y='B')

You can use c to color based off another column value Use cmap to indicate colormap to use. For all the colormaps, check out: http://matplotlib.org/users/colormaps.html

> df1.plot.scatter(x='A',y='B',c='C',
+       cmap='coolwarm')

Or use s to indicate size based off another column. s parameter needs to be an array, not just the name of a column:

> df1.plot.scatter(x='A',y='B',
+       s=df1['C']*100)

BoxPlots

> df2.plot.box(patch_artist=True)
+ # Can also pass a by= argument for groupby

Hexagonal Bin Plot

Useful for Bivariate Data, alternative to scatterplot:

> df = pd.DataFrame(np.random.randn(1000, 2), 
+     columns=['a', 'b'])
+ df.plot.hexbin(x='a',y='b',
+       gridsize=25,cmap='Oranges')

KDE plot

> df2['a'].plot.kde()

> df2.plot.density()

Pandas Built-in Data Visualization

Python code in R Markdown

Paul Jozefek

2020-09-16