Dash

Dash is a productive Python framework for building web applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It’s particularly suited for anyone who works with data in Python.

pip install dash==0.21.0  # The core dash backend
pip install dash-renderer==0.11.3  # The dash front-end
pip install dash-html-components==0.9.0  # HTML components
pip install dash-core-components==0.21.0  # Supercharged components
pip install plotly --upgrade  # Plotly graphing library used in examples

If running a Dash Python application on a remote server (GCP/AWS), the application will not work on the default Local IP (http:127.0.0.1:8050/). Adding the below arguments to server() will run the Dash application on port 8050 of the Public IP of the virtual machine. Note: TCP ingress to port 8050 must be allowed by the firewall.

if __name__ == '__main__': 
    app.run_server(port=8050, host='0.0.0.0')

Overview

Enterococcus is a fecal indicating bacteria that lives in the intestines of humans and other warm-blooded animals. Enterococcus (“Entero”) counts are useful as a water quality indicator due to their abundance in human sewage, correlation with many human pathogens and low abundance in sewage free environments. The United States Environmental Protection Agency (EPA) reports Entero counts as colonies (or cells) per 100 ml of water. The organization Riverkeeper has based its assessment of acceptable water quality on the 2012 Federal Recreational Water Quality Criteria from the US EPA. Unacceptable water is based on an illness rate of 32 per 1000 swimmers. The federal standard for unacceptable water quality is a single sample value of greater than 110 Enterococcus/100 mL, or five or more samples with a geometric mean (a weighted average) greater than 30 Enterococcus/100 mL. Enterococcus levels in the Hudson River can be found here. Data have not been cleaned and needs to be cleaned. Each question should be a separate dash app. A single app.py for each will be sufficient.

Dowload Data

wget https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20608/riverkeeper_data_2013.csv

Import Data

import pandas as pd
df = pd.read_csv("riverkeeper_data_2013.csv", parse_dates=['Date'])
df.head()

Question 1

You’re a civic hacker and kayak enthusiast who just came across this dataset. You’d like to create an app that recommends launch sites to users. Ideally an app like this will use live data to give current recommendations, but you’re still in the testing phase. Create a prototype that allows a user to pick a date, and will give its recommendations for that particular date. Think about your recommendations. You’re given federal guidelines above, but you may still need to make some assumptions about which sites to recommend. Consider your audience. Users will appreciate some information explaining why a particular site is flagged as unsafe, but they’re not scientists.

Data Dictionary

Entero Count: Enterococcus (“Entero”) is a fecal indicating bacterium that lives in the intestines of humans and other warm-blooded animals. Days Total Rain: The combined rainfall for the day of sampling, prior day, two days prior and three days prior. More than 1/4 inch is considered a “wet weather” sample. Number of Samples: Total number of samples included in these calculations. Geometric Mean: A measure of central tendency (a weighted average) used by NYS DEC and the US EPA to assess water quality. The geometric mean is defined as the nth root (where n is the number of samples) of the product of the Enterococcus measurements. A geometric mean over 30 fails the EPA criteria for safe primary contact.

Geometric Mean

The geometric mean is indeterminate. The Entero Count \(\textrm{EC}_{ i }\) from each of the \(n\) samples is needed to calculate the product \(\prod _{ i=1 }^{ n }{ \textrm{EC}_{ i } }\) which is then reduced to its \(n\)th root. If you look at the website, they give you the total Entero Count \(\sum _{ i=1 }^{ n }{ \textrm{EC}_{ i } }\), sample size \(n\), and geometric mean \({ \mu }_{ g }\). Nowhere are the individual \(\textrm{EC}_{ i }\) given. Therefore, for purposes of this visualization exercise, the geometric mean of the observations from the preceding 5-days is being used.

Pre-process Data

import pandas as pd
df = pd.read_csv("riverkeeper_data_2013.csv", parse_dates=['Date'])
df.dtypes

Site                        object
Date                datetime64[ns]
EnteroCount                 object
FourDayRainTotal           float64
SampleCount                  int64
dtype: object

from scipy.stats.mstats import gmean
df = df.set_index(['Site','Date'], drop = True).sort_index()
df['EnteroCount'] = df['EnteroCount'].replace('[^\d]', '', regex=True).astype(int)
df['GeometricMean'] = df.EnteroCount.groupby(level='Site') \
    .apply(lambda x: x.rolling(5, min_periods=1).apply(gmean))
df['OK_EPA'] = df['EnteroCount'] <= 60
df['OK_RK'] = df['GeometricMean'] <= 30
df.head()

import emoji
df['EPA'] = (df['EnteroCount'] <= 60).astype(int)
df['RK'] = (df['GeometricMean'] <= 30).astype(int)
df['Both'] = df['EPA'] + df['RK']
img1 = emoji.emojize(':poop:', use_aliases=True)
img2 = emoji.emojize(':droplet:', use_aliases=True)
img3 = emoji.emojize(':skull:', use_aliases=True)
img4 = emoji.emojize(':see_no_evil:', use_aliases=True)
img5 = emoji.emojize(':thumbsup:', use_aliases=True)
df['EPA'] = df['EPA'].replace([0,1], [img1, img2])
df['RK'] = df['RK'].replace([0,1], [img1, img2])
df['Both'] = df['Both'].replace([0,1,2], [img3, img4, img5])
df.head(12)

Plotly Setup

import plotly.offline as offline
import plotly.graph_objs as go
offline.init_notebook_mode(connected=True)
trace = go.Scatter(
    x = df.loc['125th St. Pier']['EnteroCount'].index,
    y = df.loc['125th St. Pier']['EnteroCount'],
    mode = 'lines+markers',
    text = df.loc['125th St. Pier']['Both']
    )
layout = go.Layout(
    title='Enterococcus Time Series',
    yaxis = dict(title = 'Enterococcus'),
    xaxis=dict(
        title = 'Date Sampled',
        rangeselector=dict(
            buttons=list([
                dict(count=6, label='6M', step='month', stepmode='backward'),
                dict(count=1, label='YTD', step='year', stepmode='todate'),
                dict(count=1, label='1Y', step='year', stepmode='backward'),
                dict(label='All', step='all')
            ])
        ),
        rangeslider=dict(),
        type='date'
    ),
    annotations=[dict(
            x = '2013-10-16',
            y = df.loc['125th St. Pier'].loc['2013-10-16']['EnteroCount'],
            text = 'EPA Rating: ' + df.loc['125th St. Pier'].loc['2013-10-16']['EPA'] + \
                '<br>Riverkeeper: ' + df.loc['125th St. Pier'].loc['2013-10-16']['RK'] + \
                '<br>Recommendation: ' + df.loc['125th St. Pier'].loc['2013-10-16']['Both'],
            textangle = 0,
            ax = 0,
            ay = -75,
            font = dict(color = "black", size = 12)
    )]
)
fig = go.Figure(data = [trace], layout = layout)
offline.iplot(fig)

Dash Syntax

# -*- coding: utf-8 -*-
# DASH LIBRARIES
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
# NON-DASH LIBRARIES 
import emoji, pandas as pd
from scipy.stats.mstats import gmean
import plotly.offline as offline
import plotly.graph_objs as go
df = pd.read_csv("riverkeeper_data_2013.csv", parse_dates=['Date'])
df = df.set_index(['Site','Date'], drop = True).sort_index()
df['EnteroCount'] = df['EnteroCount'].replace('[^\d]', '', regex=True).astype(int)
df['GeometricMean'] = df.EnteroCount.groupby(level='Site') \
    .apply(lambda x: x.rolling(5, min_periods=1).apply(gmean))
df['EPA'] = (df['EnteroCount'] <= 60).astype(int)
df['RK'] = (df['GeometricMean'] <= 30).astype(int)
df['Both'] = df['EPA'] + df['RK']
img1 = emoji.emojize(':poop:', use_aliases=True)
img2 = emoji.emojize(':droplet:', use_aliases=True)
img3 = emoji.emojize(':skull:', use_aliases=True)
img4 = emoji.emojize(':see_no_evil:', use_aliases=True)
img5 = emoji.emojize(':thumbsup:', use_aliases=True)
df['EPA'] = df['EPA'].replace([0,1], [img1, img2])
df['RK'] = df['RK'].replace([0,1], [img1, img2])
df['Both'] = df['Both'].replace([0,1,2], [img3, img4, img5])
sites = df.index.get_level_values(0).unique()
app = dash.Dash()
app.layout = html.Div([
    html.H1('Hudson River'),
    html.H2('Site'),
    dcc.Dropdown(
        id='dropdown-site',
        options=[{'label': i, 'value': i} for i in sites],
        placeholder="Select Site",
        clearable=False,
        value='125th St. Pier'
    ),
    html.H2('Date'),
    dcc.Dropdown(id='dropdown-date', 
                 value='2013-10-16'),
    html.H2('Findings'),
    dcc.Graph(id='graph-with-slider')
])
@app.callback(Output('dropdown-date', 'options'),
              [Input('dropdown-site', 'value')])
def update_category_options(site):
    dates = df.loc[site].index.get_level_values(0).unique().sort_values(ascending=False)
    return [{'label': k, 'value': k} for k in dates]
@app.callback(Output('graph-with-slider', 'figure'),
              [Input('dropdown-site', 'value'),
               Input('dropdown-date', 'value')])
def update_output(input1, input2):
    trace = go.Scatter(
        x = df.loc[input1]['EnteroCount'].index,
        y = df.loc[input1]['EnteroCount'],
        mode = 'lines+markers',
        text = df.loc[input1]['Both']
        )
    layout = go.Layout(
        title='Enterococcus Levels',
        yaxis = dict(title = 'Enterococcus'),
        xaxis=dict(
            title = 'Date Sampled',
            rangeselector=dict(
                buttons=list([
                    dict(count=6, label='6M', step='month', stepmode='backward'),
                    dict(count=1, label='YTD', step='year', stepmode='todate'),
                    dict(count=1, label='1Y', step='year', stepmode='backward'),
                    dict(label='All', step='all')
                ])
            ),
            rangeslider=dict(),
            type='date'
        ),
        annotations=[dict(
                x = input2,
                y = df.loc[input1].loc[input2]['EnteroCount'],
                text = 'EPA Rating: ' + df.loc[input1].loc[input2]['EPA'] + \
                    '<br>Riverkeeper: ' + df.loc[input1].loc[input2]['RK'] + \
                    '<br>Recommendation: ' + df.loc[input1].loc[input2]['Both'],
                textangle = 0,
                ax = 0,
                ay = -75,
                font = dict(color = "black", size = 12)
        )]
    )
    return {'data': [trace], 'layout': layout}
if __name__ == '__main__':
    app.run_server(debug=True, port=8050, host='0.0.0.0')

Dash Output

Sources Agree (Good)	Sources Disagree	Sources Agree (Bad)

Question 2

This time you are building an app for scientists. You’re a public health researcher analyzing this data. You would like to know if there’s a relationship between the amount of rain and water quality. Create an exploratory app that allows other researchers to pick different sites and compare this relationship.

Kendall Correlation

The most commonly used measure of correlation is Pearson correlation (\(r\)). Pearson correlation measures the relationship between normal linear homoskedastic variables. Measuring relationships between variables that are not normal, linear, or homoskedastic (inherently or through transformation) with the Pearson correlation formula produces misleading results. Spearman correlation (\(\rho\)) and Kendall correlation (\(\tau\)) are non-parametric measures of correlation based on monotonic rank. Spearman correlation is more computationally efficient than Kendall correlation, but less robust.

Pre-process Data

import pandas as pd
df = pd.read_csv("riverkeeper_data_2013.csv", parse_dates=['Date'])
df = df.set_index(['Site','Date'], drop = True).sort_index()
df['EnteroCount'] = df['EnteroCount'].replace('[^\d]', '', regex=True).astype(int)
df = df.drop('SampleCount', axis=1)
df.head()

EntroRain = df.groupby('Site')[['EnteroCount','FourDayRainTotal']] \
    .corr('kendall').iloc[::2] \
    .reset_index(1, drop=True) \
    .drop('EnteroCount', axis=1) \
    .rename(columns={'FourDayRainTotal': 'Correlation'})
EntroRain.head()

Plotly Setup

Scatter Plot

import plotly.offline as offline
import plotly.graph_objs as go
import numpy as np
offline.init_notebook_mode(connected=True)
trace0 = go.Scatter(
    x=df.drop(['125th St. Pier'])['FourDayRainTotal'],
    y=df.drop(['125th St. Pier'])['EnteroCount'],
    mode= 'markers',
    hoverinfo='none',
    marker=dict(
        size = '10',
        color = 'rgba(204,204,204,1)'
    )
)
trace1 = go.Scatter(
    x=df.loc['125th St. Pier']['FourDayRainTotal'],
    y=df.loc['125th St. Pier']['EnteroCount'],
    mode='markers',
    name='125th St. Pier',
    marker=dict(
        size = '10',
        color = 'rgba(222,45,38,0.8)'
    )
)
layout = go.Layout(
    title='Enterococcus-Rain Relationship',
    yaxis = dict(title = 'Enterococcus (log)', 
                 type = "log",
                 showticklabels=False),
    xaxis = dict(title = 'Four Day Rain Total (inches)'),
    showlegend=False
)
fig = go.Figure(data = [trace0, trace1], layout = layout)
offline.iplot(fig)

Correlation Bar

import plotly.offline as offline
import plotly.graph_objs as go
import numpy as np
offline.init_notebook_mode(connected=True)
bins = np.arange(-1.0, 1.0, 0.1)
c = ['hsl('+str(h)+',50%'+',50%)' for h in np.linspace(0, 360, len(bins))]
traceData = [] # pandas dataframe info for traces
for i in range(0, len(bins)):
    k = i if (i <= np.floor(len(bins)/2).astype(int)) else (len(bins) - i + 10)
    trace_iter = go.Bar(
        x = [bins[k]],
        y = ['Correlation'],
        base = 0,
        hoverinfo = 'none',
        orientation = 'h',
        marker=dict(
            color=c[k]
        )
    )
    traceData.append(trace_iter)     
trace = go.Scatter(
    x = EntroRain.loc['125th St. Pier'],
    y = ['Correlation'],
    mode = 'markers',
    name='125th St. Pier',
    marker=dict(
        symbol = 'star-diamond-dot',
        size = '100',
        color = 'yellow'
    ),
)
layout = go.Layout(
    yaxis = dict(showticklabels=False),
    xaxis = dict(zeroline=False),
    showlegend=False,
    barmode = 'stack'
)
traceData.append(trace)
fig = go.Figure(data=traceData, layout=layout)
offline.iplot(fig)

Plot with Inlet

import plotly.offline as offline
import plotly.graph_objs as go
import numpy as np
offline.init_notebook_mode(connected=True)
######### MAIN PLOT
trace0 = go.Scatter(
    x=df.drop(['125th St. Pier'])['FourDayRainTotal'],
    y=df.drop(['125th St. Pier'])['EnteroCount'],
    mode= 'markers',
    hoverinfo='none',
    marker=dict(
        size = '10',
        color = 'rgba(204,204,204,1)'
    )
)
trace1 = go.Scatter(
    x=df.loc['125th St. Pier']['FourDayRainTotal'],
    y=df.loc['125th St. Pier']['EnteroCount'],
    mode='markers',
    name='125th St. Pier',
    marker=dict(
        size = '10',
        color = 'rgba(222,45,38,0.8)'
    )
)
############ INLET PLOT
bins = np.arange(-1.0, 1.0, 0.1)
c = ['hsl('+str(h)+',50%'+',50%)' for h in np.linspace(0, 360, len(bins))]
traceData = [] # info for traces
for i in range(0, len(bins)):
    k = i if (i <= np.floor(len(bins)/2).astype(int)) else (len(bins) - i + 10)
    trace_iter = go.Bar(
        x = [bins[k]],
        y = ['Correlation'],
        base = 0,
        hoverinfo = 'none',
        orientation = 'h',
        marker=dict(
            color=c[k]
        ),
        xaxis='x2',
        yaxis='y2'
    )
    traceData.append(trace_iter)     
trace2 = go.Scatter(
    x = EntroRain.loc['125th St. Pier'],
    y = ['Correlation'],
    mode = 'markers',
    name='125th St. Pier',
    marker=dict(
        symbol = 'star-diamond-dot',
        size = '15',
        color = 'yellow'
    ),
    xaxis='x2',
    yaxis='y2'
)
traceData.append(trace0)
traceData.append(trace1)
traceData.append(trace2)
layout = go.Layout(
    title='Enterococcus-Rain Relationship',
    yaxis = dict(title = 'Enterococcus (log)', 
                 type = "log",
                 showticklabels=False),
    xaxis = dict(title = 'Four Day Rain Total (inches)'),
    xaxis2=dict(
        zeroline=False,
        domain=[0.75, 0.95],
        anchor='y2'
    ),
    yaxis2=dict(
        showticklabels=False,
        domain=[0.85, 0.95],
        anchor='x2'
    ),
    showlegend=False,
    barmode = 'stack'
)
fig = go.Figure(data = traceData, layout = layout)
offline.iplot(fig)

Dash Syntax

# -*- coding: utf-8 -*-
# DASH LIBRARIES
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
# NON-DASH LIBRARIES 
import numpy as np, pandas as pd
import plotly.offline as offline
import plotly.graph_objs as go
df = pd.read_csv("riverkeeper_data_2013.csv", parse_dates=['Date'])
df = df.set_index(['Site','Date'], drop = True).sort_index()
df['EnteroCount'] = df['EnteroCount'].replace('[^\d]', '', regex=True).astype(int)
df = df.drop('SampleCount', axis=1)
EntroRain = df.groupby('Site')[['EnteroCount','FourDayRainTotal']] \
    .corr('kendall').iloc[::2] \
    .reset_index(1, drop=True) \
    .drop('EnteroCount', axis=1) \
    .rename(columns={'FourDayRainTotal': 'Correlation'})
sites = df.index.get_level_values(0).unique()
app = dash.Dash()
app.layout = html.Div([
    html.H1('Hudson River'),
    html.H2('Site'),
    dcc.Dropdown(
        id='dropdown-site',
        options=[{'label': i, 'value': i} for i in sites],
        placeholder="Select Site",
        clearable=False,
        value='125th St. Pier'
    ),
    html.H2('Correlations'),
    dcc.Graph(id='graph-with-inlet'),
])
@app.callback(Output('graph-with-inlet', 'figure'),
              [Input('dropdown-site', 'value')])
def update_output(input1):
    ######### MAIN PLOT
    trace0 = go.Scatter(
        x = df.drop([input1])['FourDayRainTotal'],
        y = df.drop([input1])['EnteroCount'],
        mode = 'markers',
        hoverinfo = 'none',
        marker = dict(
            size = '10',
            color = 'rgba(204,204,204,1)'
        )
    )
    trace1 = go.Scatter(
        x = df.loc[input1]['FourDayRainTotal'],
        y = df.loc[input1]['EnteroCount'],
        mode = 'markers',
        name = input1,
        marker = dict(
            size = '10',
            color = 'rgba(222,45,38,0.8)'
        )
    )
    ############ SUB-PLOT
    bins = np.arange(-1.0, 1.0, 0.1)
    c = ['hsl('+str(h)+',50%'+',50%)' for h in np.linspace(0, 360, len(bins))]
    traceData = [] # info for traces
    for i in range(0, len(bins)):
        k = i if (i <= np.floor(len(bins)/2).astype(int)) else (len(bins) - i + 10)
        trace_iter = go.Bar(
            x = [bins[k]],
            y = ['Correlation'],
            base = 0,
            hoverinfo = 'none',
            orientation = 'h',
            marker=dict(
                color=c[k]
            ),
            xaxis='x2',
            yaxis='y2'
        )
        traceData.append(trace_iter)     
    trace2 = go.Scatter(
        x = EntroRain.loc[input1],
        y = ['Correlation'],
        mode = 'markers',
        name = input1,
        marker=dict(
            symbol = 'star-diamond-dot',
            size = '15',
            color = 'yellow'
        ),
        xaxis='x2',
        yaxis='y2'
    )
    traceData.append(trace0)
    traceData.append(trace1)
    traceData.append(trace2)
    layout = go.Layout(
        title='Enterococcus-Rain Relationship',
        yaxis = dict(title = 'Enterococcus (log)',
                     type = "log",
                     showticklabels=False),
        xaxis = dict(title = 'Four Day Rain Total (inches)'),
        xaxis2=dict(
            zeroline=False,
            domain=[0.75, 0.95],
            anchor='y2'
        ),
        yaxis2=dict(
            showticklabels=False,
            domain=[0.85, 0.95],
            anchor='x2'
        ),
        showlegend=False,
        barmode = 'stack'
    )
    return {'data': traceData, 'layout': layout}
if __name__ == '__main__':
    app.run_server(debug=True, port=8050, host='0.0.0.0')

Dash Output

Multi-Page Apps

Dash renders web applications as a “single-page app”. This means that the application does not completely reload when the user navigates the application, making browsing very fast. There are two components that aid page navigation: dash_core_components.Location represents the location bar in your web browser through the pathname property and dash_core_components.Link.

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
app = dash.Dash()
app.config.suppress_callback_exceptions = True
###############################################################
# URL BAR
###############################################################
app.layout = html.Div([
    dcc.Location(id='url', refresh=False),
    html.Div(id='page-content'),
])
app.css.append_css({
    'external_url': 'https://codepen.io/chriddyp/pen/bWLwgP.css'
})
###############################################################
# HOME PAGE
###############################################################
markdown_Q1 = 'Question 1 text'
markdown_Q2 = 'Question 2 text'
index_page = html.Div([
    html.H1('Interactive Data Visualizations with Dash'),
    html.H2('Jose Zuniga'),
    dcc.Markdown(children=markdown_Q1),
    dcc.Markdown(children=markdown_Q2),
    dcc.Link('Go to Question 1 Solution', href='/app-1'),
    html.Br(),
    dcc.Link('Go to Question 2 Solution', href='/app-2'),
])
@app.callback(dash.dependencies.Output('page-content', 'children'),
              [dash.dependencies.Input('url', 'pathname')])
def display_page(pathname):
    if pathname == '/app-1':
        return page_1_layout
    elif pathname == '/app-2':
        return page_2_layout
    else:
        return index_page
###############################################################
# APPLICATION 1
###############################################################
# 
# insert code with unique variables and functions
# 
###############################################################
# APPLICATION 1
###############################################################
# 
# insert code with unique variables and functions
# 
page_1_layout = html.Div([
    dcc.Link('Go to Question 2 Solution', href='/app-2'),
    html.Br(),
    dcc.Link('Go back to Home Page', href='/'),
    # 
    # more code with unique variables and functions
    # 
])
# 
# more code with unique variables and functions
# 
###############################################################
# APPLICATION 1
###############################################################
# 
# insert code with unique variables and functions
# 
page_2_layout = html.Div([
    dcc.Link('Go to Question 2 Solution', href='/app-2'),
    html.Br(),
    dcc.Link('Go back to Home Page', href='/'),
    # 
    # more code with unique variables and functions
    # 
])  
# 
# more code with unique variables and functions
# 
###############################################################
# APPLICATION 1
###############################################################
if __name__ == '__main__':
    app.run_server(debug=False, port=8050, host='0.0.0.0')