A2 Dashboard Workflow

Workflow for building and deploying interactive visualizations

Let's say you want to make it easy to explore some dataset, i.e.:

  • Make a visualization of the data
  • Maybe add some custom widgets to see the effects of some variables
  • Then deploy the result as a web app.

You can definitely do that in Python, but you would expect to:

  • Spend days of effort to get some initial prototype working in a Jupyter notebook, every time
  • Work hard to tame the resulting opaque mishmash of domain-specific, widget, and plotting code
  • Start over nearly from scratch whenever you need to:
    • Deploy in a standalone server
    • Visualize different aspects of your data
    • Scale up to larger (>100K) datasets

Step-by-step data-science workflow

Here we'll show a simple, flexible, powerful, step-by-step workflow, explaining which open-source tools solve each of the problems involved:

  • Step 1: Get some data
  • Step 2: Prototype a plot in a notebook
  • Step 3: Model your domain
  • Step 4: Get a widget-based UI for free
  • Step 5: Link your domain model to your visualization
  • Step 6: Widgets now control your interactive plots
  • Step 7: Deploy your dashboard
In [1]:
import holoviews as hv, geoviews as gv, param, dask.dataframe as dd, cartopy.crs as crs

from colorcet import cm, fire
from holoviews.operation import decimate
from holoviews.operation.datashader import datashade
from holoviews.streams import RangeXY
from geoviews.tile_sources import EsriImagery

Step 1: Get some data

  • Here we'll use a subset of the often-studied NYC Taxi dataset
  • About 12 million points of GPS locations from taxis
  • Stored in the efficient Parquet format for easy access
  • Loaded into a Dask dataframe for multi-core
    (and if needed, out-of-core or distributed) computation
In [2]:
%time df = dd.read_parquet('../data/nyc_taxi_wide.parq').persist()
print(len(df))
df.head(2)
CPU times: user 1.51 s, sys: 24 ms, total: 1.53 s
Wall time: 1.53 s
50001
Out[2]:
tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance pickup_x pickup_y dropoff_x dropoff_y fare_amount tip_amount dropoff_hour pickup_hour
0 2015-01-15 19:05:39 2015-01-15 19:23:42 1 1.59 -8236963.0 4975552.5 -8234835.5 4975627.0 12.0 3.25 19 19
1 2015-01-10 20:33:38 2015-01-10 20:53:28 1 3.30 -8237826.0 4971752.5 -8237020.5 4976875.0 14.5 2.00 20 20

Step 2: Prototype a plot in a notebook

  • A text-based representation isn't very useful for big datasets like this, so we need to build a plot
  • But we don't want to start a software project, so we use HoloViews:
    • Simple, declarative way to annotate your data for visualization
    • Large library of Elements with associated visual representation
    • Elements combine (lay out or overlay) easily
  • And we'll want live interactivity, so we'll use a Bokeh plotting extension
  • Result:
In [3]:
hv.extension('bokeh')