A2 Dashboard Workflow

Workflow for building and deploying interactive visualizations

Let's say you want to make it easy to explore some dataset, i.e.:

  • Make a visualization of the data
  • Maybe add some custom widgets to see the effects of some variables
  • Then deploy the result as a web app.

You can definitely do that in Python, but you would expect to:

  • Spend days of effort to get some initial prototype working in a Jupyter notebook, every time
  • Work hard to tame the resulting opaque mishmash of domain-specific, widget, and plotting code
  • Start over nearly from scratch whenever you need to:
    • Deploy in a standalone server
    • Visualize different aspects of your data
    • Scale up to larger (>100K) datasets

Step-by-step data-science workflow

Here we'll show a simple, flexible, powerful, step-by-step workflow, explaining which open-source tools solve each of the problems involved:

  • Step 1: Get some data
  • Step 2: Prototype a plot in a notebook
  • Step 3: Model your domain
  • Step 4: Get a widget-based UI for free
  • Step 5: Link your domain model to your visualization
  • Step 6: Widgets now control your interactive plots
  • Step 7: Deploy your dashboard
In [1]:
import holoviews as hv, geoviews as gv, param, paramnb, parambokeh, dask.dataframe as dd, cartopy.crs as crs

from colorcet import cm_n, fire
from holoviews.operation import decimate
from holoviews.operation.datashader import datashade
from holoviews.streams import RangeXY
from geoviews.tile_sources import EsriImagery

Step 1: Get some data

  • Here we'll use a subset of the often-studied NYC Taxi dataset
  • About 12 million points of GPS locations from taxis
  • Stored in the efficient Parquet format for easy access
  • Loaded into a Dask dataframe for multi-core
    (and if needed, out-of-core or distributed) computation
In [2]:
%time df = dd.read_parquet('../data/nyc_taxi_wide.parq').persist()
CPU times: user 4.09 s, sys: 1.62 s, total: 5.72 s
Wall time: 4.49 s
tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance pickup_x pickup_y dropoff_x dropoff_y fare_amount tip_amount dropoff_hour pickup_hour
0 2015-01-15 19:05:39 2015-01-15 19:23:42 1 1.59 -8236963.0 4975552.5 -8234835.5 4975627.0 12.0 3.25 19 19
1 2015-01-10 20:33:38 2015-01-10 20:53:28 1 3.30 -8237826.0 4971752.5 -8237020.5 4976875.0 14.5 2.00 20 20

Step 2: Prototype a plot in a notebook

  • A text-based representation isn't very useful for big datasets like this, so we need to build a plot
  • But we don't want to start a software project, so we use HoloViews:
    • Simple, declarative way to annotate your data for visualization
    • Large library of Elements with associated visual representation
    • Elements combine (lay out or overlay) easily
  • And we'll want live interactivity, so we'll use a Bokeh plotting extension
  • Result:
In [3]: