00 Welcome

Welcome to the PyViz Tutorial! It will take you step by step to show you how to solve problems in web-based data exploration, visualization, and interactive-app development using open-source Python libraries, including the Anaconda -supported tools HoloViews , GeoViews , Bokeh , Datashader , and Param :

These libraries have been carefully designed to work together to address a very wide range of data-analysis and visualization tasks, making it simple to discover, understand, and communicate the important properties of your data.

This notebook serves as the homepage of the tutorial, including a table of contents listing each tutorial section, a general overview, links to demos illustrating the range of topics covered, and instructions to check that everything is downloaded and installed properly.

Index and Schedule

The tutorial outlined here has been given as a half-day or one-day course led by trained instructors. For self-paced usage, you should expect this material to take between 1 and 3 days if you do all of it. Sections 0, 1, 2, 3, and 4 contain the most crucial and basic introductory material, and should take a couple of hours of study. All later sections can be studied as needed or skipped if not relevant.

What is this all about?

Many of the activities of a data scientist or analyst require visualization, but it can be difficult to assemble a set of tools that cover all of the tasks involved. Initial exploration needs to be in a flexible, open-ended environment where it is simple to try out and test hypotheses. Once key aspects of the data have been identified, the analyst might prepare a specific image or figure to share with colleagues or a wider audience. Or, they might need to set up an interactive way to share a set of data that would be unwieldy as a fixed figure, using interactive controls to let others explore the effects of certain variables. Eventually, for particularly important data or use cases, the analyst might get involved in a long-term project to develop a full-featured web application or dashboard to deploy, allowing decision makers to interact directly with live data streams to make operational decisions.

With Python, initial exploration is typically in a Jupyter notebook, using tools like Matplotlib and Bokeh to develop static or interactive plots. These tools support a simple syntax for making certain kinds of plots, but showing more complex relationships in data can quickly turn into a major software development exercise, making it difficult to achieve understanding during exploration. Simple apps can be built using ipywidgets to control these visualizations, but the resulting combinations end up being tightly coupled to the notebook environment, unable to migrate into a standalone server context with an application that can be shared more widely. Bokeh includes widgets that can work in both notebook and server environments, but these can be difficult to work with for initial exploration. Bokeh and Matplotlib both also have limitations on how much data they can handle, in part because Bokeh requires the data to be put into the web browser's limited memory space.

In this tutorial we will be introducing a set of open-source Python libraries we have developed to streamline the process of working with small and large datasets (from a few points to billions) in a web browser, whether doing exploratory analysis, making simple widget-based tools, or building full-featured dashboards. The libraries in this ecosystem include:

  • Bokeh : Interactive plotting in web browsers, running JavaScript but controlled by Python
  • HoloViews : Declarative objects for instantly visualizable data, building Bokeh plots from convenient high-level specifications
  • GeoViews : Visualizable geographic data that that can be mixed and matched with HoloViews objects
  • Datashader : Rasterizing huge datasets quickly as fixed-size images
  • Param : Declaring user-relevant parameters, making it simple to work with widgets inside and outside of a notebook context

These projects can be used separately or together in a wide variety of different configurations to address different needs. For instance, if we focus on the needs of a data scientist/analyst who wants to understand the properties of their data, we can compare that to the approach suggested for a software developer wanting to build a highly custom software application for data of different sizes:

Here Datashader is used to make large datasets practical by rendering images outside the browser, either directly for a programmer or via a convenient high-level interface using HoloViews, and the results can be embedded in interactive Bokeh plots if desired, either as a static HTML plot, in a Jupyter notebook, or as a standaline application.

Behind the scenes, these tools rely on a wide range of other open-source libraries for their implementation, including:

  • Pandas : Convenient computation on columnar datasets (used by HoloViews and datashader)
  • Xarray : Convenient computations on multidimensional array datasets (used by HoloViews and Datashader)
  • Dask : Efficient out-of-core/distributed computation on massive datasets (used by Datashader)
  • Numba : Accelerated machine code for inner loops (used by Datashader)
  • Fastparquet : Efficient storage for columnar data
  • Cartopy : Support for geographical data (using a wide range of other libraries)

This tutorial will guide you through the process of using these tools together to build rich, high-performance, scalable, flexible, and deployable visualizations, apps, and dashboards, without having to use JavaScript or other web technologies explicitly, and without having to rewrite your code to move between each of the different tasks or phases from exploration to deployment. In each case, we'll try to draw your attention to libraries and approaches that help you get the job done, which in turn depend on many other unseen libraries in the scientific Python ecosystem to do the heavy lifting.

You will find extensive support material on the websites for each package. You may find these links particularly useful during the tutorial:

Getting set up

Please consult pyviz.org for the full instructions on installing the software used in these tutorials. Here is the condensed version of these instructions for UNIX-based systems (Linux or Mac OS X), assuming you have already downloaded and installed Anaconda or Miniconda:

! conda install -c pyviz pyviz
! pyviz --install-examples pyviz-tutorial
! pyviz --download-sample-data pyviz-tutorial/data
! cd pyviz-tutorial
! jupyter notebook

Once everything is installed, the following cell should print '1.9.5':

In [1]:
import holoviews as hv
hv.__version__
Out[1]:
1.9.5

And you should see the HoloViews, Bokeh, and Matplotlib logos after running the following cell:

In [2]:
hv.extension('bokeh', 'matplotlib')