interactive-visualization-resources-and-advice

Interactive Visualization in Science:

Resources and advice for Python + JavaScript

by Adam Willats

Choose your own adventure!

Table of Contents


Context / background

What do I want from visualization libraries ?

I'm interested in using python and javascript to interactively explore problems in computational neuroscience and neuroengineering. I'm also interested in using interactive visualizations to teach concepts to others. here's some of my work:

holoviz docs image of high level + low level image from HoloViz documentation

my priorities are:

Visualization library options

Here I focus on a subset of solutions which I think are most promising for the interactive use-case. see also pyviz.org or "Dynamic science viz.."1 for a more comprehensive evaluation. | library | language for computation | lots of data | custom js | easy to embed | interactivity | 3D plots | | ------------ | ------------------------ | ---------------------------------------- | ------------------------------ | ------------------------------------------------ | ------------------------------- | --- | | Plotly /Dash | python[^plotly_multi] | yes, webgl + datashader | sort of - via Dash 2 | yes, dash is more flexible, but more complicated | high | yes | | Bokeh | python (+ js) | yes, column-data & server solutions | yes | yes | high | yes-ish | | Altair | python | no [^alt_dat]️ | difficult | yes, through vega-lite | medium | no | | HoloViz | python | via datashader | yes | yes | high | yes, through plotly | | Observable | javascript[^why_obs] | yes[^obs_data], stream from server | yes 3 | yes, can also embed single cells | high | yes, through js libraries |

other libraries / ecosystems not investigated / compared here:

Lineage / taxonomy of libraries

breakdown of where libraries came from / were inspired by

Which plotting library should I use?

It depends on your goals and use case. Select the goals that line up most closely with what you want and I'll recommend a library to you.

Design philosophies & features

Ploty 🚧 One double-edged feature of Plotly is the gradient of multiple approaches to achieve the same plot:

While this flexible allows you to pick the right tool for the job, it makes looking through the documentation much more confusing.

Altair 🚧 The biggest part of the Altair learning curve for me was getting my data into the correct form. It was also disheartening to fall in love with the grammar of graphics approach, only to run into a wall when trying to plot many timeseries[^alt_dat].

Bokeh technical vision

HoloViz 🚧

Observable 🚧


Handling large datasets

large datasets in Bokeh

large datasets in Plotly

large datasets in Altair

large datasets in Observable

Misc. considerations a good storage system

Should you be saving data from Python with pickle, numpy binary, writing to csv, or something else?

Additional considerations

features which are good to have, but don’t strongly impact my user experience at the moment

:fa-table: go back to comparison table :fa-table:


Embedding results

Being able to share results and code with others, especially without them having to install a complex ecosystem of tools is useful, and good for open, reproducible science.

:fa-table: go back to comparison table :fa-table:


Interactivity

Shortlist - my favorite examples

Examples of useful interactivity

Custom callbacks:

In order to implement rich interactivity beyond preconstructed templates, it is useful to have control over the callbacks or functions which execute after another event.

for Altair-specific implementation notes see building blocks of interactivity

:fa-table: go back to comparison table :fa-table:


Useful plotting techniques

(see the New Python Data Visualization Tools repo :fa-github: by Stephanie Kirmer to compare plot-type implementations across Altair, Plotly, Bokeh) 🚧 to-do: embed examples for each of these 🚧

  1. think about explanatory versus exploratory data-viz

  2. faceting / small multiples:

    • scatter-plot matrix (aka SPLOM) - 💡this is always my starting point for visualizing complex data

      • observable article by Mike Bostock

        🌟 interactive demo 🌟 <iframe width="100%" height="600" frameborder="0" src="https://observablehq.com/embed/@d3/brushable-scatterplot-matrix?cells=viewof+selection">
      implementations
       - [comparisons :fa-github:](https://github.com/skirmer/new-py-dataviz/blob/main/facets.ipynb) by Stephanie Kirmer 
       - [seaborn](https://seaborn.pydata.org/tutorial/axis_grids.html)
          - I think the added value of marginal distributions visualized with [kernel-density estimates](https://seaborn.pydata.org/examples/joint_kde.html) is great.
          - Seaborn's [PairGrid implementation](https://seaborn.pydata.org/tutorial/axis_grids.html) is the best one I've seen for this in Python
          - although [R's `pairs.panels`](http://www.sthda.com/english/wiki/scatter-plot-matrices-r-base-graphs) seems to do something similar
          
       - [altair implementation](https://altair-viz.github.io/gallery/scatter_matrix.html) with linked behavior between panels
       - [plotly](https://plotly.com/python/splom/) [w/ customization using figure factory](https://plotly.com/python/v3/legacy/scatterplot-matrix/)
      
    • case study: correlation over time , article by Mike Freeman

      🌟 live, embedded demo 🌟 <iframe width="100%" height="600" frameborder="0" src="https://observablehq.com/embed/@observablehq/correlation-over-time?cells=facet_wrap">
  3. add tooltips on hover with useful detail

    • plotly implementation docs
    • customizing tooltips in altair github
  4. use interactive heatmaps

    • can nest / hierarchically organize a lot of dimensions
    • Clustergrammer demo, talk video
      • highly interactive heatmap for clustering genes associated with phenotypes
    • Plotly examples, docs
  5. parallel coordinates - ⚠️ primarily for exploratory data-viz ⚠️

    • 🌟 live observable demo by @sophigri 🌟 <iframe width="100%" height="584" frameborder="0" src="https://observablehq.com/embed/@sophiegri/exercise-3-parallel-coordinates?cells=paracoords">
    • 5 minute intro by Amit Kapoor

    • longer showcase of parallel coordinates by Kai Chang

    • practical, implementation tips:
      • plotly implementation
        • while there are parallel coordinates implementations in many python plotting packages, this is the only one I've found with the very useful feature of filtering each dimension into ranges as well as being able to reorder axes
      • order of dimensions matters a lot! use a tool where you can rearrange order
      • scaling / normalization matters a lot!
      • coloring by a key attribute can help dissect structure
      • interactivity is crucial to sort through the "hairball"
        • high bandwidth, but hard to parse
      • can be used to pick out clusters in high-dimensional parameter space
  6. replacing legends with direct text-annotation

  7. "banking to 45 degrees" i.e. choosing aspect ratios for plots that maximize discriminability

  8. meaningful color-scales


Sources, inspiration, more resources


Appendix


Further musings

The following are my personal opinions and not necessarily general recommendations:


Tidying data

Structuring data for visualization tools

Footnotes:

  1. Dynamic scientific visualizations in the browser for Python users by Patrick Mineault

  2. Dash tries to provide a pure-Python interface to mimic the roles of HTML, JS, CSS in traditional websites. > "Dash abstracts away all of the technologies and protocols that are required to build a full-stack web app with interactive data visualization." dash callbacks

  3. Observable's not JavaScript