by Adam Willats
I'm interested in using python and javascript to interactively explore problems in computational neuroscience and neuroengineering. I'm also interested in using interactive visualizations to teach concepts to others. here's some of my work:
image from HoloViz documentation
my priorities are:
Here I focus on a subset of solutions which I think are most promising for the interactive use-case. see also pyviz.org or "Dynamic science viz.."1 for a more comprehensive evaluation. | library | language for computation | lots of data | custom js | easy to embed | interactivity | 3D plots | | ------------ | ------------------------ | ---------------------------------------- | ------------------------------ | ------------------------------------------------ | ------------------------------- | --- | | Plotly /Dash | python[^plotly_multi] | yes, webgl + datashader | sort of - via Dash 2 | yes, dash is more flexible, but more complicated | high | yes | | Bokeh | python (+ js) | yes, column-data & server solutions | yes | yes | high | yes-ish | | Altair | python | no [^alt_dat]️ | difficult | yes, through vega-lite | medium | no | | HoloViz | python | via datashader | yes | yes | high | yes, through plotly | | Observable | javascript[^why_obs] | yes[^obs_data], stream from server | yes 3 | yes, can also embed single cells | high | yes, through js libraries |
matplotlib - the de facto standard for (non-interactive, 2D) plotting in Python
plotnine - like Altair, this is a Python library which implements a grammar of graphics approach to plotting
Streamlit - python dashboarding library becoming increasingly popular
weights and biases - visualization & logging for machine learning
see also:
Bokeh is inspired by, but not built on D3.js
HoloViz is a very high-level tool
Grammar of Graphics (see sources for more info)
Pandas DataFrames
It depends on your goals and use case. Select the goals that line up most closely with what you want and I'll recommend a library to you.
matplotlib
Seaborn
Altair
Bokeh
Plotly
HoloViz
Ploty 🚧 One double-edged feature of Plotly is the gradient of multiple approaches to achieve the same plot:
While this flexible allows you to pick the right tool for the job, it makes looking through the documentation much more confusing.
Altair 🚧 The biggest part of the Altair learning curve for me was getting my data into the correct form. It was also disheartening to fall in love with the grammar of graphics approach, only to run into a wall when trying to plot many timeseries[^alt_dat].
Bokeh technical vision
HoloViz 🚧
Observable 🚧
ColumnDataSource
[docs] allows more flexible partial loading of data leading to better performance with large datasets
CDSView
[docs]Interactive chart slow with large number of data points
scaleable-vega for lots of data-points
just adding data to a database seems to dramatically slow down altair, even if the data isn't actively being rendered
[++] Streaming Data
also discussion here: altair discussion
for local exploration, can use altair-data-server
This notebook shows an example of using the Altair data server, a lightweight plugin for Altair that lets you efficiently and transparently work with larger datasets.
Should you be saving data from Python with pickle, numpy binary, writing to csv, or something else?
.csv
give the ability to visually inspect the integrity of the data which is very usefulColumnDataSource
as in Bokeh, although I haven't tried this yetfeatures which are good to have, but don’t strongly impact my user experience at the moment
Being able to share results and code with others, especially without them having to install a complex ecosystem of tools is useful, and good for open, reproducible science.
Observable
Plotly
get_embed()
docs , an exampleAltair
shell_html = io.StringIO()
chart.save(shell_html,'html')
return shell_html
Bokeh
Embedding Bokeh content docs
Standalone documents These documents don’t require a Bokeh server to work. They may have many tools and interactions such as custom JavaScript callbacks but are otherwise nothing but HTML, CSS, and JavaScript. These documents can be embedded into other HTML pages as one large document or as a set of sub-components with individual templating.
file_html()
or json_item()
to get standalone componentsBokeh applications These applications require a Bokeh server to work. Having a Bokeh server lets you connect events and tools to real-time Python callbacks that execute on the server. For more information about creating and running Bokeh apps, see Running a Bokeh server.
Code for embedding using various servers - examples repo
Filtering (aka brushing) - dynamic queries
Linking / cross-filtering - connecting behavior across subplots
faceting / small multiples - prerequisite for cross-filtering
Linked filtering - Plotly R demo
Altair
Bokeh
Highlights / tooltips - responsive annotation ties different representations together
Plotly / Dash
Bokeh
Altair
One of the unique features of Altair, inherited from Vega-Lite, is a declarative grammar of not just visualization, but interaction.
In order to implement rich interactivity beyond preconstructed templates, it is useful to have control over the callbacks
or functions which execute after another event.
Dash has it's own pseudo-javascript interface to callbacks:
Bokeh has very straightforward integration with custom JS callbacks!
Altair / vega-lite
Altair does not offer any way to register event handlers, beyond what's available in the Vega-Lite spec. That would have to be done in Javascript via the Vega view API
ObservableHQ:
HoloViz
Linking objects in Python is often very convenient, because it allows writing code entirely in Python. However, it also requires a live Python kernel. If instead we want a static example (e.g. on a simple website or in an email) to have custom interactivity, or we simply want to avoid the overhead of having to call back into Python, we can define links in JavaScript.
for Altair-specific implementation notes see building blocks of interactivity
(see the New Python Data Visualization Tools repo :fa-github: by Stephanie Kirmer to compare plot-type implementations across Altair, Plotly, Bokeh) 🚧 to-do: embed examples for each of these 🚧
think about explanatory versus exploratory data-viz
faceting / small multiples:
scatter-plot matrix (aka SPLOM) - 💡this is always my starting point for visualizing complex data
observable article by Mike Bostock
- [comparisons :fa-github:](https://github.com/skirmer/new-py-dataviz/blob/main/facets.ipynb) by Stephanie Kirmer
- [seaborn](https://seaborn.pydata.org/tutorial/axis_grids.html)
- I think the added value of marginal distributions visualized with [kernel-density estimates](https://seaborn.pydata.org/examples/joint_kde.html) is great.
- Seaborn's [PairGrid implementation](https://seaborn.pydata.org/tutorial/axis_grids.html) is the best one I've seen for this in Python
- although [R's `pairs.panels`](http://www.sthda.com/english/wiki/scatter-plot-matrices-r-base-graphs) seems to do something similar
- [altair implementation](https://altair-viz.github.io/gallery/scatter_matrix.html) with linked behavior between panels
- [plotly](https://plotly.com/python/splom/) [w/ customization using figure factory](https://plotly.com/python/v3/legacy/scatterplot-matrix/)
case study: correlation over time , article by Mike Freeman
add tooltips on hover with useful detail
use interactive heatmaps
parallel coordinates - ⚠️ primarily for exploratory data-viz ⚠️
5 minute intro by Amit Kapoor
longer showcase of parallel coordinates by Kai Chang
replacing legends with direct text-annotation
"banking to 45 degrees" i.e. choosing aspect ratios for plots that maximize discriminability
meaningful color-scales
many of these ideas I've fumbled my way to by trying to plot my own work
but many of these ideas I picked up from the work of Edward Tufte
I've found learning about "the Grammar of Graphics" quite compelling
Broad data-viz advice
Other great overviews of python, science, data-viz
Learning JavaScript-based visualization tools
Monotonic cubic splines - often useful for drawing model-agnostic trends through points
The following are my personal opinions and not necessarily general recommendations:
successful faceting might be even more useful than interactivity
fully flattened tidy csv for everything means loading far too much information (especially in Altair/Vega-lite)
straying too far outside python limits iteration
numpy
syntax is much nicer than the equivalent javascript for matrix stuffaesthetically I don't like jupyter notebooks
being able to host via github pages is a big advantage
Structuring data for visualization tools
short version: keep one instance to one row
Tidy Data by Wickham, python version
data wrangling in observable
Dynamic scientific visualizations in the browser for Python users by Patrick Mineault ↩
Dash tries to provide a pure-Python interface to mimic the roles of HTML, JS, CSS in traditional websites. > "Dash abstracts away all of the technologies and protocols that are required to build a full-stack web app with interactive data visualization." dash callbacks ↩