GraphKit: Technical Introduction

Last updated March 4, 2021

This document is intended for developers who want to learn how GraphKit can help them build compelling data visualizations that run within their existing web sites and apps.

We’re going to cover some of the background, core concepts, and benefits of GraphKit, and share a bit of the roadmap for 2021.

Background

I’ve spent most of my career helping companies build web and mobile applications, and one theme continues to stick out to me:

Businesses have lots of data, and they want to make it more accessible.

Thankfully, it's never been easier to build applications that communicate different kinds of data. The shift toward declarative programming has changed the way we write user interfaces, but there's still a gap when it comes to creating dynamic data visualizations.

Most graphing libraries have a limited number of chart types and options, or a verbose, imperative API for drawing elements directly. As a result, data visualization is still full of friction for designers, developers, and data scientists.

How do we describe our visualization in a declarative way? How do we handle interaction and animation? What’s the appropriate data format? What happens if data changes?

These answers are always different, but I think our tools can do a better job helping us figure them out. That's why I'm building GraphKit.

Overview

GraphKit is set of tools for turning your data into interactive visualizations that run inside your existing web site or app.

Unlike most charting libraries, GraphKit lets you define a visualization as a series of direct transformations on the raw data you want to visualize, enabling a wide range of options for how you can render that data on screen.

GraphKit Core is a JavaScript library for configuring reactive dataflow graphs and performing common data transformation tasks.

Additional libraries can extend functionality for specific use-cases (e.g. gk/geo) or provide bindings for certain layout contexts (e.g. gk/react). You can also write and publish your own libraries that integrate with and extend GraphKit.

Dataflow Graphs

I mentioned that GraphKit Core lets you compose "reactive dataflow graphs", but what does that mean?

A dataflow graph is a working model of your program, consisting of nodes, which are the data and functionality your program needs to evaluate, and links, which are references between those nodes. Reactive means the graph can be updated and re-evaluated to enable asynchronous behavior such as loading data or handling user input.

A flow diagram with three nodes: A=1, B=2, C=A+B; and connections A and B to C

In this graph, we see three nodes: a and b represent the inputs, and sum represents the result of adding the inputs together.

You can also imagine a dataflow graph like a spreadsheet, with cells (nodes) that reference each other and recalculate when their references (links) change.

A fake spreadsheet with three cells: A=1, B=2, C=A+B

Dataflow graphs are useful for understanding the structure of a program, visualizing the flow of data during execution, and exposing relationships between different parts of the program.

GraphKit Core formalizes these dataflow graphs based on a directed graph structure, enabling a number of benefits. Graph configurations are small, focused modules that can be copied, extended, nested, and composed to build more complex programs.

A diagram showing a list of data feeding into x and y arrays feeding into a miniature bar chart

We're using graphs (data structures) to build graphs (visualizations)!

Benefits

Structuring our programs in terms of dataflow has a number of cognitive and technical benefits, such as modularity, abstraction, reactivity, and observability.

Dataflow graphs give us an intuitive way to conceptualize our program as a functional series of steps. They encourage us to break down our visualization into a set of discrete graphical elements, whose configurations are derived from contextual data.

Graphs are configured by extending smaller, focused modules and composing them together. This modularity enables powerful abstraction through composition, making it easy to write simple building blocks that are highly reusable.

Reactivity allows our programs to efficiently respond to user input and other asynchronous behavior. Updating the entire graph can be expensive, so GraphKit uses its internal model of nodes and links to update only nodes whose references have changed.

Observability helps us inspect and debug our programs by providing context about the program's structure and execution. Command-line and GUI tools can provide valuable feedback, and even help us generate graphs without writing code.

Rendering

On its own, GraphKit Core is layout-agnostic, so you can output static HTML/SVG, or integrate with your preferred front-end library using bindings like gk/react.

Layout bindings are thin adapters that manage how your program is rendered and updated using a particular layout engine.

Over time, we'd like to extend official binding support to other layout contexts like WebGL, Vue.js, React Native, and more. If you have experience with any of these and want to help, please get in touch!

Roadmap

The top priority right now is getting GraphKit Core stable and ready for public testing as soon as possible. From there, the plan is to begin collecting feedback and expanding the library of available building blocks to support different kinds of visualizations.

I'm also working on a graphical IDE for GraphKit. It started as a way to help me build GraphKit itself, by tracking and visualizing program graphs like the examples above. The IDE provides real-time context about the application as you write and test it, creating a valuable feedback loop for development.

If you'd like to hear when GraphKit is ready, please drop your email below. If you have any questions or comments, you can click here to get in touch, or tweet at me @graphkitapp.

Thanks!