Using Mosaic & DuckDB-WASM

We need to set up Mosaic and DuckDB-WASM to "play nice" with Observable's reactive runtime. Unlike standard JavaScript, the Observable runtime will happily run JavaScript "out-of-order". Observable uses dependencies among code blocks, rather than the order within the file, to determine what to run and when to run it. This reactivity can cause problems for code that depends on "side effects" that are not tracked by Observable's runtime.

Importing Mosaic and Loading Data

Here is how we initialize Mosaic's vgplot API in the Flight Delays example:

import { vgplot } from "./components/mosaic.js";
const flights = await FileAttachment("data/flights-200k.parquet").url();
const vg = vgplot(vg => [ vg.loadParquet("flights", flights) ]);

We first import a custom vgplot initialization method that configures Mosaic, loads data into DuckDB, and returns the vgplot API.

Next, we reference the data files we plan to load. As Observable Framework needs to track which files are used, we must use its FileAttachment mechanism. However, we don't actually want to load the file yet, so we instead retrieve a corresponding URL.

Finally, we invoke vgplot(...) to initialize Mosaic, which returns a (Promise to an) instance of the vgplot API. This method takes a single function as input, which should return an array of SQL queries to execute for client-side data loading.

The vg argument to the data loader callback is exactly the same API instance that is ultimately returned by vgplot. Perhaps this feels a bit circular, with vg provided to a callback, with the ultimate result being a reference to vg. Why the gymnastics?

We want to have access to the API to support data loading, using Mosaic's helper functions to install extensions and load data files. At the same time, we don't want to assign the outer vg variable until data loading is complete, ensuring downstream code that uses the API will not be evaluated by the Observable runtime until DuckDB is ready. Once vg is assigned, the data has been loaded and we can evaluate downstream API calls for creating visualizations, inputs, params, and selections.

Mosaic Initialization

For reference, here's the vgplot() method implementation:

import * as vg from "npm:@uwdata/vgplot";

export async function vgplot(queries) {
  const mc = vg.coordinator();
  const api = vg.createAPIContext({ coordinator: mc });
  mc.databaseConnector(vg.wasmConnector());
  if (queries) {
    await mc.exec(queries(api));
  }
  return api;
}

We first get a reference to the central coordinator, which manages all queries. We create a new API context, which we eventually will return.

Next, we configure Mosaic to use DuckDB-WASM as an in-browser database. The wasmConnector() method creates a new database instance in a worker thread.

We then invoke the queries callback to get a list of data loading queries. We issue the queries to DuckDB using the coordinator's exec() method and await the result.

Once that completes, we're ready to go!