Flight Delays

Interactive exploration of large-scale transportation data

What contributes to delayed airline flights? Let's examine a sample of over 200,000 flight records provided by the U.S. DOT Bureau of Transportation Statistics.

We use Mosaic vgplot to create scalable, interactive visualizations. Mosaic loads data from a Parquet file into DuckDB-WASM, running in the browser. Mosaic queries the database to transform data as part of the visualization process.

Cross-Filtered Histograms

The histograms below visualize the arrival delay, departure time, and distance flown. Select a region in any histogram to cross-filter the charts. How are time and/or distance predictive of a flight being late? What is predictive of a flight being early?

When a selection changes, we need to filter the data and recount the number of records in each bin. The Mosaic system analyzes these queries and automatically optimizes updates by building indexes of pre-aggregated data ("data cubes") in the database, binned at the level of input pixels for the currently active view.

While 200,000 points will stress many web-based visualization tools, Mosaic doesn't break a sweat. Now go ahead and try this with 10 million records!

Density Hexbins

The histograms above provide a useful first-look at the data. However, to discover relations among the data we had to interactively explore. Instead of "hiding" patterns behind interactions, let's visualize relationships directly.

Below we use hexagonal bins to visualize the density (number of flights) by both time of day and arrival delay. Interactive histograms along the edges show marginal distributions for both.

We can see right away that flights are more likely to be delayed if they leave later in the day. Delays may accrue as a single plane flies from airport to airport.

The number of records in a hexbin vary from 0 to over 2,000, spanning multiple orders of magnitude. To see these orders more clearly, we default to a logarithmic color scale. Try adjusting the color scale menu to see the effects of different choices.

Density Heatmaps

For finer-grained detail, we can bin all the way down to the level of individual pixels.

The result is a raster, or heatmap, view. We now see some striping, which reveals that data values are truncated to a limited precision. As before, we can use interactive selections to cross-filter the charts.