arquero

Arquero API Reference

Top-Level Table Verbs Op Functions Expressions Extensibility


Table Constructors

Methods for creating new table instances.


# aq.table(columns[, names]) · Source

Create a new table for a set of named columns, optionally including an array of ordered column names. The columns input can be an object or Map with names for keys and columns for values, or an entry array of [name, values] tuples.

JavaScript objects have specific key ordering rules: keys are enumerated in the order they are assigned, except for integer keys, which are enumerated first in sorted order. As a result, when using a standard object any columns entries with integer keys are listed first regardless of their order in the object definition. Use the names argument to ensure proper column ordering is respected. Map and entry arrays will preserve name ordering, in which case the names argument is only needed if you wish to specify an ordering different from the columns input.

To bind together columns from multiple tables with the same number of rows, use the table assign method. To transform the table, use the various verb methods.

Examples

// create a new table with 2 columns and 3 rows
aq.table({ colA: ['a', 'b', 'c'], colB: [3, 4, 5] })
// create a new table, preserving column order for integer names
aq.table({ key: ['a', 'b'], 1: [9, 8], 2: [7, 6] }, ['key', '1', '2'])
// create a new table from a Map instance
const map = new Map()
  .set('colA', ['a', 'b', 'c'])
  .set('colB', [3, 4, 5]);
aq.table(map)

# aq.from(values[, names]) · Source

Create a new table from an existing object, such as an array of objects or a set of key-value pairs.

Examples

// from an array, create a new table with two columns and two rows
// akin to table({ colA: [1, 3], colB: [2, 4] })
aq.from([ { colA: 1, colB: 2 }, { colA: 3, colB: 4 } ])
// from an object, create a new table with 'key' and 'value columns
// akin to table({ key: ['a', 'b', 'c'], value: [1, 2, 3] })
aq.from({ a: 1, b: 2, c: 3 })
// from a Map, create a new table with 'key' and 'value' columns
// akin to table({ key: ['d', 'e', 'f'], value: [4, 5, 6] })
aq.from(new Map([ ['d', 4], ['e', 5], ['f', 6] ])

# aq.fromArrow(arrowTable[, options]) · Source

Create a new table backed by an Apache Arrow table instance. The input arrowTable can either be an instantiated Arrow table instance or a byte array in the Arrow IPC format.

For most data types, Arquero uses binary-encoded Arrow columns as-is with zero data copying. For columns containing string, list (array), or struct values, Arquero additionally memoizes value lookups to amortize access costs. For dictionary columns, Arquero unpacks columns with null entries or containing multiple record batches to optimize query performance.

This method performs parsing only. To both load and parse an Arrow file, use loadArrow.

Examples

// encode input array-of-objects data as an Arrow table
const arrowTable = aq.toArrow([
  { x: 1, y: 3.4 },
  { x: 2, y: 1.6 },
  { x: 3, y: 5.4 },
  { x: 4, y: 7.1 },
  { x: 5, y: 2.9 }
]);

// now access the Arrow-encoded data as an Arquero table
const dt = aq.fromArrow(arrowTable);

# aq.fromCSV(text[, options]) · Source

Parse a comma-separated values (CSV) text string into a table. Delimiters other than commas, such as tabs or pipes (‘|’), can be specified using the options argument. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.

This method performs parsing only. To both load and parse a CSV file, use loadCSV.

Examples

// create table from an input CSV string
// akin to table({ a: [1, 3], b: [2, 4] })
aq.fromCSV('a,b\n1,2\n3,4')
// skip commented lines
aq.fromCSV('# a comment\na,b\n1,2\n3,4', { comment: '#' })
// skip the first line
aq.fromCSV('# a comment\na,b\n1,2\n3,4', { skip: 1 })
// override autoType with custom parser for column 'a'
// akin to table({ a: ['00152', '30219'], b: [2, 4] })
aq.fromCSV('a,b\n00152,2\n30219,4', { parse: { a: String } })
// parse semi-colon delimited text with comma as decimal separator
aq.fromCSV('a;b\nu;-1,23\nv;3,45e5', { delimiter: ';', decimal: ',' })
// create table from an input CSV loaded from 'url'
// alternatively, use the loadCSV method
aq.fromCSV(await fetch(url).then(res => res.text()))

# aq.fromFixed(text[, options]) · Source

Parse a fixed-width file text string into a table. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.

This method performs parsing only. To both load and parse a fixed-width file, use loadFixed.

Examples

// create table from an input fixed-width string
// akin to table({ u: ['a', 'b'], v: [1, 2] })
aq.fromFixed('a1\nb2', { widths: [1, 1], names: ['u', 'v'] })

# aq.fromJSON(data) · Source

Parse JavaScript Object Notation (JSON) data into a table. String values in JSON column arrays that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior, set options.autoType to false. To perform custom parsing of input column values, use options.parse. Auto-type Date parsing is not performed for columns with custom parse options.

This method performs parsing only. To both load and parse a JSON file, use loadJSON.

The expected JSON data format is an object with column names for keys and column value arrays for values, like so:

{
  "colA": ["a", "b", "c"],
  "colB": [1, 2, 3]
}

The data payload can also be provided as the data property of an enclosing object, with an optional schema property containing table metadata such as a fields array of ordered column information:

{
  "schema": {
    "fields": [
      { "name": "colA" },
      { "name": "colB" }
    ]
  },
  "data": {
    "colA": ["a", "b", "c"],
    "colB": [1, 2, 3]
  }
}

Examples

// create table from an input JSON string
// akin to table({ a: [1, 3], b: [2, 4] })
aq.fromJSON('{"a":[1,3],"b":[2,4]}')
// create table from an input JSON string
// akin to table({ a: [1, 3], b: [2, 4] }, ['a', 'b'])
aq.fromJSON(`{
  "schema":{"fields":[{"name":"a"},{"name":"b"}]},
  "data":{"a":[1,3],"b":[2,4]}
}`)
// create table from an input JSON string loaded from 'url'
aq.fromJSON(await fetch(url).then(res => res.text()))
// create table from an input JSON object loaded from 'url'
// disable autoType Date parsing
aq.fromJSON(await fetch(url).then(res => res.json()), { autoType: false })


Table Input

Methods for loading files and creating new table instances.


# aq.load(url[, options]) · Source

Load data from a url or file and return a Promise for a table. A specific format parser can be provided with the using option, otherwise CSV format is assumed. This method’s options are also passed as the second argument to the format parser.

When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://', 'https://', etc.) it is treated as a URL and the node-fetch library is used. If the 'file://' protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs module.

This method provides a generic base for file loading and parsing, and can be used to customize table parsing. To load CSV data, use the more specific loadCSV method. Similarly,to load JSON data use loadJSON and to load Apache Arrow data use loadArrow.

Examples

// load table from a CSV file
const dt = await aq.load('data/table.csv', { using: aq.fromCSV });
// load table from a JSON file with column format
const dt = await aq.load('data/table.json', { as: 'json', using: aq.fromJSON })
// load table from a JSON file with array-of-objects format
const dt = await aq.load('data/table.json', { as: 'json', using: aq.from })

# aq.loadArrow(url[, options]) · Source

Load a file in the Apache Arrow IPC format from a url and return a Promise for a table.

This method performs both loading and parsing, and is equivalent to aq.load(url, { as: 'arrayBuffer', using: aq.fromArrow }). To instead create an Arquero table for an Apache Arrow dataset that has already been loaded, use fromArrow.

When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://', 'https://', etc.) it is treated as a URL and the node-fetch library is used. If the 'file://' protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs module.

Examples

// load table from an Apache Arrow file
const dt = await aq.loadArrow('data/table.arrow');

# aq.loadCSV(url[, options]) · Source

Load a comma-separated values (CSV) file from a url and return a Promise for a table. Delimiters other than commas, such as tabs or pipes (‘|’), can be specified using the options argument. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.

This method performs both loading and parsing, and is equivalent to aq.load(url, { using: aq.fromCSV }). To instead parse a CSV string that has already been loaded, use fromCSV.

When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://', 'https://', etc.) it is treated as a URL and the node-fetch library is used. If the 'file://' protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs module.

Examples

// load table from a CSV file
const dt = await aq.loadCSV('data/table.csv');
// load table from a tab-delimited file
const dt = await aq.loadCSV('data/table.tsv', { delimiter: '\t' })

# aq.loadFixed(url[, options]) · Source

Load a fixed-width file from a url and return a Promise for a table. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.

This method performs both loading and parsing, and is equivalent to aq.load(url, { using: aq.fromFixed }). To instead parse a fixed width string that has already been loaded, use fromFixed.

When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://', 'https://', etc.) it is treated as a URL and the node-fetch library is used. If the 'file://' protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs module.

Examples

// load table from a fixed-width file
const dt = await aq.loadFixed('a1\nb2', { widths: [1, 1], names: ['u', 'v'] });

# aq.loadJSON(url[, options]) · Source

Load a JavaScript Object Notation (JSON) file from a url and return a Promise for a table. If the loaded JSON is array-valued, an array-of-objects format is assumed and the from method is used to construct the table. Otherwise, a column object format (as produced by toJSON) is assumed and the fromJSON method is applied.

This method performs both loading and parsing, similar to aq.load(url, { as: 'json', using: aq.fromJSON }). To instead parse JSON data that has already been loaded, use either from or fromJSON.

When parsing using fromJSON, string values in JSON column arrays that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior, set options.autoType to false. To perform custom parsing of input column values, use options.parse. Auto-type Date parsing is not performed for columns with custom parse options.

When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://', 'https://', etc.) it is treated as a URL and the node-fetch library is used. If the 'file://' protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs module.

Examples

// load table from a JSON file
const dt = await aq.loadJSON('data/table.json');
// load table from a JSON file, disable Date autoType
const dt = await aq.loadJSON('data/table.json', { autoType: false })


Table Output

Methods for writing table data to an output format. Most output methods are defined as table methods, not in the top level namespace.


# aq.toArrow(data[, options]) · Source

Create an Apache Arrow table for the input data. The input data can be either an Arquero table or an array of standard JavaScript objects. This method will throw an error if type inference fails or if the generated columns have differing lengths. For Arquero tables, this method can instead be invoked as table.toArrow().

Examples

Encode Arrow data from an input Arquero table:

import { table, toArrow } from 'arquero';
import { Type } from 'apache-arrow';

// create Arquero table
const dt = table({
  x: [1, 2, 3, 4, 5],
  y: [3.4, 1.6, 5.4, 7.1, 2.9]
});

// encode as an Arrow table (infer data types)
// here, infers Uint8 for 'x' and Float64 for 'y'
// equivalent to dt.toArrow()
const at1 = toArrow(dt);

// encode into Arrow table (set explicit data types)
// equivalent to dt.toArrow({ types: { ... } })
const at2 = toArrow(dt, {
  types: {
    x: Type.Uint16,
    y: Type.Float32
  }
});

// serialize Arrow table to a transferable byte array
const bytes = at1.serialize();

Encode Arrow data from an input object array:

import { toArrow } from 'arquero';

// encode object array as an Arrow table (infer data types)
const at = toArrow([
  { x: 1, y: 3.4 },
  { x: 2, y: 1.6 },
  { x: 3, y: 5.4 },
  { x: 4, y: 7.1 },
  { x: 5, y: 2.9 }
]);


Expression Helpers

Methods for invoking or modifying table expressions.


# aq.op · Source

All table expression operations, including standard functions, aggregate functions, and window functions. See the Operations API Reference for documentation of all available functions.


# aq.agg(table, expression) · Source

Compute a single aggregate value for a table. This method is a convenient shortcut for ungrouping a table, applying a rollup verb for a single aggregate expression, and extracting the resulting aggregate value.

Examples

aq.agg(aq.table({ a: [1, 2, 3] }), op.max('a')) // 3
aq.agg(aq.table({ a: [1, 3, 5] }), d => [op.min(d.a), op.max('a')]) // [1, 5]

# aq.escape(value) · Source

Annotate a JavaScript function or value to bypass Arquero’s default table expression handling. Escaped values enable the direct use of JavaScript functions to process row data: no internal parsing or code generation is performed, and so closures and arbitrary function invocations are supported. Escaped values provide a lightweight alternative to table params and function registration to access variables in enclosing scopes.

An escaped value can be applied anywhere Arquero accepts single-table table expressions, including the derive, filter, and spread verbs. In addition, any of the standard op functions can be used within an escaped function. However, aggregate and window op functions are not supported. Also note that using escaped values will break serialization of Arquero queries to worker threads.

Examples

// filter based on a variable defined in the enclosing scope
const thresh = 5;
aq.table({ a: [1, 4, 9], b: [1, 2, 3] })
  .filter(aq.escape(d => d.a < thresh))
  // { a: [1, 4], b: [1, 2] }
// apply a parsing function defined in the enclosing scope
const parseMDY = d3.timeParse('%m/%d/%Y');
aq.table({ date: ['1/1/2000', '06/01/2010', '12/10/2020'] })
  .derive({ date: aq.escape(d => parseMDY(d.date)) })
  // { date: [new Date(2000,0,1), new Date(2010,5,1), new Date(2020,11,10)] }
// spread results from an escaped function that returns an array
const denom = 4;
aq.table({ a: [1, 4, 9] })
  .spread(
    { a: aq.escape(d => [Math.floor(d.a / denom), d.a % denom]) },
    { as: ['div', 'mod'] }
  )
  // { div: [0, 1, 2], mod: [1, 0, 1] }

# aq.bin(name[, options]) · Source

Generate a table expression that performs uniform binning of number values. The resulting string can be used as part of the input to table transformation verbs.

Examples

 aq.bin('colA', { maxbins: 20 })

# aq.desc(expr) · Source

Annotate a table expression (expr) to indicate descending sort order.

Examples

// sort colA in descending order
aq.desc('colA')
// sort colA in descending order of lower case values
aq.desc(d => op.lower(d.colA))

# aq.frac(fraction) · Source

Generate a table expression that computes the number of rows corresponding to a given fraction for each group. The resulting string can be used as part of the input to the sample verb.

Examples

 aq.frac(0.5)

# aq.rolling(expr[, frame, includePeers]) · Source

Annotate a table expression to compute rolling aggregate or window functions within a sliding window frame. For example, to specify a rolling 7-day average centered on the current day, call rolling with a frame value of [-3, 3].

Examples

// cumulative sum, with an implicit frame of [-Infinity, 0]
aq.rolling(d => op.sum(d.colA))
// centered 7-day moving average, assuming one value per day
aq.rolling(d => op.mean(d.colA), [-3, 3])
// retrieve last value in window frame, including peers (ties)
aq.rolling(d => op.last_value(d.colA), [-3, 3], true)

# aq.seed(value) · Source

Set a seed value for random number generation. If the seed is a valid number, a 32-bit linear congruential generator with the given seed will be used to generate random values. If the seed is null, undefined, or not a valid number, the random number generator will revert to Math.random.

Examples

// set random seed as an integer
aq.seed(12345)
// set random seed as a fraction, maps to floor(fraction * (2 ** 32))
aq.seed(0.5)
// revert to using Math.random
aq.seed(null)


Selection Helpers

Methods for selecting columns. The result of these methods can be passed as arguments to select, groupby, join and other transformation verbs.


# aq.all() · Source

Select all columns in a table. Returns a function-valued selection compatible with select.

Examples

aq.all()

# aq.not(selection) · Source

Negate a column selection, selecting all other columns in a table. Returns a function-valued selection compatible with select.

Examples

aq.not('colA', 'colB')
aq.not(aq.range(2, 5))

# aq.range(start, stop) · Source

Select a contiguous range of columns. Returns a function-valued selection compatible with select.

Examples

aq.range('colB', 'colE')
aq.range(2, 5)

# aq.matches(pattern) · Source

Select all columns whose names match a pattern. Returns a function-valued selection compatible with select.

Examples

// contains the string 'col'
aq.matches('col')
// has 'a', 'b', or 'c' as the first character (case-insensitve)
aq.matches(/^[abc]/i)

# aq.startswith(string) · Source

Select all columns whose names start with a string. Returns a function-valued selection compatible with select.

Examples

aq.startswith('prefix_')

# aq.endswith(string) · Source

Select all columns whose names end with a string. Returns a function-valued selection compatible with select.

Examples

aq.endswith('_suffix')

# aq.names(…names) · Source

Select columns by index and rename them to the provided names. Returns a selection helper function that takes a table as input and produces a rename map as output. If the number of provided names is less than the number of table columns, the rename map will include entries for the provided names only. If the number of table columns is less than then number of provided names, the rename map will include only entries that cover the existing columns.

Examples

// helper to rename the first three columns to 'a', 'b', 'c'
aq.names('a', 'b', 'c')
// names can also be passed as arrays
aq.names(['a', 'b', 'c'])
// rename the first three columns, all other columns remain as-is
table.rename(aq.names(['a', 'b', 'c']))
// select and rename the first three columns, all other columns are dropped
table.select(aq.names(['a', 'b', 'c']))


Queries

Queries allow deferred processing. Rather than process a sequence of verbs immediately, they can be stored as a query. The query can then be serialized to be stored or transferred, or later evaluated against an Arquero table.


# aq.query([tableName]) · Source

Create a new query builder instance. The optional tableName string argument indicates the default name of a table the query should process, and is used only when evaluating a query against a catalog of tables. The resulting query builder includes the same verb methods as a normal Arquero table. However, rather than evaluating verbs immediately, they are stored as a list of verbs to be evaluated later.

The method query.evaluate(table, catalog) will evaluate the query against an Arquero table. If provided, the optional catalog argument should be a function that takes a table name string as input and returns a corresponding Arquero table instance. The catalog will be used to lookup tables referenced by name for multi-table operations such as joins, or to lookup the primary table to process when the table argument to evaluate is null or undefined.

Use the query toObject() method to serialize a query to a JSON-compatible object. Use the top-level queryFrom method to parse a serialized query and return a new “live” query instance.

Examples

// create a query, then evaluate it on an input table
const q = aq.query()
  .derive({ add1: d => d.value + 1 })
  .filter(d => d.add1 > 5 );

const t = q.evaluate(table);
// serialize a query to a JSON-compatible object
// the query can be reconstructed using aq.queryFrom
aq.query()
  .derive({ add1: d => d.value + 1 })
  .filter(d => d.add1 > 5 )
  .toObject();

# aq.queryFrom(object) · Source

Parse a serialized query object and return a new query instance. The input object should be a serialized query representation, such as those generated by the query toObject() method.

Examples

// round-trip a query to a serialized form and back again
aq.queryFrom(
  aq.query()
    .derive({ add1: d => d.value + 1 })
    .filter(d => d.add1 > 5 )
    .toObject()
)