Command line tools for manipulating structured text data

Update: This post has been converted to a Git repository and greatly expanded at https://github.com/dbohdan/structured-text-tools.

What follows is a list of text-based file formats with command line tools for Linux for manipulating each.

Name Programming language Database engine Features Usage link License
q Python SQLite 3 Use header row for column names, custom input and output encoding, gzipped input, custom input field separator (string literal), custom output field separator, custom output formatting, table JOINs, Python module. Usage GNU GPL 3
sqawk C SQLite 3 Use header row for column names, column name aliases, can skip lines until a regexp matches, custom input field separator (string literal, per-file), keep SQLite file, show generated SQL, table JOINs. Usage ?
Sqawk Tcl SQLite 3 Use header row for column names, custom input field separator (regexp, per-file), custom input record delimiter (regexp, per-file), custom table names, custom output field separator, custom output record separator, merge selected columns into one, CSV input and output, Tcl output, table JOINs. Usage MIT
Squawk Python Custom SQL interpreter Access log and CSV input, JSON and CSV output, Python code generation. Three-clause BSD
termsql Python SQLite 3 Use header rows for column names, custom field separator (regexp), custom record separator (string literal), lines as columns, skip a given number of lines and the beginning and at the end, merge selected columns into one, HTML, CSV, SQL and Tcl output. Manual MIT
textql Go SQLite 3 Use header rows for column names, keep SQLite file, custom input field separator (string literal). Usage MIT

  • XMLStarlet

  • xml2 — convert XML and HTML to and from flat, greppable lists of “path=value” statements.

See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.

Using jq with a format converter (like mine) appears to be the best option.

  • Augeas — can extract data from and modify a number of file formats. However, not all format are equally well supported by Augeas and for some formats only a limited subset of all valid files can be parsed.

Name Description File format
GNU Recutils “[A] set of tools and libraries to access human-editable, plain text databases called recfiles.” Text-based, roughly “key: value”
SDB “[A] simple string key/value database based on djb’s cdb disk storage and supports JSON and arrays introspection.” Binary
sqlite3(1) “[A] simple command-line utility […] that allows the user to manually enter and execute SQL statements against an SQLite database.” Binary