Command line tools for manipulating structured text data

Published . Updated .

Update: This post has been converted to a Git repository and greatly expanded at https://github.com/dbohdan/structured-text-tools.

What follows is a list of text-based file formats with command line tools for Linux for manipulating each.

CSV, TSV

Name | Programming language | Database engine | Features | Usage link | License |
q | Python | SQLite 3 | Use header row for column names, custom input and output encoding, gzipped input, custom input field separator (string literal), custom output field separator, custom output formatting, table JOINs, Python module. | Usage | GNU GPL 3 |
sqawk | C | SQLite 3 | Use header row for column names, column name aliases, can skip lines until a regexp matches, custom input field separator (string literal, per-file), keep SQLite file, show generated SQL, table JOINs. | Usage | ? |
Sqawk | Tcl | SQLite 3 | Use header row for column names, custom input field separator (regexp, per-file), custom input record delimiter (regexp, per-file), custom table names, custom output field separator, custom output record separator, merge selected columns into one, CSV input and output, Tcl output, table JOINs. | Usage | MIT |
Squawk | Python | Custom SQL interpreter | Access log and CSV input, JSON and CSV output, Python code generation. | — | Three-clause BSD |
termsql | Python | SQLite 3 | Use header rows for column names, custom field separator (regexp), custom record separator (string literal), lines as columns, skip a given number of lines and the beginning and at the end, merge selected columns into one, HTML, CSV, SQL and Tcl output. | Manual | MIT |
textql | Go | SQLite 3 | Use header rows for column names, keep SQLite file, custom input field separator (string literal). | Usage | MIT |

XML, HTML

  • XMLStarlet

  • xml2 — convert XML and HTML to and from flat, greppable lists of “path=value” statements.

See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.

JSON

YAML, TOML

Using jq with a format converter (like mine) appears to be the best option.

Configuration files

  • Augeas — can extract data from and modify a number of file formats. However, not all format are equally well supported by Augeas and for some formats only a limited subset of all valid files can be parsed.

Bonus round: CLIs for single-file databases

Name | Description | File format |
GNU Recutils | “[A] set of tools and libraries to access human-editable, plain text databases called recfiles.” | Text-based, roughly “key: value” |
SDB | “[A] simple string key/value database based on djb’s cdb disk storage and supports JSON and arrays introspection.” | Binary |
sqlite3(1) | “[A] simple command-line utility […] that allows the user to manually enter and execute SQL statements against an SQLite database.” | Binary |

Back to index: Old blog.

Tags: command line, CSV, XML, JSON, YAML, TOML, SQLite, SQL, old blog.