Command line tools for manipulating structured text data
Update: This post has been converted to a Git repository and greatly expanded at https://github.com/dbohdan/structured-text-tools.
What follows is a list of text-based file formats with command line tools for Linux for manipulating each.
Name | Programming language | Database engine | Features | Usage link | License |
---|---|---|---|---|---|
q | Python | SQLite 3 | Use header row for column names, custom input and output encoding, gzipped input, custom input field separator (string literal), custom output field separator, custom output formatting, table JOINs, Python module. | Usage | GNU GPL 3 |
sqawk | C | SQLite 3 | Use header row for column names, column name aliases, can skip lines until a regexp matches, custom input field separator (string literal, per-file), keep SQLite file, show generated SQL, table JOINs. | Usage | ? |
Sqawk | Tcl | SQLite 3 | Use header row for column names, custom input field separator (regexp, per-file), custom input record delimiter (regexp, per-file), custom table names, custom output field separator, custom output record separator, merge selected columns into one, CSV input and output, Tcl output, table JOINs. | Usage | MIT |
Squawk | Python | Custom SQL interpreter | Access log and CSV input, JSON and CSV output, Python code generation. | — | Three-clause BSD |
termsql | Python | SQLite 3 | Use header rows for column names, custom field separator (regexp), custom record separator (string literal), lines as columns, skip a given number of lines and the beginning and at the end, merge selected columns into one, HTML, CSV, SQL and Tcl output. | Manual | MIT |
textql | Go | SQLite 3 | Use header rows for column names, keep SQLite file, custom input field separator (string literal). | Usage | MIT |
xml2 — convert XML and HTML to and from flat, greppable lists of “path=value” statements.
See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.
Using jq with a format converter (like mine) appears to be the best option.
- Augeas — can extract data from and modify a number of file formats. However, not all format are equally well supported by Augeas and for some formats only a limited subset of all valid files can be parsed.
Name | Description | File format |
---|---|---|
GNU Recutils | “[A] set of tools and libraries to access human-editable, plain text databases called recfiles.” | Text-based, roughly “key: value” |
SDB | “[A] simple string key/value database based on djb’s cdb disk storage and supports JSON and arrays introspection.” | Binary |
sqlite3(1) | “[A] simple command-line utility […] that allows the user to manually enter and execute SQL statements against an SQLite database.” | Binary |