2021/02/17: Formatting JSON

JSON is a pretty common exchange format. You have to agree on how to encode the data in terms of (essentially) atomic values, lists, and dictionaries. But at least, you can work with these structured entities, and for serialising and deserialising there is a library for basically any programming language. Some of these libraries also support pretty printing. For example, the following python one-liner can be used as a JSON formatter.

#!/usr/bin/env python3

import json
import sys

json.dump(json.load(sys.stdin), sys.stdout, indent=2)

download

This results in JSON formatted as in the following example: entries are separated commas, and the comma is at the end of the line. As commas are only between enries, there is no comma at the end of the last entry of each array or object.

{
  "some entry": {
    "type": "FOO",
    "short list": [],
    "longer list": [
      "first",
      "second",
      "third"
    ],
    "complicated dict": {
      "first key": "first value",
      "second key": "second value",
      "structued entry": {
        "type": "complex",
        "real": 0.0,
        "imag": 1.0
      }
    }
  },
  "another entry": [
    "this",
    "is",
    "just",
    "a",
    "long",
    "list"
  ],
  "yet antoher entry": {
    "type": "BAR",
    "short": {},
    "long": {
      "a": "A",
      "b": "B",
      "c": "C"
    }
  }
}

Now, pretty-printed format is for humans and typical tools for human operations like editing or comparing are line-based. As a lot of editing happens at the end of an arry or object, we always have to think about adding or removing the comma at the entry before. Of course, that modified comma also brings the line before into a line-based diff; that also renders the annotate functionality of the version-control system less useful. Python solves the problem for itself by allowing a trailing comma, but trailing commas are not standard-compliant JSON.

Now, if, in a haskell style, we think of the comma as an infix operator and move it to the beginning of the line, those problems when editing at the end of an array or object go away. The above example formatted that way then looks as follows.

{ "some entry":
  { "type": "FOO"
  , "short list": []
  , "longer list":
    [ "first"
    , "second"
    , "third"
    ]
  , "complicated dict":
    { "first key": "first value"
    , "second key": "second value"
    , "structued entry":
      { "type": "complex"
      , "real": 0.0
      , "imag": 1.0
      }
    }
  }
, "another entry":
  [ "this"
  , "is"
  , "just"
  , "a"
  , "long"
  , "list"
  ]
, "yet antoher entry":
  { "type": "BAR"
  , "short": {}
  , "long":
    { "a": "A"
    , "b": "B"
    , "c": "C"
    }
  }
}

Of course, it is not hard to write a script that formats JSON in that way (see below), but I'm really surprised that, given the popularity of JSON, I couldn't find anything ready made for that task. The leading comma certainly isn't an obscure way of formatting any more. So, if you know which library I failed to find, please send me an email.

#!/usr/bin/env python3

import json
import sys

def is_simple(entry):
  if isinstance(entry, list):
    return len(entry) == 0
  if isinstance(entry, dict):
    return len(entry) == 0
  return True

def hdumps(entry, *, _current_indent=0):
  if isinstance(entry, list) and entry:
    result = "[ " + hdumps(entry[0], _current_indent=_current_indent+2)
    for x in entry[1:]:
      result += "\n" + " " * _current_indent + ", "
      result += hdumps(x, _current_indent=_current_indent+2)
    result += "\n" + " " * _current_indent + "]"
    return result
  if isinstance(entry, dict) and entry:
    result = "{ "
    is_first = True
    for k in entry.keys():
      if not is_first:
        result += "\n" + " " * _current_indent + ", "
      result += json.dumps(k) + ":"
      if is_simple(entry[k]):
        result += " " + json.dumps(entry[k])
      else:
        result += "\n" + " " * _current_indent + "  "
        result += hdumps(entry[k], _current_indent=_current_indent+2)
      is_first = False
    result += "\n" + " " * _current_indent + "}"
    return result
  return json.dumps(entry)

if __name__ == "__main__":
  data = json.load(sys.stdin)
  print(hdumps(data))

download

Cross-referenced by:

2021/10/10 More compact formatting (for JSON)