2015/04/10: textwrap.wrap and word wrapping

Whenever there is a library that can be used for a task you want to do, people tend to use it, arguing that the creators of the library certainly know best how to tackle that problem. Generally, this is not a bad idea—provided you use the library for the task it was designed for.

I recently had to deal with an abuse of textwrap.wrap to break a string in equal-sized chunks. This works, as the string in question had no white space and was considered a single word; textwrap.wrap happily breaks words longer than a single line (unless you tell it, it shouldn't) in chunks of line length. However, it is a rare use case when wrapping actual text (show me a language that frequently uses words with more than a few hundred letters). So it doesn't matter that the algorithm has quadratic run time here. (In case you wonder in the example below: yes, I've written some Haskell recently, so the print is to prove that the value is actually computed.)


$ cat wrap.py
#!/usr/bin/env python

import sys
import textwrap

def wrap(count):
  word = "x" * int(count)
  wrapped = textwrap.wrap(word, width=64)
  for line in wrapped:
    print line

if __name__ == "__main__":
  wrap(sys.argv[1])
$ for count in `seq 6800 6800 34000`; do time ./wrap.py $count > /dev/null; done

real    0m1.134s
user    0m1.120s
sys 0m0.008s

real    0m4.480s
user    0m4.472s
sys 0m0.008s

real    0m8.933s
user    0m8.917s
sys 0m0.016s

real    0m17.602s
user    0m17.602s
sys 0m0.000s

real    0m26.674s
user    0m26.658s
sys 0m0.016s
$

Splitting a string into fixed-size chunks, on the other hand, is a simple task. Even the most naive implementation has efficient running time.
$ cat wrap.py
#!/usr/bin/env python

import sys
import textwrap

def wrap(count):
  word = "x" * int(count)

  wrapped = []
  idx = 0
  lenword = len(word)
  while idx < lenword:
    wrapped.append(word[idx:idx+64])
    idx += 64

  for line in wrapped:
    print line

if __name__ == "__main__":
  wrap(sys.argv[1])
$ for count in `seq 6800 6800 34000`; do time ./wrap.py $count > /dev/null; done

real    0m0.055s
user    0m0.034s
sys 0m0.016s

real    0m0.051s
user    0m0.027s
sys 0m0.024s

real    0m0.047s
user    0m0.039s
sys 0m0.008s

real    0m0.049s
user    0m0.025s
sys 0m0.024s

real    0m0.048s
user    0m0.024s
sys 0m0.024s
$

Note that for the task of breaking sequences of words, textwrap.wrap does not perform bad (and not super-linear), even on much longer strings.
$ cat ./wrap.py
#!/usr/bin/env python

import sys
import textwrap

def wrap(count):
  text = "foo bar " * int(count)
  wrapped = textwrap.wrap(text, width=64)
  for line in wrapped:
    print line

if __name__ == "__main__":
  wrap(sys.argv[1])
$ time ./wrap.py 34000 > /dev/null

real    0m0.350s
user    0m0.334s
sys 0m0.016s
$ time ./wrap.py 100000 > /dev/null

real    0m0.820s
user    0m0.788s
sys 0m0.032s
$ time ./wrap.py 1000000 > /dev/null

real    0m6.239s
user    0m6.050s
sys 0m0.189s
$