Joakim Nygård Archive Linked About

On File Formats and Time

5 May 2009

John Nack from Adobe’s Photoshop team has written a post about the PSD file format in response to a particular developer having trouble implementing a parser through reverse engineering.

File formats typically grow and change as the application incorporates more features. For old formats such as .psd (pre-1990), the result is obviously not as structured as if starting from scratch today:

Of its quirks, PSD expert Tim Wright says, “Most are the gradual result of discovering better ways to do things over 20 years, while staying compatible with older applications.”

Besides mentioning the open source graphics format FXG as an alternative, he links to yet another interesting article by Joel Spolsky on the complexity of Microsoft Office files:

A file format is just a concise summary of all the features an application supports. […] All of these subtle bits of behavior cannot be fully documented without writing a document that has the same amount of information as the source code.

Now that sounds almost like a definition of Kolmogorow complexity, but the point, of course, is that the reason for the apparent mess of these formats is a side effect of their longevity, added features and, particularly, backward compatibility.

Backward compatibility takes time and effort. This is why, for instance, Apple did not release the full iPhone SDK to developers right away, because they wanted things thought through to support backward compatibility (for a while anyway). It is also (part of) the reason small development teams can create products that appear to trump the established big players: They have no old software not to break.

There’s a metaphor coined by Ward Cunningham of Technical Debt to describe the eventual consequenses of poorly planned software. Sometimes the hundreds or thousands of man-hours put into old code bases and file formats to keep things backward compatible end up becoming a massive technical debt as was the case with Classic Mac OS (and likely old Windows): Starting over with a new foundation was the better way forward.

Websites do not really have the problem of backward compatibility (except for URLs), but the structure of databases and code face the same issue with implementing new features and scaling efficiently. There’s a tradeoff between patching new features onto existing design decisions and rebuilding the whole thing.

The bottom line is that there often are good, if historic, reasons behind seemingly poor decisions.