Mariana Oliveira | November 3, 2007 | 12:52 pm

Self-service, Prorated Super Computing Fun!

The New York Times
November 1, 2007
By Derek Gottfrid

As part of eliminating TimeSelect, The New York Times has decided to make all the public domain articles from 1851-1922 available free of charge. These articles are all in the form of images scanned from the original paper. In fact from 1851-1980, all 11 million articles are available as images in PDF format. To generate a PDF version of the article takes quite a bit of work — each article is actually composed of numerous smaller TIFF images that need to be scaled and glued together in a coherent fashion.

Previously we had generated all the PDFs dynamically. This approach had worked reasonably well, but with the strong possibility of a significant traffic increase we started to rethink things. Clearly, pre-generating all the articles and statically serving them would be a great option. Pretty quickly I thought about how we could do this (and have some fun along the way, but beware — my idea of fun is probably radically different from that of most people).

Read the full article:
http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-
super-computing-fun

Print This Post

Leave a Reply