Up: The Tour | [Related] «^» «T» |
Thursday, August 16, 2007
The CPU
I read a lot of computer language hermeneutics on various websites. It's rare to see people talking about syntax but instead they discuss how the language will be interpreted and understood. This makes sense, because according to two men who love wizards, Abelman and Sussman, “programs must be written for people to read, and only incidentally for machines to execute.” At the same time the CPU is an ideal reader, the ultimate consumer, the final arbiter and interpreter.
I always nodded when I heard that efficiency is overrated and programmer time is more important than processor time. It soothed my autodidact's ego. But then a year ago I sat down at my desk with a quarter-million 600DPI images and a content database containing two million triples that can be viewed five million different ways. And I was not prepared. At first a single HTML page took 30 seconds to build; then 10 seconds; then, once I learned to re-use my data and indexes, and once I knew where all the variables were—a fraction of a second, a 300-times speedup. I've been researching and I know there are whole sections of code I can cut and replace with simpler, faster routines. I still have a long way to go, and much to learn, before I can slow things down again.
The perfect website is exactly one page, the one the visitor wants. But nearly every page on the web is about changing your mind—“There's more over here!” The economic model for the content-driven web requires accumulating pageviews, and the way to make room for more pageviews is to speed things up: caching, gzipping, embracing the combinatoric explosion with relational databases and gigahertz multicores. But now I find my inner editor battling with my inner programmer, wondering whether, instead of letting people explore the data set on their own, which requires much from them, I should create a core, smaller set of useful pages, working out problems of audience and utility ahead of time, measuring the connections between topics, assembling content automatically, and forcing the processor to do more work. I want to make the computer my partner in editing. And once I have more understanding of the CPU I should perhaps reverse my goal: no page should take less than 30 seconds to cache. If that page finds 10,000 new readers over the next few years because it's more valuable (in my experience, with the way Google works, this does happen) then I will have achieved a far more relevant kind of processor efficiency: three extra milliseconds per-page upfront processing per new reader. This is very cheap.
To better understand what efficiency actually means I have been flipping through books to learn about how a processor interprets programs, to figure out how wall-socket electricity is turned into a website. It turns out that a computer is simply billions of postal gnomes riding buses, picking up or dropping off bits. In the margin of my programming books I see which band names I can spell out in hexadecimal—0xdefcab4c00dee equals 3,922,828,592,156,142.
Assembler is the Latin of programming. It helps you understand all the languages that came later; everyone feels guilty for not learning it; but there's a lot of rote learning involved and it's easier to skip (as I have, with both assembler and Latin). But reading about registers, after my decade of programming high-level languages, is like going back to Shakespeare or the Bible and realizing how many times I've heard Hamlet or Julius Caesar, or a proverb about pigs, repeated back to me in books and speeches—it's always been there; I just didn't know to see it.