
The last time I did a major hardware purchase I didn't have to pay much attention to how much the power consumption would cost at work, just to make sure each cabinet was hooked to a circuit able to supply the power, and hooked up safely so the electricians' union wouldn't be irked. I really doubt the same will be true next time. Costs are high enough that larger companies are starting to pay attention. Microsoft is building a data center where there was once an aluminum smelter, and Google wants to set up shop near the Bonneville Dam.
The big catch is that while processor speed has sped up through the years, energy per floating point operation has stayed mostly flat (which means processor power consumption has gone up.) My ideal data center would have the following desiderata:
1. for machines not meant for floating point operations (e.g. file & database servers, user accounts, email, Web) there should be processors designed for them, e.g. Sparc Ultra T1.
2. These should run code with every optimization appropriate for compile time done at compile time, not by a ridiculously complex pipelining structure. Run time instruction reordering and branch prediction should only do what ca't be done at compile time. Transistor space devoted to this should be appropriately sized.
3. With a smaller pipelining structure, there should be smalled distribution of the clock signal.
4. ~5 GB flash drives for holding the OSen. /var would have a filesystem image to mount over / using unionfs, for OS updates, with cumulative updates reflashed at appropriate intervals. Hard drives only on machines that need them, using a backplane with the ability to spin them down and up as needed.
5. number crunching for financial applications should be done on machines designed for fast interger and large integer calculations.
6. Floating point crunching is so Matlab-ized these days that if you read papers in any field over the last 20 years you may note the creeping in of matrix notation onto equations that can just as easily be presented in scalar form. Heavy reliance on matrix operations should indicate just how much of it should be offloaded to FPGAs living on a bus (PCIX? Infiniband? Hypertransport? It's all good) A floating point co-processor card can do tens of thousands of these operations on pairs of vectors in the time it would take a Von Neumann CPU to do just one. I'm thinking of something like a hardware implementation of the BLAS library's lower levels, for matrix additions and multiplications. Then the CPU would arrange these operations at a higher level, as well as do anything foolishly written with low-level intricacy. (Actually, with FPGAs you can anything that's repeated over large columns of floats, mappings, reductions, the works).
A computer with a good CPU and a floating point co-processor card would be worth 8 nodes on a Beowulf cluster and consume less power than one such node. Actually, for financial applications you could do the same kind of delegation for integer operations. And for database searching. So long as your DMA hardware is beefy enough.
7. Not too sure about this one, but I think a 48V DC power bus for everything is also good. UPSes would just be simple batteries on the DC side of the power supply.
8. IPMI or something similar for EVERYTHING. I would want to be able to match operation level to nearby power output from windmills if I had to...
Running this kind of data center would take more cluefull people than the boss's nephew since you can't solve every problem with just another machine and beefing up the AC. But when hiring a cluefull engineer is competitive with spending money on more juice, this is what will happen.