ocschwar: (Default)
[personal profile] ocschwar
The last time I did a major hardware purchase I didn't have to pay much attention to how much the power consumption would cost at work, just to make sure each cabinet was hooked to a circuit able to supply the power, and hooked up safely so the electricians' union wouldn't be irked. I really doubt the same will be true next time. Costs are high enough that larger companies are starting to pay attention. Microsoft is building a data center where there was once an aluminum smelter, and Google wants to set up shop near the Bonneville Dam.

The big catch is that while processor speed has sped up through the years, energy per floating point operation has stayed mostly flat (which means processor power consumption has gone up.) My ideal data center would have the following desiderata:

1. for machines not meant for floating point operations (e.g. file & database servers, user accounts, email, Web) there should be processors designed for them, e.g. Sparc Ultra T1.

2. These should run code with every optimization appropriate for compile time done at compile time, not by a ridiculously complex pipelining structure. Run time instruction reordering and branch prediction should only do what ca't be done at compile time. Transistor space devoted to this should be appropriately sized.

3. With a smaller pipelining structure, there should be smalled distribution of the clock signal.

4. ~5 GB flash drives for holding the OSen. /var would have a filesystem image to mount over / using unionfs, for OS updates, with cumulative updates reflashed at appropriate intervals. Hard drives only on machines that need them, using a backplane with the ability to spin them down and up as needed.

5. number crunching for financial applications should be done on machines designed for fast interger and large integer calculations.

6. Floating point crunching is so Matlab-ized these days that if you read papers in any field over the last 20 years you may note the creeping in of matrix notation onto equations that can just as easily be presented in scalar form. Heavy reliance on matrix operations should indicate just how much of it should be offloaded to FPGAs living on a bus (PCIX? Infiniband? Hypertransport? It's all good) A floating point co-processor card can do tens of thousands of these operations on pairs of vectors in the time it would take a Von Neumann CPU to do just one. I'm thinking of something like a hardware implementation of the BLAS library's lower levels, for matrix additions and multiplications. Then the CPU would arrange these operations at a higher level, as well as do anything foolishly written with low-level intricacy. (Actually, with FPGAs you can anything that's repeated over large columns of floats, mappings, reductions, the works).

A computer with a good CPU and a floating point co-processor card would be worth 8 nodes on a Beowulf cluster and consume less power than one such node. Actually, for financial applications you could do the same kind of delegation for integer operations. And for database searching. So long as your DMA hardware is beefy enough.

7. Not too sure about this one, but I think a 48V DC power bus for everything is also good. UPSes would just be simple batteries on the DC side of the power supply.

8. IPMI or something similar for EVERYTHING. I would want to be able to match operation level to nearby power output from windmills if I had to...

Running this kind of data center would take more cluefull people than the boss's nephew since you can't solve every problem with just another machine and beefing up the AC. But when hiring a cluefull engineer is competitive with spending money on more juice, this is what will happen.

Date: 2006-06-24 12:00 pm (UTC)
From: [identity profile] rifmeister.livejournal.com
I'm pretty sure #1 has already happened at large business transaction processing centers, where they've been worrying for awhile about the juice. #6 has been happening for years: SSE2 is the latest implementation of low-level vectorized operations. I don't think FPGAs can really help you win further for matrix multiplies, although we do still have specialized DSPs for convolutions I think.

How much power is consumed by CPUs vs. hard drives?

Date: 2006-06-24 10:44 pm (UTC)
From: [identity profile] ocschwar.livejournal.com
Same order of magnitude.

Date: 2006-06-24 10:55 pm (UTC)
From: [identity profile] ocschwar.livejournal.com
Regarding #1, I'll point you to the industry converging onto the x86_64 instruction set, even though all the chips implementing it are astoundingly mediocre in design, and suffer heat problems. It's amazing how for 30 years now there's been an effort made to make it easy to port software from one instruction set to another (Unix & C, POSIX, Java, scripting languages ...) to get to this point. x86_64 is winning over Itanium, which is less mediocre, while MIPS and related RISC sets languish and die. What gives?

Now yes, you can still get Power5 from IBM and now the T1 servers from Sun, but look at server offerings everywhere and you will see Xeons upon Xeons. SOmething is wrong here.

FPGAs have their problems. Until producers supply each of these with a full implementation of BLAS, they will not be affordable because inhouse developers for them are rare and expensive. But I bet producers will do exactly what I just asked for in the coming years, at whihc point every Matrix handling application can use them.

Profile

ocschwar: (Default)
ocschwar

June 2018

S M T W T F S
     12
3456789
1011 1213141516
17181920212223
24252627282930

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 14th, 2025 01:52 pm
Powered by Dreamwidth Studios