Re: Microsoft versus Digital Equipment Corporation
  Home FAQ Contact Sign in
alt.sys.pdp10 only
 
Advanced search
POPULAR GROUPS

more...

 Up
Re: Microsoft versus Digital Equipment Corporation         

Group: alt.sys.pdp10 · Group Profile
Author: Morten Reistad
Date: May 10, 2008 13:36

In article rcn.net>,
jmfbah wrote:
>Morten Reistad wrote:
>> In article rcn.net>,
>> jmfbah wrote:
>>> Based on my ignorance of hardware, it sounds like somebody has to
>>> reinvent the mass bus in the cache piece of gear.
>>
>> See my drawing in another post. This is what the front side bus
>> is doing. It can even be half a foot long, or more.
>
>Very nice drawing. Pictures replace hundreds of thousands of words.
>
>The drawing looked like two tightly-coupled system in a cluster
>with all comm and peripherals essentially one "wire". However I see no
>cache, or something, for the clusters to talk to each other except that
>memory that is far, far, away. I'd put a tobe-run queue in a piece of
>memory that is shared by all cpus. hmmm...it's too bad a job's context
>can't be included in that queue list; with today's app bloat trend each
>context would be too big.

The memory in the drawing is shared by all cpu's; but access
go through several layers of cache. High-end servers typically
add a sideways transport between the caches.

and, yes, caches these days are very tightly associated with
physical cpu's. Free-standing caches are rare. -- But the memory
system can have lots of accelrators; where you can write-through
to a chip-local little cache on the memory side, and it will
update the shadowed DRAM independently.
>>
>> Note also from the drawing; all cpu devices can address all the stuff
>> on the front side bus; but due to caches and routing of interrupts
>> it is wise to stay on the same cpu if you have no good reason to
>> leave.
>
>Sure. I'm not saying that a job has to go to another cpu just because
>it's there. An idle cpu should be able to pick up where another
>cpu left off or when a device issues a done interrupt.

Yes, in principle any cpu can take over at any point, but if a
job, or interrupt handler, bouces around too much it will lead
to a lot of thrashing of the cache. So the scheduler has a
sticky bias, where it will reschedule a job on the same cpu if
there aren't any good reasons for moving it. Ditto for
interrupts. They can be scheduled to specific cpus. This
leads to a much better load-balancing on a system. These things
can of course move around to avoit bottle necks. But over
several millisecond times the irq and cpu affinities are pretty
stable.
>>>> It has improved; the original 66 MHz Pentium had
>>>> a 33 MHz front side bus - now, the FSB has recently been upgraded to
>>>> as high as 1,033 MHz. But one has perhaps 4 GHz chips - or chips with
>>>> *four* 2 GHz processors on them, so the FSB speed has improved only
>>>> 1/2 or 1/4 as much as the speed of the CPU. (Actually, it's even worse
>>>> than this for some complicated technical reasons - back then, the
>>>> memory didn't have to be interleaved, but now it is.)
>>>>
>>>> And the fact that today's operating systems and multimedia
>>>> applications are demanding these *enormous* quantities of RAM - so
>>>> that a 16 MHz 360/195, a supercomputer in its day, is now "good
>>>> enough" for a pocket organizer these days - isn't helping.
>>> At some point in time, hardware gurus have to bite the bullet and
>>> stop fulfilling those demands. Then software will be forced to
>>> learn how to code efficiently. Code is sloppy today because
>>> nobody's had to deal, long-term, with a minimum of resources.
>>
>> We have reached the wall of cpu speed. Standard cpus have been at
>> somewhat below 3GHz for a long while now, but you can squeeze out
>> 4 or even 5 at the price of a baking hot device.
>>
>> Now they are tending to the bits they still can deal with. They
>> can make bigger&better caches, link them with hypertransport,
>> increase the speed (and shorten the distance) of the front side
>> bus; and make memory faster. But memory _LATENCY_ seems to be
>> a place where we are hitting the wall too.
>
>For this systems have to be watched to see what kinds of things
>have to be fetched from memory. Is it data or is it code? Is
>it shared by more than one [I cannot think of the correct word]
>job or process? Is it repeated? If so, what's the rate? Many
>times a second or an hour or a day?

It helps a lot to just utilise the caches as beast as we can,
and rather sacrifice a little cpu time. This is what sticky
schedulers are all about.
>>> So we have probably two, maybe three, generations of software
>>> types who have never had to think about core limits. [Don't you
>>> others start yapping Unix at me...look at the apps that are running
>>> on top of it before you start with the same old stuff.]
>>
>> People dealing with servers serving millions of daily users
>> know to keep their code in shape. They aggressively work
>> to keep all active code inside the cpu cache; e.g. 8 megabytes, and
>> they think of memory as we used to think of swap space.
>
>Of course, they have to think that way. I've already adjusted my
>thinking to slot memory as far away as a swapfile on disk.
>It's that cache that needs to be thought of as memory like we used
>to think of memory. Adding a new layer of cache moves the concept
>of memory to a lower level...like card decks (slight exaggeration).

One amusing consequence is that we could make a blazingly fast
pdp10. A chip density like a current Xeon or Opteron would fit
2-4 PDP10s and 4-8MW of static memory, and you could use the normal
memory as disk drives.
>>>> Today's
>>>> computer scene is the *proof* that Parkinson's Law (in the generalized
>>>> form - tasks and demands expand to fill the available resources)
>>>> applies to computers.
>>> But it's worse. Nobody has ever had to experience a limit so all those
>>> things we learned as we went along over the years, has to be learned
>>> by the current coders overnight. This is going to be a culture shock
>>> of humungous proportions....something on the order of soccer moms
>>> having to go out and shoot, butcher and cook dinner.
>>
>> We are meeting that limit real soon now.
>
>You have been saying that for 5-8 years. It sounds like there is going
>to be a big puff of smoke when the requirements all collapse at the same
>time. It will make the Y2K mess, looking back, seem like a child's
>game.

We have already met the limit on clock speed for a processor. We
still make headway on the energy density, and on the number of
units we can fit on a chip; but we are slipping off the curve of
Moore's law at least on the second one. We are keeping up on
I/O and memory access bandwidth, but we have hit the wall on
memory latency. Caches are still improving, though.

On a timeline I would expect things to happen in roughly
this order :

Clock speed max : Reached in 2006.
IO and memory latency : Reached 2008
Units per chip : still 3-4 years left ; 2011?
Energy density per mips : Approaching a limit. 2013?
# of full processors per chip : 2013?
Ram size ; ?

If things go the way it looks now, we will end with a
standard processor of 3.5GHz, addressing 16G of ram,
with 64M L2 cache, 4-way chips, unified cache, hypertransport,
and using 40-160 watts per chip.

Such a system is an extremely powerful one for all the
normal uses, except where we battle P, NP or O(exp(n)).

-- mrr
no comments
diggit! del.icio.us! reddit!