Re: Concurency in the functional world
  Home FAQ Contact Sign in
comp.lang.functional only
 
Advanced search
POPULAR GROUPS

more...

 Up
Re: Concurency in the functional world         

Group: comp.lang.functional · Group Profile
Author: Neelakantan Krishnaswami
Date: Nov 19, 2007 15:05

In article <<7x4pfjqmzy.fsf@ruckus.brouhaha.com>>,
Paul Rubin <> wrote:
> Neelakantan Krishnaswami cs.cmu.edu> writes:
>> 1. Message-passing with ownership transfer. Here, a message is
>> constructed as a data structure, and the ownership of the memory
>> associated with it is transferred from one process to another
>> when the message is sent.
>>
>> #2 is what you do when you don't have the type structure to enforce
>> safe ownership tranfer, and if you're sending large messages back and
>> forth it can be painful to copy a message.
>
> Hmm, you're saying the message is some big data structure consed out
> of nothingness? If yes, is that better than copying? If not, what
> happens if it shares structure with something else in the local
> process?
>
> This is really interesting, are you saying #1 uses something like
> linear types to keep track of which data belongs to which process?

Yes, that's right. That's what maintains strong process isolation.

To understand why isolation is important, it's good to look at an
idealized model of a multicore machine:

CPU 1 . . . CPU n
| |
Cache 1 Cache n
| |
... global memory ...

So we have a global memory, which n CPUs act on. The CPUs each have a
cache to speed access to the global memory. The existence of a cache
means that whenever a CPU does a write to a memory location, all of
the other cores need to be notified, in case they also have that
location in their cache. Otherwise the CPUs might not see a consistent
view of the heap.

To make this communication fast, each CPU needs to be connected to all
the others, which is an O(n^2) dependency. This is an awful lot of
communication to require on every write, which is why hardware vendors
have been steadily weakening the memory consistency guarantees they
offer us programmers. Another way of describing this is that the
abstraction of global shared memory requires a lot of message passing
under the hood -- communication that a message passing architecture a)
makes explicit, and b) potentially lets you avoid doing when you don't
need to.

Now, note that if each CPU has exclusive access to part of the global
heap, then there are no cache consistency problems, since we know that
other CPUs will not make use of that part of the heap. In Singularity,
when processes communicate, they transfer ownership of the message
data when they send a message. This means that even though some data
may be in a process's cache, it won't look at it again in the future.

This is all important, because at the level of hardware, even
functional languages do a lot of mutation. Whenever you use a copying
garbage collector, the pointers representing values change, which
looks like an update from the point of view of a cache!

Erlang doesn't have linearly typed messages, so at a conceptual level
it copies the whole message from one process's heap into another
process's. This also maintains the same process isolation invariant,
at a slightly higher cost.

Now, in the short term, our hardware does have a lot of cache
consistency circuitry, and it's foolish to ignore it. For example,
Thomas Lindgren mentioned in another post that some implementations of
Erlang use a global heap under the covers, and don't actually copy
messages sent between processes on the same node.

This state of affairs won't last, though, and it's unwise to base a
language design on the idea it will always be there. Erlang is future
proof in this regard -- the more I learn about it, the more its design
impresses me. It's hard to make a language that can survive profound
architectural changes with its performance model intact!
>> 3. Message-passing with purely functional data and a shared heap.
>> Message passing is just as cheap as it is in 1, but you need
>> parallel garbage collection to maintain the fiction of a
>> global heap. GC then becomes the (only) concurrency bottleneck
>> in your program.
>
> I agree this is very clean conceptually. But, I think you either need
> a lot more messages than shared-memory approaches would need, or else
> you have to push what would be local computation into the remote
> process, sacrificing parallelism.

Message passing only makes explicit the communication that shared
memory is already doing.

Note that when I say "bottleneck", I mean that there's a global,
implementation-imposed limit on the parallel speedup a program can
achieve. So even if you have a ridiculously parallel problem, you
might not be able to get full utilization.

Any shared-memory design will face this as a fundamental problem.
Even languages that are purely functional at the user level may face
this problem if they need garbage collection.
>> 5. Software transactional memory. You have shared memory and threads,
>> and optimistic concurrency control to detect interference and
>> rollback and restart ruined computations. The expense is the
>> highest here, since the primitives have to do so much work, and
>> since their efficient behavior relies on very particular
>> assumptions about the workload.
>
> In the uncontended case the primitives don't do much, and measurements
> in the original papers show STM beats the traditional locking approach
> on some big SMP machines. Maybe there are specific workloads that can
> make STM lose to message passing.

Any time the assumptions underlying optimistic concurrency control
break, STM loses big time. The easiest way to lose is if you have any
long-lived transactions. This makes it quite dangerous to actually
*use* the compositionality of STM. :(

--
Neel R. Krishnaswami
neelk@cs.cmu.edu
no comments
diggit! del.icio.us! reddit!