This is a thread to discuss various systems for writing parallel programs, at any level of abstraction, from the bare machine to a high-level language. It seems reasonable to say that it's easiest for most of us to think sequentially, when programming. It's somewhat difficult to consider machines in which every instruction operates ``at once'', at least for complex tasks. Even cells have some manner of sequential behavior.
It's certainly easiest for me to consider parallel operation as sequences of sequential instructions happening in not necessarily meaningful orders; how do you consider parallel operation most easily?
Which method(s) do you prefer to write a parallel program?
Do you prefer dataflow, where you build pipelines of sorts? Is it instead the event-driven method, in which one provides an interface and actions and largely leave it at that? Is simple multi-threading, needing no introduction, your favorite? Do you prefer to ignore these details and let the system parallelize your programs for you? Do you prefer a system or method I've not mentioned; if so, do explain it.
An example I like of a program in which every instruction executes ``at once'' is a program that calculates an approximation of a sum by each instruction calculating each individual value and adding them to the same location. This ignores populating memory with these instructions.
I am a fan of simple multithreading, but with green threads (like Haskell's threads, Erlang's processes, or Go's goroutines).
There are two types of threads a program may use: operating system threads, managed by the kernel; and green threads, managed by some userland library.
- Context-switching between OS threads is quite expensive, which is why the conventional wisdom is to do things like have one thread per core and distribute work between them, or have some other form of small thread pool. After a point, the more OS threads you have, the worse your program performs: it's spending so much time context-switching that it's getting nothing done!
- On the other hand, context switching between green threads can be incredibly cheap. This is because your program doesn't need to go all the way to the kernel and its scheduler. It's even possible to do more intelligent scheduling, based on knowledge of the program semantics which the OS scheduler doesn't necessarily have. Green thread implementations will generally use one OS thread per core, and multiplex the green threads onto them. But because it is totally transparent to the programmer, you can create thousands (or millions!) of green threads on the same machine, with good performance.
Multithreading when you don't need to keep track of how many threads you have, which cores they are running on, and what work they are doing, is much simpler. For example, in any sort of server program, you can just fork a new thread per request. This is the simple and obvious behaviour which people start out with, but switch away from due to the pitfalls of OS threads.
>>692 >There are two types of threads a program may use: operating system threads, managed by the kernel; and green threads, managed by some userland library. You dont know what youre talking about, the two kinds of threading models are thread locking where only one thread can run at a time while the other threads have to wait. All operating systems use this concurrency model, it obviously doesnt scale to multiple servers.
And then there is message passing where many threads can run at the same time and this can scale across many server nodes.
There are Big Data systems like Hadoop which takes message passing to huge levels. They break up search tasks on redundant data and perform them across multiple servers.
>>698 What are you talking about? >locking If one thread is running, another cannot run on the same core. This is obvious. I didn't contradict that. >multiple servers When did I say anything about multiple servers? >message passing Message passing is a way that threads may communicate, it has nothing to do with how the threads are actually scheduled.
>>700 >Message passing is a way that threads may communicate, it has nothing to do with how the threads are actually scheduled. that true, but the point I am making is that thread locking is always dependent on a scheduler on one computer and message passing is not
I've had some experience with writing parallel programs as part of my degree. I've used OpenMP, MPI, OpenCL and CUDA (and also had a brief look at pthreads and C++11 threads). I find MPI hard to deal with, but the benefits are huge in terms of performance if you can use multiple CPUs. OpenMP is usually the easiest to use but it is interesting to note that I have been able to get better performance out of OpenCL (on the same CPU), providing the problem size is large enough to outweigh the higher overhead. Obviously the problem with CUDA is that it's proprietary and the problem with OpenCL is that most implementations are proprietary (I've never got Beignet to work).
Honestly now that I'm used to the OpenCL API I'd much rather use it than anything else, especially for one device. I guess for multiple CPUs MPI is king.