What Makes Parallel Programming Difficult?
Posted: Wed Jun 01, 2011 2:10 pm
Great post over at /.
Intel's Aater Suleman writes about why parallel programming is difficult. ... I was unaware ... that a major challenge in multi-threaded programming lies in optimizing parallel programs not just getting them to run. His analysis is insightful and the case study is very enlightening if you are unfamiliar with parallel code debugging.
Pretty good intro to this stuff.Some jobs are easy to parallelize, e.g., if it takes one guy 8 hours to paint a room, then two guys working in parallel can paint it in four hours. Similarly, two software threads can convert a picture from color to grayscale 2x faster by working on different halves of the picture concurrently.
There also other programs that are sequential in nature, e.g., two guys will not be able to cook 2x faster than one guy because the task isn’t fully parallelizable: there are inter-task dependencies and the cooks end up waiting for each other at times. Unfortunately, a lot of programs have artificial inter-task dependencies because the programmers wrote them with an ST mind set. For example, consider this code excerpt from the H.264 reference code (I have removed unnecessary details to highlight my point):Notice how the variable mb is written every iteration and no iteration uses the mb written by the previous iterations. However, mb was declared as a global variable probably to avoid its repeated allocation and deallocation. This is a reasonable ST optimization. However, from an MT standpoint, the loop iterations of the OUTER loop now have a dependency among each other and they cannot be run in parallel. To parallelize this code, the programmer has to first identify that the dependency is artificial. He/She then has to inspect 1000s of lines of code to ensure that this assumption isn’t mistaken. Lastly, he/she has to change the entire code to make mb a local per-iteration variable. All this is difficult to achieve (I parallelized H.264 for this paper).macroclock_output_data mb; //Global variable
for (...) // The OUTER loop
decode_slice(...);
if(mb->last_mb_in_slice)
break;
void decode_slice(...)
...
mb = ...