Perilous Parallel.For

I’ve been playing a lot with the Task Parallel Library because I’m really excited about the new Task-Based Asynchronous Pattern. Someone on the VB forum asked a question that was more or less about parallelizing a task, and I endeavored to make an example that compared the performance of some approaches.

When I got to the one that used Parallel.For, I started having some random incorrect results. Here’s some code that reproduces what I was seeing:

using System.Threading.Tasks;
using System.Threading;

static class Class1
{
    private const int NumberOfValues = 100;

    public static void Main()
    {
        int[] values = new int[NumberOfValues];
        int expectedSum = 0;
        for (int i = 0; i < values.Length; i++)
        {
            values[i] = i;
            expectedSum += i;
        }

        for (int i = 0; i < 10; i++)
        {
            int sum = 0;
            Parallel.For(0, values.Length, (int value) =>
                {
                    sum += value;
                    Thread.Sleep(10);
                });
            System.Console.WriteLine("{0}/{1}", expectedSum, sum);
        }
    }

}

The frequency of incorrect results seemed to depend on the presence and length of the Sleep() call. This told me it was most definitely a problem with concurrent access to the sum variable. My first attempt to fix it involved making a class with a lock:

class Sum
{
    private readonly object LockObject = new object();
    private int _sum;

    public int Sum
    {
        get
        {
            lock(LockObject) { return _sum; }
        }
        set
        {
            lock(LockObject) { _sum = value; }
        }
    }
}

It didn’t help. This is where I actually started converting the project to C# and planning out a StackOverflow post, but after a few more minutes of thought I figured out what was happening.

When sum += value executes, it looks like one atomic step, but it’s not. It has to execute in at least two steps: it must fetch the current value of sum, it must apply the + operator to that result, then it must assign the result to sum. There is a potential for some thread to assign a new value to sum after some other thread has retrieved it; this will cause some results to be lost. The Sum class above didn’t solve this because the thread synchronization only blocks concurrent access to the get and set individually. To solve the problem, I had to block access to the get and set simultaneously:

// ...snip

for (int i = 0; i < 10; i++)
{
    _sum = 0;
    Parallel.For(0, values.Length, (int value) =>
        {
            lock (LockObject)
            {
                _sum += value;
            }
            Thread.Sleep(10);
        });
    System.Console.WriteLine("{0}/{1}", expectedSum, _sum);
}

This works as expected (I later updated it to use Interlocked.Add()). But if you look closely, it seems to destroy the concurrency! The point of the code is to sum an array of values, but if each thread can only execute serially there’s no value to having threads. What gives?

I didn’t believe summation is impossible to parallelize because addition is commutative. However, since adding a value to the sum must be atomic, I feel like it might not be something that can be done in parallel. When I remove the Sleep() call, this approach is faster than a single-threaded approach but much slower than some other approaches I took using the thread pool; those let each thread tally its own sum then added each thread’s individual sum at the end. I wish I could think of a way to do that elegantly with Parallel.For().

With the Sleep() calls inserted, Parallel.For() is dramatically faster than any of the other approaches I tried; I assume this is because the Sleep() calls don’t block concurrent access and the TPL might be shuffling tasks out if they’re idle. While the Sleep() call may seem useless, it could represent 10ms worth of work that I have to do once I’ve updated _sum; if I cache any values needed in the lock statement I could still gain some benefit from working with the cached values outside of the lock statement.

So if you’re using Parallel.For(), be mindful of whether any variables you touch are safe for concurrent access. If they aren’t, consider different approaches.