Map/reduce and the value of a good range library

I happened across an article extolling the virtue of map/reduce. Unfortunately, it only used map and reduce, and the result was kind of janky. Worse on average than using a for loop. Let’s look at what we can do better.

Today, we’re going to be using the D programming language, since it’s probably the most expressive language in this area. The specific problems we have to solve have to do with summing some numbers that have been filtered by some external function — isPrime in the example.

Sum of the prefix of an array

The first task is simply to sum the first ten items of an array:

The recommended solution:

return array
    .filter((e, i) => i < 10)
    .reduce((a, e) => a + e, 0);

This is awkward. There’s a filter applied to every element in the array, there’s a reduce function…if you’re keen on basic functional programming, you can understand it quickly, but I’d still want a comment.

And performance-wise, I hope the array is short, because this is going to look at every element in the array. If it’s a million items, you’re iterating over all million items.

In imperative D:

ulong result;
foreach (item; array[0..10])
    result += item;
return result;

That’s not so bad, is it?

And with ranges:

return array.take(10).sum;

Well, it’s hard to beat that.

Sum the first ten primes

This is like the previous task, just with a filter. This is where the functional way of handling lists should start to shine, even in javascript, right?

Here’s the recommended solution again:

return array
    .filter(isPrime)
    .filter((e, i) => i < 10)
    .reduce((a, b) => a + b, 0);

Well, it’s unfortunate that we haven’t managed to improve on the “take the first ten items” thing. Let’s try it in imperative D:

ulong sum = 0;
ulong primes = 0;
foreach (v; array)
{
    if (isPrime(v))
    {
        primes++;
        sum += v;
        if (primes >= 10) break;
    }
}
return sum;

Not great. Not terrible, not great.

So let’s introduce ranges:

return array
    .filter!isPrime
    .take(10)
    .sum;

Again, it’s hard to imagine doing any better.

Add in a multiply

This time, instead of just adding up the primes, we also want to multiply the prime by the difference between it and the previous prime.

Again, let’s check the recommended solution:

return array
    .filter(isPrime)
    .map((e, i, a) => i > 0 ? e * (e - a[i - 1]) : e)
    .filter((e, i, a) => i >= a.length - 10)
    .reduce((a, e) => a + e, 0);

Holy hells that’s unreadable! My eyes are literally bleeding right now.

Let’s try to do it in imperative D:

ulong result = 0;
ulong lastPrime = 0;
uloung count = 0;
foreach (v; array)
{
    if (!isPrime(v)) continue;
    if (lastPrime == 0)
    {
        result += v;
    }
    else
    {
        result += v * (v - lastPrime);
    }
    lastPrime = v;
    count++;
    if (count >= 10) break;
}
return result;

This is a little verbose, but it’s clear.

Let’s try it in pure range operations:

auto primes = array.filter!isPrime.take(10);
auto previousPrime = primes.save.take(9);
return primes.front +
    lockstep(primes.skip(1), previousPrime)
        .map!(x => x[0] * (x[0] - x[1]))
        .sum;

Okay, that’s kind of complex. Let’s break it down.

The first prime is a special case. We’re going to handle that separately.

For all other primes, we’re handling them in the last three lines. We’re going to join each of them with its predecessor, and we’re going to take the product of the prime with the difference between it and its predecessor.

So that’s joining primes 1 through 9 with primes 0 through 8 and doing a bit of simple math. Then we sum them up.

Now we’ve got to bring in the first prime again, since it was a special case. The value we need to add is just the prime itself, because that’s how the problem was defined.

This is kind of awkward. Is there anything better we can do? Why, yes! We can do a hybrid!

auto primes = input.filter!isPrime.take(10).array;
auto result = primes[0];
foreach (i, p; primes[1..$])
    result += p * (p - primes[i - 1]);
return result;

By explicitly introducing and iterating through the primes array, we can quickly and clearly refer to the previous prime number. The result is both concise and obvious.

Summing the last ten primes instead

This version’s the same as the previous problem, but we want to run through the last ten values.

The recommendation is:

const result = array
    .filter(isPrime)
    .map((e, i, a) => i > 0 ? e * (e - a[i - 1]) : e)
    .filter((e, i, a) => i >= a.length - 10)
    .reduce((a, e) => a + e, 0);

So we take all the primes, figure out the value we’d want to add to the sum, filter out all but the last ten, and take the sum. That just sounds inefficient. And ugly.

Okay, so let’s do this up! In the imperative style:

ulong sum = 0;
ulong lastPrime = 0;
ulong primes = 0;
for (int i = array.length - 1; i >= 0; i--)
{
    if (array[i].isPrime)
    {
        if (lastPrime != 0)
        {
            sum += lastPrime * (lastPrime - array[i]);
        }
        lastPrime = array[i];
        primes++;
        if (primes >= 11) break;
    }
}
if (primes < 11) sum += lastPrime;
return sum;

So here we've got relatively direct logic: we'll look through the array, end to beginning, to look for up to eleven primes. This lets us calculate the appropriate values for the last ten. When we encounter a new prime, we have enough data to figure out the right value for the previous one.

If we reach the beginning of the array without having found an eleventh prime, we haven't dealt with the tenth prime. So we go back to our special case from the problem definition and just add the tenth prime itself.

How about a more range-oriented style?

auto primes = array.reverse.filter!isPrime.take(11).array;
ulong sum = 0;
foreach (i, p; primes[0..$-1])
{
    sum += p * (p - primes[i + 1]);
}
if (primes.length < 11) sum += primes[$-1];
return primes;

Again, the parts that need to deal with array indexing go into procedural, imperative code, while the parts that deal with mapping and selecting parts of the sequence use the range-oriented API. And base cases and exceptional parts of the algorithm are called out explicitly.

Can we improve ranges?

What's missing for making the range versions better?

The D range API contains a function chunks. We can use it to iterate through each successive pair of items in an array. (Or each successive triplet, or quadruplet, or so on.) However, this gives us distinct chunks. So taking pairs from the list of ten primes gives us five pairs -- first and second, then third and fourth.

What we need here is a sliding window. We could then write the last example as:

auto primes = array.reverse.filter!isPrime.take(11).array;
auto result = primes
    .save
    .slidingWindow(2)
    .map!(x => x[0] * (x[0] - x[1]))
    .sum;
if (primes.length < 10)
{
    result += primes[$-1];
}
return result;

Everything except for the exceptional case is handled succinctly.

Conclusion

Ranges and the map/reduce style can give you very succinct code that's still quite understandable. However, aside from the most simple cases, map/reduce/filter alone produces rather obtuse and inefficient code. This style of programming relies on having a rich library of range-oriented functions at hand.