Lisp like filtered container views in C++

January 3, 2017

Timo Geusch

6-Minute Read

Lisp dialects like Clojure have a very rich set of algorithms that can present altered views on containers without modifying data in the underlying container. This is very important in functional languages as data is immutable and returning copies of containers is costly despite the containers being optimised for copy-on-write. Having these algorithms available prevents unnecessary data copies. While I am not going into mutating algorithms in this post, the tradition of non-modifying alghorithms that work on containers leads to an expressiveness that I often miss in multi-paradigm languages like C++. As an example I will show you how to use a filtered container view in C++ like you would in Clojure.

Clojure example

Let’s assume that we have a vector of integers and want to sum up only the odd values. Yes, this is your typical contrived example but bear with me. In Clojure, you would write the code something like the code below with a filtered container view only showing the elements we are interested in:

(def numbers [ 1 2 3 4 6 8 9 11 14 15 19 ])

(reduce + (filter odd? numbers))

Let’s take this apart.

(def numbers …) creates a vector of integers. No surprises here, even if the syntax might look a little odd to C++ programmers.

The second line is a little more interesting. As with any Lisp dialect you have to read it inside out. If you do, the code decomposes into two stages:

(filter odd? numbers) gives us a lazy sequence containing only the odd numbers in the ’numbers’ vector. There is no copying of the data in underlying containers, the lazy sequences yields a new element until the filtered view of the underlying container is exhausted.
(reduce + …) applies the function ‘+’ to the sequence. Yes, that is reduce as in the second part of the map-reduce algorithm. I could have used apply instead of reduce and ended up with the same result, but I prefer reduce in this case as reduce is more descriptive for the operation we’re undertaking.

But what’s wrong with writing the following code in C++?

In C++, you typically would write the above in an imperative fashion, maybe like so:

int numbers[] = { 1, 2, 3, 4, 6, 8, 9, 11, 14, 15, 19 };

int sum = 0;

for (auto number : numbers) {
  if (is_odd(number))
    sum += number;
}

So what is wrong with the C++ code? On the face of it, nothing, at least if you have a C++ 11 compatible compiler. It’s legal, comprehensible C++ and produces the desired output. We’ve all written a large number of loops like this, over and over again.

What’s wrong with it is that it lacks the expressiveness of the Clojure version of the code.

Notice how much more compact the Clojure version is? And that is with me excluding the definition of is_odd() to make the comparison a little fairer to C++, as C++’s standard library doesn’t contain a built-in equivalent of Clojure’s odd? predicate. Yes, is_odd() is easy to write, but if we add it to the mix, the difference in code size and expressiveness is even more apparent:

template <typename T>
T is_odd(T const n) {
  return (n % 2) == 1;
}

int numbers[] = { 1, 2, 3, 4, 6, 8, 9, 11, 14, 15, 19 };

int sum = 0;

for (auto number : numbers) {
  if (is_odd(number))
    sum += number;
}

If you are counting, the example above contains 10 lines of code, not including the blank lines. It accomplishes the same as the two lines of Clojure above. In all likelihood the C++ code executes faster, but that really only matters in small parts of your code.

Also keep in mind that the 10 lines are counted without all the additional code that we would want around is_odd() to prevent it from being applied to types that aren’t numeric. Trying to ensure that we receive a compilation error when we try to apply is_odd() to, say, a string, will result in additional template voodoo involving SFINAE and type traits. I’ve written code like that and while it works and performs well if done correctly, it’s what I call “three Aspirin code” - one while you’re writing it and two more when you look at it six months later.

Boost to the rescue

Like most C++ developers, when I want to use idioms or functions that are not part of the standard library or the C++ core language, the first (and most likely, last) place I look are the Boost libraries. Unsurprisingly the Boost.Range library has adapters that do exactly what I wanted. Specifically, the boost::adaptors::filtered range adaptor. Using that, we can write the code above in a much more functional and concise manner:

struct is_odd2 {
  bool operator()(int n) { return (n % 2) == 1; }
};

int numbers[] = { 1, 2, 3, 4, 6, 8, 9, 11, 14, 15, 19 };

auto sum = boost::accumulate(numbers | boost::adaptors::filtered(is_odd2()), 0);

OK, so we have to create an is_odd2 functor instead of the simple is_odd() function because that’s what C++ algorithms expect. For simplicity I created a single int-specific version of is_odd2, but in production code you would most likely templatise it as I did in the original is_odd() example.

For readability, I much prefer the second version. This might be personal preference, but there are a couple of important points:

The second version is much more descriptive. It does require you to either be familiar with Boost.Range or be willing to read up on it, but it states intent much clearer than the hand-rolled loop.
Less code wins in the maintainability stakes. Unless you pick the wrong algorithm, in all likelihood you have fewer bugs in 20 lines of code than in 200. Also, writing less code allows the reader of the code - which might well be your future self - to see more code at the same time, which improves the overall understadability of the code. Admittedly, if you achieve your goal of writing less code by using libraries this only holds true for well tested libraries.

Summary

The pre-canned algorithms that are part of the C++ standard library and their equivalents in Boost are one of the most underused C++ features. For some reason, a lot of C++ developers I know aren’t very familiar with these libraries and as a result, tend to re-implement existing algorithms using hand rolled versions. Yes, the range for loop in combination with auto has at least simplified the original C++ for loop when containers are involved, but it’s still less descriptive than just writing boost::accumulate (or std::accumulate for that matter).

The Lone C++ Coder's Blog