Same old tools, new uses

In case you missed it: Part 1

This is part 2 of my series on functional programming concepts for OO / imperative developers. If you’re a not really clear what imperative means, I’d direct you to Part 1 where I attempt to illustrate some of the conceptual differences between imperative and functional programming. In this part we’re going to take the two tools we established in the previous post and start doing things with them.

Loops

How the *%^@ are you supposed to iterate without mutating a variable? This was my first reaction as well, I think it’s a fairly universal reaction. Let’s take a look at our arsenal of traditional tools for dealing with iteration.

For loops

for (int i = 0; i < 10; ++i) {
    // do something interesting here
}

We declare the variable that should be used to keep track of our progress iterating through the sequence. I’m sure you’ve written such a line thousands if not tens of thousands of times throughout your programming career, it’s canonical, it’s idiomatic, it’s simply the way things are done. It definitely involves mutating a variable many times.

If we wanted to do the same thing to a collection that supports direct indexing such as an array or List<T> then we could basically do the same thing.

int[] numbers = new [] { 43, 75, 6, 2, 91, -5 };

for (int i = 0; i < numbers.Length; ++i) {
    // do something with numbers[i]
}
List<int> numbers = new List<int> { 43, 75, 6, 2, 91, -5 };

for (int i = 0; i < numbers.Count; ++i) {
// do something with numbers[i]
}

This sort of fails though for any collection beyond your standard lists and arrays. Stuff like LinkedList<T> or any tree type structure won’t be indexable this way and although Dictionary is indexable, it doesn’t use a pre-defined sequence. There’s a pretty standard solution to this, the Iterator pattern. Sure enough, all the collections in C# implement IEnumerable which has 1 method, GetEnumerator which returns our iterator. We call MoveNext on the IEnumerator and it knows how to go to the next element in the collection and we call Current to get the current element out of the collection.

LinkedList<int> numbers = new LinkedList<int>(new [] { 43, 75, 6, 2, 91, -5 });

for (IEnumerator<int> iter = numbers.GetEnumerator(); iter.MoveNext();) {
    int num = iter.Current;
    // do something with num here
}

Not quite as nice as the for loop with all that getting of the IEnumerator, but it works for Arrays, Lists, LinkedLists, Dictionaries, and well, anything that implements IEnumerable (which is everything collection’ish). This version like the last has a counter type variable that we keep doing an operation on to move to the next element. Where with the for loop we were incrementing the i variable, with the Enumerator we’re calling MoveNext. You could argue that this is a lot nicer since the actual mutation bits happen inside the Enumerator but it’s still pretty clear you’re telling an object to mutate to the next value. In C++ you actually use the ++ operator to move the iterator to the next element.

Now perhaps your language has gone out of its way to wrap up this concept in some syntactic sugar, C# foreach is a good example of this. This allows us to use the IEnumerator to traverse the list without having to see any of that gunk.

LinkedList<int> numbers = new LinkedList<int>(new [] { 43, 75, 6, 2, 91, -5 });

foreach (int num in numbers) {
    // do something with num here
}

This is much nicer, and has a lot less boilerplate than the initial for loop. This version doesn’t have variables floating around getting updated so that part is cleaned up. Generally I would be fairly happy with this solution in a language, it is certainly “good enough” for every day iteration needs.

Loops without loops

We still haven’t answered the question of how you iterate without ticking some type of counter up and up and up until you hit your end condition and stop looping. The answer, and I’m sure you’ve seen it before, is recursion. Before you look at this code, know that this is not how I would advocate writing all your loops, as if you should ban for and foreach and copy paste this little snippet around everywhere. This activity is about spotting patterns so that we can leverage some proper reuse of concepts not about writing the most optimized version of this particular iteration.

void DoSomething(int[] numbers, int index) {
    if (index == numbers.Length) {
        return; // if we are at the end of the list, stop recursing
    }

    // do something with numbers[index] here
    // recursively call ourselves with the next index
    DoSomething(numbers, index + 1);
}

var nums = new [] { 1, 4, 5, 6, 7, 12, 1 };
DoSomething(nums, 0);

So what on earth is going on here? We have a function that checks to see if the value of its index variable is equal to the end of the list, if so it returns without doing anything. If not, then we can proceed to do whatever it is we want to do as part of this iteration. We then proceed to go to the next element in the collection by calling the same function again but with an index of index + 1. You’ve probably seen something like this before as a canonical definition of recursion, usually it’s factorial but it comes up a lot when iterating over tree type structures as well. It’s not the prettiest thing, but hey, we iterated over a collection and did not in fact mutate a variable! What we did was allow the local state of the function we called over and over hold the current state for us. If you think of the stack as we recursively call the DoSomething function over and over, each stack frame is effectively what the i variable was in the for (int i = 0; i < numbers.Length; ++i) example.

<DoSomething index = numbers.Length>
...
<DoSomething index = 2>
<DoSomething index = 1>
<DoSomething index = 0>
<Main>

Obviously having to create a method and pass in 0 as the start point is not very nice, although the 0 for the initial parameter is pretty analogous to the int i = 0 we use in the for loop. Now I’m not advocating that recursive iteration is the only way to do things or that it is inherently superior to say, the iterator pattern used by C#’s foreach, but it is a way to iterate without any of your mutable state “leaking out” of your iteration. A very common misconception about functional programming is that it contains no state and that it doesn’t allow any mutation of state. A more accurate way of describing the functional approach would be that state is contained in the local scope of functions. Once you’re in that world where your functions aren’t having side effects and you can get values in and out of those functions, the immutability of everything inside that function becomes rather academic. So, we could move forward in a functional style and still use foreach and everything is hunky dory as long as we stick to the pure function and only working with values parts.

And now it doesn’t even have a body

In all of the examples we’ve done so far, whether using for, foreach, or a recursive function, one thing has been constant. We’ve done a lot of ceremony and work just to execute our actual logic. Wouldn’t it be nice if we never had to write a loop again, if we could boil down the essence of iteration to its bare essentials and then when we need iteration we could just say “hey do this for each thing in the collection”. Well good sir or madam, you are indeed in luck this fine day. C# 3.0 introduced a language feature called lambda, it’s going to be your new best friend. A lambda simply put is a function that you can declare and use without needing to give it a name. You can give it a name, but you don’t have to. In this sense it is sometimes called an anonymous function. Really this sort of thing isn’t all that strange. In the above example, what name did we give to the int[] we passed into DoSomthing? That is an anonymous int[] and since C# has the concept of delegates, we can pass in an anonymous one of those as easily as you can pass around an anonymous int[].

Fortunately for us C# has developed some very good functional tendencies in the last few versions and as a result this whole process is much smoother than it would be in, say Java which does not currently have lambdas. Ruby has good support for lambdas and Python has a fairly limited version of lambda that is restricted to single line expressions. Still, better than nothing.

For those of you not up to speed on these new fangled “delegates” and “lambdas” things, let me give you a brief introduction. Delegates in C# are like a variable that points to a function instead of your regular variables that reference a piece of data. Like regular variables, delegates must have a type, specifically they must match a type signature of some sort. You may see C# code that looks like this

delegate int MyDelegateType(int a, string b);

This declares a new type MyDelegateType which is a function that takes an int and a string and returns an int. You may see this written as (int, string) -> int or (int * string * int) which both mean, “this is a function that takes an int and a string and returns an int”. C# has already defined several such delegates for us, and apart from creating your own for the sake of conciseness, you can go a long way with just the built in types. Action, Action<T>, Action<T1, T2> and so on represent a function that takes nothing, T, (T1, T2), etc. and returns void. So Console.WriteLine is an Action<string>. As we move forward, we’re not going to write many Actions as they tend to not return values and anytime you’re calling a function and it doesn’t return anything you can be sure there are side effects, otherwise what possible purpose could it serve? The other delegate type C# has for us is Func<T>, Func<T1, T2>, Func<T1, T2, T3> and so on. Unlike Action, Func always returns a value. If you specify a Func<T> then you are saying this is a function that takes zero parameters and returns a value of type T. If you declare a Func<T1, T2> then you are saying it is a function that takes a T1 and returns a T2.  The int version of Math.Min is a Func<int, int, int>, it takes two ints as parameters and returns an int. We could create a new variable of type Func<int, int, int> and assign Math.Min to it and then invoke the delegate and it would in turn invoke Math.Min for us. In this way, we can pass around functions as pieces of data which it turns out is really handy. We can also create a new function (a lambda) and assign it directly to our variable. Using the Func<int, int, int> example from before we could do the following.

// this is our variable of type (int, int) -> int
Func<int, int, int> intAndIntToInt;
// notice we're not calling Math.Min,
// we're assigning the function to our variable
intAndIntToInt = Math.Min;
// this calls Math.Min just a sure as if we wrote Math.Min(6, 3)
intAndIntToInt(6, 3);
// creates a new lambda of type (int, int) -> int that returns the
// sum of the params and assigns it to the variable
intAndIntToInt = (a, b) => a + b;
// this return 9
intAndIntToInt(5, 4);

As you can see, we made up a new function on the fly, one that takes in two parameters, named a and b and returns the sum. C# has special syntax for one line lambdas that allows you to omit the return statement. Written out the long way (and the way you must use for multi-line lambdas), we would get the following.

Func<int, int, int> intAndIntToInt;
intAndIntToInt = (a, b) => {
    return a + b;
});

If you squint and turn your head to the side a bit this really looks pretty close to your normal everyday function, except there’s no return type and it’s got the => between the params and the function body. If we can just make our own functions willy nilly and pass them around like any old variable, then perhaps we can make “hey do this for each thing in the collection” actually happen. Since we’re approximating foreach I’ll name the function after that keyword.

void ForEach(int[] numbers, Action<int> action) {
    foreach (var num in numbers) {
        action(num);
    }
}

var nums = new [] { 1, 4, 5, 6, 7, 12, 1 };
// parens around the parameter are optional if there is only 1 param
ForEach(nums, num => // do something with num here);

So we call ForEach, passing in the num and our Action<int> that will be invoked for each element in the array. This of course only works on int[], but if you’ve ever touched generics in C# or Java or templates in C++ it’s a short hop skip and a jump over to a nice generalized version.

void ForEach<T>(IEnumerable<T> values, Action<T> action) {
    foreach (var value in values) {
        action(value);
    }
}

The usage of ForEach doesn’t change one bit and now the majority of the code on our single line is the business end, not the “make the compiler happy” end. This is incredibly powerful. While we have not introduced new syntax into the language, with the power of C#’s extension methods, we can get fairly close.

static void ForEach<T>(this IEnumerable<T> values, Action<T> action) {
    foreach (var value in values) {
        action(value);
    }
}

// this allows us to do
var nums = new [] { 1, 4, 5, 6, 7, 12, 1 };
nums.ForEach(num => // do something with num here);

Even less non business cruft to get in the way*. As you may have suspected those smart people who work on C# have already implemented ForEach as part of System.Linq which also was introduced in C#3 (although only for the List class which seems like such an oversight). Thanks to Linq and a few other extension methods I wrote, I have not written a foreach loop in a very long time. This is not a coincidence.

*Note this example is slightly simplified. Extension methods must be declared as part of a static class.

These functions that take functions are called “higher order functions” and functional programs are absolutely chocked full of them. There are functions that take two or more functions and make a new function that runs all of them in sequence. There are functions that take in a function and wrap it in some functionality such as logging or caching or communicating over a network without the wrapped function having any idea it’s being used in such a context. Higher order functions are very cool, and they are the reason we can build reusable structures when we follow a functional style.

Different but the same

Iterating over collections sometimes feels like the only thing our programs do, so naturally we run into situations where just a foreach style loop by itself isn’t sufficient. There are only so many cases where you want to “do something” directly to each value in a collection. What if you wanted to do an operation that spanned multiple values, or all the values. The simplest of these is probably summing the numbers in a collection. In C# you could implement it this way.

int[] numbers = new [] { 43, 75, 6, 2, 91, -5 };

int sum = 0;
foreach (int num in numbers) {
    sum += num;
}

or using the magical new extension method

int[] numbers = new [] { 43, 75, 6, 2, 91, -5 };

int sum = 0;
numbers.ForEach(num => sum += num);

Now wait just a second here. The lambda that we’re declaring takes only a single parameter, num, but we’re adding it to sum, how the heck does the lambda have sum in scope? Even though we are declaring a new function, because it is a lambda the scope of that function includes all the variables in scope at its declaration. So since sum is visible where the lambda is declared, sum can be used inside the function, even if our lambda isn’t run until after the function that declared sum terminates. This is known as a closure and it’s quite important for some of the more fancy things that are done with functional programs (such as the aforementioned higher order functions). For our purposes it means we have to pass one less variable into our Action.

This works, it’s not overly complex, and it gets the job done with only 1 extra line that can be located right next to where the iteration occurs. So far so good, let’s try something slightly more complex. What about finding the largest number in the list?

int[] numbers = new [] { 43, 75, 6, 2, 91, -5 };

int largest = numbers[0];
numbers.ForEach(num =>
    if (num > largest) {
        largest = num;
    }
});

This looks pretty similar to what we did before, it’s just our action that has gotten more complex. If we wanted to find the smallest element we would write the same Action but just flip the > to a <. It seems like those are really the same logic with a teeny tiny difference, the kind we could parameter’ize away. What we want is some iteration type construct that at each element asks us, should this new element become the one we’re going to return at the end?

static T Pick<T>(this IEnumerable<T> values, Func<T, T, bool> selector) {
    // do error handling here, if values is null, if selector is null, etc.
    IEnumerator<T> iter = values.GetEnumerator();
    iter.MoveNext();
    T current = iter.Current;
    while (iter.MoveNext()) {
        // if the lambda returns true, the user wants to use the
        // iterator's current value so save it
        if (selector(iter.Current, current)) {
            current = iter.Current;
        }
    }
    return current;
}

int[] numbers = new [] { 43, 75, 6, 2, 91, -5 };
// returns true when the current value is smaller than the current smallest
int smallest = numbers.Pick(num, currentSmallest => num < currentSmallest);
// returns true when the current value is larger than the current largest
int largest = numbers.Pick(num, currentLargest => num > currentLargest);

As you have surely encountered in your own work, once you generalize a solution to being able to implement 2 solutions, it’s generally pretty easy to make n different solutions as long as your constraints don’t change very much. In this case the constraint is that the function will accept a lambda that it invokes at the right time during the iteration process and then either pays attention to the result of or not depending on what kind of operation it’s trying to do. Basically everything except a ForEach will care about the result and do something with it.

For an example of some of the awesome ways this manifests, you should become acquainted with Linq. Working with collections will never be the same for you in C#. Here are some examples of things that are vastly simpler than their foreach implementation would be.

Find all words that are less than 6 letters long and sort them in descending order.

string[] words = new []	{
	"Lorem",
	"ipsum",
	"dolor",
	"sit",
	"amet",
	"consectetur",
	"adipisicing",
	"elit",
	"sed",
	"do",
	"eiusmod",
	"tempor",
	"incididunt",
	"ut",
	"labore",
	"et",
	"dolore",
	"magna",
	"aliqua"
};

IEnumerable<string> descendingOrderedWords = words.Where(word => word.Length < 6).OrderByDescending(w => w);

Convert a list of strings that represent ints into a list of ints

string[] numbers = new [] {
	"1",
	"7",
	"45",
	"9",
	"81",
	"37"
};
IEnumerable<int> parsedNumbers = numbers.Select(numberString => int.Parse(numberString));

Count the number of occurrences of an element in a list and then order the elements of the list in order of number of occurrences descending.

int[] numbers = new [] { 9, 27, 8, 3, 12, 27, 6, 8, 9, 8, 27, 1 };
IEnumerable<int> numbersOrderedByFrequency =
    numbers.Aggregate(new Dictionary<int, int>(), (count, number) => {
        if (!count.ContainsKey(number)) {
	    count[number] = 0;
        }
        ++count[number];
        return count;
    }).OrderBy(keyValuePair => -keyValuePair.Value)
      .Select(keyValuePair => keyValuePair.Key);

As you can see, these sorts of things can get fairly complex, and certainly functions like Aggregate deserve a nice sit-down with the documentation page to really grok what’s going on, but hopefully you’re getting a feel for the shape of these things. We’ve got functions that take other functions as inputs and produce new IEnumerables as a result that we can then chain together to get a whole lot of functionality for very little noise. Whether these functions use recursion or an iterator internally doesn’t particularly matter as that detail is totally hidden away from us and unknowable. Since Linq functions don’t mutate any state themselves (although the lambda you pass in certainly could), they are quite functional and thus composable. Composability is a big buzzword you’ll be hearing about more, it’s one of the reasons for pursuing a functional approach (apart from all this cool Linq goodness).

Advertisements
Tagged , , , , ,

One thought on “Same old tools, new uses

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: