# Introduction

A while back, I blogged about (Mis)using C# 4.0 Dynamic – Type-Free Lambda Calculus, Church Numerals, and more which was a fun post and got some good feedback and solid reading numbers. So, let’s continue our journey of brain-exploding theoretical foundational posts with “The Revenge of the Type-Free Lambda Calculus, Now Compatible with C# 3.0”.

Last time around, we truly misused C# 4.0 dynamic to uni-type (not a spelling mistake) functions and values, and went bananas turning basic computation with numerals and Booleans into functions thanks to Church encodings. All in all a great unit test for the Dynamic Language Runtime that sweats under the pressure put on it by our crazy function calls:

```//
// Church numerals.
// Basic idea: natural number n gets represented as \fx.f^n(x)
//
var N0 = new Func<dynamic, Func<dynamic, dynamic>>(f => x => x);
var succ = new Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>(n => f => x => f(n(f)(x)));

Func<dynamic, uint> toInt = x => x(new Func<dynamic, dynamic>(i => i + 1))(0u);
Func<uint, dynamic> fromInt = null;
fromInt = n =>
{
return (n == 0 ? N0 : succ(fromInt(n - 1)));
};```
`var plus = new Func<dynamic, Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>>(m => n => f => x => m(f)(n(f)(x)));`

The above allows us to write a simple iterative (ah, I’m cheating) Fibonacci function as follows:

```var i = N0;
var prev = N0;
var curr = succ(N0);

while (true /* could have used a Church Boolean as well ;-) */)
{
Console.WriteLine(toInt(curr));

var t = curr;
curr = plus(curr)(prev);
prev = t;
}```

If you don’t believe the above works, or you don’t get it how it does its magic, revisit my previous post on the topic, and check back after you’ve plowed through it.

# Exceptions rule the world…

Today, we’ll take a step back and have a look at how we can do all of the above in the pre-.NET 4.0 era, where we don’t have the “convenient” dynamic type to escape from the typed reality. So, how can we make this work? A possible answer can be found in the use of unchecked exceptions, as we have them at our service in the .NET Framework. The underlying idea is to “smuggle out” data out of a function without having to declare it anywhere in the function’s signature (use of checked exceptions prevents this, as the “throws clause” would have to reveal what’s being thrown in a … statically typed manner, aye!).

This is one of the findings described in Mark Lillibridge’s paper entitled “Unchecked Exceptions can be Strictly More Powerful than Call/CC” on page 301, though much more mind-blowing stuff can be found in there (as the title reveals). Once I get into blogging about continuations, most likely in the context of Reactive LINQ (otherwise known as Reactive Framework or Reactive Fx), I may refer to this paper again to assist in blowing out people’s brains (just kidding). Talking about Reactive LINQ, I got the initial reference to this paper by Erik Meijer.

So, what’s the big idea? Let me try to summarize it. You’ll remember we need to define two operations to make the essentials of lambda calculus work: abstraction and application. We also know that both functions and values need to be represented in an equal manner, as functions are first-class: a function can take another function as its input or produce one as its output. So, it’s still clear we need to uni-type things just like we did before. So let’s spoil the beans now without much further ado: that single type we’ll be using today is nothing less than System.Action, as delegate defined as follows:

`public delegate void Action();`

In fundamentalist functional programming the above is the most useless type for a function: it takes nothing and produces nothing, hence all those guys can be compiled away, right? But as they’ll be our bread and butter for today’s installment, you can guess we’ll joyfully dance the tango of side-effects. One such great side-effect is called an exception.

Let’s take a step back first and summarize the dense info above: values and functions will be represented as System.Action objects. Right, in our lambda calculus “emulator” System.Action is the new System.Object. The careful reader will notice that System.Action plays a dual role here: on the outside we use it as the single type we’re dealing with both for values and functions. But on the inside, we'll leverage its delegate (a “CLR function object” if you will) characteristic somehow. Let’s go there now…

How can we realize abstraction? As we know from the previous time around, all functions of multiple arguments can be represented as functions of one argument through currying. So, to construct this simplest form of a function, all we need is a way for the user to define a function between two of our “first-class” objects (of type System.Action), which is something we can do as a Func<Action, Action> in C#. This allows the users to use the lambda arrow to write such a function. However, since a defined function (through abstraction) needs to be capable of being passed elsewhere as an argument, a Func<Action, Action> is too verbose: we need to morph it into a System.Action again. This is what our lambda abstraction operator will have to do:

```static Action λ(Func<Action, Action> x)
{
// But how???}```

We’ll see in a moment how we can realize this function. But first, let’s have a look at the contract of a application operator we so badly need to have this come off the ground. Application takes two objects at a time: the first one will be a function, the second one a function or a value. It applies the function to the given argument. Objects are of type System.Action, and the result of the application call is an object too, hence the following signature:

`static Action ß(Action x, Action y){    // But how???}`

Okay, let’s fill in the voids. How can we smuggle out a function between two Action objects into a single action, to realize abstraction? Well, what about carrying the function on the back of an exception object. We’ll define a useful generic Exception type for this purpose:

```class Exception<T> : Exception
{
public Exception(T data)
{
Data = data;
}

public new T Data { get; private set; }
}```

This allows us to define the lambda function like this:

```static Action λ(Func<Action, Action> x)
{
return () => { throw new Exception<Func<Action, Action>>(x); };
}```

In Lillibridge’s paper this is referred to as the “roll” operation. Think of this smuggling operation as rolling the function in a carpet of type Action: nobody will notice from the outside that it contains some dirty secrets. (Tip: don’t try this technique at airport security gates, they won’t allow exceptions.) But what does this mean? Well, from the outside the function can simply construct a proper lambda calculus function, surfaced as an Action object (making it deserve the status of “first-class”). How do we, the “lambda runtime”, get it out of there again? Simply invoke the resulting Action when needed, and catch the generic exception type. That’s precisely what the applicaiton function will have to do:

```static Action ß(Action x, Action y)
{
Func<Action, Action> xe;

try
{
x();
xe = _ => _;
}
catch (Exception<Func<Action, Action>> ex)
{
xe = ex.Data;
}

return xe(y);
}```

This is referred to as the “unroll” operation. In order to apply the function “x” to argument “y”, we first need to evaluate x. We do so by calling the delegate, which (when properly constructed through the lambda operator) will throw an exception allowing us to unroll the underlying function type from our smuggling exception carpet. Then we can feed in the argument ”y” to the resulting Func<Action, Action> to get the result of the function call back. The remaining one-billion dollar question is how we’d ever end up executing the last statement in the try-block:

`xe = _ => _;`

This is simply an identity function (you could equally well have write z => z or so), which ultimately will cause the application to return its argument. We use this as a convenient mechanism to allow “regular” Action objects as participants in our application operation. At a later point, this will become clear.

# Lambda Language Runtime for CLR 2.0 – summary

The “LLR” I’m illustrating here is a way to layer untyped computation on the statically typed foundation of the “CLR”. Contrast to the DLR’s complex infrastructure, we need nothing more than two methods:

```static Action λ(Func<Action, Action> x)
{
return () => { throw new Exception<Func<Action, Action>>(x); };
}

static Action ß(Action x, Action y)
{
Func<Action, Action> xe;

try
{
x();
xe = _ => _;
}
catch (Exception<Func<Action, Action>> ex)
{
xe = ex.Data;
}

return xe(y);
}```

# Reality check: Church Booleans

In the previous post, I’ve explained what Church Booleans are, so I won’t do it again now. Instead, let’s try to redefine those in terms of our lambda and beta operations:

```//
// Church Booleans
//
var T = λ(a => λ(b => a)); // \ab.a
var F = λ(a => λ(b => b)); // \ab.b

var Not = λ(m => λ(a => λ(b => ß(ß(m, b), a)))); // \m.\ab.mba
var And = λ(m => λ(n => ß(ß(m, n), m))); // \mn.mnm
var Or = λ(m => λ(n => ß(ß(m, m), n))); // \mn.mmn
var Xor = λ(m => λ(n => λ(a => λ(b => ß(ß(m, ß(ß(n, b), a)), ß(ß(n, a), b)))))); // \mn.\ab.m(nba)(nab)```

This should do the trick, shouldn’t it? How to calculate, say, T && F? Here it is:

`ß(ß(And, T), F)`

Quite easy, but with a bit of beta function noise. To prove all of this works, we’ll have to provide pretty-printing functions. Similarly we may benefit from a function that promotes a CLR Boolean into the corresponding Church Boolean:

```Func<Action, bool> toBool = b =>
{
bool res = false;
ß(ß(b, () => { res = true; }), () => { res = false; })();
return res;
};
Func<bool, Action> fromBool = b => b ? T : F;```

Those are very comparable to the ones we used in the previous post, using C# 4.0 dynamic. Notice how an impure lambda expression is used as an argument to the beta-application in order to realize toBool.

Based on all of this we’d like to be able to write a testing function that can print results of applying our Boolean operators to various arguments. Again we base this off the functions used in the previous post:

```static void Unary<T, PArg, PRes>(string name, Func<Action, T> f, Func<Action, PArg> printArg, Func<T, PRes> printRes, params Action[] values)
{
foreach (var value in values)
Console.WriteLine("{0}({1}) = {2}", name, printArg(value), printRes(f(value)));
}

static void Binary<T, PArg, PRes>(string name, Func<Action, Action, T> f, Func<Action, PArg> printArg, Func<T, PRes> printRes, params Action[] values)
{
foreach (var valueL in values)
foreach (var valueR in values)
Console.WriteLine("{0}({1}, {2}) = {3}", name, printArg(valueL), printArg(valueR), printRes(f(valueL, valueR)));
}```

For example, Unary could be used with the “Not” operator, using toBool for print-functions which turn the Church Boolean into the corresponding System.Boolean that Console.WriteLine can deal with. Notice one thing though: the above is not pure either, in a sense we’re not passing in plain Action objects for the function “f”, instead we’re using Func<Action, …> types. None of our lambda functions like “Not” and “And” are compatible with this form. How do we do this (purely a cosmetic thing, btw)?

```//
// Conversion operations
//
Func<Action, Func<Action, Action>> toFunc1 = f => x => ß(f, x);
Func<Action, Func<Action, Action, Action>> toFunc2 = f => (x, y) => ß(ß(f, x), y);
Func<Action, Func<Action, Action, Action, Action>> toFunc3 = f => (x, y, z) => ß(ß(ß(f, x), y), z);
Func<Func<Action, Action>, Action> fromFunc1 = f => λ(x => f(x));
Func<Func<Action, Action, Action>, Action> fromFunc2 = f => λ(x => λ(y => f(x, y)));
Func<Func<Action, Action, Action, Action>, Action> fromFunc3 = f => λ(x => λ(y => λ(z => f(x, y, z))));```

The conversion operators above provide the answer: they allow us to make plain lambda functions callable as if they were CLR functions (Func<…>) and also allow for abstraction based on a CLR functions. In fact, the family of toFunc functions could be called “decurry” (e.g. toFunc3 produces a function of three arguments) while the fromFunc functions are “curry” operators (e.g. fromFunc3 takes a function of three arguments). Based on this we can write “C#-user friendly” forms of our Church Boolean operators:

```var not = toFunc1(Not);
var and = toFunc2(And);
var or = toFunc2(Or);
var xor = toFunc2(Xor);```

Notice the different casing used. Exercise for the reader: infer the types of the eight objects that appear in the fragment above (i.e. excluding the to* functions).

Now we can write a test procedure as follows:

```Console.WriteLine("Church Booleans");
Unary("toBool", toBool, toBool, x => x, F, T);
Unary("not", not, toBool, toBool, F, T);
Binary("and", and, toBool, toBool, F, T);
Binary("or", or, toBool, toBool, F, T);
Binary("xor", xor, toBool, toBool, F, T);
Console.WriteLine();```

Here’s the result:

It worked, didn’t it?

# Lambda Language Runtime for CLR 2.0 – noise reducing intermezzo

In the sample above, we’ve encountered utterly complicated application calls due to the fact the beta function only takes two arguments at a time (just like classical lambda calculus prescribes):

`var Xor = λ(m => λ(n => λ(a => λ(b => ß(ß(m, ß(ß(n, b), a)), ß(ß(n, a), b)))))); // \mn.\ab.m(nba)(nab)`

It’s a convention that applications are left-associative, hence we’d like to be able to write the following instead:

`var Xor = λ(m => λ(n => λ(a => λ(b => ß(m, ß(n, b, a), ß(n, a, b)))))); // \mn.\ab.m(nba)(nab)`

Obviously this is easy to realize as follows, using LINQ’s Aggregate (“left fold”) operator:

```static Action ß(Action x, Action y, params Action[] zs)
{
return zs.Aggregate(ß(x, y), (l, z) => ß(l, z));
}```

Can we make the excessive lambda arrows are bit less invasive too? Sure, inspired by our fromFunc objects:

```static Action λ(Func<Action, Action, Action> f)
{
return λ(x => λ(y => f(x, y)));
}

static Action λ(Func<Action, Action, Action, Action> f)
{
return λ(x => λ(y => λ(z => f(x, y, z))));
}
```

That puts us in a position to write the Xor operator as:

`var Xor = λ((m, n) => λ((a, b) => ß(m, ß(n, b, a), ß(n, a, b)))); // \mn.\ab.m(nba)(nab)`

Much cleaner, isn’t it?

# Church numerals

If Booleans work, we should get numerals working too. Obviously this is the case:

```//
// Church numerals
//
var N0 = λ(f => λ(x => x)); // \fx.x = F
var N1 = λ(f => λ(x => ß(f, x))); // \fx.fx
var N2 = λ(f => λ(x => ß(f, ß(f, x)))); // \fx.f(fx)

var Zero = λ(n => ß(ß(n, λ(x => F)), T)); // \n.n(\x.F)T
var Succ = λ(n => λ(f => λ(x => ß(f, ß(ß(n, f), x))))); // \n.\fx.f(nfx)
var Pred = λ(n => λ(f => λ(x => ß(ß(ß(n, λ(g => λ(h => ß(h, ß(g, f))))), λ(u => x)), λ(u => u)))));
var Plus = λ(m => λ(n => λ(f => λ(x => ß(ß(m, f), ß(ß(n, f), x)))))); // \mn.\fx.mf(nfx)
var Sub = λ(m => λ(n => ß(ß(n, Pred), m)));
var Mul = λ(m => λ(n => λ(f => ß(n, ß(m, f))))); // \mn.\f.n(mf)
var Exp = λ(m => λ(n => ß(n, m))); // \mn.nm

var zero = toFunc1(Zero);
var succ = toFunc1(Succ);
var pred = toFunc1(Pred);
var plus = toFunc2(Plus);
var sub = toFunc2(Sub);
var mul = toFunc2(Mul);
var exp = toFunc2(Exp);

Func<Action, uint> toInt = x =>
{
uint n = 0;
ß(ß(x, () => { n++; }), nop)();
return n;
};
Func<uint, Action> fromInt = null;
fromInt = n =>
{
return (n == 0 ? N0 : succ(fromInt(n - 1)));
};```

The nop helper function used in the toInt function is simply the no-op Action delegate () => {}. To get to know how toInt works (as well as all the other functions, by means of formal proofs), have a look at my previous post again. Either way, a simple test shows all of the above works as well:

```Console.WriteLine("Church numerals");
Unary("toInt", toInt, toInt, x => x, Enumerable.Range(0, 3).Select(i => fromInt((uint)i)).ToArray());
Unary("succ", succ, toInt, toInt, Enumerable.Range(0, 3).Select(i => fromInt((uint)i)).ToArray());
Unary("pred", pred, toInt, toInt, Enumerable.Range(1, 3).Select(i => fromInt((uint)i)).ToArray());
Unary("zero", zero, toInt, toBool, Enumerable.Range(0, 2).Select(i => fromInt((uint)i)).ToArray());
Binary("plus", plus, toInt, toInt, Enumerable.Range(0, 3).Select(i => fromInt((uint)i)).ToArray());
Binary("mul", mul, toInt, toInt, Enumerable.Range(0, 3).Select(i => fromInt((uint)i)).ToArray());
Binary("exp", exp, toInt, toInt, Enumerable.Range(1, 3).Select(i => fromInt((uint)i)).ToArray());
Console.WriteLine();```

Resulting in:

# Church pairs and lists

As an outtake for today’s installment, let’s show how to realize pairs and lists using functions:

```//
// Pairs
//
var Pair = λ(x => λ(y => λ(z => ß(ß(z, x), y)))); // \xy.\z.zxy
var Fst = λ(p => ß(p, T)); // \p.pT
var Snd = λ(p => ß(p, F)); // \p.pF

var pair = toFunc2(Pair);
var fst = toFunc1(Fst);
var snd = toFunc1(Snd);

Func<Action, Tuple<object, object>> toPair = a => new Tuple<object, object>(ext(fst(a)), ext(snd(a)));
Func<Tuple<object, object>, Action> fromPair = p => pair(inj(p.Item1), inj(p.Item2));

Assert(toInt(fst(pair(N1, N2))) == 1, "fst(pair(N1, N2))");
Assert(toInt(snd(pair(N1, N2))) == 2, "snd(pair(N1, N2))");

var me = new Tuple<object, object>("Bart", 26);
Assert(toPair(fromPair(me)).Equals(me), "Roundtrip object pair");

var nums = new Tuple<object, object>(N1, N2);
Assert(toPair(fromPair(nums)).Equals(nums), "Roundtrip nums pair");```

A pair is constructed by giving it two arguments, resulting in a function that will return one of them by giving it a Boolean (“z” in the Pair function above). The first and second extraction functions are then defined in terms of applying the “pair as a function” to either T or F, which will select the right value out of the pair.

The toPair and fromPair functions are defined to go back and forth between a Church pair and a Tuple<object, object> in CLR-land. This is based on two functions I’m not showing here (left as an exercise for the reader): ext and inj, for extract and inject respectively. The signature of those functions is as follows:

```//
// Values
//
Func<object, Action> inj = o => /* left as an exercise */
Func<Action, object> ext = a => /* left as an exercise */

Assert((string)ext(inj("Bart")) == "Bart", "ext(inj(\"Bart\"))");
Assert((int)ext(inj(42)) == 42, "ext(inj(42))");
DateTime now = DateTime.Now;
Assert((DateTime)ext(inj(now)) == now, "ext(inj(now))");```

Warning: This is not a trivial exercise, as arbitrary .NET objects need to be able to be “lifted into” lambda land (e.g. the sample above allows to roundtrip a string and a DateTime). Have fun <g>!

Church lists can be defined in terms of Nil and Cons operations, together with extraction functions like Head and Tail, and a test function IsNil:

```//
// Lists
//
var Nil = ß(ß(Pair, T), T);
var IsNil = Fst;
var Cons = λ(h => λ(t => ß(ß(Pair, F), ß(ß(Pair, h), t))));
var Head = λ(z => ß(Fst, ß(Snd, z)));
var Tail = λ(z => ß(Snd, ß(Snd, z)));

var nil = Nil;
var isNil = fst;
var cons = toFunc2(Cons);
var tail = toFunc1(Tail);

Func<Action, List<object>> toList = a =>
{
List<object> res = new List<object>();

while (!toBool(isNil(a)))
{
a = tail(a);
}

return res;
};
Func<List<object>, Action> fromList = l => Enumerable.Reverse(l).Aggregate(nil, (t, o) => cons(inj(o), t));

Assert(toBool(isNil(nil)) == true, "isNil(nil)");
Assert(toBool(isNil(cons(N1, nil))) == false, "isNil(cons(N1, nil))");
Assert(toInt(tail(cons(N1, N2))) == 2, "tail(cons(N1, N2))");

var lst = new List<object> { 1, N2, 3 };
Assert(toList(fromList(lst)).SequenceEqual(lst), "Roundtrip list");```

Again we provide ways to turn a Church list in a List<object> (again based on ext), and the other way around (using the magical inj). Yes, the above works just fine… As an additional exercise, write a ForEach extension method that can take a Church list (of type … Action) and an Action<Action> delegate as the body of the loop, so we can use it as follows (tip: mirror the toList implementation):

```cons(N1, N2).ForEach(x =>
{
Console.WriteLine(toInt(x));
});```
` `

# Conclusion

What can I say? This was fun for sure, if nothing more. I hope I’ve convinced the reader that functions are immensely powerful animals capable of representing data. In addition, we’ve shown how powerful exceptions can be (something the “Unchecked Exceptions can be Strictly More Powerful than Call/CC” paper goes much much deeper into). Yeah, performance guys will have the hairs on their back standing upright, but hey: it’s Crazy Sundays after all. And, as an additional excuse I now can claim to be referred to as a “stunt coder” on Twitter (thanks James, also for subtly using the word “fold”):

Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

I promise, it will be a (relatively) short post this time. You all know the foreach statement in C#, don’t you? Think twice before you answer and tell me exactly how the following works:

foreach (int x in src)
{
// Do something with x.
}

Got an answer? Let me disappoint you: if you have the answer, you’re wrong. There’s no single answer to the question above as you need to know more about the type of src to make a final decision on how the above works…

You may say that clearly that object needs to implement IEnumerable or IEnumerable<T>, and maybe you’ll even mention that in the former case the compiler inserts a cast for you when it gets “x” back from the call to the IEnumerator’s Current property getter. In other words, the code gets translated like this:

var e = src.GetEnumerator();
while (e.MoveNext())
{
var x = (int)e.Current; // without the cast if src was an IEnumerable<T>
// Do something with x.
}

A worthy attempt at the translation but not quite right. First of all, the variable x is declared in an outer scope (causing some grief when talking about closures, but that’s a whole different topic…). Secondly, the enumerator may implement IDisposable, in which case the foreach-statement will ensure proper disposal a la “using”:

{
int x;

using (var e = src.GetEnumerator())
{
while (e.MoveNext())
{
x = (int)e.Current; // without the cast if src was an IEnumerable<T>
// Do something with x.
}
}
}

That’s a bit more sane, but we’re missing out on another kind of source foreach can work with: any object, as long as it exposes the enumeration pattern of GetEnumerator in tandem with MoveNext and Current. Here’s a sample object that just works fine with the foreach-statement:

```class Source
{
public SourceEnumerator GetEnumerator()
{
return new SourceEnumerator();
}
}

class SourceEnumerator
{
private Random rand = new Random();

public bool MoveNext()
{
return rand.Next(100) != 0;
}

public int Current
{
get
{
return rand.Next(100);
}
}
}```

With its usage shown below:

```foreach (int x in new Source())
Console.WriteLine(x);```

Okay, that’s flexible, isn’t it? In fact, the foreach-statement can be said to be duck typed: it’s not the nominal type that matters (i.e. Source is explicitly declared to be an IEnumerable, and SourceEnumerator an IEnumerator) but just the structure of the object that determines “compatibility” with the foreach-statement.

But who says foreach over a collection immediately starts thinking about LINQ, no? Say the consumer of Source looked like this:

```List<int> res = new List<int>();
foreach (int x in new Source())
if (x % 2 == 0)

A great candidate for LINQ it seems, especially as we start adding more and more logic to the “query”. Nothing surprising about this conclusion, but trying to realize it fails miserably:

Why? Because LINQ is statically typed (update: to be taken with a grain of salt, see comments below this post; agreed, it'd be more precise to write LINQ to Objects as the subject of this sentence), so it expects what I’ve referred to as a nominal enumerator implementation: something that has explicitly stated to be an IEnumerable and not something that “accidentally” happens to look like that. Question of the day: how to morph an existing structural enumerator onto a nominal one so it can be used with LINQ? Sure, we could write specialized code for the Source object above that essentially creates an iterator on top of Source:

```static void Main()
{
var res = from x in IterateOver(new Source())
where x % 2 == 0
select x;

foreach (var x in res)
Console.WriteLine(x);
}

static IEnumerable<int> IterateOver(Source s)
{
foreach (int i in s)
yield return i;
}```

But maybe you’re in a scenario with plenty of those structural enumerator constructs around (e.g. some Office automation libraries expose GetEnumerator on types like Range, while the Range object itself doesn’t implement IEnumerable hence it’s not usable with LINQ), so you want to generalize the above. Essentially, given any object you’d like to provide a duck-typed iterator over it, a suitable task for another extension method and C# 4.0 dynamic:

```static class DuckEnumerable
{
public static IEnumerable<T> AsDuckEnumerable<T>(this object source)
{
dynamic src = source;

var e = src.GetEnumerator();
try
{
while (e.MoveNext())
yield return e.Current;
}
finally
{
var d = e as IDisposable;
if (d != null)
{
d.Dispose();
}
}
}
}```

Question to the reader: why can’t we simply write a foreach-loop over the “source casted as dynamic” object? Tip: how would you implement the translation of foreach when encountering a dynamic object as its source?

Yes, you’re cluttering the apparent member list on System.Object, so use with caution or just use plain old method calls to do the “translation”. What matters more is the inside of the operator, using the dynamic type quite a bit to realize the enumeration pattern. Notice how easy on the eye dynamically typed code looks in C# 4.0. With much more casts, it’d look like this:

```static class DuckEnumerable
{
public static IEnumerable<T> AsDuckEnumerable<T>(this object source)
{
dynamic src = (dynamic)source;

dynamic e = src.GetEnumerator();
try
{
while ((bool)e.MoveNext())
yield return (T)e.Current;
}
finally
{
var d = e as IDisposable;
if (d != null)
{
d.Dispose();
}
}
}
}```

And now we can write:

```var res = from x in new Source().AsDuckEnumerable<int>()
where x % 2 == 0
select x;

foreach (var x in res)
Console.WriteLine(x);```

Dynamic glue – why not? In fact, even objects from other languages (like Ruby or Python) that follow the pattern will now work with LINQ, and for existing compatible objects the operator call is harmless (but wasteful). Oh, and notice you can also have an IEnumerable of “dynamic” objects if you’re dealing with objects originating from dynamic languages...

Can you implement the AsDuckEnumerable operator in C# 3.0? Absolutely, if you limit yourself to reflection-based discovery methods (left as an exercise for the reader).

Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

# Introduction

Sunday morning, time for another episode of the Crazy Sundays series. Again one in the category with risk for exploding brains, but that’s what we like, don’t we? This time around, we’re going to have a look at the type free lambda calculus in C#. But wait a minute, isn’t C# a typed language? True. Does that mean everything you do in it should be statically typed? Not necessarily: typing is there as a tool you can leave or take. In this post we’ll take a look at the new C# 4.0 dynamic feature (which provides a static type to do dynamic dispatch) from a somewhat bizarre angle…

Sometimes types do get in the way: maybe something is intrinsically untyped (like XML documents without a schema, web services without a WSDL contract, etc) or meant to be dynamically typed (like objects coming from a dynamic language like Ruby or Python). And then there are notorious APIs that “peter out” in their typing: one moment you’re statically typed, but all of a sudden you end up with System.Object everywhere (like with COM interop to the Office libraries). All such scenarios are what C# 4.0 dynamic is meant for. The classical sample to illustrate the feature is to talk to an IronPython object from C#. First the Python definition:

class Calc:
def Add(self, a, b):
return a + b
def GetCalc():
return Calc()

The nice thing about this Python-based calculator is that it will work with anything that supports an addition operator. In other words, we could feed it two int objects, two string objects, or even a DateTime and a TimeSpan. To make such calls, we use the dynamic keyword in C#:

```var py = Python.CreateEngine();
dynamic script = py.ImportModule("Calc");

dynamic calc = script.GetCalc();
int three = calc.Add(1, 2);```

Ignore the few lines of plumbing to load the Python file; what matters here is the fact the calc variable is typed to be “dynamic”, meaning all operations invoked on it will be resolved at runtime:

I won’t go into details on how all of this works, maybe another time (outside the scope of Crazy Sundays posts), but that’s the essence of the feature. Instead, we’re going to push this feature to the limits in a “don’t try this at homework” style: enter type free lambda calculus.

# Type free lambda calculus in a nutshell

The type free lambda calculus, due to Alonzo Church around 1930, is a theory about the computational aspects of functions. It views functions as rules and defines two complimentary operations to work with those: abstraction and application. In essence, that’s all there is but nevertheless the theory gives rise to a whole research area of its own. To convince you about that statement, check the page count for the following book on your favorite online bookshop: So, allow me to summarize the essentials here, citing/paraphrasing the book above (due to the non-trivial nature of mathematical notation, pasted as images from Word equations). First, we need to define the concept of a lambda term:

As a typical exercise, expand the fourth lambda term in the examples above to its fully parenthesized and bloated form.

All there is to lambda terms is they can denote functions, just like lambda expressions in C# do. For example, the second sample above is a function with argument “x”, returning “x”. In other words, it’s the identity function. However, it has no type: it can operate on anything (in particular, any other lambda term). In C# we could write this as follows:

Func<T, T> I = x => x;

where T stands for a generic parameter. It’s clear this function can operate on values but equally well it can operate on functions: given a function, it will return exactly that function:

I(5) // produces 5
I(I) // produces I

In fact, the middle three samples above are closed terms, meaning all symbols used in their “function body” (the part after the dot) are in “scope” by means of the abstraction(s) over it. We call those combinators:

The last sample term in the previous sample is not closed though: it returns “x” which is not being abstracted over. This reflects the concept of a closure, just as we have in C#:

R x = …;
Func<T, R> f = z => x;

Let’s not go there for now, but suffice to say that fancy words like “closure” are deeply rooted in theoretical foundations. Praise yourself lucky to work with a language layered on top of solid theoretical foundations :-).

Next, we need the concept of free variables. In short, this allows us to identify those variables not introduced by an abstraction in a given term. The definition is fairly straightforward:

For combinators, the free variables set will be empty. In contrast, for our last sample (the one resulting in a closure) the set would be a singleton containing “x”. Nothing too fancy, right?

Finally, we can define how application is carried out, based on the concept of substitution:

To avoid name clashes (something that can be formally avoided by using a variable convention, using the definition of FV above), we notice careful renames are possible. We all know this from C#, and this is merely a theoretical foundation for scoping:

Func<T, T> I1 = x => x;

and

Func<T, T> I2 = y => y;

are in fact the “same”. Using such a big word in the context of a whole runtime (CLR) and language (C#) is a dangerous business. I’m not saying that I1 and I2 will refer to the same delegate: there’s no identification between delegates that says “x => x” and “y => y” are the same. Rather, what I’m pointing out here is that the behavior of I1 and I2 will be the same when applied to the same object. In the lambda calculus this is referred to as alpha-conversion.

Ignoring the important concern of avoiding name clashes for the time being, have a closer look at the application “beta-conversion” rule above. The idea is simple: “application of an abstraction with another term results in substitution”. This is pretty much like a delegate call in C#, but yet different enough: in the world of the lambda calculus, substitutions are nothing but mechanical rewrites on terms. In languages like C#, code is compiled as-is and delegate calls don’t magically rewrite the delegate’s body on the fly. But more importantly, in C# we get also immediately concerned with call semantics like call-by-value: before making a call, its arguments need to be reduced to a “value” that can be passed to the receiving end of the call (through the delegate invocation mechanism).

# Untyped or uni-typed?

Now the question of the day: can we type (in the CLR/C# type system sense) all lambda terms? It seems we were successful doing so already with the I combinator, concluding it’s a Func<T, T> with the use of generics: passing a value of a certain type T, the result it the same value and hence of the same type. In fact, what about inferring such signatures? C# doesn’t do so, but F# can (through Hindley-Milner type inference as is done typically in ML-inspired languages):

Okay, “I” is inferred to be a type that goes from ‘a to ‘a, where ‘a is the notation for a generic parameter. What about K, the combinator that given two arguments always returns that first one (“constant” value producer):

We’re still good to go it seems. F# has inferred the type ought to be “‘a to ‘b to ‘a”. Wow, what’s going on here? In C#, this would look like Func<T1, Func<T2, T1>>: a function that given an argument of type T1 returns a function that given an argument of type T2 returns a value of type T1. Get it? What’s happening here is “currying”, where a function of n arguments is turned into “nested” functions that consume one argument at a time. This means we can write, say, “K 5” to create a new function that will eat the remaining “y” argument (of any type) and will always return 5.

Note: There’s a little complication here in the context of F#, called the “value restriction”. I won’t go there as this is irrelevant for the discussion here. What we’re focusing on is solely whether or not we can type lambda expressions.

Even for the (relatively) complicated beast S, a type can be inferred:

Impressive? Not so hard in fact. Follow me as we infer the type ourselves by hand, from the function’s body. First of all, we see x followed by two terms: “z” and “(y z)”. That means “x” is a function with two arguments, and we assume it returns something. Assign type names for those things: the first argument “z” gets type ‘a and the result of “(y z)” will get type ‘b. For the result we write ‘c. In other words, the type of “x” is already inferred to be ’a –> ‘b –> ‘c. Next, we need to infer the type for “y” based on our prior identification of the type for “(y z)” as ’b. We see that “y” is a function with one argument, “z”. We already typed “z” to be ‘a, so the type of “y” becomes ‘a –> ‘b. And finally, our “S” function also takes in “z” as a third argument, which is typed to be ‘a. All of this brought together with the return type of the call to “x” leads to the signature shown above.

So, it looks like we can type all lambda terms, right? Unfortunately, no. Here’s the proof:

Here I’ve written a function W that given an argument x applies that to itself. Such a simple lambda term, and yet one can’t find a type for it. Follow me on this brain exercise: W takes one argument x, say of type ‘a. Now, x is applied to x. From this it looks like x is a function type with one argument. The argument is x, which we’ve already typed to be ‘a. What about the return type? Let’s say it’s ‘b, so now we have that x needs to be of type ‘a –> ‘b, which isn’t the same as ‘a we had before: unification fails.

That’s where differences between the type free lambda calculus and typed variants (like the simply typed lambda calculus and for generics and such something called “System F”) crop up. C# being a typed language doesn’t allow us to get rid of types altogether, so there’s no way we can get “untyped”. But what about “uni-typed” (due to Robert Harper): replace all types with one single type. Can we get there? The answer is, with “dynamic” we can!

`dynamic W = new Func<dynamic, dynamic>(x => x(x));`

The fact we need some ugly delegate constructor call is unfortunate, but notice how we’re assigning the “dynamic –> dynamic” type to a “dynamic” on the left. In other words, we’re treating everything (non-functional values and function “objects” themselves) as dynamic. The code above compiles just fine, but how does the x(x) in the lambda body work? Well, at runtime the system will figure out what the type of x is and ensure it can be used to be called as a unary function. For example:

```dynamic W = new Func<dynamic, dynamic>(x => x(x));
W(1);```

This will clearly fail. It corresponds to “calling the function integer 1” with argument “integer 1”. Clearly, an integer value cannot be used as a function (a good thing!), so a runtime error results:

Unhandled Exception: Microsoft.CSharp.RuntimeBinder.RuntimeBinderException: Cannot invoke a non-delegate type
at CallSite.Target(Closure , CallSite , Object , Object )
at System.Dynamic.UpdateDelegates.UpdateAndExecute2[T0,T1,TRet](CallSite site, T0 arg0, T1 arg1)
at UntypedLambda.Program.<Main>b__46(Object x)
at CallSite.Target(Closure , CallSite , Object , Int32 )
at System.Dynamic.UpdateDelegates.UpdateAndExecuteVoid2[T0,T1](CallSite site, T0 arg0, T1 arg1)
at UntypedLambda.Program.Main()

How this works internally is beyond the scope of this post (I’m sure I’ll blog about the DLR machinery and the role of the C# and VB compilers in that mix some time in the relatively near future) but the DLR was right to conclude the call above is nonsense.

What about the following?

```var I = new Func<dynamic, dynamic>(x => x);
dynamic W = new Func<dynamic, dynamic>(x => x(x));
Console.WriteLine(W(I)(42));```

Now, that works. Passing I to W results in I(I), which reduces (using application, or beta-conversion) into I. That function is then used with argument 42 subsequently, returning 42.

You can guess it … we’re going to uni-type our whole programs using dynamic in this Crazy Sundays post. Did I ever mention Scheme? If not, I did now :-).

# SKI combinators

As we’ve already played with S, K and I in the samples above, let’s see how those look in uni-typed C#:

```//
// Where else to start than with ... SKI combinators.
//
var S = new Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>(x => y => z => x(z)(y(z)));
var K = new Func<dynamic, Func<dynamic, dynamic>>(x => y => x);
var I = new Func<dynamic, dynamic>(x => x);```

Notice we’re currying all functions so that we can partially apply functions. For example, we can do the following to create a “constant function” that always returns 42:

```var answer = K(42);

All of the calls above will print 42, regardless of what’s passed in to answer. Follow the reduction in your head: applying 42 to K returns a function from a parameter called “y” to 42. In other words, “y” is discarded altogether. We’re generating the constant 42, no matter what (both in terms of value and type) we throw to the function as an argument. Not just plain values: in the last line we throw S, another function, to the answer function and yet it persists telling us the answer is 42.

It can be shown, as an easy exercise, that SKK (and SKS) are the same as I, so the following is a complicated way of writing 5:

`int five = S(K)(K)(5);`

See how functions are first-class citizens: they can be passed or returned everywhere. Also notice how functions have to be called one argument at a time because of currying (due to Schonfinkel, not Curry).

While we’re at it, we’ll write a helper function to apply a function to different argument values and print the result to the screen. In subsequent paragraphs we’ll use this function (and a variant thereof) to aid us in printing test results:

```static void Unary<T, PArg, PRes>(string name, Func<dynamic, T> f, Func<dynamic, PArg> printArg, Func<dynamic, PRes> printRes, params dynamic[] values)
{
foreach (var value in values)
Console.WriteLine("{0}({1}) = {2}", name, printArg(value), printRes(f(value)));
}```

Quite a signature, but essentially we take a friendly function name, a function to be tested, some function to turn the “dynamic” argument into a print-friendly form (and a similar function for the result of the function call), and an array of values to be fed in. For example:

```Console.WriteLine("SKI");
Unary("SKK", S(K)(K), I, I, 5, DateTime.Now, "Bart");
Unary("I", I, I, I, 5, DateTime.Now, "Bart");
Console.WriteLine();```

This results in the following:

As an exercise, try to do something more meaningful with the S combinator.

# Church Booleans

Functions and data are very closely related. More closely than you may think. Most programmers think of functions as pieces of code that have a certain behavior and get applied to zero or more arguments, producing some (or none) value. That’s a very code-centric view of the world, while functions in the mathematical sense are obviously not related to “code” at all. Functions are often defined as graphs and are a special kind of relation between two sets. Sets are all about data, aren’t they? Based on such an observation one can establish functions in a computer programs as table lookup = data.

But there’s more, even values themselves (and not mappings between them) can be encoded using functions. Let’s start easy, with Booleans. Easy because there are only two values to distinguish. Here’s the proposed mapping:

```//
// Church Booleans.
// Basic idea: true and false are dyadic functions returning respectively
//             the first or second argument, acting as a conditional (?:)
//
var F = new Func<dynamic, Func<dynamic, dynamic>>(a => b => b);
var T = new Func<dynamic, Func<dynamic, dynamic>>(a => b => a);```

Okay, the comment reveals it already: the idea is that true and false are encoded as functions with two arguments (again curried) of which one is returned: false returns the second one, true the first one. Sounds familiar, doesn’t it? Right, the conditional operator (sometimes awkwardly referred to as the ternary operator) in C# does exactly that. Notice that using alpha-conversion, T is exactly the same as K.

Notice it’s easy to convert between the untyped world and the typed world using two back-and-forth conversion functions:

```Func<dynamic, bool> toBool = x => x(true)(false);
Func<bool, dynamic> fromBool = b => b ? T : F;```

We wouldn’t need toBool if we weren’t to print the results to the screen somehow :-). For completeness, I’ve added fromBool to the equation. Both are easy to understand, but let’s start with fromBool. Plain easy: give it true, and it returns T; give it false, and you get F back. The inverse function, toBool, is simple function application (beta conversion) of the defined functions to C# Booleans: if the function bound to x is T, the first argument will be returned (true). If it’s F, the second one (false) will. Clear as crystal. Notice toBool can be applied with any object as its argument: going from untyped to typed will blame the argument if something goes wrong (due to Philip Wadler).

Not very useful yet, if we don’t have a way to define operations between such Booleans. Again functions come to the rescue (we don’t have anything else after all), so let’s have a look at how we define a simple operator: not.

`var not = new Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>(m => a => b => m(b)(a));`

Notice all operators on Church Booleans will be higher-order functions in their very nature, since their arguments are functions already. That’s the nature of the beast we’re dealing with. So, how can we turn a Church Boolean “m” in its opposite? We already know that Church Booleans are dyadic functions, so the result of calling not with a Church Boolean should be a function with two arguments: that’s what “a” and “b” are for. The body of the function may look a bit weird: we’re calling “m” (a Church Boolean, but remember everything is a function hence executable) with arguments b and a. That has a flipping effect as proven below:

Impressive, isn’t it? A word on notation: before doing a beta-reduction we indicate the abstractions we’re going to apply substitutions for by putting a bar on top of them (e.g. in the second step m, in the fourth step x and y). When doing multiple reductions we write an arrow with a double arrowhead. When substituting terms for their definition (like T and F in the proof above), we carry out alpha-conversion to avoid the risk of name clashes (though strictly speaking for closed terms we could play a more risky game).

How can we do binary operators, like and, or and xor? Turns out those are fairly simple to do as well:

```var and = new Func<dynamic, Func<dynamic, dynamic>>(m => n => m(n)(m));
var or = new Func<dynamic, Func<dynamic, dynamic>>(m => n => m(m)(n));
var xor = new Func<dynamic, Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>>(m => n => a => b => m(n(b)(a))(n(a)(b)));```

Agreed, one needs to see a proof of correctness before being convinced those functions have the desired behavior. By the way, outside C#, we could save us some parentheses which mainly come from (dynamic) delegate invocations above. Also notice that the functions for and and or don’t have abstractions in their “body”: the result of calling “mnm” or “mmn” already gives a function of the right arity, as can be seen in the proofs below:

The proof for xor is left as an exercise to the reader (feel free to use case analysis of all four combinations of arguments, although you can simplify matters a bit…).

Of course you want to see this in action. We already have our Unary function to test lambda functions with given arguments, let’s define a similar one for binary operators:

```static void Binary<T, PArg, PRes>(string name, Func<dynamic, T> f, Func<dynamic, PArg> printArg, Func<dynamic, PRes> printRes, params dynamic[] values)
{
foreach (var valueL in values)
foreach (var valueR in values)
Console.WriteLine("{0}({1}, {2}) = {3}", name, printArg(valueL), printArg(valueR), printRes(f(valueL)(valueR)));
}```

It basically creates all combinations of input arguments and applies them to the function, putting some pretty printing to the mix. So we write:

```Console.WriteLine("Church Booleans");
Unary("toBool", toBool, toBool, I, F, T);
Unary("not", not, toBool, toBool, F, T);
Binary("and", and, toBool, toBool, F, T);
Binary("or", or, toBool, toBool, F, T);
Binary("xor", xor, toBool, toBool, F, T);
Console.WriteLine();```

The use of toBool allows us to go from a “dynamic” function to a concrete C# Boolean value which is printable by Console.WriteLine. The result looks like this:

Looks right, doesn’t it?

# Church numerals

If we can do Booleans, we can do numeric values too, right? Let’s restrict ourselves to positive natural numbers (including 0) and we end up with the concept of Church numerals. Now we have an infinite domain of values to represent, we need a way to control this complexity somehow (in order to be able to define nice operators over them). One possible solution is the use of repeated function application to encode a numeric value, as follows:

Can you see it? Calling f on argument x once corresponds to 1, twice corresponds to 2, etc. Quite nice. What kind of simple operations can we define over such value representation? Clearly we don’t want to define all different N-objects for the whole domain of natural numbers. In fact, we don’t even want to define N1 explicitly. Two basic ingredients should suffice to define every natural number: a representation of 0, and a way to “add one” to a Church numeral (i.e. returning a new function that represents the value of the argument, plus one). This is the much desired successor function:

```var N0 = new Func<dynamic, Func<dynamic, dynamic>>(f => x => x);
var succ = new Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>(n => f => x => f(n(f)(x)));```

The successor function is a function with one argument “n”, returning a function of two arguments “f” and “x” (like all Church numerals being double-abstractions over some term). The definition is “f(nfx)”, with some more parentheses in C#. But how does it work? See it with your own eyes and feed it N0 for starters: “n” is substituted (beta conversion) for “N0” which by itself is a lambda expression of the form shown in the figure above (goes from f and x to x). A few beta-conversions later you’ll end up with f(x), exactly the definition of N1. In other words, succ simply adds an additional “f” application to the existing term’s body. A more formal proof is given below:

Based on this, can we create a simple conversions function to promote a C# positive integer value into a corresponding Church numeral? The answer is obviously yes and the idea is fairly easy: we’ll call succ repeatedly, n times for integer with value n, starting with N0 as the base. To go back from a Church numeral to a C# unsigned integer we exploit the repeated function call behavior on argument “f” and use a devilish side-effect to increment a counter whenever we are called:

```Func<dynamic, uint> toInt = x =>
{
uint n = 0;
x(new Action<dynamic>(_ => { n++; }))(null);
return n;
};

Func<uint, dynamic> fromInt = null;
fromInt = n =>
{
return (n == 0 ? N0 : succ(fromInt(n - 1)));
};```

If you don’t get this immediately, don’t worry. Just feed it some Church numerals and see what happens. For example, N0 is a function that simply returns its second argument. When calling toInt on N0, x is bound to N0, executed with two arguments of which the second one – null – is returned. The function of type Action<dynamic> in the first argument was never called, so the result is 0. But if N1 is fed in, that function in the first argument will get called how many times? Right, just one time. In other words, the side-effecting n++ will be called once, and the result is 1. In other words, we should be able to “roundtrip” C# integers through Church numerals:

```for (uint i = 0; i <= 10; i++)
Console.WriteLine(toInt(fromInt(i)));```

Try it at home and you should see the numbers 0 through 10 being printed on the screen. Victory. Oh, and fromInt is simply a recursive function that makes successor calls till n == 0 is reached, at which point N0 is returned. So, fromInt(5) will be encoded as succ(succ(succ(succ(succ(N0)))).

All of this looks good, doesn’t it? Another useful function is a zero-test. In the typed world this would be a function from an integer to a Boolean, but of course in the world of type-free lambda calculus we loose that typing safety net. Nevertheless, here’s how zero looks like:

`var zero = new Func<dynamic, dynamic>(n => n(K(F))(T));`

Recall that N0 was the same as F, and we know that F returns its second argument for its result. A zero-check on N0 clearly should return T, so the zero function should look like n (???) T. This satisfies the case where N0 is fed in. What about the other cases? Well, the first argument to n (the Church numeral passed in) is the function that gets called repeatedly on the second argument (which we already made T). For all those numerals we want to return F for the zero-check, so the question is whether we know a function that will always return F no matter what argument it’s given? We do, it’s called K with argument F: the constant-generator combinator K which we ask friendly to return hardheadedly F all the time. A formal proof is shown below:

Finally, the real stuff. What about operations like add and multiply, or even exponential? The good thing is they exist indeed :-). And here they are:

```var plus = new Func<dynamic, Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>>(m => n => f => x => m(f)(n(f)(x)));
var mul = new Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>(m => n => f => n(m(f)));
var exp = new Func<dynamic, Func<dynamic, dynamic>>(m => n => n(m));```

How they work is another piece of cake. Here’s a proof for plus’s behavior:

That was fairly easy, wasn’t it? Feel free to feed it a couple of values to confirm the behavior, but we’ll run it through some tests in just a while. For mul and exp, proofs need some inductive reasoning as shown below:

Notice how beautiful the definition of exp is: apply n with argument m, and you got an exponential. Wow. If I’d drink alcohol, this would be a reason to get drunk. I’ll stick with Diet Coke anyhow, but suffice to say I absolutely love it. And far that matter, mul is nice too, don’t you agree?

What about a predecessor and subtract function? Unfortunately those are quite ugly in this Church numeral encoding system:

`var pred = new Func<dynamic, Func<dynamic, Func<dynamic, dynamic>>>(n => f => x => n(new Func<dynamic, Func<dynamic, dynamic>>(a => b => b(a(f))))(K(x))(I));`

Bweik. It’s so disgusting I’ll leave the definition of a “sub” function (tip: use pred) to the reader. Oh, and if you’re really fearsome, prove the correctness of “pred” using induction.

Anyway, here’s a test-application for Church numerals:

```Console.WriteLine("Church numerals");
Unary("toInt", toInt, toInt, I, Enumerable.Range(0, 3).Select(i => fromInt((uint)i)).ToArray());
Unary("succ", succ, toInt, toInt, Enumerable.Range(0, 3).Select(i => fromInt((uint)i)).ToArray());
Unary("pred", pred, toInt, toInt, Enumerable.Range(1, 3).Select(i => fromInt((uint)i)).ToArray());
Unary("zero", zero, toInt, toBool, Enumerable.Range(0, 2).Select(i => fromInt((uint)i)).ToArray());
Binary("plus", plus, toInt, toInt, Enumerable.Range(0, 3).Select(i => fromInt((uint)i)).ToArray());
Binary("mul", mul, toInt, toInt, Enumerable.Range(0, 3).Select(i => fromInt((uint)i)).ToArray());
Binary("exp", exp, toInt, toInt, Enumerable.Range(1, 3).Select(i => fromInt((uint)i)).ToArray());
Console.WriteLine();```

We use our beloved fromInt function in combination with some LINQ operators to produce an array of test inputs, for which all combinations will be fed to the passed-in functions:

Wonderful. (Don’t bother to ask about performance…)

# Going nuts – recursive functions

We could go much further and define pairs, tuples and lists using Church encodings as well, but as I have another rabbit in my hat for another time, I won’t spoil the beans at this point. A little sneak peak though (this is without C# 4.0 dynamic…), for pairs:

```var Pair = λ(x => λ(y => λ(z => ß(ß(z, x), y)))); // \xyz.zxy
var Fst = λ(p => ß(p, T)); // \p.pT
var Snd = λ(p => ß(p, F)); // \p.pF```

and for lists:

```var Nil = ß(ß(Pair, T), T);
var IsNil = Fst;
var Cons = λ(h => λ(t => ß(ß(Pair, F), ß(ß(Pair, h), t))));
var Head = λ(z => ß(Fst, ß(Snd, z)));
var Tail = λ(z => ß(Snd, ß(Snd, z)));```

Instead, let’s conclude this post with two recursive functions:

```//
// The icing on the cake.
//
Func<Func<Func<dynamic, dynamic>, Func<dynamic, dynamic>>, Func<dynamic, dynamic>> Fix = null;
Fix = f => x => f(Fix(f))(x);

var fact = Fix(new Func<dynamic, Func<dynamic, dynamic>>(f => x => zero(x)(new Func<dynamic, dynamic>(_ => succ(N0)))(new Func<dynamic, dynamic>(_ => mul(x)(f(pred(x)))))(I)));
var fib = Fix(new Func<dynamic, Func<dynamic, dynamic>>(f => x => zero(x)(new Func<dynamic, dynamic>(_ => N0))(new Func<dynamic, dynamic>(_ => zero(pred(x))(new Func<dynamic, dynamic>(__ => succ(N0)))(new Func<dynamic, dynamic>(__ => plus(f(pred(x)))(f(pred(pred(x))))))(I)))(I)));

Unary("fact", fact, toInt, toInt, Enumerable.Range(0, 10).Select(i => fromInt((uint)i)).ToArray());
Unary("fib", fib, toInt, toInt, Enumerable.Range(0, 10).Select(i => fromInt((uint)i)).ToArray());```

This gets quite ugly due to the use of a fixpoint combinator and the chase to circumvent call-by-value semantics (exercise: where and how does that happen precisely in the fragments above?) causing endless evaluations (exhausting the stack), but rest assured those functions work just fine:

Finally, an outtake :-). What does the following result in?

```var Omega = new Func<dynamic, dynamic>(x => x(x))(new Func<dynamic, dynamic>(x => x(x)));
Omega(I);```

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

# Introduction

In my last post, Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0, I showed how to the extensions to the LINQ expression trees API opens up for full-blown statement trees including support for assignment, control flow, etc. One popular question that came up in the comments section is on the lack of language-level support for statement trees, like this:

```Expression<Func<int, List<int>>> getPrimes = to =>
{
var res = new List<int>();
for (int n = 2; n <= to; n++)
{
bool found = false;

for (int d = 2; d <= Math.Sqrt(n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}

if (!found)
}
return res;
};```

At this point, the above won’t work and trigger the following compile error:

A lambda expression with a statement body cannot be converted to an expression tree

This hasn’t changed from the previous release. Though it would be a nice feature to have, there are various reasons not to have it integrated with the language at this point. My posts to the comments section of my previous post elaborate on some of the conservative constraints employed during language design, applied to this particular case. So, sorry: not at this point.

The result is the above turns into quite an involved statement tree declaration if done by hand. Let’s repeat it here to set the scene:

```var to = Expression.Parameter(typeof(int), "to");
var res = Expression.Variable(typeof(List<int>), "res");
var n = Expression.Variable(typeof(int), "n");
var found = Expression.Variable(typeof(bool), "found");
var d = Expression.Variable(typeof(int), "d");
var breakOuter = Expression.Label();
var breakInner = Expression.Label();
var getPrimes =
// Func<int, List<int>> getPrimes =
Expression.Lambda<Func<int, List<int>>>(
// {
Expression.Block(
// List<int> res;
new [] { res },
// res = new List<int>();
Expression.Assign(
res,
Expression.New(typeof(List<int>))
),
// {
Expression.Block(
// int n;
new [] { n },
// n = 2;
Expression.Assign(
n,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// (n <= to)
Expression.LessThanOrEqual(
n,
to
)
// )
),
// break;
Expression.Break(breakOuter)
),
// {
Expression.Block(
// bool found;
new[] { found },
// found = false;
Expression.Assign(
found,
Expression.Constant(false)
),
// {
Expression.Block(
// int d;
new [] { d },
// d = 2;
Expression.Assign(
d,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// d <= Math.Sqrt(n)
Expression.LessThanOrEqual(
d,
Expression.Convert(
Expression.Call(
null,
typeof(Math).GetMethod("Sqrt"),
Expression.Convert(
n,
typeof(double)
)
),
typeof(int)
)
)
// )
),
// break;
Expression.Break(breakInner)
),
// {
Expression.Block(
// if (n % d == 0)
Expression.IfThen(
Expression.Equal(
Expression.Modulo(
n,
d
),
Expression.Constant(0)
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)
// }
),
// d++;
Expression.PostIncrementAssign(d)
// }
),
breakInner
)
),
// if
Expression.IfThen(
// (!found)
Expression.Not(found),
Expression.Call(
res,
n
)
)
),
// n++;
Expression.PostIncrementAssign(n)
// }
),
breakOuter
)
),
res
),
to
// }
).Compile();```

I’ve formatted a bunch of nodes in the expression tree above to indicate the “tiles” that are based on the v3.0 subset of the Expression Tree APIs, which do have language support. What if we could simplify the declaration above by leveraging the language support at places where it’s worthwhile? A tour through simplifying statement trees…

# Abstraction + application = ???

Without much further ado, let’s dive straight into our pool of expression tree tricks and reveal the goods for this round: abstraction and application. Guess what, we’re back to fundamental lambda calculus. An explanation is in order:

Given those fundamental laws in lambda calculus, we can create a very round-about way of expressing 1 + 2. Just reverse the arrows in the last diagram: 2 becomes y, 1 becomes x, both “abstracting away” the constants one at a time, resulting in a function that’s readily applied to the two constants.

This basic insight is what will help us to simplify the statement tree declaration, allowing us to “tile in” expression trees in the bigger soup of statement tree nodes. But before we go there, let’s show the technique above in practice. Say we want to over-complicate the act of adding 1 and 2 together and do so through a deviation along the road of abstraction and application:

```var x = Expression.Parameter(typeof(int), "x");
var y = Expression.Parameter(typeof(int), "y");
var f = Expression.Lambda<Func<int, int, int>>(
x, y
);
var three = Expression.Invoke(
f,
Expression.Constant(1),
Expression.Constant(2)
);
var res = Expression.Lambda<Func<int>>(three).Compile();

Console.WriteLine(res());```

What’s happening here is not that difficult as it may seem at first glance. The “f” lambda expression simply is a function that goes from x and y to x + y (in C# lingo, (x, y) => x + y). Expression.Invoke is used to invoke a delegate (lambdas ultimately are delegates), this time supplying the arguments. So, “three” is shorthand for ((x, y) => x + y)(1, 2), which is longhand for 1 + 2. Finally, the “res” lambda expression is a simple function returning an int, being the result of the invocation of the sum abstraction.

Looking at the generated IL code for the “res” delegate (using the tricks revealed in the previous post), we see the following:

IL_0000: ldc.i4.1
IL_0001: ldc.i4.2
IL_0002: stloc.1
IL_0003: stloc.0
IL_0004: ldloc.0
IL_0005: ldloc.1
IL_0007: ret

Did you see it? You did? Great! Exactly, no single trace of a delegate call in here. The combination of application (InvocationExpression) over an abstraction (LambdaExpression) got optimized away. Instead of pushing constants 1 and 2 on the stack in preparation of a delegate invocation call instruction, they get stored in locals, followed by a dump of the invoked function’s IL where the expected ldarg instructions are replaced by ldloc instructions. All in all, it’s still as if we’d written:

```var res = Expression.Lambda<Func<int>>(
Expression.Constant(1),
Expression.Constant(2)
)
).Compile();```

which translates in slightly simpler IL code:

IL_0000: ldc.i4.1
IL_0001: ldc.i4.2
IL_0003: ret

Eliminating the excessive stloc/ldloc pairs in the original fragment is something the JIT compiler could take care of it feels happy to do so, but the core message here is that this trick of “abstract & apply” is cheaper than it looks.

Why am I telling you all of this? In fact, the trick outlined above is what we’ll use to tile in language-generated expression trees in a bigger sea of hand-crafted statement trees. For the sample above, what if we made the function definition through abstraction happen using the language-level support, while keeping the invocation call in plain hand-crafted expression trees:

```var three = Expression.Invoke(
(Expression<Func<int, int, int>>)((x, y) => x + y),
Expression.Constant(1),
Expression.Constant(2)
);
var res = Expression.Lambda<Func<int>>(three).Compile();```

See it coming? Think about it for a while: we’re combining a compiler-generated expression tree with a manually created bigger one in which we embed the smaller one.

# Theory applied

To keep things a bit more controlled, focus on one part of the original statement tree:

```Expression.IfThen(
// (!
Expression.Not(
// d <= Math.Sqrt(n)
Expression.LessThanOrEqual(
d,
Expression.Convert(
Expression.Call(
null,
typeof(Math).GetMethod("Sqrt"),
Expression.Convert(
n,
typeof(double)
)
),
typeof(int)
)
)
// )
),
// break;
Expression.Break(breakInner)
),```

This corresponds to the terminating condition for the inner loop:

```for (int d = 2; d <= Math.Sqrt(n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}```

As we all know, that’s a plain old valued expression that could be turned into a function as follows:

```Func<int, int, bool> cond = (x, y) => x <= Math.Sqrt(y);
for (int d = 2; cond(d, n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}```

Now we’re getting somewhere. The code above is exactly the same as the original one, but we’ve abstracted over the condition by turning it into a function of its own, which we subsequently apply given d and n. For regular code, that doesn’t make much sense, but it’s exactly the trick we’re talking about here and that will make the expression tree case simpler. What if we translated the code above as follows?

```Expression<Func<int, int, bool>> cond = (x, y) => x <= Math.Sqrt(y);
for (int d = 2; cond.Compile()(d, n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}```

Heh, there’s our expression tree for the condition. As we’re still not generating the whole for-loop in a statement tree, we have to call Compile explicitly on the intermediate lambda expression to invoke it in the condition part of the for-loop. Now we can take it one step further and go back to the expression tree for the whole for-loop, patching it up with our intermediate “cond” expression tree … using application. At the same time, we can eat the “Not” node:

```Expression.IfThen(
// (!
Expression.Not(
// d <= Math.Sqrt(n)
Expression.LessThanOrEqual(
d,
Expression.Convert(
Expression.Call(
null,
typeof(Math).GetMethod("Sqrt"),
Expression.Convert(
n,
typeof(double)
)
),
typeof(int)
)
)
// )
),
// break;
Expression.Break(breakInner)
),```

In essence, the first argument to IfThen (i.e. the if-statement’s condition expression) contains two variables from the outer scope: d and n. We’ll abstract over those, introducing an intermediate lambda expression, and invoke (apply) that one using d and n. Code is worth a thousand words:

```Expression.IfThen(
// (!(d <= Math.Sqrt(n)))
Expression.Invoke(
(Expression<Func<int, int, bool>>)((x, y) => !(x <= Math.Sqrt(y))),
d,
n
),
// break;
Expression.Break(breakInner)
),```

Wow, that got a bit simpler, didn’t it? The formatted (bold, underline) part corresponds to the language-generated expression tree. All we have to do is wrap that thing in an InvocationExpression, passing in d and n as arguments (respectively being bound to x and y).

Similarly, we can simplify other parts of the statement tree. For example:

```// if (n % d == 0)Expression.IfThen(
Expression.Equal(
Expression.Modulo(
n,
d
),
Expression.Constant(0)
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)```

It doesn’t make sense to try to abstract Expression.Constant calls (it would only blow up the code size and make things more cumbersome), but the n % d == 0 part is something we could ask the compiler to generate for us:

```// if (n % d == 0)
Expression.IfThen(
Expression.Invoke(
(Expression<Func<int, int, bool>>)((x, y) => x % y == 0),
n,
d
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)```

It’s a little unfortunate lambda expressions need to have a type forced upon them to emit them either as a delegate (Func<…> types) or an expression tree (Expression<Func<…>> types), requiring us to use an explicit cast to demand the compiler to generate an expression tree. This blows up the code significantly, but still we’re saving quite a bit of code. Also for method invocations, we can get rid of the ugly reflection code required to get the right overload, e.g.:

```// if
Expression.IfThen(
// (!found)
Expression.Not(found),
Expression.Call(
res,
n
)
)```

Finding the Add method is simple in this case, but if you end up creating a MethodCallExpression with a bunch of arguments to a method with lots of overloads available, the code gets quite complicated. In the sample above, there’s little to gain, but just for the sake of illustration:

```// if
Expression.IfThen(
// (!found)
Expression.Not(found),
Expression.Invoke(
(Expression<Action<List<int>, int>>)((l, e) => l.Add(e)),
res,
n
)
)```

Moreover, when typing the lambda expression above you’ll get full-blown (statically typed) IntelliSense:

In fact, this trick is a runtime-aided trick to implement an “infoof” operator (info-of) which I use from time to time. For example, if you want to get the MethodInfo for Console.WriteLine(“{0} is {1}”, “Bart”, 26) that can get quite involved using reflection flags (public, static), a method name in a string (“WriteLine”), a type array for the arguments (beware of the “params” behavior above), etc. Instead, you could do this:

```static MethodInfo InfoOf(Expression<Action> ex)
{
var mce = ex.Body as MethodCallExpression;
if (mce != null)
{
return mce.Method;
}
else
{
throw new InvalidOperationException("InfoOf called on expression without any kind of member access.");
}
}

static MemberInfo InfoOf<T>(Expression<Func<T>> ex)
{
var me = ex.Body as MemberExpression;
if (me != null)
{
return me.Member;
}
else
{
throw new InvalidOperationException("InfoOf called on expression without any kind of member access.");
}
}```

which we can call as follows:

```var cw = InfoOf(() => Console.WriteLine("{0} is {1}", "Bart", 26));
var now = InfoOf(() => DateTime.Now);```

So, all in all, the lack of statement tree support in the language is a pity, but by leveraging existing expression tree support we can simplify the task at hand in some cases. The result for our sample is as follows:

```var to = Expression.Parameter(typeof(int), "to");
var res = Expression.Variable(typeof(List<int>), "res");
var n = Expression.Variable(typeof(int), "n");
var found = Expression.Variable(typeof(bool), "found");
var d = Expression.Variable(typeof(int), "d");
var breakOuter = Expression.Label();
var breakInner = Expression.Label();
var getPrimes =
// Func<int, List<int>> getPrimes =
Expression.Lambda<Func<int, List<int>>>(
// {
Expression.Block(
// List<int> res;
new [] { res },
// res = new List<int>();
Expression.Assign(
res,
Expression.New(typeof(List<int>))
),
// {
Expression.Block(
// int n;
new [] { n },
// n = 2;
Expression.Assign(
n,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// (n <= to)
Expression.LessThanOrEqual(
n,
to
)
// )
),
// break;
Expression.Break(breakOuter)
),
// {
Expression.Block(
// bool found;
new[] { found },
// found = false;
Expression.Assign(
found,
Expression.Constant(false)
),
// {
Expression.Block(
// int d;
new [] { d },
// d = 2;
Expression.Assign(
d,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!(d <= Math.Sqrt(n)))
Expression.Invoke(
(Expression<Func<int, int, bool>>)((x, y) => !(x <= Math.Sqrt(y))),
d,
n
),
// break;
Expression.Break(breakInner)
),
// {
Expression.Block(
// if (n % d == 0)
Expression.IfThen(
Expression.Invoke(
(Expression<Func<int, int, bool>>)((x, y) => x % y == 0),
n,
d
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)
// }
),
// d++;
Expression.PostIncrementAssign(d)
// }
),
breakInner
)
),
// if
Expression.IfThen(
// (!found)
Expression.Not(found),
Expression.Invoke(
(Expression<Action<List<int>, int>>)((l, e) => l.Add(e)),
res,
n
)
)
),
// n++;
Expression.PostIncrementAssign(n)
// }
),
breakOuter
)
),
res
),
to
// }
).Compile();```

Still quite involved to write, but a bit simpler than our previous attempt. As an exercise, identify a few other potential rewrite sites in the fragment above. Have fun!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

# Introduction

Avid blog readers know I have a weak spot for expression trees in particular and the broader picture of meta-programming facilities. With the introducing of LINQ in the .NET Framework 3.5 timeframe, we ended up adding expression trees to the framework as a way to represent code as data. A fundamental requirement to make remote LINQ scenarios such as LINQ to SQL work, as the LINQ expression written by the developer needs to be cross-compiled into the corresponding query language, e.g. SQL. To enable this, two routes could be taken:

1. Decompile IL code for the query at runtime, and turn it into the target language – quite roundabout
2. Enable representing the query’s code as data at runtime by emitting an expression tree at compile-time

Turning the query into the required target language at compile time isn’t a viable option either, since the front-end compilers would have to know about all of the target query domains; clearly not a good idea.

# A quick recap of the 3.5 landscape

Before diving into the .NET 4.0 feature set around enhanced expression trees, let’s have a quick recap of the corresponding baseline features introduced in .NET 3.5. Since LINQ was the driver for this feature after all, it makes sense to take a top-down approach and turn a LINQ query into more primitive building blocks.

Consider the following query expression:

var res = from p in db.Products
where p.UnitPrice > 100
select p.ProductName;

Given this, we can’t say much about the query just yet. All we know is that from a language’s point of view, we can “desugar” the above into the below:

var res = db.Products
.Where(p => p.UnitPrice > 100)
.Select(p => p.ProductName);

From this point on, regular method resolution rules kick in to find a Where method on db.Products, and to find a Select method on the result of calling Where. The argument to both those methods is a lambda expression and here’s where the flexibility comes from. Lambda expressions by themselves don’t have a type:

var twice = (int x) => x * 2;

Though the right-hand side seems to have all the information to infer a type (going from int to an expression of type int, ought to be Func<int, int> right?) the above doesn’t compile. Lambda expressions can be assigned to variables of two distinct types: regular delegates or expression trees thereof. The below illustrates this:

Func<int, int>             twiceD = x => x * 2;
Expression<Func<int, int>> twiceE = x => x * 2;

The first one is shorthand for an anonymous method we already supported in C# 2.0:

Func<int, int>             twiceD = delegate(int x) { return x * 2; };

However, the second one gives rise to quite a bit more of generated code; in particular, code is emitted that constructs an expression tree at runtime to represent the code the user wrote as data. Given this code, the library function calls (like Where and Select in case of LINQ) can consume that expression tree to analyze the code the user wrote, extracting the intent and turning it into execution in whatever way is right for the task at hand (e.g. by cross-compiling the expression into a target query language like SQL). In particular, the twiceE lambda expression gets turned into:

ParameterExpression x = Expression.Parameter(typeof(int), “x”);
Expression<Func<int, int>> twiceE = Expression.Lambda<Func<int, int>>(
Expression.Multiply(
x,
Expression.Constant(2)
),
x
);

That’s exactly what happens in the LINQ case when using an IQueryable<T> data source: the query operator methods like Where and Select don’t take in a plain delegate (like a Func<T, bool> for the filter fed in to the Where operator) but instead they take in an Expression<TDelegate> thereof (like an Expression<Func<T, bool>>). This causes the compiler to emit code generating an expression tree (by means of the Expression factory methods as shown above) for the argument. Ultimately the Queryable class puts together all those expression tree fragments into a big one representing the whole query expression, ready for the query provider to do whatever magic it sees fit. We won’t go there now.

To finish up this recap, a single picture that tells it all:

There’s on thing I haven’t mentioned yet: the Compile method on the LambdaExpression type as produced by the Expression.Lambda call. This plan simple API surface gives access to a gigantic machinery behind the scenes that turns the expression tree into executable IL code at runtime: the tip of the Compiler-as-a-Service (CaaS) iceberg.

As a little extra, let’s dig a little deeper and show the generated IL code through the Son-of-Strike (SOS) debugger extension. I’m using a Visual Studio 2010 project targeting .NET 3.5 for this demo. If it weren’t for the fact I have a 64-bit machine, I could equally well have used Visual Studio 2008, but mixed mode debugging (native + managed, as required by SOS) isn’t available on 64-bit in the 2008 version. In Visual Studio 2010, this restriction is gone (hooray).

Inspired by the code above, we can write a simple expression tree and have it compile at runtime. The resulting delegate object is based on dynamically emitted code, which we’ll visualize now:

First, we load the SOS extension from the .NET 2.0 installation folder. To do this, you’ll have to enable unmanaged code debugging:

Next, we use !dso to dump stack objects, which will reveal our twice object which is of type Func<int, int>. Dumping the object using !do reveals the target object, the underlying method, etc. We’ll have a look at the _methodBase next, where we’ll be able to see the dynamic method’s code:

The !dumpil command is all we need to dump the contents of the dynamically generated IL body, revealing the “times two” logic. Feel free to experiment with more complex lambda expressions revealing their contents as IL after runtime compilation. For example, what lambda expression did I write to arrive with the following?

!dumpil 026b89f0
This is dynamic IL. Exception info is not reported at this time.
If a token is unresolved, run "!do <addr>" on the addr given
in parenthesis. You can also look at the token table yourself, by
running "!DumpArray 026bde24".

IL_0000: ldarg.1
IL_0001: ldarg.2
IL_0002: callvirt 6000002 System.String.Substring(Int32)
IL_0007: ret

So, to wrap up, the .NET 3.5 story of expression trees consisted of two core parts:

• Library support for expression trees, in the System.Linq.Expressions namespace defined in the System.Core.dll assembly (meaning one can manually new up expression trees and obviously inspect their properties).
• Language support for expression trees, triggered by assigning a lambda expression to an Expression<TDelegate> type, available in C# and VB (as invisible glue to make LINQ work in certain scenarios).

# After expressions come … statements

Looking at expression trees in .NET 3.5 one observes they’re only able to represent expressions, i.e. those portions from languages that are valued in their very nature. Things like if and while don’t fall under this umbrella and are statements, and one more step higher up we land at full-blown declarations (like classes and interfaces). It’s clear if we want to represent more than valued expressions and aim our arrows at full method bodies, we need more than pure expression trees. So .NET 4.0 introduces statement trees as an extension to the existing expression tree API (so, as a little unfortunate side-effect of this, the Expressions part in the containing namespace becomes a bit of a misnomer).

The following picture illustrates the gaps we have in expression trees that hinder us from representing full method bodies. The reason we’re looking after such support is to be able to use the expression trees as one of the driving forces behind the Dynamic Language Runtime (DLR). To communicate code to the runtime, one uses expression trees. One could think of this in a rather extremist fashion as follows: where IL is used to represent code in the CLR, expression trees are used in the DLR. Ultimately the DLR is layered on top of the CLR, so at the end of the day expression trees are compiled – at runtime – into IL instructions as shown above with the Compile method of the v3.5 API. And from that point on, the IL can be turned into native code by the JIT compiler.

Everything in green was already exposed through the expression tree APIs in .NET 3.5. Things like method calls, various operators, constants, constructor calls, and variables (though you couldn’t assign to them but use them as parameters to lambdas) can be represented through factory method calls on the Expression type.

One first thing we need to add is all sorts of assignments so that mutation of variables can be achieved. Lots of languages live and breathe mutation, so assignment becomes an essential building block to realize the potential of those languages on top of the DLR. This is indicated in orange. Notice things like n++ fall under this category as well, due to their dual increment and assign nature. In fact, assignments in C# are categorized as expressions (why?), so the old API didn’t expose the whole expression surface of the language. Instead, the LINQ expressions API provided just enough expressiveness as was needed for commonly used operations in the context of writing queries. Writing things like the following doesn’t make sense in that context:

var res = from p in db.Products
where p.UnitPrice > 100
select p.ProductName = “Changed”;

(Note: The above could represent some kind of update statement, but together with LINQ’s deeply rooted lazy evaluation the above would turn out very cumbersome from an operational point of view.)

Finally, we require a way to perform sequencing of operations (the semi-colon in C# if you will) and organize such operations in blocks where the concept of scoping is alive and kicking (the idea of curly braces in C#). In addition, every statement that influences control flow is useful to have too. Examples include if and loops (together with their break and continue mechanisms), returning from a method, throwing and catching exceptions, etc.

Adding all those ingredients to the mix results in the v4.0 version of the expression tree API, including the ability to do dynamic dispatch (which is what the DLR is all about, and is surfaced through the new C# 4.0 dynamic keyword). Note: the “table” mentioned in the rectangle for dynamic dispatch above refers to all the operations supported by the DLR, such as calling properties, invoking methods, etc.

The following query can be used on both .NET 3.5 and .NET 4.0 to illustrate the differences between the APIs:

```var ops = from m in typeof(Expression).GetMethods()
where m.IsStatic
group m by m.Name into g
orderby g.Key
select new { g.Key, Overloads = g.Count() };

foreach (var op in ops)
Console.WriteLine(op);```

In .NET 3.5, this is the result:

{ Key = Add, Overloads = 2 }
{ Key = AddChecked, Overloads = 2 }
{ Key = And, Overloads = 2 }
{ Key = AndAlso, Overloads = 2 }
{ Key = ArrayIndex, Overloads = 3 }
{ Key = ArrayLength, Overloads = 1 }
{ Key = Bind, Overloads = 2 }
{ Key = Call, Overloads = 6 }
{ Key = Coalesce, Overloads = 2 }
{ Key = Condition, Overloads = 1 }
{ Key = Constant, Overloads = 2 }
{ Key = Convert, Overloads = 2 }
{ Key = ConvertChecked, Overloads = 2 }
{ Key = Divide, Overloads = 2 }
{ Key = ElementInit, Overloads = 2 }
{ Key = Equal, Overloads = 2 }
{ Key = ExclusiveOr, Overloads = 2 }
{ Key = Field, Overloads = 2 }
{ Key = GetActionType, Overloads = 1 }
{ Key = GetFuncType, Overloads = 1 }
{ Key = GreaterThan, Overloads = 2 }
{ Key = GreaterThanOrEqual, Overloads = 2 }
{ Key = Invoke, Overloads = 2 }
{ Key = Lambda, Overloads = 5 }
{ Key = LeftShift, Overloads = 2 }
{ Key = LessThan, Overloads = 2 }
{ Key = LessThanOrEqual, Overloads = 2 }
{ Key = ListBind, Overloads = 4 }
{ Key = ListInit, Overloads = 6 }
{ Key = MakeBinary, Overloads = 3 }
{ Key = MakeMemberAccess, Overloads = 1 }
{ Key = MakeUnary, Overloads = 2 }
{ Key = MemberBind, Overloads = 4 }
{ Key = MemberInit, Overloads = 2 }
{ Key = Modulo, Overloads = 2 }
{ Key = Multiply, Overloads = 2 }
{ Key = MultiplyChecked, Overloads = 2 }
{ Key = Negate, Overloads = 2 }
{ Key = NegateChecked, Overloads = 2 }
{ Key = New, Overloads = 6 }
{ Key = NewArrayBounds, Overloads = 2 }
{ Key = NewArrayInit, Overloads = 2 }
{ Key = Not, Overloads = 2 }
{ Key = NotEqual, Overloads = 2 }
{ Key = Or, Overloads = 2 }
{ Key = OrElse, Overloads = 2 }
{ Key = Parameter, Overloads = 1 }
{ Key = Power, Overloads = 2 }
{ Key = Property, Overloads = 3 }
{ Key = PropertyOrField, Overloads = 1 }
{ Key = Quote, Overloads = 1 }
{ Key = RightShift, Overloads = 2 }
{ Key = Subtract, Overloads = 2 }
{ Key = SubtractChecked, Overloads = 2 }
{ Key = TypeAs, Overloads = 1 }
{ Key = TypeIs, Overloads = 1 }
{ Key = UnaryPlus, Overloads = 2 }

And in .NET 4.0, where I’ve used the same color scheme to outline differences from the previous version (notice some additions exist to the “expression” subset of available tree node types):

{ Key = Add, Overloads = 2 }
{ Key = AddAssign, Overloads = 3 }
{ Key = AddAssignChecked, Overloads = 3 }
{ Key = AddChecked, Overloads = 2 }
{ Key = And, Overloads = 2 }
{ Key = AndAlso, Overloads = 2 }
{ Key = AndAssign, Overloads = 3 }
{ Key = ArrayAccess, Overloads = 2 }
{ Key = ArrayIndex, Overloads = 3 }
{ Key = ArrayLength, Overloads = 1 }
{ Key = Assign, Overloads = 1 }
{ Key = Bind, Overloads = 2 }
{ Key = Block, Overloads = 12 }
{ Key = Break, Overloads = 4 }
{ Key = Call, Overloads = 14 }
{ Key = Catch, Overloads = 4 }
{ Key = ClearDebugInfo, Overloads = 1 }
{ Key = Coalesce, Overloads = 2 }
{ Key = Condition, Overloads = 2 }
{ Key = Constant, Overloads = 2 }
{ Key = Continue, Overloads = 2 }
{ Key = Convert, Overloads = 2 }
{ Key = ConvertChecked, Overloads = 2 }
{ Key = DebugInfo, Overloads = 1 }
{ Key = Decrement, Overloads = 2 }
{ Key = Default, Overloads = 1 }
{ Key = Divide, Overloads = 2 }
{ Key = DivideAssign, Overloads = 3 }
{ Key = Dynamic, Overloads = 6 }
{ Key = ElementInit, Overloads = 2 }
{ Key = Empty, Overloads = 1 }
{ Key = Equal, Overloads = 2 }
{ Key = ExclusiveOr, Overloads = 2 }
{ Key = ExclusiveOrAssign, Overloads = 3 }
{ Key = Field, Overloads = 3 }
{ Key = GetActionType, Overloads = 1 }
{ Key = GetDelegateType, Overloads = 1 }
{ Key = GetFuncType, Overloads = 1 }
{ Key = Goto, Overloads = 4 }
{ Key = GreaterThan, Overloads = 2 }
{ Key = GreaterThanOrEqual, Overloads = 2 }
{ Key = IfThen, Overloads = 1 }
{ Key = IfThenElse, Overloads = 1 }
{ Key = Increment, Overloads = 2 }
{ Key = Invoke, Overloads = 2 }
{ Key = IsFalse, Overloads = 2 }
{ Key = IsTrue, Overloads = 2 }
{ Key = Label, Overloads = 6 }
{ Key = Lambda, Overloads = 18 }
{ Key = LeftShift, Overloads = 2 }
{ Key = LeftShiftAssign, Overloads = 3 }
{ Key = LessThan, Overloads = 2 }
{ Key = LessThanOrEqual, Overloads = 2 }
{ Key = ListBind, Overloads = 4 }
{ Key = ListInit, Overloads = 6 }
{ Key = Loop, Overloads = 3 }
{ Key = MakeBinary, Overloads = 3 }
{ Key = MakeCatchBlock, Overloads = 1 }
{ Key = MakeDynamic, Overloads = 6 }
{ Key = MakeGoto, Overloads = 1 }
{ Key = MakeIndex, Overloads = 1 }
{ Key = MakeMemberAccess, Overloads = 1 }
{ Key = MakeTry, Overloads = 1 }
{ Key = MakeUnary, Overloads = 2 }
{ Key = MemberBind, Overloads = 4 }
{ Key = MemberInit, Overloads = 2 }
{ Key = Modulo, Overloads = 2 }
{ Key = ModuloAssign, Overloads = 3 }
{ Key = Multiply, Overloads = 2 }
{ Key = MultiplyAssign, Overloads = 3 }
{ Key = MultiplyAssignChecked, Overloads = 3 }
{ Key = MultiplyChecked, Overloads = 2 }
{ Key = Negate, Overloads = 2 }
{ Key = NegateChecked, Overloads = 2 }
{ Key = New, Overloads = 6 }
{ Key = NewArrayBounds, Overloads = 2 }
{ Key = NewArrayInit, Overloads = 2 }
{ Key = Not, Overloads = 2 }
{ Key = NotEqual, Overloads = 2 }
{ Key = OnesComplement, Overloads = 2 }
{ Key = Or, Overloads = 2 }
{ Key = OrAssign, Overloads = 3 }
{ Key = OrElse, Overloads = 2 }
{ Key = Parameter, Overloads = 2 }
{ Key = PostDecrementAssign, Overloads = 2 }
{ Key = PostIncrementAssign, Overloads = 2 }
{ Key = Power, Overloads = 2 }
{ Key = PowerAssign, Overloads = 3 }
{ Key = PreDecrementAssign, Overloads = 2 }
{ Key = PreIncrementAssign, Overloads = 2 }
{ Key = Property, Overloads = 7 }
{ Key = PropertyOrField, Overloads = 1 }
{ Key = Quote, Overloads = 1 }
{ Key = ReferenceEqual, Overloads = 1 }
{ Key = ReferenceNotEqual, Overloads = 1 }
{ Key = Rethrow, Overloads = 2 }
{ Key = Return, Overloads = 4 }
{ Key = RightShift, Overloads = 2 }
{ Key = RightShiftAssign, Overloads = 3 }
{ Key = RuntimeVariables, Overloads = 2 }
{ Key = Subtract, Overloads = 2 }
{ Key = SubtractAssign, Overloads = 3 }
{ Key = SubtractAssignChecked, Overloads = 3 }
{ Key = SubtractChecked, Overloads = 2 }
{ Key = Switch, Overloads = 6 }
{ Key = SwitchCase, Overloads = 2 }
{ Key = SymbolDocument, Overloads = 4 }
{ Key = Throw, Overloads = 2 }
{ Key = TryCatch, Overloads = 1 }
{ Key = TryCatchFinally, Overloads = 1 }
{ Key = TryFault, Overloads = 1 }
{ Key = TryFinally, Overloads = 1 }
{ Key = TryGetActionType, Overloads = 1 }
{ Key = TryGetFuncType, Overloads = 1 }
{ Key = TypeAs, Overloads = 1 }
{ Key = TypeEqual, Overloads = 1 }
{ Key = TypeIs, Overloads = 1 }
{ Key = UnaryPlus, Overloads = 2 }
{ Key = Unbox, Overloads = 1 }
{ Key = Variable, Overloads = 2 }

As you can see, there’s a wealth of new capabilities added to the expression trees API and one can’t but love this from the bottom of his .NET developer’s heart. What hasn’t changed though, at the moment, is the language support for expression tree generation. That is, when writing a statement bodied lambda expression and assigning it to an Expression<TDelegate> variable, the compiler won’t turn it into an expression tree using the new nodes for you. It keeps on supporting the subset required for LINQ. Till we truly enter the domain of meta-programming, I except the language to stay with the true expression tree subset.

# A first sample

To get readers excited about this API enrichment, let’s have a go at writing a bigger expression tree containing statement and control flow nodes. Our running primes sample is a good choice for this task at hand. But first, the regular C# code to set the scene:

```Func<int, List<int>> getPrimes = to =>
{
var res = new List<int>();
for (int n = 2; n <= to; n++)
{
bool found = false;

for (int d = 2; d <= Math.Sqrt(n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}

if (!found)
}
return res;
};

foreach (var num in getPrimes(100))
Console.WriteLine(num);```

This code should be fairly straightforward to grasp. Notice I’m using a lambda expression to define the “method” we’re talking about, so that we have a local variable called getPrimes that we can invoke through. Our goal is to generate this code at runtime using the expression trees APIs, so that the use site (the foreach loop at the bottom) can remain the same.

Next, we’re on for a stroll in expression tree land. The top level node is a plain old LambdaExpression but then things start to change from what we’re used to. First of all, we’ll need sequencing which can be achieved through a BlockExpression. Each block can have a number of variables that are in scope, as well as a number of expressions that make part of it. In our case, the body of the lambda has a res variable and two statements: one for the for-loop and one for the return-statement:

```var to = Expression.Parameter(typeof(int), "to");
var res = Expression.Variable(typeof(List<int>), "res");
var getPrimes =
// Func<int, List<int>> getPrimes =
Expression.Lambda<Func<int, List<int>>>(
// {
Expression.Block(
// List<int> res;
new [] { res },
// res = new List<int>();
Expression.Assign(
res,
Expression.New(typeof(List<int>))
),
// loop goes here
res
),
to
// }
).Compile();

foreach (var num in getPrimes(100))
Console.WriteLine(num);```

Again, this piece of code is not too hard to understand. The Lambda factory method takes in the body and the lambda parameters. Difference from the past API landscape is that the body now can be a BlockExpression, and in our case it has a “res” variable in scope. Assignment is trivial using the Assign factory method passing in the left-hand side and right-hand side. Returning from the lambda expression can simply be done by using the “res” expression since the last valued expression from a block becomes its return value (an Expression.Return factory exists for more complicated cases where one returns from the middle of a block of code, which boils down to a “value-carrying” GotoExpression behind the scenes). Notice this is more general property of expression trees, as languages like Ruby use this property throughout (e.g. the value of an if-statement is the last expression’s value from either of the branches).

Now the loop. Although a whole bunch of loop constructs exist in front-end languages like C#, the expression tree APIs only provide one out of the box. Its goal is to centralize the plumbing involved with re-executing the loop’s body while having break and continue capabilities, as well as maintaining the last iteration’s value as the result of the loop (again for languages like Ruby that work on the DLR using expression trees). As we’ve written a for-loop in the original fragment, we need a way to cross-compile a for-loop into this basic building block for loops:

for (initializer; condition; update)
body

stands for

{
initializer;
while (condition)
{
body;
continueLabel:
update;
}
breakLabel:
}

where the body’s immediate inner statements get a substitution applied to rewrite break and continue statements in terms of the added labels. Notice the use of a separate block, keeping the for-loop’s initialized variables in a separate scope. More-over, the expression tree API’s loop construct doesn’t have a terminating condition, so that too needs to be replaced by a custom jump to the break label:

{
initializer;
while (true)
{
if (!condition)
goto breakLabel;
body;
continueLabel:
update;
}
breakLabel:
}

As our loops don’t use continue statements, we can omit that complication. For the outer loop we end up with:

```// {
Expression.Block(
// int n;
new [] { n },
// n = 2;
Expression.Assign(
n,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// (n <= to)
Expression.LessThanOrEqual(
n,
to
)
// )
),
// break;
Expression.Break(breakOuter)
),
// loop body goes here
// n++;
Expression.PostIncrementAssign(n)
// }
),
breakOuter
))```

A more thorough explanation is in order here. Think of Expression.Loop as providing an infinite loop. So, right inside the loop’s body (the first argument to the Loop call) we provide a condition to jump out of the loop when the terminating condition holds. As part of the loop’s body we also need to have the inner loop (see further) as well as the update part of the for-statement, which is a post-increment-assign node. One more thing is required here though: the use of a break label to denote the statement right after the loop. We create such a beast using the following call:

`var breakOuter = Expression.Label();`

and specify it as the last argument to the Loop call, where the expression tree API can take care of emitting the label in the right spot. Inside the loop, we can simply use the Break factory method to emit a GotoExpression that essentially breaks out of the loop. Also notice the use of the outer loop’s block as the scope for the variable “n”, which was declared like this:

`var n = Expression.Variable(typeof(int), "n");`

No rocket science either. Finally, we need to write the loop’s body which is piece of cake again: use a block to group multiple statements. It also contains an inner for-loop for which we use the same tricks, including the use of a break label etc:

```// {
Expression.Block(
// bool found;
new[] { found },
// found = false;
Expression.Assign(
found,
Expression.Constant(false)
),
// {
Expression.Block(
// int d;
new [] { d },
// d = 2;
Expression.Assign(
d,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// d <= Math.Sqrt(n)
Expression.LessThanOrEqual(
d,
Expression.Convert(
Expression.Call(
null,
typeof(Math).GetMethod("Sqrt"),
Expression.Convert(
n,
typeof(double)
)
),
typeof(int)
)
)
// )
),
// break;
Expression.Break(breakInner)
),
// {
Expression.Block(
// if (n % d == 0)
Expression.IfThen(
Expression.Equal(
Expression.Modulo(
n,
d
),
Expression.Constant(0)
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)
// }
),
// d++;
Expression.PostIncrementAssign(d)
// }
),
breakInner
)
),
// if
Expression.IfThen(
// (!found)
Expression.Not(found),
Expression.Call(
res,
n
)
)
),```

It should be straightforward to find out how the variables d, found and the label breakInner have been created upfront. Around the d <= Math.Sqrt(n) code there’s quite a bit of plumbing needed because of the implicit conversions that take place all over, resulting in the use of a few convert nodes. Other than that, the code above doesn’t use anything new.

For simplicity when trying this out for yourself in .NET 4.0 and Visual Studio 2010, here’s the whole fragment:

```var to = Expression.Parameter(typeof(int), "to");
var res = Expression.Variable(typeof(List<int>), "res");
var n = Expression.Variable(typeof(int), "n");
var found = Expression.Variable(typeof(bool), "found");
var d = Expression.Variable(typeof(int), "d");
var breakOuter = Expression.Label();
var breakInner = Expression.Label();
var getPrimes =
// Func<int, List<int>> getPrimes =
Expression.Lambda<Func<int, List<int>>>(
// {
Expression.Block(
// List<int> res;
new [] { res },
// res = new List<int>();
Expression.Assign(
res,
Expression.New(typeof(List<int>))
),
// {
Expression.Block(
// int n;
new [] { n },
// n = 2;
Expression.Assign(
n,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// (n <= to)
Expression.LessThanOrEqual(
n,
to
)
// )
),
// break;
Expression.Break(breakOuter)
),
// {
Expression.Block(
// bool found;
new[] { found },
// found = false;
Expression.Assign(
found,
Expression.Constant(false)
),
// {
Expression.Block(
// int d;
new [] { d },
// d = 2;
Expression.Assign(
d,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// d <= Math.Sqrt(n)
Expression.LessThanOrEqual(
d,
Expression.Convert(
Expression.Call(
null,
typeof(Math).GetMethod("Sqrt"),
Expression.Convert(
n,
typeof(double)
)
),
typeof(int)
)
)
// )
),
// break;
Expression.Break(breakInner)
),
// {
Expression.Block(
// if (n % d == 0)
Expression.IfThen(
Expression.Equal(
Expression.Modulo(
n,
d
),
Expression.Constant(0)
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)
// }
),
// d++;
Expression.PostIncrementAssign(d)
// }
),
breakInner
)
),
// if
Expression.IfThen(
// (!found)
Expression.Not(found),
Expression.Call(
res,
n
)
)
),
// n++;
Expression.PostIncrementAssign(n)
// }
),
breakOuter
)
),
res
),
to
// }
).Compile();

foreach (var num in getPrimes(100))
Console.WriteLine(num);```
` `

# Inspecting the IL code

Obviously you want to see the generated code after calling Compile. Near the top of this post you saw how to do this using SOS, so I won’t repeat the recipe over here (just make sure to load the sos.dll file from the v4.0 installation folder instead), but below is the result:

!dumpil 023f2bc0
This is dynamic IL. Exception info is not reported at this time.
If a token is unresolved, run "!do <addr>" on the addr given
in parenthesis. You can also look at the token table yourself, by
running "!DumpArray 023f985c".

// res = new List<int>();
IL_0000: newobj 023ef050 is not a MethodDesc
6000003
IL_0005: stloc.0

// n = 2
IL_0006: ldc.i4.2
IL_0007: stloc.1

// n <= to
IL_0008: ldloc.1
IL_0009: ldarg.1
IL_000a: ble.s IL_000f

// “mumble, mumble”
IL_000c: ldc.i4.0
IL_000d: br.s IL_0010
IL_000f: ldc.i4.1
IL_0010: ldc.i4.0
IL_0011: ceq
IL_0013: brfalse IL_001d
IL_0018: br IL_0079

// found = false;
IL_001d: ldc.i4.0
IL_001e: stloc.2

// d = 2
IL_001f: ldc.i4.2
IL_0020: stloc.3

// d <= Math.Sqrt(n)
IL_0021: ldloc.3
IL_0022: ldloc.1
IL_0023: conv.r8
IL_0024: call 023ef9a0 is not a MethodDesc
6000004
IL_0029: conv.i4
IL_002a: ble.s IL_002f

// “mumble, mumble”

IL_002c: ldc.i4.0
IL_002d: br.s IL_0030
IL_002f: ldc.i4.1
IL_0030: ldc.i4.0
IL_0031: ceq
IL_0033: brfalse IL_003d
IL_0038: br IL_005c

// n % d == 0
IL_003d: ldloc.1
IL_003e: ldloc.3
IL_003f: rem
IL_0040: ldc.i4.0
IL_0041: ceq

IL_0043: brfalse IL_004f

// found = true;
IL_0048: ldc.i4.1
IL_0049: stloc.2

IL_004a: br IL_005c

// d++;
IL_004f: ldloc.3
IL_0050: stloc.s VAR OR ARG 4
IL_0052: ldloc.s VAR OR ARG 4
IL_0054: ldc.i4.1
IL_0056: stloc.3

IL_0057: br IL_0021

// if (!found)
IL_005c: ldloc.2
IL_005d: ldc.i4.0
IL_005e: ceq
IL_0060: brfalse IL_006c

IL_0065: ldloc.0
IL_0066: ldloc.1
IL_0067: callvirt 023f0548 is not a MethodDesc
6000005

// n++;
IL_006c: ldloc.1
IL_006d: stloc.s VAR OR ARG 5
IL_006f: ldloc.s VAR OR ARG 5
IL_0071: ldc.i4.1
IL_0073: stloc.1

IL_0074: br IL_0008

// return res;
IL_0079: ldloc.0
IL_007a: ret

Amazing, isn’t it? That’s it for now. Expect to see more expression tree fun in the future, in my continuation to the Control Library series of posts.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

From a hotel lobby in the sunny city of Durban, South-Africa, waiting for my plane transfer after a great TechEd Africa event. Why not write a blog post on one of my talks: the Managed Extensibility Framework, or MEF. As we steam ahead to release .NET 4.0, it’s great to see the amount of new APIs that will make it into this release. I had the opportunity to talk on three of those the past few days:

• The Dynamic Language Runtime (System.Dynamic) with dynamic support in C# and VB, and dynamic languages like IronPython and IronRuby.
• New additions for parallel programming (Task Parallel Library, PLINQ, Coordination Data Structures).
• The Managed Extensibility Framework (System.ComponentModel.Composition).

# Why does extensibility matter?

The world’s most successful APIs succeed in taking away developer frustration. Think of web services (“Add Service Reference”, anyone), LINQ (thanks to its unified query experience across domains) and many more. With extensibility, that’s no different:

That extensibility is a big deal for customers is no secret thing. Look at the success of add-ins within popular software packets: browsers, office productivity suites, developer tools, and even the tool I’m writing this post in: Windows Live Writer. Having a brand new installation of Windows 7, I haven’t installed something I’ll need further on in this post: the ability to paste code from Visual Studio in my post. Why didn’t the Windows Live Writer team think about that capability?

Honestly, that’s not a silly question to ask at all. But tell me, if 99% of users (okay, I’m magnifying things out a little bit) with a blog tend to write about how great the spaghetti was they consumed last night in one of their favorite bars, making pasting code from a tool as specific as Visual Studio doesn’t seem like it meets the bar of top feature requests right?

What if the Live Writer team would say: let’s not freaking care about such a feature ourselves but provide the ability for others to extend our product?

So what about ALT-PrntScrn, CTRL-V to illustrate my point?

Okay, I visited Windows 7’s new Paint utility for a short while to draw a red rectangle, but the above shows how the ability to paste code from Visual Studio is painfully missing but how the “Add a plug-in…” capability is the magic bottle of snake-oil waiting to be consumed. Clicking the link, I arrive on a web page with a bunch of extensions waiting for me to be added to my favorite blogging tool:

I find one plug-in that sounds promising, click Download, accept a few prompts and the installation is on its way. Seconds later, even without restarting Windows Live writer, I take another screenshot:

If you’re a true developer, you’ll have to admit being jealous on the Live Writer team. Don’t you want your application to be that pluggable as well? Sure you do! That’s why extensibility matters: customers love it, but it’s way too hard for developers to do and get right today. Guess what, that’s where MEF comes in…

# Our running sample

Boring as I am, my demos always go the minimalistic route: no UI candy, no contrived samples, plain basics using the purest black gold in the Windows and .NET universe: console applications. I bet my statistics of application development in Visual Studio by project type exceeds the 100:1 ratio for console applications versus anything else. So live with it or navigate away (please don’t).

Because I release “Hello World” would be too boring of a sample, let’s go with an extensible console-driven calculator. Be prepared to be shocked as I show the stunning result:

Little do you know that Add is a built-in operation, while the Pythagorean option was loaded from an extension provided by someone else that the calculator “core” developer. And what about MEF? It’s the secrete sauce that put the pieces of the puzzle together.

# Introducing FunCalc’s project structure

But before going into MEF terrain, let’s have a look at how we can make extensibility work without using MEF. Our goal? To show how much plumbing is involved in order to get such a seemingly simple thing off the ground. In order to be a bit architecturally hygienic, we’ll divide our application into a few components:

1. Our superb million dollar business UI, a console application project.
2. Extensibility interface(s), as a class library project.
3. Third party extensions, again as a class library project.

So we’ll go ahead and create a solution with three projects and the necessary cross-references between them: both the UI and the extension projects need to refer to the extensibility interface project to access the contract that’s established between the “host” and the “extenders”. Here’s what the interface will look like:

```public interface ICalculatorOperation
{
char Symbol { get; }
string Name { get; }
int Calc(int a, int b);
}```

How did I insert the code snippet above? Right, using the plug-in I installed just a few minutes before. Life is great and I’ll be delighted to give this level of flexibility to the users of my superior quality calculator as well.

What does the interface look like? Honestly, it’s already overkill as I seem to have cared a tiny little bit about presentation. Shame on me: the Symbol and Name get-only properties are all about visualization. The former one represents the infix operator symbol while the latter one is used to add a menu option (‘+’ for an “Add” operation, ‘#’ for a “Pythagorean” operation – don’t ask me why I chose that symbol but I reserve the “symbolic mathematical right” to choose whatever symbol I see fit :-)).

Besides the UI candy properties (which more UI-savvy developers would bloat with whole control-exposing properties, which would act as the labels on a button in the calculator UI) there’s some real hardcore stuff in the interface as well: Calc is the binary operator that takes two integer numbers and produces another one back. That’s where the operator’s implementation will live.

But first, the solution’s project structure. Don’t worry about the code files just yet but rather focus on the references between projects:

I’ve also cleaned up the references to assemblies I don’t need (like System.Data) to make the structure as clean as possible for sake of the demonstration. The startup project, FunCalc, is the Console Application project. It references the Interfaces class library assembly where the aforementioned interface ICalculatorOperation is defined. You can already go ahead and do that. Similarly, the Extensions assembly (which you should really think of as being created by a developer on the other side of the world in a separate solution altogether) refers to the Interfaces assembly. I’m keeping everything in one solution again for the sake of demonstration but you should envision extensions coming from tons of places just as raw DLL assembly files.

One final thing left to do in this setting is to provide a way for the extensions to become discoverable to the host, FunCalc.exe, application. In other words, the Extensions.dll file needs to be placed in a well-known location where our application will look for extensions. Again, in a real-world setting this would be done by providing a well-known folder location (maybe registered somewhere in the registry) that plug-in installers would copy their extension DLL file(s) to. All of that would distract us from our core mission here, so let’s use the tools at hand. Using a post-build event command on the Extensions project, I’ll copy the produced assembly to an AddIns subfolder underneath the build location for the FunCalc executable:

Don’t worry too much about this little bit of magic but a few give-aways: TargetPath contains the full path to the compiled file (Extensions.dll), while SolutionDir and SolutionName contain the full path to the solution root and the name of the solution (FunCalc, which happens to be the same as the Console Application’s project name, the only bit of magic we’re relying on here) respectively. I’m also hardcoding the bin\Debug folder, which could be pimped up a little bit if you were to do this for real. But the crux of it is that we’re copying the extensions DLL file to the AddIns folder next to the FunCalc.exe binary. Give it a try and have a look in Explorer:

Sweet. We’re ready to go.

# Designing the “UI”

Time to design and build the host application’s UI. In Program.cs, add the following code to the Main method for starters:

```Calculator calc = new Calculator();

while (true)
{
ICalculatorOperation op = AskUserInput(calc);
if (op == null)
return;

int a = ReadInt("a", x => true);
int b = ReadInt("b", x => true);
int res = op.Calc(a, b);

Console.WriteLine();
Console.WriteLine("{0} {1} {2} = {3}", a, op.Symbol, b, res);
Console.WriteLine();
}```

We don’t have the Calculator class just yet, but be patient for a little while. What’s the main loop do? It calls AskUserInput to show the menu with options and return the calculator operation the user chose. If the result is null, the user entered the exit option (0). To retrieve the operands to the selected (always to be binary/dyadic in our sample) operator, a call a helper function called ReadInt twice. The first argument will be the prompt, the second one a validator function (Func<int, bool>) to have ReadInt repeat its question till the validator passes (e.g. for range checks, see further). Finally we invoke the calculator’s selected operation by means of calling Calc and print the result with some fancy formatting resulting in infix notation using the operator’s Symbol property getter.

Let’s have a look at AskUserInput next:

```private static ICalculatorOperation AskUserInput(Calculator calc)
{
int i = 1;
foreach (var op in calc.Operations)
{
Console.WriteLine(i++ + ". " + op.Name);
}
Console.WriteLine();

int max = i - 1;

int choice = ReadInt("Your choice", x => x >= 0 && x <= max);

if (choice == 0)
{
return null;
}

return calc.Operations[choice - 1];
}```

As trivial as it can get. Some printing of a one-based menu based on a yet-to-be-defined Operations property on the Calculator object (see further), followed by another use of ReadInt to ask the user for a menu option, this time passing in a validator function using a concise lambda expression making only the range of menu options (including 0 for exit) valid. If the choice was 0, we return null to indicate we should exit, otherwise we select the operation of the user’s choice using an indexer operation.

Which brings us with ReadInt as the final helper function:

```private static int ReadInt(string prompt, Func<int, bool> isValid)
{
bool valid = false;
int choice = 0;
while (!valid)
{
Console.Write(prompt + ": ");
string input = Console.ReadLine();
valid = int.TryParse(input, out choice) && isValid(choice);
}

return choice;
}```

Simple once more. Keep on asking for user input given the specified prompt till the input is a valid integer. That’s all it takes to create the UI and you didn’t have to touch the mouse once. How good does life get?

# The Calculator type

Finally our host needs its Calculator type which will hold on to all the supported operations. You can envision it to be much more complex, potentially providing UI candy and additional functionality (like support for different operator arities, a graphical calculator unit or unit conversion modes – have  a look at the Windows 7 calculator for inspiration). But here is our minimalistic Calculator type:

```class Calculator
{
public List<ICalculatorOperation> Operations { get; set; }
}```

At this point, we can try to run the application. It will fail in grandeur:

Why? Because we didn’t compose operators with the Operations list. It’s still null and has never been assigned. A capital mistake but it turns out this is the place where MEF will come in: gluing things together.

But before we embrace MEF, let’s try to wire up things manually “the naive way”. And even before we do so, let’s make our calculator sellable by having built-in operations for Add, Subtract, Multiply and Divide.

# Built-in operations

Creating built-in operations is no rocket science: simply implement the interface. The built-in nature reveals where we’re going to define those operations: right inside the FunCalc console application project. Here’s a sample implementation:

```class Add : ICalculatorOperation
{
public char Symbol
{
get { return '+'; }
}

public string Name
{
get { return "Add"; }
}

public int Calc(int a, int b)
{
return a + b;
}
}```

Consult your books on higher-order mathematics to implement the extremely difficult Subtract, Multiply and Divide operations yourself. While you do so, I’ll go ahead and wire up those in the Main method as follows:

```Calculator calc = new Calculator();

while (true)
{```

And the LoadBuiltIns method looks as follows:

```private static IEnumerable<ICalculatorOperation> LoadBuiltIns()
{
yield return new Add();
yield return new Sub();
yield return new Mul();
yield return new Div();
}```

A trivial iterator which we turn into a List<ICalculatorOperation> higher up by calling LINQ’s ToList operator. Great team-work isn’t it? You gotta love contract-first interface-driven implementation, don’t you? And now, our not-yet-extensible calculator simply works:

Now on to making it extensible.

# Extensibility the naive way

While we already have a LoadBuiltIns method, we also want to be able to load extensions dynamically at runtime by looking in our AddIns folder and discovering types in assemblies that are suitable for use by our Calculator type. The way we do so is by scanning the whole directory for .dll files (a first mistake, since a .dll is not necessarily a .NET assembly), loading them (a mistake if we do it naively as we shall see), looking for types that implement the interface (could be a mistake too, because we may pick up types that accidentally implemented the interface but weren’t meant – explicitly that is – for composition), and create instances of those types by invoking the default constructor (it gets tedious, but this isn’t ideal too). Here’s what I came up with as the most naive very error-prone way of doing extensibility the .NET 1.0 way. In fact, let’s make it the .NET 3.5 way and be super-naive writing everything as a LINQ query:

```private static IEnumerable<ICalculatorOperation> LoadExtensions()
{
return from dll in System.IO.Directory.EnumerateFiles("AddIns", "*.dll")
let asm = System.Reflection.Assembly.LoadFrom(dll)
from t in asm.GetTypes()
where t.GetInterfaces().Contains(typeof(ICalculatorOperation))
select (ICalculatorOperation)Activator.CreateInstance(t);
}```

Obviously we also need to patch up the assignment to the Calculator object’s Operations property to include the extension objects:

```Calculator calc = new Calculator();

while (true)
{```

# Writing our first add-in

But does the magic above work? Let’s try. Add the following class definition to your Extensions project which has been set up to copy the resulting assembly to the AddIns folder before. That should be enough to pick it up right?

```public class Pyth : ICalculatorOperation
{
public char Symbol
{
get { return '#'; }
}

public string Name
{
get { return "Pythagorean"; }
}

public int Calc(int a, int b)
{
return (int)Math.Sqrt(a * a + b * b);
}
}```

Obviously you’ll have to import the Interfaces namespace where the interface lives, as referenced through the project reference to the Interfaces project. Build and run and lo and behold:

If it didn’t work, check a few things:

• Did you include the LoadExtensions result in the assignment to Operations?
• Was the rebuilt Extensions.dll copied correctly to the AddIns folder?
• Is your Pythagorean operation class declared as public?

# Reflecting on our solution

How do we like our current solution? Positive thing: it works. Negative thing: the implementation sucks. Why?

• Discovering extensions was a pain.
• Instantiation extensions was tedious.
• Wiring up the list of extensions to the Calculator is tedious too.

So, if we boil down things to the bare essence we end up with a few things we want to make easier to do:

• Contracts – The interface between various parties is something we can’t eliminate. Notice how general this concept is: things like WCF are contract-based too (address, binding, contract).
• Catalogs – What about making discoverability generic by means of a catalog concept? Such a catalog could be based on a directory search, amongst other sources.
• Composition – Why can’t wiring up parts of the application be as easy as throwing everything together and let the runtime figure out what fits where (e.g. operations should be added to the Operations collection).

That’s the essence of MEF: its three C’s:

# MEF = playing with blocks

To make those concepts a bit easier to grasp, imagine yourself playing with Lego blocks again (or admit you spend all your free time doing so nevertheless):

Blocks fit onto one another in a very particular way: there’s a contract between different blocks that determines whether they fit. Next, there’s a catalog where we can find all the blocks we can play with. It’s about discovering what parts exist. Finally, in an ideal world, we’d be able to buy a magic box, throw the parts in there, and out comes a giant structure where everything is fit together automagically:

That magic box is MEF. It’s a composition engine. No wonder the MEF logo looks as follows:

# MEF essentials

MEF is about give and take. The equation is plain easy:

Import is what our Calculator will do: it needs operations to play with. The operations on the other hand export themselves by saying: I have an operation available for you. MEF wires up the parts in order to reach a composed application. In a practical scenario the picture looks as follows:

The way to express the import/export relationship is done in a purely declarative way using custom attributes: ImportAttribute (with an ImportManyAttribute variant) and ExportAttribute.

# Putting MEF into action

Time to simplify our application using MEF. First of all, let’s get rid of all the code in LoadBuiltIns and LoadExtensions. More specifically, we want to get rid of this line of code:

`calc.Operations = LoadBuiltIns().Concat(LoadExtensions()).ToList();`

This is where we’ll put some MEF code in order to request composition to happen. But first, we’re going to use the MEF attributes to denote what’s importing functionality and what’s exporting functionality. Let’s start with export. In order to make this happen we need to import System.ComponentModel.Composition, the assembly where MEF lives. We do so in both the host application (FunCalc) and extensions (Extensions) projects:

Next, go to all of your operation types (Add, Sub, Mul, Div and Pyth) to put the Export attribute on their definitions as follows:

Notice you’ll have to import the System.ComponentModel.Composition namespace. The core essence of the ExportAttribute is to explicitly state to MEF that the type exports a certain contract (which can be based on an interface type):

```[Export(typeof(ICalculatorOperation))]
public class Pyth : ICalculatorOperation```

We do this for all of our ICalculatorOperation types, including the built-in ones. This allows us to have MEF take care of the composition for those built-in ones as well, taking the burden of doing the wiring-up ourselves away. On the other side of the equation we find the Import attribute where the exported parts will be accepted for composition:

```class Calculator
{
[ImportMany(typeof(ICalculatorOperation))]
public List<ICalculatorOperation> Operations { get; set; }
}```

We’re actually using ImportMany instead of Import in order to tell MEF that we’re doing a Many-to-One mapping here: our Operations property will contain a List containing all of the exported ICalculatorOperation parts it can find. This will ultimately include built-ins as well as extensions.

Now that we have attributed all of our implementers and the consumer, MEF is capable of wiring up everything for us as long as we instruct it to do so. Before getting into the domain of discovering parts, let’s do a manual composition first. The code to do this looks like this:

```Calculator calc = new Calculator();
```var container = new CompositionContainer();
container.ComposeParts(
calc,
new Add(), new Sub(), new Mul(), new Div()
);```

Make sure to import the System.ComponentModel.Composition and System.ComponentModel.Composition.Hosting namespaces in Program.cs for the code above to compile:

```using System.ComponentModel.Composition;
using System.ComponentModel.Composition.Hosting;```

When we run the application, sure enough our calculator again exposes the included operations through its UI. MEF went ahead and wired up the objects for us, something you could see in action when setting a breakpoint on the Operations property setter on Calculator:

Notice I’ve also changed the property to have a private setter. This looks far cleaner from an encapsulation point of view and MEF is perfectly capable of initializing through private setters (and even directly on fields) because of its reflection-based nature (and its intentional support for this type of scenario).

Though we haven’t seen how to discover parts yet, the power of MEF’s composition engine has become apparent in the sample above. Time to move on to the discoverability aspect next: catalogs.

# Catalogs in MEF

The concept of a catalog is what drives discoverability of parts in MEF. Various types of catalogs are built-in to MEF but one can create his own catalog easily by deriving from the ComposablePartCatalog class:

To get started with catalogs, we’ll use the TypeCatalog to abstract away from our manual hook up of the built-in operation instances:

```using (var catalog = new TypeCatalog(typeof(Add), typeof(Sub), typeof(Mul), typeof(Div)))
using (var container = new CompositionContainer(catalog)){
container.ComposeParts(calc);}```

How’s that any better I hear the reader wonder. If we still have to specify all the types, what are we saving here? In fact, using the TypeCatalog we’re leaving the burden of instantiating part objects to the MEF composition engine. More advanced features allow controlling the constructor invocation as well, so in essence we’re yielding more control to MEF here. But we can do better: what if we add a built-in operation to the calculator? Now we need to update the code above to include the implementing type to the catalog manually. Can’t we discover all types in the current assembly? It turns out we can using the AssemblyCatalog:

```using (var catalog = new AssemblyCatalog(typeof(Program).Assembly))
using (var container = new CompositionContainer(catalog))
{
container.ComposeParts(calc);
}```

Here we’re instructing MEF to search the whole current assembly, i.e. the one where the calculator’s Program type is defined in, for attributed types. For the types it found, it will take over the plumbing associated with instantiating the objects and obviously the rest of the composition as it did before.

Finally, we want to add the extensions to the mix. To do so, we need two more things. First of all, we need a DirectoryCatalog for MEF to search an entire directory for assemblies containing types that can be used for composition. This is the superior equivalent of our LoadExtensions iterator we played with before. Since we now are faced with two catalogs: one AssemblyCatalog for the built-ins and one DirectoryCatalog for the extensions, we need a way to compose those catalogs. That’s what the AggregateCatalog is for:

```using (var builtIns = new AssemblyCatalog(typeof(Program).Assembly))
using (var extensions = new DirectoryCatalog("AddIns"))
using (var catalog = new AggregateCatalog(builtIns, extensions))
using (var container = new CompositionContainer(catalog))
{
container.ComposeParts(calc);
}```

And, as you can probably imagine, taking this one step further and mixing in other directories or other types of catalogs is trivial to do using the AggregateCatalog: it simply acts as the union of catalogs.

# The result

We replaced a significant amount of plumbing code with five significant lines of MEF code and a few attributes here and there. For people extending our application, things are as easy as implementing the contract interface and sticking an attribute on the resulting type to opt-in explicitly to MEF composition. For us, the application creators, code got significantly easier to create an extensible application. Obviously, the resulting application still runs flawlessly:

In its true essence, MEF is a way to play with blocks or to brew soup (in the latter interpretation, one could argue it’s the Maizena plus in the picture below). We haven’t explore the concept of a CompositionBatch as mentioned below, so the figure below acts as a little stimulus for further exploration on your own :-).

Enjoy this nice new library in .NET 4! In the meantime, you can play around with MEF from http://mef.codeplex.com.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks