April 2008 - Posts

Introduction

Recently I delivered a session on Custom LINQ Providers - LINQ to Anything at the TechDays conferences in Ghent and Lisboa. The core of the session focuses on expression trees and translating those at runtime into a query language like CAML or LDAP (as the running samples). In this approach, IQueryable is just a minor implementation detail - or rather a vehicle to bring you closer to the query pattern established by the query operator methods. Last week, one of the attendees who got inspired by the session and started to write a query provider, asked me for some input on the best approach, referring to the little bullet indicated on my slide below:

image

 

IQueryable<T> or not?

So, do you really need IQueryable<T>? To say the least, IQueryable<T> is an interesting interface and so is IQueryProvider. Why? Well, first of all the IQueryable<T> interface itself doesn't have all of the query operators, instead it's being extended with those operators (and their implementation, wiring up expression trees) by the Queryable class's extension methods. You can somewhat think of it as an abstract base class; suffice to say it's the right balance in the tension between interface providers and implementers. In addition to this, the infrastructure provided by IQueryable is basically just a means to glue together pieces of an expression tree or to trigger execution of such an expression tree in IQueryProvider, e.g. in response to calling the First method.

Although providing much flexibility, the little caveat of the IQueryable approach lies in the fact that most providers don't need all of those methods, so most of the time you have to disappoint the user stating that something is not supported. When? At runtime, through NotSupportedException. And in lots of cases, there's no way to avoid this: think of a query predicate passed in to the Where method underneath the covers. If it contains a call to certain method (e.g. String.EndsWith) which can't be translated to the target language (e.g. CAML) you're out of luck and you won't find out till the code runs (the compiler doesn't call into a query provider, asking whether or not the generated expression tree will be supported at runtime).

However, on the higher level of query operators you can get some compile-time help by just partially implementing the query pattern. How does that work? Well, I did have the following slide, outlining query expressions are just another pattern provided by the language (just like foreach, using, lock, etc):

image

(The answer to the question above is click here.)

Right, the sample above is completely non-sense, but the compiler will be more than happy to translate it:

var res = new Homer().Where(x => x.HasSkateboard).Select(x => x.Sister).Father;

Take a look at the Where call, which resolves to the instance method Where on Homer, taking in a Func<Bart, bool>. In other words, the x in x => x.HasSkateboard is of type Bart, and HasSkateboard is a (not so far-fetched) property on Bart, returning a bool. Resolving the Where call: accomplished and returning an instance of Marge. The Select call can be inferred similarly, which is left to the reader.

 

Cooking you own query pattern

So, this is how you can create your own query pattern implementation by means of instance methods. A sample is shown below:

class Table<T> : IEnumerable<T>
{
   public Query<T> Where(Expression<Func<T, bool>> predicate) { ... }
   public ClosedQuery<R> Select<R>(Expression<Func<T, R>> projection) { ... }
   ...
}

class Query<T> : IEnumerable<T>
{
   public ClosedQuery<R> Select<R>(Expression<Func<T, R>> projection) { ... }
   ...
}

class ClosedQuery<T> : IEnumerable<T>
{
   ...
}

basically by providing an entry-point like Table<T> and further providing a fluent interface pattern that produces Query<T> objects in a chained manner. How you go ahead and expose the query operators on the source object and the query class purely depends on the composability of the query you want to provide, e.g. do you want a maximum of one Where call or do you allow more flexibility? Don't forget a Select method though since C# 3.0 requires the user to specify it; VB 9.0 doesn't but still generates the call which decouples the data source object from a resulting query object, no matter how trivial it is.

Notice in the sample above the use of ClosedQuery, which is a class that will keep all the expression tree information (which could be as easy as a few fields set by an internal constructor, containing the predicate and projection and whatever else you have captured using query operator methods) as well as a reference to the source the query is operating on. It doesn't contain further query operators though, to avoid multiple class to e.g. Select (which IQueryable<T> allows and which can be a bit challenging to implement - why?).

The core thing in the code above however is the type of the parameters to the query operator methods: Expression<T>, since you want an expression tree for runtime parsing and translation. And for the parser itself, obviously you want to short-circuit things in the degenerate cases such as Table<T> and Query<T> to fallback on a common implementation - people can still write new Table<string>().Where(x => x.Length == 5) and iterate over it, which should just work.

 

Compiler errors

Assuming you have the code above, which you can implement trivially as throwing NotImplementedException or returning null, let's try to write the following query:

var res = from x in new Table<string>()
          where x.Length > 5
          select x;

This should compile just fine. But if we write:

var res = from x in new Table<string>()
          where x.Length > 5
          orderby x.Length descending
          select x;

we'll see this compile-timer error:

image

which clearly points at the problem by using the words "query pattern".

 

Conclusion

When do you switch to this approach instead of using IQueryable<T>? As usual, the answer is: it depends. I'd say, if you just implement Where and Select and maybe some others like Take and First, it might not be worth to go all the way with IQueryable<T>. When ordering comes into play, things get a little more tricky with OrderBy and ThenBy calls (with their descending variants). Also, if you want to support multiple calls to query operators, IQueryable<T> might be beneficial, think of cases where users write "client-side views":

var expensive = from x in new Table<Product>()
                where x.UnitPrice > 123
                select x;

var res = from x in expensive
          where x.Name.StartsWith("C")
          select x;

In fact, for LINQ to AD, I still have the pending work item on my list to switch over to a custom implementation of the query pattern rather than using IQueryable<T>, but since it started as a blog series entitled "The IQueryable Tales", I have a good historical excuse for its current structure. LINQ to SharePoint on the other hand supports quite a bit of query operators (Where, Select, OrderBy*, ThenBy*, GroupBy, Take, First, TakeWhile) which makes it a better candidate for an IQueryable<T>-based implementation.

Happy querying!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

It's been a while since I continued my series on a functional pattern matcher in C#. I finally found some time to extract the simplified pattern matching code from the bigger project I'm working on and cook up a downloadable documented sample. Without further delay: here it is.

image

It contains most of the matching techniques outlined in this blog series:

  • NewExpression (with metadata on constructor arguments)
  • MemberInitExpression
  • ConstantExpression (the degenerated case of a classic switch)
  • ParameterExpression (the degenerated case of a type switch)
  • ListInitExpression

Obviously there are a bunch of restrictions, namely:

  • No support for nested constructs, e.g. a match on a list of persons with parameters in the Person object expressions
  • No caching of results (cf. the Fibonacci sample)
  • Optimizations

The code comes with a set of samples showing the supported matches in practice. There's more fun stuff to come but timing is still undecided, so watch out :-). Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

I love my blog readers. Of course purely in a professional sense but still. Before I continue, let me point out to my much beloved audience on the other side of the RSS channel that this post isn't particularly interesting. So, if you're looking for hardcore technical stuff, I'd strongly recommend to ignore my post for one time.

So what brings me to write down this love declaration all of a sudden? A flashback: in late 2003 I started to blog on blogs.bartdesmet.net. I already had my domain name but no particularly interesting homepage (and the color scheme was a highly controversial topic of discussion amongst some of my friends). In fact, I've never been a big fan of homepages with photos of summer barbecues, tales about pets and other personal trivia. So I wanted to take a different approach and started to maintain a technical blog. Having been involved in some local projects that required a web server I was so kind to maintain (first Windows 2000 Server, later Windows Server 2003), it was a luxury for me to take some of the machine's web space and point my domain to it. At that time, we weren't really bandwidth constrained, being housed on a fiber network in a semi-professional housing environment.

But luxury is a volatile phenomenon. Time passed by, the project moved to other housing environments and I was no longer involved in the web server's maintenance (which was honestly a bit of a relief too). However, my blog was in the position of becoming homeless which wasn't a pretty thought given the number of links to it and of course all the time I invested to write up my sometimes interesting posts. So I was looking out for some affordable hosting company - which will remain unnamed - providing the latest and greatest platform (Windows Server 2003 and SQL Server 2005 at the time) with enough space to hold my contents. At the time I had about 250 MB of content + a 100 MB SQL Server database, so 1 GB of web space and 250 MB of SQL seemed to be more than enough. Today the size occupied has almost doubled as illustrated below:

Web space SQL Server space
image image

The thing I paid the least attention to though was the bandwidth provided by the hosting company, at that time I believe 60 GB per month. Who needs 60 GB right? I was a little surprised initially to find out there was - if I remember correctly - about 30 GB being transferred the first month at my new hoster. History only goes back to November 07, where it climbed to approximately 40 GB a month:

image

I was still fine and the bandwidth limits were even increased to a stunning 80 GB. However, ever since I started using Windows Live Writer I've been adding more and more screenshots to my posts (it so easy to upload dude), most of these at a large size to avoid the annoying "click to enlarge" catch-phrase every two lines (my policy is to fall back to the latter thumbnail approach whenever an image isn't part of the regular flow of the post but acts purely as an illustration which is "nice to see" but not strictly needed). So, after a few popular blogging series lately (C# 3.0, VB 9.0 and PowerShell 2.0 feature focuses and my recent ramblings on functional pattern matching in C# - to be continue) I was still a bit surprised to see my stats for this month so far:

image

This is why I love my readers: they make electrons travel the globe to deliver my writings to their brains. I never thought to become bandwidth constrained, but 63 GB in little over one half of the month is a bit too much given the limit of 80 GB a month. So this called for immediate action and luckily I have some additional space at bartdesmet.info which I bought recently to play around with IIS 7.0 (I said my hoster is on the cutting bleeding edge of technologies!) and to prepare moving my blog over to IIS 7.0 eventually.

The plan is simple: move over images to the second host, leave app stuff on the first host. One of the first things that brought up nice experiences was the re-encounter with the amazingly fast (not!) FTP protocol when dealing with small files: copying about 100 MB of images from one place to the other took several hours. One thing remaining is to update all pointers to images for which I considered different approaches:

  • Handle all .jpg and .png files using an HTTP Handler and redirect to the new location - would work in IIS 7.0 but my old host is still on IIS 6.0 so the metabase would require a change which involves the ISP's goodwill.
  • Tweak the SQL Server database to "replace" (although the REPLACE T-SQL function wouldn't work here - storage for post bodies in CommunityServer seem to be ntext fields) old links with new ones.
  • Write a Community Server module.

I went for the last approach, at least temporarily, since it doesn't involve touching the database and should take effect immediately. It goes roughly like this:

using System;
using System.Text;
using System.Xml;
using CommunityServer.Blogs.Components;
using CommunityServer.Components;

namespace BartDeSmet.Net
{
    public class CSRewriteImageLinks : ICSModule
    {
        public void Init(CSApplication csa, XmlNode node)
        {
            csa.PreRenderPost += new CSPostEventHandler(csa_PreRenderPost);
        }

        void csa_PreRenderPost(IContent content, CSPostEventArgs e)
        {
            if (e.ApplicationType == ApplicationType.Weblog && e.Target == PostTarget.Web)
            {
                WeblogPost post = content as WeblogPost;
                if (post != null && post.PostLevel == 1)
                {
                    StringBuilder sb = new StringBuilder();
                    sb.Append(post.FormattedBody);
                    sb.Replace("http://bartdesmet.info/images", "http://bartdesmet.info/images");
                    sb.Replace("http://bartdesmet.info/images_wlw", "http://bartdesmet.info/images_wlw");
                    post.FormattedBody = sb.ToString();
                }
            }
        }
    }
}

Quite brute force - though still a bit gentle using the StringBuilder (and web infrastructure caching I'd assume) - but functional. Fingers crossed... Oh and obviously I've changed my Live Writer settings to upload files to my second host (click to enlarge <g>):

image image

Alright, with this post I violated my own policy not to nag about personal trivial but one sin should be acceptable, no? At the very least I did publish some lines of C# code...

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

After my previous little European tour visiting Ghent and Lisboa talking about LINQ, Parallel FX Extensions, Windows PowerShell 2.0 and WPF, I'm looking forward to meet the European audience again at DevDays 2008 Amsterdam. I'm especially thrilled about this one since it's my first speaking opportunity at this conference though I've been living in a radius of a few 100 kilometers for the first 24,5 years of my life. The speaker list looks impressive, just a few big names: Rafal Lukaweicki, Daniel Moth and Mike Taulty, Ingo Rammer, Dan Fernandez, David Platt, the U2U triplet Peter, Patrick and Jan. This time I'll be delivering purely on LINQ, especially (agenda subject to change):

  • LINQ Inside Out - Thursday 10:50-12:00 - Want to know what really happens when you execute LINQ queries? Join us as we enter the inner workings of LINQ and see how the compiler translates LINQ query expressions into standard query operators, while digging into iterators that make LINQ to Objects tick. Learn exactly when query evaluation is deferred and how lambda expressions and closures work together to enable LINQ’s elegant syntax. After investigating LINQ to Objects, we’ll switch gears to dive into LINQ to SQL, which results in radically different translations as we dig into the details of IQueryable and expression trees. Finally, we explore language-specific LINQ features, including XML literals in VB 9.0.
  • Creating Custom LINQ Providers - Friday 13:30-14:40 - LINQ is all about unifying data access in a natural language integrated way. But there’s more than just LINQ to Objects, LINQ to SQL and LINQ to XML. In this session, we put ourselves on the other side of the curtain and explore the wonderful world of LINQ providers. You’ll learn how to create a fully functional LINQ query providers allowing users to target your favorite query language using familiar LINQ syntax in C# 3.0 and VB 9.0: LINQ to AD, LINQ to SharePoint, LINQ to Outlook, you name it! This is your chance to get to know the inner workings of LINQ.

Basically the first session acts as the entry-point to LINQ at the conference. We'll be approaching the technology from top to bottom, starting with the LINQ syntax in C# 3.0 and VB 9.0, moving on to the translation patterns carried out by the compiler, to end up in the inner workings of the beast: iterators in LINQ to Objects and expression trees for implementations like LINQ to SQL. We'll go fairly deep, so ideally attendees have had some prior exposure to the concepts of LINQ.

My second session acts as the ultimate dive deep into LINQ Providers. More specifically, you'll see how expression trees are turned into execution by runtime-translation into an underlying query language. As running samples, we'll use LINQ to AD and LINQ to SharePoint, covering the art of developing query providers. This includes not only coverage of expression tree parsing but also practical guidance on translation patterns, tools integration, etc. Even if you're not planning to write a custom LINQ provider yourself, this session is the ideal "under the covers" session for LINQ.

 

from session in DevDays.Sessions
where session.Subject == "LINQ"
select new { session.Title, session.Speaker }

  • LINQ Inside Out - Bart De Smet
  • Understanding the ADO.NET Entity Framework - Mike Taulty
  • A Tour of LINQ to XML - Mike Taulty
  • Is LINQ to SQL your data access layer? - Anko Duizer
  • Creating Custom LINQ Providers - Bart De Smet

For more information on the agenda, take a look at the session list over here. There's tons of great content on various topics: hardcore debugging, Silverlight, Visual Studio 2008, MOSS and WSS, WPF/WCF/WF, smart client and mobile development, DLR stuff, data mining and cryptography, parallel extensions, VSTS, SQL 2008, etc. Too much to pick from :-).

 

Hope to see you in Amsterdam!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

In the last handful of posts in this series we've been looking at ways to match a whole set of patterns, including:

There's not that much left to apply (meaningful) matches for (feel free to think of others of course) so from this post on, we'll change the theme and start to focus a little more on translation of pattern matches into various executable forms. So far, we've been considering interpretation and a lightweight form of compilation by invoking the lambda expression compiler of .NET 3.5. We already saw in part 3 that the overall performance of a compiled pattern match is very reasonable given the fact we're faced with more work at runtime but once parsed and compiled, execution goes amazingly fast. So, we won't focus much on performance tuning again for now, but rather analyze the translation mechanism employed.

 

Where we're at right now

Our current form of compilation basically consists of this:

  • The Pattern<T> object aggregates pairs of match clause expression trees and match body delegates into MatchEntry objects.
  • Compilation is carried out on individual MatchEntry objects.
  • Each MatchEntry, when compiled, is turned into an efficient matcher and invoker:

    internal class MatchEntry
    {
        public LambdaExpression Match { get; set; }
        public Delegate Action { get; set; }

        public Func<object, bool> CompiledMatch { get; set; }
        public Func<object, T> CompiledInvoker { get; set; }

        internal void Compile()
        {
            ...
        }
    }

Although this works fine, we still do have some plumbing around that's looking a bit ugly. In order to match an object, we cycle through the list of MatchEntry objects and evaluate the compiled matcher. If it evaluates true, we hand off the object to the compiled invoker and return the resulting object with type T.

 

On to a more functional matcher

The interesting thing in the previous observation is a single letter: 'T'. Indeed, the pattern matcher returns an object of type T representing the result. In other words, it's a (fairly) complex function, or stated otherwise a valued expression. From this point on, it really matters what camp you're in:

  • Pattern matches as switch statements - or -
  • pattern matches as switch expressions

The latter obviously doesn't exist in C# (and many other languages of course) since switches are statements. In other words, they are used for their interesting side-effects (which might or might not set one piece of state, yielding some kind of hidden function). Switching to an expression type of switch would conceptually look like this:

int? age = switch (name)
{
    case "Bart":
        25;
        break;
    case "John":
        52;
        break;
    default:
        null;
        break;
}

It looks weird (hence the qualifier "conceptually") but here's another hypothetical form that looks more functional:

int? age = switch (name)
{
    case "Bart" => 25;
    case "John" => 52;
    default => null;
}

A similar approach could be taken for an if statement turning it into an expression:

int? age = if (name == "Bart")
    25;
else if (name == "John")
    52;
else
    null;

but if you look carefully we already have this kind of thing with the ternary operator:

int? age = (name == "Bart" ? 25 : (name == "John" ? 52 : null))

which can be nested arbitrarily to form a nice expression. Taking it back to the switch, you could see it as:

int? age =
(
    name == "Bart" ? 25 : (
    name == "John" ? 52 :
    null)
)

By the way, to make this really work in C#, you'll have to give the type inference machinery a hand like this (cf. CS0173):

int? age =
(
    name == "Bart" ? 25 :
    name == "John" ? 52 :
    (int?) null
)

(nullables strike again). More important though is the fact we can omit parentheses in the above because of the ternary operators right associativity. Now, let's generalize a bit:

T result =
(
    condition-expr-1 ? result-expr-1 :
    condition-expr-2 ? result-expr-2 :
    ...
    condition-expr-n ? result-expr-n :
    else-expr
)

Wait a minute, don't these conditions represent match clauses? Precisely. The only thing we need to change is the fact the match body needs to become an expression rather than a statement. So ultimately our match entry should comprise the following two:

    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

The remaining question is how we should chain those pieces together.

 

ConditionalExpression

We've already picked our target (ternary operators), but we need to know what it looks like under the covers. The simplest way to find out about it is the following C# one-liner:

Expression<Func<bool,int>> cond = b => b ? 1 : 0;

By now you should already be prepared to see (and filter out mentally) a ParameterExpression, two ConstantExpressions and one LambdaExpression for the whole thing. The body of the lambda is what interests us and geeks as we are, let's rely on ILDASM once more:

IL_0000:  nop
IL_0001:  ldtoken    [mscorlib]System.Boolean
IL_0006:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_000b:  ldstr      "b"
IL_0010:  call       class [System.Core]System.Linq.Expressions.ParameterExpression [System.Core]System.Linq.Expressions.Expression::Parameter(class [mscorlib]System.Type,
                                                                                                                                               string)
IL_0015:  stloc.1
IL_0016:  ldloc.1
IL_0017:  ldc.i4.1
IL_0018:  box        [mscorlib]System.Int32
IL_001d:  ldtoken    [mscorlib]System.Int32
IL_0022:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_0027:  call       class [System.Core]System.Linq.Expressions.ConstantExpression [System.Core]System.Linq.Expressions.Expression::Constant(object,
                                                                                                                                             class [mscorlib]System.Type)
IL_002c:  ldc.i4.0
IL_002d:  box        [mscorlib]System.Int32
IL_0032:  ldtoken    [mscorlib]System.Int32
IL_0037:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_003c:  call       class [System.Core]System.Linq.Expressions.ConstantExpression [System.Core]System.Linq.Expressions.Expression::Constant(object,
                                                                                                                                             class [mscorlib]System.Type)
IL_0041:  call       class [System.Core]System.Linq.Expressions.ConditionalExpression [System.Core]System.Linq.Expressions.Expression::Condition(class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                 class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                 class [System.Core]System.Linq.Expressions.Expression)

IL_0046:  ldc.i4.1
IL_0047:  newarr     [System.Core]System.Linq.Expressions.ParameterExpression
IL_004c:  stloc.2
IL_004d:  ldloc.2
IL_004e:  ldc.i4.0
IL_004f:  ldloc.1
IL_0050:  stelem.ref
IL_0051:  ldloc.2
IL_0052:  call       class [System.Core]System.Linq.Expressions.Expression`1<!!0> [System.Core]System.Linq.Expressions.Expression::Lambda<class [System.Core]System.Func`2<bool,int32>>(class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                                                        class [System.Core]System.Linq.Expressions.ParameterExpression[])
IL_0057:  stloc.0
IL_0058:  ret

The mechanism is surprisingly simple again thanks to the composable nature of the System.Linq.Expressions namespace types. Here's the C# equivalent of the above:

var b = Expression.Parameter(typeof(bool), "b");
var cond = Expression.Lambda(
    Expression.Conditional(
        b,
        Expression.Constant(1),
        Expression.Constant(0)
    ),
    b
);

The remaining thing for us to do is to map the entire matcher onto on big nested Expression.Conditional wrapped in a lambda. What does the lambda need to take in? Obviously the object being matched, so ultimately we'll be able to take the entire expression for the pattern matcher, compile it using the Compile method, and have a single delegate of type Func<object, T> that does the whole matching for us using efficient IL.

 

Revamping Pattern<T>

Our previous incarnation of Pattern<T> was rather quick and dirty in a sense that it didn't really enforce the fluent pattern at compile time. What we really want is a chain of Match calls that render a useless partial pattern which gets completed by adding an Else clause. Once that's done, the pattern can be compiled (for which different strategies can be considered, either we do it straight away - as shown below - or we require the user to trigger the compilation explicitly, e.g. at the first application of the pattern match on an object or by a separate Compile method call). So we'll start redefining our Pattern<T> class implementing the fluent interface pattern:

class Pattern<T>
{
    private List<MatchEntry> _entries = new List<MatchEntry>();

    public Pattern<T> Match<TR>(Expression<Func<TR>> clause, Expression<Func<T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, TR>(Expression<Func<T1, TR>> clause, Expression<Func<T1, T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, T2, TR>(Expression<Func<T1, T2, TR>> clause, Expression<Func<T1, T2, T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, T2, T3, TR>(Expression<Func<T1, T2, T3, TR>> clause, Expression<Func<T1, T2, T3, T>> body)
    {
        return MatchInternal(clause, body);
    }

    private Pattern<T> MatchInternal(LambdaExpression clause, LambdaExpression body)
    {
        _entries.Add(new MatchEntry() { Clause = clause, Body = body });
        return this;
    }

    public ClosedPattern<T> Else(Expression<Func<T>> @else)
    {
        return new ClosedPattern<T>(_entries, @else);
    }
}

The MatchEntry class now looks like this:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }
}

Notice how the Else method returns a ClosedPattern which represents a pattern that has all the information required to start applying the pattern to objects:

class ClosedPattern<T>
{
    private List<MatchEntry> _entries;
    private LambdaExpression _else; 

    private Func<object, T> _pattern;

    internal ClosedPattern(List<MatchEntry> entries, LambdaExpression @else)
    {
        _entries = entries;
        _else = @else;

        Compile();
    }

    private void Compile()
    {
        //
        // Compilation goes here.
        //

    }

    public T Execute(object o)
    {
        return _pattern(o);
    }
}

The most notable differences with the previous implementation are indicated in bold and outline the fact we're now fully expression-based.

 

Compilation of ClosedPattern<T>

At the core of our new pattern matcher implementation is the compilation of the closed pattern. Basically we have information representing something like:

T result =
(
    condition-expr-1 ? result-expr-1 :
    condition-expr-2 ? result-expr-2 :
    ...
    condition-expr-n ? result-expr-n :
    else-expr
)

where all the condition and result variables contain a reference to the object passed in. The way we match and map results is irrelevant for now, let's just assume we have a Func<object, bool> for each match clause and a Func<object, T> for each match body, which all live in peace inside the list of MatchEntry objects. In addition we have the lambda expression for the else clause. The way to thread things up is by starting at the bottom (in pseudo-code):

Expression pattern = else;

=>

pattern <- Expression.Condition(condition-expr-n, result-expr-n, pattern)

=>

...

=>

pattern <- Expression.Condition(condition-expr-1, result-expr-1, (..., Expression.Condition(condition-expr-n, result-expr-n, pattern)...))

Assuming we have this rewritten MatchEntry class:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

    public Expression Match { get; private set; }
    public Expression Map { get; private set; }

    public void Compile(ParameterExpression input)
    {
        //
        // Compilation goes here.
        //

    }
}

we can now write our compilation for the pattern like this:

private void Compile()
{
    ParameterExpression input = Expression.Parameter(typeof(object), "o");

    Expression pattern = _else.Body;

    foreach (var entry in Enumerable.Reverse(_entries))
    {
        entry.Compile(input);

        pattern = Expression.Condition(
            entry.Match,
            entry.Map,
            pattern
        );
    }

    _pattern = Expression.Lambda<Func<object,T>>(pattern, input).Compile();
}

This precisely captures the algorithm outlined above, building up the nested conditional expressions bottom-up. Ultimately, one pattern gets translated into on single delegate.

 

Compilation of MatchEntry objects

The remaining pieces of the puzzle are the Match and Map properties on MatchEntry:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

    public Expression Match { get; private set; }
    public Expression Map { get; private set; }

    public void Compile(ParameterExpression input)
    {
        //
        // Compilation goes here.
        //

    }
}

The Compile method needs to take Clause and Body and translate those into Match and Map respectively. The types represented by the expression objects will make things clear: Match is an expression producing a Boolean determining whether or not the input object (represented by ParameterExpression input) matches the entry; Map on the other hand takes a (matched) object and turns it into the output type T.

What's even better is we can reuse most of the original code of the unification engine. Below you can see the changes needed to our current Compile method in order to generate the new match expression (omitted a bunch of calls for specific expression types besides NewExpression):

internal void Compile(ParameterExpression input)
{
    //
    // Lambda input parameter.
    //
    ParameterExpression o = Expression.Parameter(typeof(object), "o");

    //
    // Created target type.
    //
    Type target = ClauseMatch.Body.Type;

    //
    // List of bindings.
    //

    var bindings = new Dictionary<ParameterExpression, Expression>();

    NewExpression ne;
    ...

    Expression match;
    if ((ne = ClauseMatch.Body as NewExpression) != null)
        match = CompileNewExpression(oinput, target, ne, bindings);
    ...
    else
        throw new
NotSupportedException("Unsupported expression type in match clause.");

    //
    // Store Compile match predicate.
    //
    CompiledMatch = Expression.Lambda<Func<object, bool>>(match, o).Compile();

The second part took care of the mapping part to execute the body. Again, we eliminate compilation on a per-MatchEntry basis and simply keep the expression tree around. It's just a little trickier because originally we did have a delegate called Action representing the match body. However, not this is represented as an expression tree too which represents the lambda expression that takes in the extracted values and produces the result of type T. Since the Body is simply a LambdaExpression now, we can feed it into Expression.Invoke straight away:

    //
    // Conversion from bound object to the target type.
    //

    Expression me = Expression.Convert(o input,  target);

    //
    // Create list of unbound parameters.
    //

    Expression[] args = new Expression[ClauseMatch.Parameters.Count];
    int j = 0;
    foreach (ParameterExpression param in ClauseMatch.Parameters)
    {
        //
        // Parameters need to be bound in the match expression.
        //

        Expression map;
        if (!bindings.TryGetValue(param, out map))
            throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

        //
        // Null for identity checks; grab from identified property otherwise.
        //

        args[j++] = map ?? me;
    }

    //
    // Store Compile invoker function.
    //

    Expression invoker Map = Expression.Invoke(BodyExpression.Constant(Action), args);
    CompiledInvoker = Expression.Lambda<Func<object, T>>(invoker, o).Compile();
}

Essentially we've been collecting trees in the MatchEntry's Compile method invocations and put them on a nice row inside the ClosedPattern's Compile method in order to create a pattern matching avenue.

 

Testing it

Let's take this sample again:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point() { X = x, Y = x }, x => true)
    .Else(() => false);

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    Point p = new Point(rand.Next(1, 5), rand.Next(1, 5));
    if (bisect.Execute(p))
        Console.WriteLine(p);
}

Exercise: what's the type of bisect in the code above?

With the help of one breakpoint and the Immediate Window we can figure out what this translates into:

image

VB'ers will recognize IIF as the equivalent for the ternary operator but in a prefix form (actually, this is a lie in terms of VB - IIF is not a truly ternary operator, it's a function - VB 9.0 has a new If operator that's a real ternary operator). We just see one single ternary here, but you can see the whole thing working. First the condition:

((o Is Point) && Equals(Convert(Convert(o).X), Convert(Convert(o).Y)))

is

o is Point && ((Point)o).X == ((Point)o).Y

which represents the condition imposed by

(int x) => new Point() { X = x, Y = x }

The true part of the ConditionalExpression is:

Invoke(x => True,Convert(o).X)

which stands for

(x => true)(((Point)o).X)

where the first part represents a lambda and the second part calls it with argument "point's X value".

And finally the else got translated into False, which clearly stands for false.

Exercise: Map the following ToString output of the internal pattern expression tree representation onto the corresponding high-level pattern match (trivial one): "IIF((Convert(o) = 1), Invoke(() => False), IIF((Convert(o) = 2), Invoke(() => True), IIF((Convert(o) = 3), Invoke(() => False), IIF((Convert(o) = 4), Invoke(() => True), False))))"

You can imagine what mixed pattern matches will start to look like, don't you?

 

1 pattern match => 1 delegate

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

In our last encounter on the pattern matching mission we covered techniques to match T[] and List<T>. Today we cover another type that's being use a lot: dictionaries (the generic brothers of Hashtable which you could match too, exercise). I've already shown an example of such a match:

.Match((int bartAge, int johnAge) => new Dictionary<string, int>() {
    { "Bart", bartAge },
    { "John", johnAge }
}, (bartAge, johnAge) => { ... })

In our discussion we noticed it would be best not to constrain the number of entries in the dictionary based on the number of entries in the match clause. This is fairly similar to matching objects through object initializers where we just match the specified properties (comparable to keys) to have certain properties (comparable to values). If the reader feels the matching should be more restrictive, she should feel free to implement so. In addition to this, we do mandate the specified keys to be present since otherwise we don't know what value to extract and additionally, the matches can only be put on the keys while the extractions take place on the values (keys are unique, values aren't necessarily). Putting everything together, this is the skeleton of the sample's translation:

IDictionary<string, int> dictionary = o as IDictionary<string, int>;
int bartAge, johnAge;
if (dictionary != null && dictionary.TryGetValue("Bart", out bartAge) && dictionary.TryGetValue("John", out johnAge))
{
   ...
}

I'm using IDictionary over here to boost flexibility (I could have done the same in the previous post with IList<T> instead of List<T>) but that's just a minor detail. However, the above can't be translated one-on-one in our pattern matcher: the out keywords are problematic in the context of expression trees. Instead we'll stick with the more redundant equivalent:

IDictionary<string, int> dictionary = o as IDictionary<string, int>;
if (dictionary != null && dictionary.ContainsKey("Bart") && dictionary.ContainsKey("John"))
{
   int bartAge = dictionary["Bart"];
   int johnAge = dictionary["John"];
   ...
}

From a language point of view, initializing a Dictionary using collection initializers is not at all different from doing the same on say a List or an array; the core difference is the fact an Add method with two operandi is called. In other words, we're facing another of those ListInitExpressions, so we can fork the code introduced last time:

private Expression CompileListInitExpression(ParameterExpression o, Type target, ListInitExpression lie, Dictionary<ParameterExpression, Expression> bindings)
{
    if (lie.NewExpression.Arguments.Count != 0)
        throw new NotSupportedException("Collection initializers in match expressions should use the default constructor.");

    Type t = lie.Type.GetGenericTypeDefinition();
    if (t == typeof(List<>))
        return CompileListInitExpressionForListOfT(o, target, lie, bindings);
    else if (t == typeof(Dictionary<,>))
        return CompileListInitExpressionForDictionaryKonV(o, target, lie, bindings);
    else
        throw new NotSupportedException("Only List<T> and Dictionary<K,V> are supported.");
}

Don't bother about my rather verbose method naming, the inside of the method is much more appealing. The pattern is basically the same as before:

  1. Emit a type check
  2. Unify and extract all list entries
  3. Coalesce bindings to the same parameters and aggregate those with the check

Let's go there step by step. First the type check (trivial):

private Expression CompileListInitExpressionForDictionaryKonV(ParameterExpression o, Type target, ListInitExpression lie, Dictionary<ParameterExpression, Expression> bindings)
{
    Expression check = Expression.TypeIs(
        o,
        target
    );

Followed by the iteration over all initializers, which is where things get interesting. The Add method in Dictionary<K,V> takes two parameters: one of type K and one of type V. The compiler will have done all the type checking already to make sure types are compatible and such, and let's assume this overload of Add is the only one in reach. Here's the first part of the loop:

    int i = 0;
    foreach (var e in lie.Initializers)
    {
        //
        // Note: We know we're processing Dictionary<K,V>, which only has one Add overload,
        //       so we omit additional checks for now.
        //
        Expression keyArg = e.Arguments[0];
        Expression valArg = e.Arguments[1];

        ConstantExpression key = keyArg as ConstantExpression;
        if (key == null)
            throw new NotSupportedException("Keys for dictionaries used in pattern matches should be constant expressions.");

        check = Expression.AndAlso(
            check,
            Expression.Call(
                Expression.Convert(
                    o,
                    target
                ),
                target.GetMethod("ContainsKey"),
                key
            )
        );

First we make sure the expression used as the first parameter, which represents the key, is a constant. This restriction simplifies matters and makes much sense too since we don't want to get in the nasty business of reverse mapping (which key belongs to a given value?). Once we've done this sanity check, we extend the check with a call to ContainsKey on the dictionary (watch out for the cast, which is safe since we've already done a TypeIs before). The call is simply taking in the key expression as we got it.

After the key comes the lock value. Here we allow more flexibility in our usual style. In fact, this is one of those places where recursive unification would come in handy, allowing you to write matches like:

.Match((string name, int age) => new Dictionary<int, Person>() {
    { 12345, new Person() { Name = name, Age = age} }
}, (name, age) => { ... })

Conservative as we are, let's stick with ParameterExpression and ConstantExpression for the scope of this post:

        ParameterExpression pe;
        ConstantExpression ce;

        if ((pe = valArg as ParameterExpression) != null)
        {
            if (bindings.ContainsKey(pe))
            {
                check = Expression.AndAlso(
                    check,
                    GetEqualityCheckForDictionaryEntry(o, target, key, bindings[pe])
                );
            }
            else
                bindings[pe] = Expression.Call(
                    Expression.Convert(
                        o,
                        target
                    ),
                    target.GetMethod("get_Item"),
                    key
                );
        }
        else if ((ce = valArg as ConstantExpression) != null)
        {
            check = Expression.AndAlso(
                check,
                GetEqualityCheckForDictionaryEntry(o, target, key, ce)
            );
        }
        else
            throw new
NotSupportedException("Can only match constants.");

        i++;
    }

    return check;
}

The core differences with the previous post are outlined in bold. It's becoming a pattern... and yes, in a recursive descent-ish parser approach this kind of match would just occur once while the leaf nodes would be supplied based on carried type data (e.g. the extraction expression for an array is different from that for a property or a dictionary entry). Again, we can use get_Item to call the dictionary object's indexer, passing in the key and giving us a binding. The remainder cases need some equality checks to match constants and parameters respectively. Matching constants in this context doesn't make very much sense but that's a matter of taste:

.Match((int johnAge) => new Dictionary<string, int>() {
    { "Bart", 25 },
    { "John", johnAge }
}, (johnAge) => { ... })

would try to find an entry "Bart" with age set to 25 in order to allow a match for "John" to be found. If this looks awkward and too flexible, one can just comment out that part of the parser of course. Anyhow, here's the GetEqualityCheckForDictionaryEntry code:

private Expression GetEqualityCheckForDictionaryEntry(ParameterExpression o, Type target, ConstantExpression key, Expression e)
{
    return Expression.Call(
        typeof(object),
        "Equals",
        new Type[0],
        Expression.Convert(
            Expression.Call(
                Expression.Convert(
                    o,
                    target
                ),
                target.GetMethod("get_Item"),
                key
            ),
            typeof(object)
        ),
        Expression.Convert(
            e,
            typeof(object)
        )
    );
}

Again, straightforward: it just emits a call to Object.Equals, passing in the value retrieved through the indexer and the value specified as the expression. Need a sample?

var onBisectXY = new Pattern<bool>()
    .Match((int x) => new Dictionary<string, int>() { { "x", x }, { "y", x } }, x => true)
    .Else(() => false);

onBisectXY.Compile();

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    var p = new Dictionary<string, int> { { "x", rand.Next(0, 5) }, { "y", rand.Next(0, 5) } };
    if (onBisectXY.Execute(p))
        Console.WriteLine(p["x"] + "," + p["y"]);
}

producing:

image

A small matching exercise today. More stuff next time - stay tuned!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Monday morning: The Return of The Pattern Matcher. After an awesome weekend (well, a Saturday at least) plenty of sun here in Seattle, we'll dive into even more pattern matching fun. This time around we'll investigate ways to match collections. Last time we wrapped our heads around ways to match object initialization patterns but as many of you know, the latest language wave introduced richer collection initialization language features as well. You might want to check out my "C# 3.0 Feature Focus - Part 3 - Collection Initializers" post on the topic, but let's recap first.

Collection and object initializers - anything special?

In the pre C# 3.0 era, you could write stuff like this:

int[] primes = new int[] { 2, 3, 5, 7, 11 };

but something that ought to be richer, namely (generic) collections, had quite redundant noise as illustrated below:

List<int> primes = new List<int>();
primes.Add(2);
primes.Add(3);
primes.Add(5);
primes.Add(7);
primes.Add(11);

It's a pattern dude, and you know what language designers do with those kind of common patterns? Exactly:

List<int> primes = new List<int>() { 2, 3, 5, 7, 11 };

which internally translates into the old verbose code. Well, almost true... Take a look at the IL result (feel free to use much more handy tools - but way too advanced for me - like Reflector) - click to enlarge:

image 

Two assignments, huh? This illustrates the core difference between the old C# 2.0 fragment and the new C# 3.0 fragment: the battle of statements versus expressions. The following is an expression:

new List<int>() { 2, 3, 5, 7, 11 }

just like

new Person() { Name = "Bart", Age = 25 }

So, either you have an object that matches all the criteria of the initialization, or you don't have anything. That is, every time you look at me, I should be called "Bart" and have an age of 25 (till February 11th that is). Similarly, the collection of primes should always have 2, 3, 5, 7 and 11. So, you shouldn't ever see an intermediate state, hence the atomic assignment. I explained this behavior for object initializers in the past (C# 3.0 Object Initializers Revisited) but the same holds true for collection initializers.

The key take-away from this is the fact those are expressions, not abbreviated statement sequences (although internally, we ultimately rely on statements to get the work done, but on the surface it's only the result that counts).

Collection initializers in their expression tree costume

Expressions can be represented as expression trees (duh!) so how does it look like? A sample:

Expression<Func<List<int>>> primes = () => new List<int>() { 2, 3, 5, 7, 11 }

Notice the generic declaration on the left hand side is mandatory (read: no 'var') since lambda expressions don't have a pre-defined type: they are either delegates to anonymous methods or expression tree generators. The C# 3.0 compiler has a new error code for this: error CS0815: Cannot assign lambda expression to an implicitly-typed local variable. And no, you can't write:

Expression<var> primes = () => new List<int>() { 2, 3, 5, 7, 11 }

Anyway, we're more interested in what it gives us back. We could go the IL route, but let's make it less painful:

image

Hooray, a ListInitExpression. What does it consist of? Two things:

image

A NewExpression (where did we see this before?) and a set of Initializers which are of type ElementInit and represent the Add method calls. What about regular arrays? Somewhat similar:

image

but with its own dedicated type NewArrayExpression.

What do we want to match?

Anything to match? You bet! But how? Obviously, we'll have some limitations to deal with since - as usual - we need to trace back construction to runtime matching expressions. Let's make that concrete: consider the following:

.Match((int x) => new[] { x, 1 }, x => { ... })

means we're looking for an array with two elements, the second one being equal to 1, asking for the first one. Notice the wording: second one, first one. We're considering ordering, which gives us the power to match like this:

int[] array = o as int[];
if (array != null && array.Length == 2 && array[1] == 1)
{
   int x = array[0];
   ...
}

Thus, the indexer becomes our vehicle to match and extract (i.e. the opposite of Add). The same principle can be applied to List<T> which also has the pair of Add and indexer members. But what about the following?

.Match((int x) => new HashSet<int>() { x, 1 }, x => { ... })

No indexer this time since sets are unordered. So, what do we mean to say by this? Maybe a set of two elements that contains 1, binding the remainder element to x? Yes, but it gets quite complicated if you extend upon this:

.Match((int x, int y) => new HashSet<int>() { x, 1, 2, y, 2, 1, 3 }, x => { ... })

So, a set with 7 (?) elements, containing 1 and 2 both twice (?), as well as 3, the remainder elements being bound to x and y. Not really, sets can contain an element only once, so our matcher would emit redundant checks and a check for the length would be more difficult to infer (exercise: why do we need a check for the length?). So far, this might still be expressible in some higher-order matching condition (using set operations) but the real crucial problem is the set's unordered characteristic: there's no guarantee which of the two remaining elements will be bound to x and which one to y. Sets don't give any guarantees, so let's not try to rely on those non-existing things. If you really feel this is a severe restriction, you can cook your own matcher (exercise) for sets limiting the number of unbound variables to 1 (which is deterministic).

A more interesting one might be the following:

.Match((int bartAge, int johnAge) => new Dictionary<string, int>() {
    { "Bart", bartAge },
    { "John", johnAge }
}, (bartAge, johnAge) => { ... })

Notice how collection initializers allow to be layered on top of Add methods with more than one parameter (you can even mix different overloads for Add methods in one collection initializer). What would a match condition look like? This raises an interesting question: do we want to check for the number of entries (meaning, we only want to extract information from dictionaries with fixed lengths inferred from the initializer's usage) or do we want to allow additional flexibility (e.g. extract two names from a telephone book using pattern matching)? I'm tempted to opt for the latter one (exercise: why not for T[] or List<T>; consider pros and cons). So here's a sample match:

IDictionary<string, int> dictionary = o as IDictionary<string, int>;
int bartAge, johnAge;
if (dictionary != null && dictionary.TryGetValue("Bart", out bartAge) && dictionary.TryGetValue("John", out johnAge))
{
   ...
}

In this post, we'll focus on T[] and List<T> matching. Dictionary<TKey, TValue> will be treated separately next time. There are more challenging variants, but let's keep their discussion for another time.

Arrays

Match = unify + extract. Given the commutativity of the operation we can start either way, but let's stick with the western habits of left to right processing (we're in compilation land anyhow). As usual, we start by extending MatchEntry.Compile:

NewExpression ne;
ConstantExpression ce;
ParameterExpression pe;
MemberInitExpression mie;
NewArrayExpression na;

Expression match;
if ((ne = Match.Body as NewExpression) != null)
    match = CompileNewExpression(o, target, ne, bindings);
else if ((ce = Match.Body as ConstantExpression) != null)
    match = CompileConstantExpression(o, target, ce, bindings);
else if ((pe = Match.Body as ParameterExpression) != null)
    match = CompileParameterExpression(o, target, pe, bindings);
else if ((mie = Match.Body as MemberInitExpression) != null)
    match = CompileMemberInitExpression(o, target, mie, bindings);
else if ((na = Match.Body as NewArrayExpression) != null)
    match = CompileNewArrayExpression(o, target, na, bindings);
else
    throw new
NotSupportedException("Unsupported expression type in match clause.");

This illustrates how easy it is to extend our matcher. Notice though that for an optimal matcher, you'd need a recursive approach (I'll talk about this later) since expressions can be arbitrarily nested (composability is key), as in:

.Match((int x, int y) => new Bar() { Foo = x, FooBar = new[] { y, 1 }, (x, y) => { ... })

Anyhow, there's something more important in the above, the type of bindings will need to change to:

//
// List of bindings.
//

var bindings = new Dictionary<ParameterExpression, Expression>();

What does this mean? Remember, a binding is used to extract information from the object being matched in order to feed it into the match body. So far, we've been extracting just properties and fields on objects, but now we need more extraction: elements of arrays. So, a MemberInfo value in the dictionary no longer suffices, hence the change to an Expression that will capture the way to get the data out of the original object. For the existing code, this is a rather simple fix: everywhere we stored a MemberInfo, it has to be replaced by:

bindings[pe] = Expression.PropertyOrField(
    Expression.Convert(
        o,
        target
    ),
    member.Name
);

which creates the mapping straight away. Other changes follow easily from this. Back to our array matching, time to take a look at CompileNewArrayExpression. Our check consists of a few things: checking the type, the length and all indices for which a constant is specified. We restrict ourselves to constants as we've done before. Here's how it looks:

private Expression CompileNewArrayExpression(ParameterExpression o, Type target, NewArrayExpression na, Dictionary<ParameterExpression, Expression> bindings)
{
    Expression check = Expression.AndAlso(
        Expression.TypeIs(
            o,
            target
        ),
        Expression.Equal(
            Expression.ArrayLength(
                Expression.Convert(
                    o,
                    target
                )
            ),
            Expression.Constant(na.Expressions.Count)
        )
    );

    int i = 0;
    foreach (var e in na.BLOCKED EXPRESSION
    {
        ParameterExpression pe;
        ConstantExpression ce;

        if ((pe = e as ParameterExpression) != null)
        {
            if (bindings.ContainsKey(pe))
            {
                check = Expression.AndAlso(
                    check,
                    GetEqualityCheck(o, target, i, bindings[pe])
                );
            }
            else
                bindings[pe] = Expression.ArrayIndex(
                    Expression.Convert(
                        o,
                        target
                    ),
                    Expression.Constant(i)
                );
        }
        else if ((ce = e as ConstantExpression) != null)
        {
            check = Expression.AndAlso(
                check,
                GetEqualityCheck(o, target, i, ce)
            );
        }
        else
            throw new
NotSupportedException("Can only match constants.");

        i++;
    }

    return check;
}

The first part of the check outside the foreach-loop checks the type using the is operator equivalent TypeIs and checks the array length which has a nice expression counterpart listening to the lovely name of ArrayLength which is matched against the number of expressions (which by the way stand for the individual entries in the comma-separated initialization list). Notice the compiler will make sure collection initializers have the same number of initialization expressions as the specified array rank (if specified at all).

Next, inside the loop, we try to bind all of the expressions in the initializer expression. The idea is completely analogue to what we've done before: if it's a constant, we emit an equality check that will be added to the match clause expression; if it's a parameter, it becomes a binding for extraction of values that need to be fed into the match body. A breakdown analysis:

        if ((pe = e as ParameterExpression) != null)
        {
            if (bindings.ContainsKey(pe))
            {
                check = Expression.AndAlso(
                    check,
                    GetEqualityCheck(o, target, i, bindings[pe])
                );
            }
            else
                bindings[pe] = Expression.ArrayIndex(
                    Expression.Convert(
                        o,
                        target
                    ),
                    Expression.Constant(i)
                );
        }

For parameters, we need to check first whether or not we've already found a binding to the parameter since we need to emit equality checks in that case. A sample of such an occurrence is:

.Match((int x, int y) => new[] { x, y, x }, (x, y) => { ... })

which should emit a condition like a[0] == a[2]. We keep the original binding (since we enforce equality in the check for bindings to the same name, it doesn't matter which one we retrieve obviously) but chain the extra equality check to the current check. If there isn't a binding yet, we create one using the ArrayIndex factory method on Expression which takes in an array and an index. The code for this speaks for itself.

Constants are even more trivial:

         else if ((ce = e as ConstantExpression) != null)
        {
            check = Expression.AndAlso(
                check,
                GetEqualityCheck(o, target, i, ce)
            );
        }

once we know what's hiding behind the GetEqualityCheck method. So let's reveal this little secret:

private Expression GetEqualityCheck(ParameterExpression o, Type target, int i, Expression e)
{
    return Expression.Call(
        typeof(object),
        "Equals",
        new Type[0],
        Expression.Convert(
            Expression.ArrayIndex(
                Expression.Convert(
                    o,
                    target
                ),
                Expression.Constant(i)
            ),
            typeof(object)
        ),
        Expression.Convert(
            e,
            typeof(object)
        )
    );
}

Again we rely on System.Object.Equals(a, b) in order to do equality checking but you could refine this if you want (exercise). The core part is ArrayIndex and it looks all familiar, so it doesn't require much more attention. Just don't forget about the conversions to System.Object for both operands to the equality check since those are to be bound to the parameters of the System.Object.Equals method.

That's it! We've done the unification, we extracted information, all the rest happens automagically in our Compile method we discussed before. Time for a sample:

var onBisectXY = new Pattern<bool>()
    .Match((int x) => new[] { x, x, 0 }, x => true)
    .Else(() => false);

onBisectXY.Compile();

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    var p = new[] { rand.Next(0, 5), rand.Next(0, 5), rand.Next(0, 5) };
    if (onBisectXY.Execute(p))
        Console.WriteLine(p[0] + "," + p[1] + "," + p[2]);
}

which produces:

image

Three matches out of 100, not bad given the restrictive condition:

image

List<T>

Our next candidate is List<T>. On the surface of it, it looks pretty similar:

var onBisectXY = new Pattern<bool>()
    .Match((int x) => new List<int> { x, x, 0 }, x => true)
    .Else(() => false);

onBisectXY.Compile();

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    var p = new List<int> { rand.Next(0, 5), rand.Next(0, 5), rand.Next(0, 5) };
    if (onBisectXY.Execute(p))
        Console.WriteLine(p[0] + "," + p[1] + "," + p[2]);
}

but the way to retrieve data is slightly different since we no longer have an ArrayInitExpression. Instead we're faced with a ListInitExpression. Once more, start by adding the right match handler:

NewExpression ne;
ConstantExpression ce;
ParameterExpression pe;
MemberInitExpression mie;
NewArrayExpression na; 
ListInitExpression lie;

Expression match;
if ((ne = Match.Body as NewExpression) != null)
    match = CompileNewExpression(o, target, ne, bindings);
else if ((ce = Match.Body as ConstantExpression) != null)
    match = CompileConstantExpression(o, target, ce, bindings);
else if ((pe = Match.Body as ParameterExpression) != null)
    match = CompileParameterExpression(o, target, pe, bindings);
else if ((mie = Match.Body as MemberInitExpression) != null)
    match = CompileMemberInitExpression(o, target, mie, bindings);
else if ((na = Match.Body as NewArrayExpression) != null)
    match = CompileNewArrayExpression(o, target, na, bindings);

else if ((lie = Match.Body as ListInitExpression) != null)
    match = CompileListInitExpression(o, target, lie, bindings);
else
    throw new
NotSupportedException("Unsupported expression type in match clause.");

Let's introduce CompileListInitExpression piecemeal. First let's do some checking to allow just List<T>:

private Expression CompileListInitExpression(ParameterExpression o, Type target, ListInitExpression lie, Dictionary<ParameterExpression, Expression> bindings)
{
    if (lie.NewExpression.Arguments.Count != 0)
        throw new NotSupportedException("Collection initializers in match expressions should use the default constructor.");

    if (lie.Type.GetGenericTypeDefinition() != typeof(List<>))
        throw new NotSupportedException("Only List<T> is supported.");

    return CompileListInitExpressionForListOfT(o, target, lie, bindings);
}

We restrict ourselves to List<T> but also make sure the constructor isn't used (since collection initializers can be combined with regular constructor calls) to simplify unification. Notice we don't loose much since the constructors for List<T> take either a capacity (which is irrelevant since we're using the "object construction" only for matching purposes) and an IEnumerable<T> (which could only be one of our supported types, like T[] or List<T> which would be redundant).

The method (with the rather long name) being called starts by checking the type and length, this time using the Count property of the list:

private Expression CompileListInitExpressionForListOfT(ParameterExpression o, Type target, ListInitExpression lie, Dictionary<ParameterExpression, Expression> bindings)
{
    Expression check = Expression.AndAlso(
        Expression.TypeIs(
            o,
            target
        ),
        Expression.Equal(
            Expression.Property(
                Expression.Convert(
                    o,
                    target
                ),
                "Count"
            ),
            Expression.Constant(lie.Initializers.Count)
        )
    );

The rest if fairly predictable but nevertheless interesting:

    int i = 0;
    foreach (var e in lie.Initializers)
    {
        //
        // Note: We know we're processing List<T>, which only has one Add overload,
        //       so we omit additional checks for now.
        //
        Expression arg = e.Arguments[0];

        ParameterExpression pe;
        ConstantExpression ce;

        if ((pe = arg as ParameterExpression) != null)
        {
            if (bindings.ContainsKey(pe))
            {
                check = Expression.AndAlso(
                    check,
                    GetEqualityCheckForListOfT(o, target, i, bindings[pe])
                );
            }
            else
                bindings[pe] = Expression.Call(
                    Expression.Convert(
                        o,
                        target
                    ),
                    target.GetMethod("get_Item"),
                    Expression.Constant(i)
                );
        }
        else if ((ce = arg as ConstantExpression) != null)
        {
            check = Expression.AndAlso(
                check,
                GetEqualityCheckForListOfT(o, target, i, ce)
            );
        }
        else
            throw new
NotSupportedException("Can only match constants.");

        i++;
    }

    return check;
}

Notice the skeleton of the code is the same: iterate, type check, accumulate the check expression, etc. Being so similar, I've highlighted the main differences. The type of the initializers are ElementInit objects which have an AddMethod and Arguments property. We don't do additional checking but in the general case any well-suited overload of Add could be used. List<T> only have one, so we simply extract the first parameter to the call and treat it as the expression being used. Again, it needs to be either constant or a parameter, resulting a match condition or a binding respectively. The binding is where things become interesting since we can't use ArrayIndex but rather need to call the indexer on the List<T> which is known as a method with name get_Item. Other than that, there's no relevant difference.

The helper method GetEqualityCheckForListOfT is plain simple to once we know how to call the indexer:

private Expression GetEqualityCheckForListOfT(ParameterExpression o, Type target, int i, Expression e)
{
    return Expression.Call(
        typeof(object),
        "Equals",
        new Type[0],
        Expression.Convert(
            Expression.Call(
                Expression.Convert(
                    o,
                    target
                ),
                target.GetMethod("get_Item"),
                Expression.Constant(i)
            ),
            typeof(object)
        ),
        Expression.Convert(
            e,
            typeof(object)
        )
    );
}

The resulting match condition you ask? Here it is:

image

Can you still parse it yourself? :-) Anyhow, more important: it works:

image

Next time: matching dictionaries.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Remark: Some readers have asked me for the sources of all this magic. Since this series is based on an extraction from a bigger project and I'm composing the decoupled (and slightly simplified) pattern matcher as I'm writing these blog posts, the source isn't in a publishable condition yet. To put the crown on the work, I'll publish the sources with the last post in this series, so keep on reading to absorb the principles, capabilities and design rationale of it.

 

In this series' last post we did put some effort into matching constants which could be considered the simplest of patterns (although there were quite some caveats to work out as we saw). This time we move back to the world of objects and extend upon the notions developed in part 1. Recall we worked on supporting patterns like this:

string message = new Pattern<string>(x)
   .Match((string name) => new Person(name, 25), name => name + " is 25 years.")
   .Match((int age) => new Person("John", age), age=> "John is " + age + " years.")
   .Else(() => "Unmatched object");

Although this approach worked out pretty well, it doesn't work on today's typical objects because we required some annotation to the constructor's parameters, like this:

public Person([Property("Name")]string name, [Property("Age")]int age)
{
   Name = name; 
   Age = age;
}

However, C# 3.0 and VB 9.0 have this new syntax called object initializers:

which allows us to write things like:

new Person() { Name = "Bart", Age = 25 }

An expression-based initialization syntax became a real need with LINQ in order to express projections and indeed, LINQ query providers like LINQ to SQL inspect the expression tree associated with a projection clause to figure out which fields to retrieve from the data store, possibly mixed with other operations, nested queries etc. What I'm aiming at is simply the fact those language constructs, since it are expressions, can be expressed in expression trees:

Expression<Func<Person>> e = () => new Person() { Name = "Bart", Age = 25 };

Enter MemberInitExpression:

image

 

The anatomy of MemberInitExpression

So what makes up a MemberInitExpression? Two things:

image

A NewExpression, as we've already crawled through before, and a set of Bindings. This structure will allow us to decompose the unification of such an expression into parsing the NewExpression based on our previous unification (which supports opt-out since one can call the default constructor that doesn't require any metadata for its parameters obviously) followed by the unification of the bindings which look like this:

image

Those bindings are of type MemberAssignment and contain information about the underlying member (field, property) that gets initialized in the object initializer. The composability of expressions (and hence expression trees) yields some interesting results though. Those assignments can be quite complicated:

new Person() { Name = "Bart", Age = 25, Address.Zip = 98004, Hobbies = new { "Programming", "Reading" } };

In here, we're recursively defining members of members (which is a MemberBinding binding type of MemberAssignment) and defining a list using collection initializer syntax (which is a ListBinding binding type of MemberAssignment). To trim down complexity for the purpose of this post, we'll stick with the Assignment binding type only which allows to set properties/fields as in the original sample.

 

Intermezzo - making symbolic unification work

Another case we didn't quite handle yet explicitly before is this:

.Match((int x) => new Point() { X = x, Y = x }, x => { ... })

This gets all of the points that are on the first/third bisector of the orthogonal 2D space. Assume for a second we have this instead, using our old mapping metadata for properties (the Point type should be very predicable):

.Match((int x) => new Point(x, x), x => { ... })

For the moment our pattern matcher doesn't handle this case correctly, the reason being that a ParameterExpression is bound to one property. In the heart of our unification algorithm, we have something like this:

foreach (ParameterInfo param in ne.Constructor.GetParameters())
{
    ...

    ParameterExpression pe;

    if ((pe = arg as ParameterExpression) != null)
    {
        bindings[pe] = property;
    }
    else
       
...
}

The problematic line is indicated in bold. Since we want unification to work across different types of expressions we need to tackle this problem first. It wouldn't be unrealistic to see code like this:

.Match((int x) => new Point(x, x) { Z = x }, x => { ... })

where a NewExpression is embedded in a MemberInitExpression, so in this case we're looking for points that match the criterion p.X = p.Y = p.Z = x. What are the assignment targets? Likely properties but public fields could sneak in as well (and would in this context (!) be even nicer since they are predictable: what goes in must come out identically). Therefore, we replace the type of bindings from:

Dictionary<ParameterExpression, PropertyInfo> bindings

to

Dictionary<ParameterExpression, MemberInfo> bindings

Now unification needs to do a little more work to enforce the equality of all the members bound to the same parameter, e.g. in:

.Match((int x) => new Point(x, x), x => { ... })

parameter 'x' is bound to both properties X and Y on type Point. Currently, we don't have any matching condition: both properties are unbound, so any point will match. Instead we need to emit conditions like:

Point p;
if ((p = o as Point) != null && p.X == p.Y) { ... }

We're currently missing the part in bold. Again, we're going to a take a cumulative approach, adding stuff to the check condition variable: if a parameter already has a binding, we emit an equality check for both the old and new binding:

foreach (ParameterInfo param in ne.Constructor.GetParameters())
{
    ...

    ParameterExpression pe;

    if ((pe = arg as ParameterExpression) != null)
    {
        if (bindings.ContainsKey(pe))
       
{
            check = Expression.AndAlso(
                check,
                GetEqualityCheck(o, target, bindings[pe], property)
            );

        }
        else
            bindings[pe] = property;
    }
    else
       
...
}

The helper method to emit the equality check looks like this:

private Expression GetEqualityCheck(ParameterExpression o, Type target, MemberInfo left, MemberInfo right)
{
    return Expression.Call(
        typeof(object),
        "Equals",
        new Type[0],
        Expression.Convert(
            Expression.PropertyOrField(
                Expression.Convert(
                    o,
                    target
                ),
                left.Name
            ),
            typeof(object)
        ),
        Expression.Convert(
            Expression.PropertyOrField(
                Expression.Convert(
                    o,
                    target
                ),
                right.Name
            ),
            typeof(object)
        )
    );
}

By now we all should now matching takes two faces: unification and extraction. The extraction currently looks like this:

//
// Create list of unbound parameters.
//

Expression[] args = new Expression[Match.Parameters.Count];
int j = 0;
foreach (ParameterExpression param in Match.Parameters)
{
    //
    // Parameters need to be bound in the match expression.
    //
    PropertyInfo property;
    if (!bindings.TryGetValue(param, out property))
        throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

    //
    // Null for identity checks; grab from identified property otherwise.
    //
    args[j++] = (property == null ? me : Expression.Property(me, property));
}

In here Match stands for the lambda expression corresponding to the match clause. We look for all parameters, determine the property they're bound to and stuff it into an array for extraction. This time, we'll have possibly more than one binding but all will point at the same value, so we can pick an arbitrary one, i.e. in:

.Match((int x) => new Point(x, x), x => { ... })

it doesn't matter whether we extract the X property from the object or the Y property since we want them to be the same, and indeed in our unification algorithm we only stored one of the many. Therefore the code doesn't need much of a change, just patching up the type of the binding:

//
// Create list of unbound parameters.
//

Expression[] args = new Expression[Match.Parameters.Count];
int j = 0;
foreach (ParameterExpression param in Match.Parameters)
{
    //
    // Parameters need to be bound in the match expression.
    //
    MemberInfo member;
    if (!bindings.TryGetValue(param, out member))
        throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

    //
    // Null for identity checks; grab from identified property otherwise.
    //
    args[j++] = (member== null ? me : Expression.PropertyOrField(me, member.Name));
}

Time for a sample:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point(x, x), x => true)
    .Else(() => false);

bisect.Compile();

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    Point p = new Point(rand.Next(1, 5), rand.Next(1, 5));
    if (bisect.Execute(p))
        Console.WriteLine(p);
}

It produces:

image

which is clearly less than 100 generated points and all match the condition. Congratulations - symbolic pattern matching at work! Of course things can get much more complicated, like this:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point(x, 2 * x), x => true)
    .Else(() => false);

What do you think the 2 * x will look like? Exactly, yet another expression tree. How could you handle this (left to the reader as an exercise)? A few tips:

  • Make sure you understand the semantics correctly: in the sample above we're demanding that p.Y == 2 * x.
  • Think about the translation that we're doing:
    • Source: The match clause can be any function of type Func<..., object> taking in an arbitrary number of parameters. E.g. .Match((int x, int y) => new Point(x, y, x + y), (x, y) => ...)
    • Target: The compiled matcher becomes of type Func<object, bool> taking in an arbitrary object determining whether it matches or not.
  • Order of parameter use can be arbitrary, e.g.: .Match((int y) => new Point(2 * y, y), y => ...)
  • Distinguish between trivial bindings and complex bindings.
  • (Golden tip): trees can be patched...
  • (Only for true geeks): mathematic expressions can be rewritten... Think about this one: .Match((int a, int b) => new Point(a + 1, b * 2, a * b), (a, b) => ...)

 

MemberInitExpression - parse time!

So, what should our matcher support? A breakdown:

.Match((string name) => new Person() { Name = name, Age = 25 }, name => { ... })

means we want to get people with Age 25 and extract their name to feed it into the match body. Obviously, our MatchEntry::Compile method needs to be enhanced:

NewExpression ne;
ConstantExpression ce;
ParameterExpression pe;
MemberInitExpression mie;

Expression match;
if ((ne = Match.Body as NewExpression) != null)
    match = CompileNewExpression(o, target, ne, bindings);
else if ((ce = Match.Body as ConstantExpression) != null)
    match = CompileConstantExpression(o, target, ce, bindings);
else if ((pe = Match.Body as ParameterExpression) != null)
    match = CompileParameterExpression(o, target, pe, bindings);
else if ((mie = Match.Body as MemberInitExpression) != null)
    match = CompileMemberInitExpression(o, target, mie, bindings);
else
    throw new NotSupportedException("Unsupported expression type in match clause.");

The code for the MemberInit parser looks like this:

private Expression CompileMemberInitExpression(ParameterExpression o, Type target, MemberInitExpression me, Dictionary<ParameterExpression, MemberInfo> bindings)
{
    //
    // Compile embedded new expression.
    //

    Expression check = CompileNewExpression(o, target, me.NewExpression, bindings);

    //
    // Parse bindings.
    //
    foreach (MemberBinding binding in me.Bindings)
    {
        MemberAssignment ma = binding as MemberAssignment;
        if (ma == null)
            throw new NotSupportedException("Only top-level regular assignment bindings are supported.");

        check = TryBind(check, o, target, ma.Member, ma.Expression, bindings);
    }

    return check;
}

Notice we're parsing the constructor first. MemberInitExpressions always have a call to a constructor embedded in them, so we extend upon that functionality. More importantly, notice the conciseness of the loop thanks to some refactoring into a TryBind method. Make the following observation. When parsing a NewExpression we do precisely the same: for each argument we try to establish a binding or emit a check. Therefore that code can be shared between CompileNewExpression and CompileMemberInitExpression. Let's take a look at the rewritten CompileNewExpression first:

private Expression CompileNewExpression(ParameterExpression o, Type target, MemberInitExpression me, Dictionary<ParameterExpression, MemberInfo> bindings)
{
    //
    // Check the type of the object against the expected type.
    //

    Expression check = GetTypeCheck(o, target);

    //
    // Resolve parameters.
    //

    int i = 0;
    foreach (ParameterInfo param in ne.Constructor.GetParameters())
    {
        //
        // Mapping information from constructor parameters to properties required.
        //
        PropertyAttribute pa = param.GetCustomAttributes(typeof(PropertyAttribute), false).Cast<PropertyAttribute>().SingleOrDefault();
        if (pa == null)
            throw new InvalidOperationException("Input object doesn't have required mapping information.");

        //
        // Find the property.
        //
        PropertyInfo property = target.GetProperty(pa.Name);
        if (property == null)
            throw new InvalidOperationException(String.Format("Property {0} on type {1} not found.", pa.Name, target.Name));

        check = TryBind(check, o, target, property, ne.Arguments[i++], bindings);
    }

    return check;
}

This too is fairly straightforward, we just need to infer the property through the metadata on the constructor arguments first. Time to move on to the TryBind method which acts as an accumulator for bindings and an extended for match expressions:

private Expression TryBind(Expression check, ParameterExpression o, Type target, MemberInfo member, Expression arg, Dictionary<ParameterExpression, MemberInfo> bindings)
{
    ConstantExpression ce;
    ParameterExpression pe;

    //
    // Expression should either be a constant or a parameter reference.
    //
    if ((ce = arg as ConstantExpression) != null)
    {
        //
        // Check the object's property matches the constant.
        //
        check = Expression.AndAlso(
            check,
            GetEqualityCheck(o, target, member, ce)
        );
    }
    else if ((pe = arg as ParameterExpression) != null)
    {
        //
        // Keep track of the loose parameter.
        //
        if (bindings.ContainsKey(pe))
        {
            check = Expression.AndAlso(
                check,
                GetEqualityCheck(o, target, bindings[pe], member)
            );
        }
        else
            bindings[pe] = member;
    }
    else
        throw new NotSupportedException("Can only match constants.");

    return check;
}

That's it! Here's our rewritten Point example:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point() { X = x, Y = x }, x => true)
    .Else(() => false);

bisect.Compile();

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    Point p = new Point(rand.Next(1, 5), rand.Next(1, 5));
    if (bisect.Execute(p))
        Console.WriteLine(p);
}

and here's the result:

image

Randomly different of course, but still up and running!

 

Next time - a little trip through the art of matching collections.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Last time around in this series we won the battle of performance, going all the way from an interpreter-driven pattern matcher to a fully on-the-fly compiled one, with great results almost reaching the performance of a manual implementation. However, functionality-wise our pattern matcher is rather restricted, only supporting the NewExpression. Nonetheless, there are many other interesting opportunities to introduce matches for. In this post, we'll talk only about constants. Are constants so interesting you might wonder? You bet they are :-).

 

ConstantExpression

This one was just too simple to start with; its simplicity reflects the borderline between a full-blown pattern matcher and the degenerated case of a switch expression. In other words, you can drop the switch I presented before and layer it on top of the pattern matcher. In order to do so, the pattern matcher needs to provide support to match constants obviously. Take a look back at our MatchEntry.Compile method:

internal void Compile()
{
    //
    // Only support for NewExpression.
    //

    NewExpression ne = Match.Body as NewExpression;
    if (ne == null)
        throw new NotSupportedException("Match clause can only contain NewExpression.");

Since we can expect any kind of expression as the Match lambda expression's body, we should add checks to the above to distinguish between cases. A non-atypical structure looks like this:

    NewExpression ne;
    ConstantExpression ce;

    if ((ne = Match.Body as NewExpression) != null)
    { ... }
    else if ((ce = Match.Body as ConstantExpression) != null)
    { ... }
    else
        throw new NotSupportedException("Unsupported expression detected.");

Not unsurprisingly (the last double negation for today, I promise) this pattern looks familiar - it's nothing more than a type-switch, which - as we said before - can be expressed in terms of a degenerated pattern matcher. It should be apparent that the way towards a meta-circular interpreter isn't too long from this observation on. Anyhow, back to the order of the day. Here's a typical example of what we like to achieve:

uint Fib(uint n)
{
   return new Pattern<uint>(n)
      .Match(() => 0, () => 1)
      .Match(() => 1, () => 1)
      .Else(() => Fib(n - 1) + Fib(n - 2)).Result;
}

Obviously, it's nothing more than a sample since this is likely a laureate in the "Who's written the stupidest implementation of Fibonacci?" competition. But well, it suits our goals. Notice by the way how the Else clause works by means of a closure that captures the local variable n. The implementation of a constant-based match clause unification is seemingly innocent:

private static bool TryMatchConstantExpression(object o, ConstantExpression ce, Dictionary<ParameterExpression, PropertyInfo> bindings)
{
    return Object.Equals(o, ce.Value);
}

And yes, this works but there's a little caveat with the sample above:

      .Match(() => 0, () => 1)

What's the type of the match clause? Func<int>. What's the type of the match body? Func<uint>. What do we feed into the pattern matcher? Right, a uint. So, what about comparability of System.Int32 against System.UInt32? Right, won't work. So either you make your matcher a smart as e.g. the C# compiler:

uint i = 0;
int j = 0;
bool b = i == j;

becomes

.locals init (uint32 V_0,
         int32 V_1,
         bool V_2)
...
ldc.i4.0
stloc.0
ldc.i4.0
stloc.1
ldloc.0
conv.u8
ldloc.1
conv.i8
ceq
stloc.2

with implicit conversions in place. Or, you simply patch up your pattern as follows:

uint Fib(uint n)
{
   return new Pattern<uint>(n)
      .Match(() => (uint)0, () => 1)
      .Match(() => (uint)1, () => 1)
      .Else(() => Fib(n - 1) + Fib(n - 2)).Result;
}

Question: Say with Fib(int i) you can calculate Fibonacci numbers up to i == x. Now switch to Fib(uint u), what do you expect to be the upper bound to the computability using this alternative function? Relate this to the discussion above.

The match implementation above obviously doesn't employ deferred execution using a compiled pattern. Making the switch to the compiled form as explained previously in this series should be trivial, or not? There's a little caveat (as usual) and the choice of Fibonacci as the running sample in this paragraph isn't just a coincidence. Recursion is tricky in lambda calculus, as explained bY others previouslY (notice this doesn't reflect the complexity of the platform, as some believe, but rather the complexity of the core mathematical theories - taken to the extreme - that provide the foundation of what we now call "functional programming"). It's maybe not completely apparent but in reality we've defined the pattern is terms of the place (method) where it's defined in itself. It might be an interesting exercise to trace back a few calculations of say Fib(4) and correlate this to memory usage. A way to make this callable is simply this:

static Pattern<int> _pattern = new Pattern<int>()
       .Match(() => 0, () => 1)
       .Match(() => 1, () => 1)
       .Match((int n) => n, n => Fib(n - 1) + Fib(n - 2))
       .Else(() => { throw new Exception("Unmatched value detected."); });

static int Fib(int n)
{
    return _pattern.Execute(n);
}

Don't bother about the static context, it just so happened to be like that in my collection of unit tests. Notice I've gently switched to use ints instead of uints, and I had to make a couple of changes. First of all, why doesn't it simply work anymore with a trailing Else handling the recursive case? Recall my remark on the subtle use of a closure in the original code, capturing the local variable inside the Fib method and using it inside the lambda expression of the Else case's match body. Since we're delaying execution now through an open pattern match object, there's no such thing as "n" to be captured from the outer scope. Indeed, the new value is going to be passed in to the pattern match machinery through the Execute method instead. To fix this, we need to get our hands on the value that's flowing through the pattern match, and what's better than an identity checking clause (x => x)?

       .Match((int n) => n, n => Fib(n - 1) + Fib(n - 2))

Notice however the type of the lambda expression's body representing the match clause: ParameterExpression. Actually the identity check is nothing more than a wildcard operation albeit with a type check upfront (this too can be lifted if you define a pattern over a closed domain, something we might cover in a later episode). Also notice the fact we now require an Else block to complete the match object because a match is a valued expression that should produce values for the entire input domain (something that can be lifted when working with closed domains while inferring the input domain coverage, i.e. if we'd know in the sample above the input can only be of type int, we can prove the pattern is complete with the first three matches - although it isn't defined meaningful for negative numbers).

So, how should we compile a ParameterExpression? Fairly straightforward:

private static bool TryMatchParameterExpression(object o, ParameterExpression pe, Dictionary<ParameterExpression, PropertyInfo> bindings)
{
    bindings[pe] = null;
    return pe.Type.IsAssignableFrom(o.GetType());
}

The first line is a bit tricky. As it stands now, our bindings map parameters onto properties, but the parameter itself (in this case) maps to the entire object (indeed, you could have a (Person p) => p as well), which we represent internally as a null-mapping. In addition to this, the matcher performs a type check since the object being matched won't necessarily match the clause (e.g. in the (Person p) => p sample, the pattern match object might be fed with a Giraffe object which only in rare circumstances takes human characteristics). As a matter of fact, this type check can be seen as the engine for a type switch layered on top of the pattern matcher. The change to the evaluator (or back-end compiler) is straightforward and just involves one additional null-check:

object value = property == null ? o : property.GetValue(o, null);

and

Expression me = Expression.Convert(o, target);
args[j++] = (property == null ? me : Expression.Property(me, property));

And here are finally the compile functions:

private Expression CompileParameterExpression(ParameterExpression o, Type target, ParameterExpression pe, Dictionary<ParameterExpression, PropertyInfo> bindings)
{
    bindings[pe] = null;

    return Expression.TypeIs(o, target);
}

private Expression CompileConstantExpression(ParameterExpression o, Type target, ConstantExpression ce, Dictionary<ParameterExpression, PropertyInfo> bindings)
{
    return Expression.Equal(
        Expression.Convert(o, ce.Type),
        ce
    );
}

Time for a little geek exercise. In the screenshot below I'm showing off a tempting attempt to make our beloved Fibonacci count his rabbits in a local method scope:

image

However, our even more beloved C# compiler indicates something is wrong. Can you guess what and propose a fix? Does it work as you expect? Notice the following code doesn't compile either but the fix for this issue is trivial (and similar - see Wes' blog for more info):

Func<int, int> fib = n => n > 1 ? fib(n - 1) + fib(n - 2) : 1;

Anyway, here's a complete sample at work:

image

and here are the results:

image

 

Performance

Oh well, nice eye candy but how does it perform? Yesterday's post was definitely a victory but the nice thing about it was the constant cost for each pattern match. This time, we're creating a cascade of pattern executions by means of the recursive call to Fib inside the pattern match's third body. A quick trace down shows:

Fib(4) --> Fib(3) + Fib(2) --> Fib(2) + Fib(1) + Fib(1) + Fib(0) --> Fib(1) + Fib (0) + Fib(1) + Fib(1) + Fib(0)

The number of calls is exploding. Here's a detail of the call tree in CLR Profiler outlining two calculations:

image

Notice how the second line of the first box corresponds to the last line of the second box (both 7 calls). You'll see this pattern returning since those are the recursive calls to Fib(1). To make this cascade more concrete, here's an outline of 10 iterations:

image

Ignore the first one (some heaving string concat going on there) - all the others nicely reflect the trend for subsequent iterations. Math freaks like me will do a little division of two subsequent numbers and will - not coincidently - see the following:

46  
63 63 / 46 = 1.37
96 96 / 63 = 1.52
146 146 / 96 = 1.57
229 229 / 146 = 1.58
362 362 / 229 = 1.60
578 578 / 362 = 1.60
927 927 / 578 = 1.61
1492 1492 / 927 = 1.61

And if you'd profile further, e.g. run 19 and 20 with values of 112434 and 181914 respectively, you'd come to the conclusion the ratio is nearly constant (run[20]/run[19] = 1.6179625...). This is the golden ratio which has the precise value of (SQRT(5) + 1) / 2 = 1.6180339... Whole sites have been devoted to the topic but here's the ultimate source. Fibonacci is worth a blog on its own and has many applications (including memory allocation schemas, cf. Ferguson [1976]). Now, why do we find this number? In essence, it simply reflects the ratio of calls required to calculate Fibonacci number n versus n - 1, which sole distinction is the recalculation of number n - 2. This does simply show the shortcoming of the algorithm, not really our implementation of the pattern matcher.

For example, if you do a really simplistic pattern match like this:

var even = new Pattern<bool>()
                .Match(() => 1, () => false)
                .Match(() => 2, () => true)
                .Match(() => 3, () => false)
                .Match(() => 4, () => true)
                .Else(() => false);

You'll see it performance nearly as good (or bad) at the imperative variant, with just a factor 3 or so the difference, e.g.:

var res_1 = from i in lst select even.Execute(i);
var res_2 = from i in lst select (i == 1 ? false : i == 2 ? true : i == 3 ? false : i == 4 ? true : false);

Obviously, the even check can be turned into i % 2 == 0, but the code above illustrates two almost identical implementations which makes a fair comparison. Remember our switch takes is a Boolean predicate to do matchings, if you layer that on top of this you can get much closer to a modulo operation based implementation again (however, at the moment we don't have those Match method overloads that accept additional Boolean predicates as in "When" - exercise).

 

It's functional baby!

But why so much talking nagging about constants? Let's take a step back and look where those constants appear: not only in the match but also in the calculation's result. Take a look at the left-most expansion of Fib(4):

Fib(4) --> Fib(3) + Fib(2) --> Fib(2) + Fib(1) + Fib(1) + Fib(0) --> Fib(1) + Fib (0) + Fib(1) + Fib(1) + Fib(0)

Assume the underlined expressions have been evaluated, what's next: the bold one. What's it's value? Fib(1) which we've calculated already before. See the picture? Fib is a function and if there's one thing about functions, it's that they're damn consistent over time. Give Fibonacci 10 as its input today and it will say 89. Assuming the activity of rabbits (inside joke) remains consistent over time, Fibonacci should answer 89 again tomorrow. So, what's the difference between the following two fragments?

static int OldFib(int n)
{
    if (n == 0)
        return 1;
    else if (n == 1)
        return 1;
    else
        return
OldFib(n - 1) + OldFib(n - 2);
}

and

static Pattern<int> _pattern = new Pattern<int>()
       .Match(() => 0, () => 1)
       .Match(() => 1, () => 1)
       .Match((int n) => n, n => Fib(n - 1) + Fib(n - 2))
       .Else(() => { throw new Exception("Unmatched value detected."); });

static int Fib(int n)
{
    return _pattern.Execute(n);
}

They do the same, they look (admittedly, almost, <g>) the same but - although both are equally inefficient because of their implementation nature (a simple loop would be much simpler and better - but remember, this post simply illustrates a concept, Fibonacci is our horse rabbit to bring us there) - the latter one gives us much more power than you might imagine. Functions are invariant over time and although the imperative implementation on top is a function, there's no intelligence in the runtime that can exploit those characteristics. If you ask Fib(10) today and you do that again tomorrow, it will get calculated twice. But also as soon as you ask for Fib(12), which consists of Fib(11) and Fib(10), which consists of Fib(10), Fib(9), Fib(9) and Fib(8), we'll repeat this calculation all over again. Here's a loop that calculates Fibonacci numbers 1 to 35, through OldFib and Fib respectively:

image

The algorithm is killing us and our previously assumed minor costs to call through delegates (and because of recursion calling the Execute method many times during the snowball effect) accumulate quite a bit, resulting in those bad figures. But...

 

Tip of the day: Evaluate only once

Of course it all depends on your input distribution characteristics (i.e. how data is spread and what the likelihood is to calculate the same thing multiple times) and how pure your pattern matching match bodies are (which you could enforce a bit more by making it expression trees too - why?), but by now the reader should think about buzzwords like caching.

For sake of the demo, I've implemented what's likely near to the worst cache: a simple Dictionary. Add this to Pattern<T>:

private Dictionary<object, T> _cache = new Dictionary<object, T>();

There are lots of problems with this: we don't have an aging mechanism to clear the cache, the dictionary keys hold on to objects keeping the GC from doing its job, etc. If you want to see a much better implementation of a GC-friendly dictionary, take a look at WeakHash<TKey, TValue> in Microsoft.Scripting.Utils in the DLR (ships with IronPython). Nevertheless, the Evaluate code will be changed like this:

public T Execute(object o)
{
    ...

    T res;
    if (_cache.TryGetValue(o, out res))
        return res;

    ...

and everywhere a value is calculated through the compiled match or the else clause or even through interpretation, it gets stored in the dictionary before returning it to the caller. The result is provably correct if you don't rely on your match bodies to carry out side-effects (i.e. pure functions) and so, with the caching turned on, the naive algorithm starts to perform incredibly well:

image

even a 100 times better than our reference. Given the number of iterations carried out above (35) and the golden ratio we can predict it would take about 25 seconds (36 * 1.6^8) for 43 iterations, which is doable to show off the dramatic improvement once more:

image

Prediction: accurate (given the fact I used 1.6 instead of the 1.61...). Result of our pattern match variant: hasn't moved a bit. Indeed, when we try to bump up the number of iterations performed, we see something like this:

image

Important: The test above doesn't test for correctness anymore. In fact, overflow for Fibonacci happens from input 46 on. However, the sample run still gives some idea about the number of operations carried out which is what we really care about. Also notice the test loop clears the cache every time it restarts to test with a new number of iterations. Don't even dream about 1000 iterations using the recursive algorithm (roughly 5.4e+196 seconds that would take on my computer) and obviously you'd need some LargeInteger or BigNumber implementation. Of course there's the much better linear algorithm I've been referring to quite a bit (with a few obvious flaws - do a domain check for instance):

static int TheFib(int n)
{
    checked
   
{
        if (n <= 1)
            return 1;

        int prev = 1;
        int curr = 1;

        for (int i = 2; i <= n; i++)
        {
            int t = curr;
            curr += prev;
            prev = t;
        }

        return curr;
    }
}

In the light of the overflow checking, notice the use of the checked keyword, which by the way you can use in our pattern match:

static Pattern<int> _pattern = new Pattern<int>()
       .Match(() => 0, () => 1)
       .Match(() => 1, () => 1)
       .Match((int n) => n, n => checked(Fib(n - 1) + Fib(n - 2)))
       .Else(() => { throw new Exception("Unmatched value detected."); });

I'll leave it to the reader to go and figure out whether or not the difference in performance between the imperative and recursive + cached implementation is still relevant at this point (and obviously there are pros and cons)... However, if I gave you the imperative fragment above without the method signature and asked you what it does, would you be able to state it's Fibonacci in an imperative carnival costume?

 

Enough S (K (S I I)) (S (S (K S) K) (K (S I I))) for tonight. Next time we'll match plain objects again with the MemberInitExpression.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Welcome back beloved pattern matchers! In our last encounter we moved from closed to open pattern match objects in order to allow for reuse, improving the performance of our pattern matcher quite a bit. But we're not done yet: today we'll build upon our open pattern match and provide a way to pre-compile it into highly efficient code. Such pre-compilation is often seen in similar scenarios such as regular expression matching (see System.Text.RegEx) and can offer huge improvements.

 

Revisiting the MatchEntry

Previously, we accumulated information about match clauses and bodies in a list of MatchEntry objects which simply held the clause and the body for later use in TryMatch and Evaluate respectively. This very MatchEntry class will serve as our hook to optimize each individual match entry but turning it from an interpreter-based evaluation into a pre-compiled based evaluation. By doing so, we'll get red of the huge costs seen before to evaluate a match, largely overshadowed by reflection costs:

image

Remaining question is how. Here's the plan: add a Compile method to the MatchEntry class that turns the match clause into a delegate that takes in an object and returns a bool. At the same time, turn the match body into a delegate that takes in an object and returns an object array containing all extracted bindings. When, during pattern match evaluation, we encounter such a pre-compiled MatchEntry, we can simply invoke the delegates to check the clause and produce the result. Here's the original Execute method:

public T Execute(object o)
{
    if (Object != null)
        throw new InvalidOperationException("Execution is only valid on unbound pattern match objects.");

    if (_else == null)
        throw new InvalidOperationException("Can't evaluate the match. Incomplete match object. Are you missing an Else clause?");

    foreach (MatchEntry entry in _matches)
    {
        Dictionary<ParameterExpression, PropertyInfo> bindings;
        if (TryMatch(o, entry.Match, out bindings))
        {
            return Evaluate(o, entry.Match, entry.Action, bindings);
        }
    }

    return _else();
}

All we like to do is extend it as follows:

public T Execute(object o)
{
    if (Object != null)
        throw new InvalidOperationException("Execution is only valid on unbound pattern match objects.");

    if (_else == null)
        throw new InvalidOperationException("Can't evaluate the match. Incomplete match object. Are you missing an Else clause?");

    foreach (MatchEntry entry in _matches)
    {
        if (entry.CompiledMatch != null)
        {
            if (entry.CompiledMatch(o))
            {
                return (T)entry.Action.DynamicInvoke(entry.CompiledMapper(o));
            }
        }
        else
        {

            Dictionary<ParameterExpression, PropertyInfo> bindings;
            if (TryMatch(o, entry.Match, out bindings))
            {
                return Evaluate(o, entry.Match, entry.Action, bindings);
            }
        }
    }

    return _else();
}

Let's take a look at the flow of operations. CompiledMatch and CompiledMapper aren't methods, they are simply delegates:

internal class MatchEntry
{
   public LambdaExpression Match { get; set; }
   public Delegate Action { get; set; }

   public Func<object, bool> CompiledMatch { get; set; }
   public Func<object, object[]> <object, object[]> CompiledMapper { get; set; }


   ...
}

Basically we test whether or not the supplied object passes the match clause test which incorporates the unification algorithm including the type check and matching of individual property values. If it passes, we extract the unmatched parameters into an object array which is produced by the compiler mapper and fed into the match body delegate through DynamicInvoke. Nothing mind-blowing.

 

Match clause compilation

The match compilation follows the same skeleton as the original unification algorithm but instead of doing the evaluation straight away, it simply builds the plan to do so later when executing the open pattern match on a given object. The code will therefore look familiar and ideally it gets unified with the eager evaluation loop. Indeed, in the original code that lies at the foundation of this blog series, the (much larger) unification algorithm is shared with different back-ends plugging in to it: an eager interpreter and a couple of code-emitters of which this is one.

How does a match clause look like in its compiled form? Somewhat similar to this:

Person p;
if ((p = o as Person) != null && p.Age == 25)
   // body

Of course the number of equality checks (or in a broader sense, unification checks) is unbounded but what's important here is:

  1. The type check
  2. Fail-early semantics due to the use of && short-circuiting

This gives us the pattern for a match clause compilation, so let's present this one first:

internal void Compile()
{
    //
    // Only support for NewExpression.
    //

    NewExpression ne = Match.Body as NewExpression;
    if (ne == null)
        throw new NotSupportedException("Match clause can only contain NewExpression.");

    //
    // Lambda input parameter. 
    //
    ParameterExpression o = Expression.Parameter(typeof(object), "o");


    //
    // Created target type.
    //
    Type target = ne.Constructor.DeclaringType;

    //
    // Check the type of the object against the expected type.
    //
    Expression check = GetTypeCheck(o, target);


    //
   // List of bindings.
   //
   var bindings = new Dictionary<ParameterExpression,PropertyInfo>();

    //
    // Resolve parameters.
    //
    int i = 0;
    foreach (ParameterInfo param in ne.Constructor.GetParameters())
    {
        //
        // Mapping information from constructor parameters to properties required.
        //
        PropertyAttribute pa = param.GetCustomAttributes(typeof(PropertyAttribute), false).Cast<PropertyAttribute>().SingleOrDefault();
        if (pa == null)
            throw new InvalidOperationException(&ampquotInput object doesn't have required mapping information."); 

        //
        // Find the property.
        //
        PropertyInfo property = target.GetProperty(pa.Name);
        if (property == null)
            throw new InvalidOperationException(String.Format(&ampquotProperty {0} on type {1} not found.", pa.Name, target.Name)); 

        ConstantExpression ce;
        ParameterExpression pe; 

        //
        // Expression should either be a constant or a parameter reference.
        //
        Expression arg = ne.Arguments[i++];
        if ((ce = arg as ConstantExpression) != null)
        {
            //
            // Check the object's property matches the constant.
            //
            check = Expression.AndAlso(
                check,
                GetEqualityCheck(o, target, property, ce)
            );
        }
        else if ((pe = arg as ParameterExpression) != null)
        {
            //
            // Keep track of the loose parameter.
            //
            bindings[pe] = property;
        }
        else
            throw new NotSupportedException("Can only match constants.");
    }

    //
    // Compile match predicate.
    //
    CompiledMatch = Expression.Lambda<Func<object, bool>>(check, o).Compile();

Only the bold portions have changed, so most is familiar. Let's analyze the new approach step-by-step, starting from the bottom:

    //
    // Compile match predicate.
    //
    CompiledMatch = Expression.Lambda<Func<object, bool> >(check, o).Compile();

Here we use maybe the single most powerful method in .NET Framework 3.5's System.Core.dll assembly: LambdaExpression.Compile. We simply have an expression tree representing a check that produces a Boolean, wrap it inside a lambda expression with one parameter called o, and ask the runtime to compile it into efficient IL code. Actually we take the lazy route and let the framework do all the heavy lifting to get our check compiled. The only remaining thing for us to do is to create the check expression tree corresponding to the match clause by means of (simplified) unification. It all starts here, by declaring the input object lambda parameter:

    //
    // Lambda input parameter. 
    //
    ParameterExpression o = Expression.Parameter(typeof(object), "o");

This needs to happen upfront since we're going to refer to it quite often, e.g. to emit the type check (along the lines of &ampampampampquoto is Person") or to emit the matching conditions (e.g. ((Person)o).Age == 25). First is our type-check:

    //
    // Check the type of the object against the expected type.
    //
    Expression check = GetTypeCheck(o, target);

which gets implemented as follows:

private static Expression GetTypeCheck(ParameterExpression o, Type t)
{
    return Expression.TypeIs(p, t);
}

TypeIs corresponds to the is keyword in C#, comparing the passed in object's type against the type of the constructor of the match clause. For example, take this match clause:

(string name) => new Person(name, 25)

The corresponding type check becomes:

o is Person

Obviously, the code above shouldn't live in its own method if it's that short but in case you want stricter type checks (e.g. ignoring inheritance), this gives you the room to play. If we were to implement just a type switch we would be done here with the "unification". However, we need to emit more equality checks:

            //
            // Check the object's property matches the constant.
            //
            check = Expression.AndAlso(
                check,
                GetEqualityCheck(o, target, property, ce)
            );

As mentioned before, we only consider constants as comparable to keep the unification simple. What's interesting about the code above is the fact we accumulate all of the required conditions into one (giant) expression tree using AndAlso. This type of expression corresponds to the && short-circuiting Boolean and operator which is exactly what we want. Here's the GetEqualityCheck method:

private static MethodCallExpression GetEqualityCheck(ParameterExpression o, Type target, PropertyInfo property, ConstantExpression ce)
{
    return Expression.Call(
        typeof(object),
        "Equals",
        new Type[0],
        Expression.Convert(
            Expression.Property(
                Expression.Convert(
                    o,
                    target
                ),
                property
            ),
            typeof(object)
        ),
        Expression.Convert(
            ce,
            typeof(object)
        )
    );
}

Basically we call Object.Equals, passing in the value from the property on the passed-in object and the constant expression we found in the original expression tree. The Convert nodes are required to make the method binding work in the expression compiler, so we simply cast both inputs to System.Object. For our running sample, the resulting expression tree is:

image

 

Match clause mapper compilation

We're not done with the compilation yet. As the bridge between the match clause and match body, we need a mapper that extracts and binds the values for the match parameters:

(string name) => new Person(name, 25)

in order to feed those values in to the match clause. We had this piece of magic before in our unifier's Evaluate method. Here we just add it to the Compile method:

    //
    // Create list of unbound parameters.
    //
    Expression[] args = new Expression[Match.Parameters.Count];
    int j = 0;
    foreach (ParameterExpression param in Match.Parameters)
    {
        //
        // Parameters need to be bound in the match expression.
        //
        PropertyInfo property;
        if (!bindings.TryGetValue(param, out property))
            throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

        //
        // Expression to grab value from property.
        //
        args[j++] = Expression.Convert(
                        Expression.Property(
                            Expression.Convert(
                                o,
                                target
                            ),
                            property
                        ),
                        typeof(object)
                    );
    }

    //
    // Compile mapper function.
    //
    CompiledMapper = Expression.Lambda<Func<object, object[]>>(Expression.NewArrayInit(typeof(object), args), o).Compile();
}

This code is equally straight-forward and creates a lambda expression that maps the parameter 'o' representing the input object on an array of extracted property values, represented by a NewArrayInit node consisting of a series of Property nodes (which again need some conversion).

Again for our running sample, this is what happens:

image

 

Exposing compilation on the pattern object

Trivial plumbing in fact. Just add the following to Pattern<T>:

public void Compile()
{
    if (Object != null)
        throw new InvalidOperationException("Compilation is only valid on unbound pattern match objects.");

    if (_else == null)
        throw new InvalidOperationException("Can't compile the match. Incomplete match object. Are you missing an Else clause?");

    foreach (MatchEntry entry in _matches)
        if (entry.CompiledMatch == null)
            entry.Compile();
}

Alternatively, we could do the compilation as we add the match entries to the list. This would definitely be a benefit but we'll keep it as shown above. When having multiple back-ends in the compilation mechanism, the approach used above has its benefits since this Compile method has the capability of having a global view across the individual matches. For example, if you have multiple clauses testing for a Person object, we could get away with one single Person type check and rather turn the code into a TypeAs that attempts the conversion to a Person object only once, followed by null checks.

That's it: we've succeeded to compile the match clause and match body, hopefully giving us a mindblowing performance boost.

 

Drums please ... Please welcome Mr. Performance

Fasten your seatbelts and add the following to the benchmark:

pattern.Compile();

var res3 = persons.Select(p => pattern.Execute(p));

watch.Reset();
watch.Start();

foreach (var p in res3)
    ;

watch.Stop();
Console.WriteLine(watch.Elapsed + " " + watch.ElapsedMilliseconds / baseMs);

Debug.Assert(baseline.SequenceEqual(res3));

One line the difference, but what to expect from the performance... Here are the results:

image

Can't believe it? You should! One order of magnitude slower, and given the absolute numbers, this is workable. Some measurements again in CLR Profiler. We should be at easy with the memory footprint (exercise: confirm) so let's skip that one but rather look at the allocation graph:

image

Important to realize here is that the Compile phase is the one-time cost which should be O(n) in the number of match clauses. The remaining portion is clearly the iteration itself and the allocations happening here should be purely related to the delegate invocations. Indeed, here it is:

image

But what about the call tree? Tada...

image

Look at that - 14 12 (UPDATE: I missed by two lines) humble calls per iteration, allocating three objects. This is obviously just one iteration and more specifically one that didn't produce a match. In case of a match, we pay the price of going through the mapper which involves a quite expensive DynamicInvoke call:

image

This is a steady-state DynamicInvoke call, the initial invocation through the delegate is more expensive but information gets cached in the RuntimeTypeCache. However, compare to where we come from we've reduced the number of calls per iteration from 485 to 14 (non-match). The probability of having a match (and obviously, each match clause has a cost) will largely influence the overall cost of the execution since this requires the mapper to kick in through the reflection call. In our run, we had a chance of 11% to find a match (random numbers between 1 and 9, of which only Age == 5 was considered a match), so if you boost the probability you'll see those costs influence the performance negatively. For example, change the benchmark to generate random numbers between 4 and 5 (remember the second parameter to Random.Next takes the exclusive upper bound, so specify 4 and 6) - which yields a probability for a match of 50% - and you'll see the overall execution time drop to 60 ms (about 4 times slower).

A detailed performance analysis would drift us off track but you get the idea of how to measure and interpret. We've found the target for our next optimization.

 

Optimizing the call

So, is there something we can do about this DynamicInvoke call? Sure! Make it a plain delegate call. How? InvokeExpression comes to the rescue. We'll change the MatchEntry.Compile method's last part as follows:

    //
    // Create list of unbound parameters.
    //
    Expression[] args = new Expression[Match.Parameters.Count];
    int j = 0;
    foreach (ParameterExpression param in Match.Parameters)
    {
        //
        // Parameters need to be bound in the match expression.
        //
        PropertyInfo property;
        if (!bindings.TryGetValue(param, out property))
            throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

        //
        // Expression to grab value from property.
        //
        args[j++] = Expression.Convert(
      
                 Expression.Property(
                            Expression.Convert(
                                o,
                                target
                            ),
                            property
                        ),
                        typeof(object)
      
             );
    }

    //
    // Compile mapper function.
    //
    CompiledMapper = Expression.Lambda<Func<object, object[]>>(Expression.NewArrayInit(typeof(object), args), o).Compile();

    //
    // Compile invoker function.
    //

    Expression invoker = Expression.Invoke(Expression.Constant(Action), args);
    CompiledInvoker = Expression.Lambda<Func<object, T>>(invoker, o).Compile();
}

Make sure MatchEntry is a nested class in Pattern<T> since we're referencing generic parameter T in here to denote the result type. But man, how simply this fix is once more :-). First of all, we don't longer cast the parameters to System.Object. Second, we got rid of the array initialization. Third, we simply invoke the action delegate (which is kept in the MatchEntry as a property).

The only thing left to change is the Execute method on Pattern<T>:

public T Execute(object o)
{
    if (Object != null)
        throw new InvalidOperationException("Execution is only valid on unbound pattern match objects.");

    if (_else == null)
        throw new InvalidOperationException("Can't evaluate the match. Incomplete match object. Are you missing an Else clause?");

    foreach (MatchEntry entry in _matches)
    {
        if (entry.CompiledMatch != null)
        {
            if (entry.CompiledMatch(o))
            {
                return (T)entry.Action.DynamicInvoke(entry.CompiledMapper(o));
                return entry.CompiledInvoker(o);
            }
        }
        else
        {
            Dictionary<ParameterExpression, PropertyInfo> bindings;
            if (TryMatch(o, entry.Match, out bindings))
            {
                return Evaluate(o, entry.Match, entry.Action, bindings);
            }
        }
    }

    return _else();
}

That's it. You can guess by now: benchmark time!

image

Doesn't need further comments I guess... Below are the results of Call Tree analysis using CLR Profiler:

image

In red, you can see a match: <Benchmark>b__3(String) is the anonymous method for the Match clause, taking in the bound parameter and returning a bool. Indicated in blue is a mismatch: <Benchmark>b__2() is the anonymous method for the Else clause. The mismatch case is still the same compared to the previous run: 12 calls made, but the match case has improved dramatically, all the way down from 127 to 11. The reason the mismatch case takes one call more that the match is the additional MoveNext call caused by the foreach in the Execute method.

Could we improve even more? Yes, for example calls to Object::Equals can be eliminated in certain cases, replacing them by efficient equality checks for built-in types like Int32 and Boolean. Is it worth it? No.

 

Coming up in this series: extending the unification engine, a detour through Reflection.Emit and who knows what more :-).

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

More Posts Next page »