January 2010 - Posts

Slightly over two years after arriving here in Redmond to work on the WPF team, time has come for me to make a switch and pursue other opportunities within the company. Starting January 13th, I’ll be working on the SQL Cloud Data Programmability Team on various projects related to democratizing the cloud. While we have much more rabbits sitting in our magician hats, Rx is the first big deliverable we’re working on.

For my blog, there won’t be much change as I’ve always written on topics related to what I’ll be working on: language innovation, data access, LINQ, type systems, lambda fun, etc. I’m planning to stay committed to blogging and other evangelism activities, including speaking engagements from time to time, so feel free to ping me if I’m in your proximity (or if you’re visiting our campus). Next up and confirmed are TechDays “low lands” in Belgium and the Netherlands, end of March.

Needless to say, I’m thrilled to have this opportunity of working together with a relatively small group of smart and passionate people, on the things I’d spend all my free time on anyway. Having this one-to-one alignment between day-to-day professional activities at work and all sorts of free time hacking projects is like a dream coming true. Thanks Danny, Erik, Jeffrey, Mark and Wes for taking me on board.

Expect to see more Rx blogging love over here, and watch out for more goodness to come your way in the foreseeable future. In the meantime, check out the following resources on the matter:

Please keep the feedback on Rx coming: help us, help you!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about EnumerableEx’s facilities to tame side-effects in a functionally inspired manner:

image

 

To side effect or not to side effect?

Being rooted in query comprehensions as seen in various functional programming languages (including (the) pure one(s)), one would expect LINQ to have a very functional basis. Indeed it has, but being hosted in various not functionally pure languages like C# and Visual Basic, odds are off reasoning about side-effects in a meaningful and doable manner. As we’ve seen before, when talking about the Do and Run operators, it’s perfectly possible for a query to exhibit side-effects during iteration. You don’t even have to look that far, since every lambda passed to a query operator is an opportunity of introducing effects. The delayed execution nature of LINQ queries makes that those effects appear at the point of query execution. So far, nothing new.

So, the philosophical question ought to be whether or not we should embrace side-effects or go for absolute purity. While the latter would be preferable for various reasons, it’s not enforceable through the hosting languages for LINQ, so maybe we should exploit side-effects if we really want to do so. The flip side of this train of thought is that those side-effects could come and get us if we’re not careful, especially when queries get executed multiple times, potentially as part of a bigger query. In such a case, you’d likely not want effects to be duplicated. Below is a sample of such a problematic query expression:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _);
xrs.Zip(xrs, (l, r) => l + r).Take(10).Run(Console.WriteLine);

Using Generate, we generate a sequence of random numbers. Recall the first argument is the state of the anamorphic Generate operator, which we get passed in the lambdas following it: once to produce an output sequence (just a single random number in our case) and once to iterate (just keeping the same random number generator here). What’s more important is we’re relying on the side-effect of reading the random number generator which, as the name implies, provides random answers to the Next inquiry every time it gets called. In essence, the side-effect can (not) be seen by looking at the signature of Random.Next, which says it returns an int. In .NET this means the method may return the same int every time it gets called, but there are no guarantees whatsoever (as there would be in pure functional programming languages).

This side-effect, innocent and intentional as it may seem, comes and gets us if we perform a Zip on the sequence with itself. Since Zip iterates both sides, we’re really triggering separate enumeration (“GetEnumerator”) over the same sequence two times. Though it’s the same sequence object, each of its iterations will produce different results. As a result, the expected invariant of the Zip’s output being only even numbers (based on the assumption l and r would be the same as they’re produced by the same sequence) doesn’t hold:

52
114
112
103
41
135

78
114
59
137

While random number generation is a pretty innocent side-effect, not having it under control properly can lead to unexpected results as shown above. We can visualize this nicely using another side-effect introduced by Do:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr));
xrs.Zip(xrs, (l, r) => l + r).Take(10).Run(Console.WriteLine);

This will print a message for every number flowing out of the random number generating sequence, as shown below:

! -> 97
! -> 78
175
! -> 11
! -> 6
17
! -> 40
! -> 17
57
! -> 92
! -> 63
155
! -> 70
! -> 13
83
! -> 41
! -> 1
42
! -> 64
! -> 76
140
! -> 30
! -> 71
101
! -> 1
! -> 81
82
! -> 65
! -> 45
110

If we look a bit further to the original query, we come to the conclusion we can’t apply any form of equational reasoning anymore: it seems that the common subexpression “xrs” is not “equal” (as in exposing the same results) in both use sites. The immediate reason in the case of LINQ is the delayed execution, which is a good thing as our Generate call produces an infinite sequence. More broadly, it’s the side-effect that lies at the heart of the problem as equational reasoning breaks down in such a setting. For that very reason, side-effect permitting languages have a much harder time carrying out optimizations to code and need to be very strict about specifying the order in which operations are performed (e.g. in C#, arguments to a method call – which is always “call-by-value” – are evaluated in a left-to-right order).

Moving Take(10) up doesn’t change the delayed characteristic either:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Take(10)
    .Do(xr => Console.WriteLine("! -> " + xr));
xrs.Zip(xrs, (l, r) => l + r).Run(Console.WriteLine);

What would help is forcing the common subexpression’s query to execute, persisting (= caching) its results in memory, before feeding them in to the expression using it multiple times:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Take(10).ToArray()
    .Do(xr => Console.WriteLine("! -> " + xr));
xrs.Zip(xrs, (l, r) => l + r).Run(Console.WriteLine);

Don’t forget the Take(10) call though, as calling ToArray (or ToList) on an infinite sequence is not quite advised on today’s machines with finite amounts of memory. It’s clear such hacking is quite brittle and it breaks the delayed execution nature of the query expression. In other words, you can’t really hand out the resulting expression to a caller for it to call when it needs results (if it ever does). We’re too eager about evaluating (part of) the query, just to be able to tame the side-effect:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Take(10).ToArray();

var randomEvens = xrs.Zip(xrs, (l, r) => l + r);
// What if the consumer of randomEvens expects different results on each enumeration... Hard cheese! 
randomEvens.Run(Console.WriteLine);
randomEvens.Run(Console.WriteLine);

It’s clear that we need some more tools in our toolbox to tame desired side-effects when needed. That’s exactly what this post focuses on.

 

Option 1: Do nothing with Let

A first way to approach side-effects is to embrace them as-is. We just allow multiple enumerations of the same sequence to yield different results (or more generally, replicate side-effects). However, we can provide a bit more syntactical convenience in writing queries that reuse the same common subexpression in multiple places. In the above, we had to introduce an intermediate variable to store the common expression in, ready for reuse further on:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _);
xrs.Zip(xrs, (l, r) => l + r).Take(10).Run(Console.WriteLine);

Can’t we somehow write this more fluently? The answer is yes, using the Let operator which passes its left-hand side to a lambda expression that can potentially use it multiple times:

EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Let(xrs => xrs.Zip(xrs, (l, r) => l + r)).Take(10).Run(Console.WriteLine);

You can guess the signature of Let just by looking at the use above, but let’s include it for completeness:

public static IEnumerable<TResult> Let<TSource, TResult>(this IEnumerable<TSource> source, Func<IEnumerable<TSource>, IEnumerable<TResult>> function);

Because of the call-by-value nature of the languages we’re talking about, the expression used for the source parameter will be fully evaluated (not the same as enumerated!) before Let gets called, so we can feed it (again in a call-by-value manner) to the function which then can refer to it multiple times by means of its lambda expression parameter (in the sample above this is “xrs”). Let comes from the world of functional languages where it takes the following form:

let x = y in z

means (in C#-ish syntax)

(x => z)(y)

In other words, there’s a hidden function x => z sitting in a let-expression and the “value” for x (which is y in the sample) gets passed to it, providing the result for the entire let-expression. In EnumerableEx.Let, the function is clear as the second parameter, and the role of “y” is fulfilled by the source parameter. One could create a Let-form for any object as follows (not recommended because of the unrestricted extension method):

public static R Let<T, R>(this T t, Func<T, R> f)
{
    return f(t);
}

With this, you can write things like this:

Console.WriteLine(DateTime.Now.Let(x => x - x).Ticks);

This will print 0 ticks for sure, since the same DateTime.Now is used for x on both sides of the subtraction. If we were to expand this expression by substituting DateTime.Now for x, we’d get something different due to the duplicate evaluation of DateTime.Now, exposing the side-effect of reading from the system clock:

Console.WriteLine((DateTime.Now - DateTime.Now).Ticks);

(Pop quiz: What sign will the above Ticks result have? Is it possible for the above to return 0 sometimes?)

 

Option 2: Cache on demand a.k.a. MemoizeAll

As we’ve seen before, on way to get rid of the side-effect replication is by forcing eager evaluation of the sequence through operators like ToArray or ToList. However, those are a bit too eager in various ways:

  • They persist the whole sequence, which won’t work for infinite sequences.
  • They do so on the spot, i.e. the eagerness can’t be delayed till a later point (‘on demand”).

The last problem can be worked around using the Defer operator, but the first one is still a problem requiring another operator. Both those things are what MemoizeAll provides for, essentially persisting the sequence bit-by-bit upon consumption. This is achieved by exposing the enumerable while only maintaining a single enumerator to its source:

image

In the figure above, this is illustrated. Red indicates a fetch operation where the original source’s iterator makes progress as an element is requested that hasn’t been fetched before. Green indicates persisted (cached, memoized) objects. Gray indicates elements in the source that have been fetched and hence belong to the past from the (single) source-iterators point of view: MemoizeAll won’t ever request those again. Applying this operator to our running sample using Zip will produce results with the expected invariant:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr))
    .MemoizeAll();
xrs.Do(xr => Console.WriteLine("L -> " + xr)).Zip(
    xrs.Do(xr => Console.WriteLine("R -> " + xr)),
    (l, r) => l + r).Take(10).Run(Console.WriteLine);

Now we’ll see the xrs-triggered Do messages being printed only 10 times since the same element will be consumed by the two uses of xrs within Zip. The result looks as follows, showing how the right consumer of Zip never causes a fetch back to the random number generating source due to the internal caching by MemoizeAll:

! -> 71
L -> 71
R -> 71
142
! -> 18
L -> 18
R -> 18
36
! -> 12
L -> 12
R -> 12
24
! -> 96
L -> 96
R -> 96
192
! -> 1
L -> 1
R -> 1
2
! -> 54
L -> 54
R -> 54
108
! -> 9
L -> 9
R -> 9
18
! -> 87
L -> 87
R -> 87
174
! -> 18
L -> 18
R -> 18
36
! -> 12
L -> 12
R -> 12
24

What about lifetime of the source’s single enumerator? As soon as one of the consumers reaches the end of the underlying sequence, we got all elements cached and are prepared to any possible inquiry for elements on the output side of the MemoizeAll operator, hence it’s possible to dispose of the original enumerator. It should also be noted that memoization operators use materialization internally to capture the behavior of the sequence to expose to all consumers. This means exceptions are captured as Notification<T> so they’re repeatable to all consumers:

var xes = EnumerableEx.Throw<int>(new Exception()).StartWith(1).MemoizeAll();
xes.Catch((Exception _) => EnumerableEx.Return(42)).Run(Console.WriteLine);
xes.Catch((Exception _) => EnumerableEx.Return(42)).Run(Console.WriteLine);

The above will therefore print 1, 42 twice. In other words, the source blowing up during fetching by MemoizeAll doesn’t terminate other consumers that haven’t reached the faulty state yet (but if they iterate long enough, they’ll eventually see it exactly as the original consumer did).

Finally, what’s All about MemoizeAll? In short: the cache used by the operator can grow infinitely large. The difference compared to ToArray and ToList has been explained before, but it’s worth repeating it: MemoizeAll doesn’t fetch its source’s results on the spot but only makes progress through the source’s enumerator when one of the consumers requests an element that hasn’t been retrieved yet. Call it a piecemeal ToList if you want.

 

Option 3: Memoize, but less “conservative”

While MemoizeAll does the trick to avoid repetition of side-effects, it’s quite conservative in its caching as it never throws away elements it has retrieved. You never know whether someone – like a slow enumerator or a whole new enumerator over the memoized result – will request the data again, so a general-purpose Memoize can’t throw away a thing. However, if you know the behavior of the consumers of the memoized source, you can be more efficient about it and use Memoize specifying a buffer size. In our running sample of Zip we know that both uses of the source for the left and right inputs to Zip will be enumerated at the same pace, so it suffices to keep the last element in the cache in order for the right enumerator to be able to see the element the left enumerator just saw. Memoize with buffer size 1 does exactly that:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr))
    .Memoize(1);
xrs.Do(xr => Console.WriteLine("L -> " + xr)).Zip(
    xrs.Do(xr => Console.WriteLine("R -> " + xr)),
    (l, r) => l + r).Take(10).Run(Console.WriteLine);

In pictures, this looks as follows:

image

Another valid buffer size – also the default – is zero. It’s left to the reader, as an exercise, to come up with a plausible theory for what that one’s behavior should be and to depict this case graphically.

(Question: Would it be possible to provide a “smart” memoization operator that knows exactly when it can abandon items in the front of its cache? Why (not)?)

 

Derived forms

The difference between Let and the Memoize operators is that the former feeds in a view on an IEnumerable<T> source to a function, allowing that one to refer to the source multiple times in the act of producing a source in return. Let is, as we saw, nothing but fancy function application in a “fluent” left-to-right dataflowy way. Derived forms of Memoize exist that have the same form where a function is fed a memoized data source:

  • Replay is Memoize on steroids
  • Publish is MemoizeAll on steroids

The following snippets show just what those operators do (modulo parameter checks):

public static IEnumerable<TResult> Publish<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function) { return function(source.MemoizeAll()); } public static IEnumerable<TResult> Publish<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function,
TSource initialValue) { return function(source.MemoizeAll().StartWith(initialValue)); } public static IEnumerable<TResult> Replay<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function) { return function(source.Memoize()); } public static IEnumerable<TResult> Replay<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function,
int bufferSize) { return function(source.Memoize(bufferSize)); }

So we could rewrite our Zip sample in a variety of ways, the following being the cleanest one-sized buffer variant:

EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr))
    .Replay(xrs => xrs.Do(xr => Console.WriteLine("L -> " + xr)).Zip(
                   xrs.Do(xr => Console.WriteLine("R -> " + xr)),
                   (l, r) => l + r),
            1)
    .Take(10).Run(Console.WriteLine);

 

Option 4: Fair (?) sharing with Share and Prune

The Share operator shares an IEnumerator<T> for any number of consumers of an IEnumerable<T>, hence avoiding duplication of side-effects. In addition, it also guarantees that no two consumers can see the same element, so in effect the Share operator has the potential of distributing elements across different consumers. Looking at it from another angle, one consumer can steal elements from the source, preventing another consumer from seeing it. Prune is derived from Share as follows:

public static IEnumerable<TResult> Prune<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function) { return function(source.Share()); }

The naming for Prune follows from the effect consumers inside the function have on the sequence being shared: each one consuming data effectively prunes elements from the head of the sequence, so that others cannot see those anymore. An example is shown below, showing another way a Zip could go wrong (practical scenarios for this operator would involve sharing) since the left and right consumers both advance the cursor of the same shared enumerator under the hood:

EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr))
    .Prune(xrs => xrs.Do(xr => Console.WriteLine("L -> " + xr)).Zip(
                   xrs.Do(xr => Console.WriteLine("R -> " + xr)),
                   (l, r) => l + r))
    .Take(10).Run(Console.WriteLine);

The result of this is of interest since the logging will reveal the sharing characteristic. Looking at the first Do’s output we’ll see it gets triggered by any consumer on the inside of Prune:

! -> 37
L -> 37
! -> 51
R -> 51
88
! -> 98
L -> 98
! -> 89
R -> 89
187
! -> 4
L -> 4
! -> 71
R -> 71
75
! -> 43
L -> 43
! -> 30
R -> 30
73
! -> 18
L -> 18
! -> 24
R -> 24
42
! -> 17
L -> 17
! -> 41
R -> 41
58
! -> 45
L -> 45
! -> 68
R -> 68
113
! -> 83
L -> 83
! -> 53
R -> 53
136
! -> 64
L -> 64
! -> 69
R -> 69
133
! -> 0
L -> 0
! -> 22
R -> 22
22

In pictures, this looks as follows:

image

Exercise: Can you guess how Memoize(0) differs from Share?

Quiz: What should be the behavior of the following fragment? (Tip: you got to know what two from clauses result in and how they execute)

Enumerable.Range(0, 10)
.Prune(xs => from x in xs.Zip(xs, (l, r) => l + r) from y in xs select x + y) .Run(Console.WriteLine);

 

Next on More LINQ

A look at the Asynchronous and Remotable operators, dealing with some infrastructure-related concepts, wrapping up this series for now.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

select top 9 [Subject] from dbo.cs_Posts
where postlevel = 1 and usertime < '01/01/2010' and usertime >= '01/01/2009'
order by TotalViews desc

Forgive me for the classic SQL, but here are the results with some short annotations inline:

  1. (Mis)using C# 4.0 Dynamic – Type-Free Lambda Calculus, Church Numerals, and more

    Uses the new C# 4.0 dynamic feature to implement the type-free lambda calculus consisting of an abstraction and application operator. Besides talking about the fundamentals of lambda calculus, this post shows how to implement the SKI combinators and Church Booleans, Church numerals and even recursive functions.
     
  2. LINQ to Ducks – Bringing Back The Duck-Typed foreach Statement To LINQ

    Since LINQ to Objects is layered on top of IEnumerable<T>, it doesn’t work against objects that just happen to implement the enumeration pattern consisting of GetEnumerator, MoveNext and Current. Since the foreach statement actually does work against such data sources, we bring back this duck typing to LINQ using AsDuckEnumerable<T>().
     
  3. Type-Free Lambda Calculus in C#, Pre-4.0 – Defining the Lambda Language Runtime (LLR)

    We repeat the exercise of the first blog post but now without C# 4.0 dynamic features., encoding application and abstraction operators using none less that exceptions. Those primitives define what I call the Lambda Language Runtime (LLR), which we use subsequently to implement a bunch of samples similar to the ones in the first post.
     
  4. Taming Your Sequence’s Side-Effects Through IEnumerable.Let

    Enumerable sequences can exhibit side-effects for various reasons ranging from side-effecting filter predicates to iterators with side-effecting imperative code interwoven in them. The Let operator introduced in this post helps you to keep those side-effects under control when multiple “stable” enumerations over the sequence are needed.
     
  5. Statement Trees With Less Pain – Follow-Up on System.Linq.Expressions v4.0

    The introduction of the DLR in the .NET 4 release brings us not only dynamic typing but also full-fledged statement trees as an upgrade to the existing LINQ expression trees. Here we realize a prime number generator using statement trees and runtime compilation, reusing expression trees emitted by the C# compiler where possible.
     
  6. LINQ to Z3 – Theorem Solving on Steroids – Part 1

    LINQifying Microsoft Research’s Z3 theorem solver has been one of my long-running side-projects. This most recent write-up on the matter illustrates the concept of a LINQ-enabled Theorem<T> and the required visitor implementation to interop with the Z3 libraries. Finally, we show a Sudoku and Kakuro solver expressed in LINQ.
     
  7. Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

    Just like post 5, we have a look at the .NET 4 expression tree support, now including statement trees. Besides pointing out the new tree node types, we show dynamic compilation and inspect the generated IL code using the SOS debugger’s dumpil command. In post 5, we follow up by showing how to reuse C# 3.0 expression tree support.
     
  8. Unlambda .NET – With a Big Dose of C# 3.0 Lambdas

    Esoteric programming languages are good topics for Crazy Sundays posts. In this one we had a look at how to implement the Unlambda language – based on SKI combinators and with “little” complications like Call/CC – using C# 3.0 with lots of lambda expressions. To satisfy our curiosity, we run a Fibonacci sample program.
     
  9. C# 4.0 Feature Focus – Part 4 – Co- and Contra-Variance for Generic Delegate and Interface Types

    Generic co- and contra-variance is most likely the most obscure C# 4.0 feature, so I decided to give it a bit more attention using samples of the everyday world (like apples and tomatoes). We explain why arrays are unsafe for covariance and how generic variance gets things right, also increasing your expressiveness.

In conclusion, it seems esoteric and foundational posts are quite popular, but then again that’s what I write about most. For 2010, I hope to please my readers’ interests even further with the occasional “stunt coding”, “brain pain” and “mind bending” (based on Twitter quotes in 2009). If there are particular topics you’d like to see covered, feel free to let me know. So, thanks again for reading in 2009 (good for slightly over 1TB – no that’s not a typo – of data transfer from my hoster) and hope to see you back in 2010!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduced in my previous blog post on The Essence of LINQ – MinLINQ, the first release of this project is now available for reference at the LINQSQO CodePlex website at http://linqsqo.codeplex.com. Compared to the write-up over here in my previous post, there are a few small differences and caveats:

  • Only FEnumerable functionality is available currently; the FObservable dual may follow later.
  • Option<T> has been renamed to Maybe<T>, to be CLS compliant and avoid clashes with the VB keyword.
  • Some operators are not provided, in particular GroupBy, GroupJoin and Join. They’re left as an exercise.
  • A few operator implementations are categorized as “cheaters” since they roundtrip through System.Linq.
  • Don’t nag about performance. The MinLINQ code base is by no means optimal and so be it.
  • Very few System.Interactive operators are included since those often require extra foundations (such as concurrency).

A few highlights:

  • FEnumerable.Essentials.cs is where the fun starts. Here the three primitives – Ana, Bind and Cata – form the ABC of LINQ.
  • There’s a Naturals() constructor function generating an infinite sequence of natural numbers, used in operators that use indexes.
  • OrderBy and ThenBy are supported through roundtripping to System.Linq with a handy trick to keep track of IOrderedEnumerable<T>.
  • As a sample, I’ve included Luke Hoban’s LINQified RayTracer with AsFEnumerable and AsEnumerable roundtripping. It works just fine.
  • Creating an architectural diagram in Visual Studio 2010 yields the following result (not meant to zoomed in), where I’ve used the following colors:
    • Green = Ana
    • Blue = Bind
    • Red = Cata

image

Obviously, all sorts of warnings apply. People familiar to my blog adventures will know this already, but just in case:

//
// This project is meant as an illustration of how an academically satifying layering
// of a LINQ to Objects implementation can be realized using monadic concepts and only
// three primitives: anamorphism, bind and catamorphism.
//
// The code in this project is not meant to be used in production and no guarantees are
// made about its functionality. Use it for academic stimulation purposes only. To use
// LINQ for real, use System.Linq in .NET 3.5 or higher.
//
// All of the source code may be used in presentations of LINQ or for other educational
// purposes, but references to http://www.codeplex.com/LINQSQO and the blog post referred
// to above - "The Essence of LINQ - MinLINQ" - are required.
//

Either way, if you find LINQ interesting and can stand some “brain pain of the highest quality” (a Twitter quote by dahlbyk), this will likely be something for you.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Before reaching the catharsis in the “More LINQ with System.Interactive” series over here, I wanted to ensure a solid understanding of the essence of LINQ in my reader base. Often people forget the true essence of a technology due to the large number of auxiliary frameworks and extensions that are being provided. Or worse, sometimes a sense for the essence never materialized.

Searching for essence is nothing other than a “group by” operation, partitioning the world in fundamentals and derived portions. One succeeds in this mission if the former group is much smaller than the latter. In this post, we’ll try to reach that point for the IEnumerable<T> and IObservable<T> LINQ implementations, illustrating both are fundamentally similar (and dare I say, dual?). You can already guess much of the essence lies in the concept of monads. By the end of the post, we’ll have distilled the core of LINQ, which I refer to as MinLINQ since small is beautiful.

 

Interfaces are overrated?

While loved by object-oriented practitioners, interfaces are essentially nothing but records of functions. And functions, as we all know, are the fundamental pillars of functional programming languages. This trivial observation is illustrated below. I’ll leave it to the reader to think about various implications of the use of a (covariant) IRecord representation for objects:

class Program
{
    static void Main()
    {
        for (var c = new Counter(); c.Get() < 10; c.Inc(1))
            Console.WriteLine(c.Get());
    }
}

interface IRecord<out T1, out T2>
{
    T1 First { get; }
    T2 Second { get; }
}

class Counter : IRecord<Func<int>, Action<int>>
{
    // Data
    private int _value;

    // Code - explicit implementation to hide First, Second
    Func<int> IRecord<Func<int>, Action<int>>.First { get { return () => _value; } }
    Action<int> IRecord<Func<int>, Action<int>>.Second { get { return i => _value += i; } }

    // Code - friendly "interface"
    public Func<int> Get { get { return ((IRecord<Func<int>, Action<int>>)this).First; } }
    public Action<int> Inc { get { return ((IRecord<Func<int>, Action<int>>)this).Second; } }
}

Why do we care? Well, it turns out that IEnumerable<T> and IObservable<T> tend to obscure the true meaning of the objects a bit by having many different methods to facilitate the task of enumeration and observation, respectively. The source of this apparent bloating is irrelevant (and in fact follows design guidelines of an object-oriented inspired framework); what matters more is to see how the two mentioned interfaces can be boiled down to their essence.

Minimalistic as we are, we’re going to drop the notion of error cases that manifest themselves through MoveNext throwing an exception and OnError getting called, respectively on IEnumerator<T> and IObserver<T>. For similar reasons of simplification, we’ll also not concern ourselves with the disposal of enumerators or subscriptions. The resulting picture looks as follows:

image

To consolidate things a bit further, we’ll collapse MoveNext/Current on the left, and OnNext/OnCompleted on the right. How so? Well, either getting or receiving the next element can provide a value or a termination signal. This is nothing but a pair of an optional value and a Boolean. Turns out we have such a thing in the framework, called Nullable<T> but since one can’t nest those guys or use them on reference types, it doesn’t help much. Instead, we’ll represent the presence or absence of a value using an Option<T> type:

public abstract class Option<T>
{
    public abstract bool HasValue { get; }
    public abstract T Value { get; }

    public sealed class None : Option<T>
    {
        public override bool HasValue
        {
            get { return false; }
        }

        public override T Value
        {
            get { throw new InvalidOperationException(); }
        }

        public override string ToString()
        {
            return "None<" + typeof(T).Name + ">()";
        }
    }

    public sealed class Some : Option<T>
    {
        private T _value;

        public Some(T value)
        {
            _value = value;
        }

        public override bool HasValue
        {
            get { return true; }
        }

        public override T Value
        {
            get { return _value; }
        }

        public override string ToString()
        {
            return "Some<" + typeof(T).Name + ">(" + (_value == null ? "null" : _value.ToString()) + ")";
        }
    }
}

The subtypes None and Some are optional though convenient, hence I’ll leave them in. With this, IEnumerator<T> would boil down to an interface with a single method retrieving an Option<T>. When it returns a Some object, there was a next element and we got it; when it returns None, we’ve reached the end of the enumeration. Similar for IObserver<T>, OnNext and OnCompleted are merged into a single method receiving an Option<T>. Interfaces with a single method have a name: they’re delegates. So both those types can be abbreviated to:

IObserver<T>  –>  Action<Option<T>>
IEnumerator<T>  –>  Func<Option<T>>

A quick recap: an observer is something you can give a value or tell it has reached the end of the observable object, hence it takes in an Option<T>; an enumerator is something you can get a value from but it can also signal the end of the enumerable object, hence it produces an Option<T>. In a more functional notation, one could write:

Option<T> –> ()
() –> Option<T>

Here the arrow indicates “goes to”, just as in lambda expressions, with the argument on the left and the return type on the right. All that has happened is reverting the arrows to go from an observer to an enumerator and vice versa. That’s the essence of dualization.

But we’re not done yet. Look one level up at the IEnumerable<T> and IObserver<T> interfaces. Those are single-method ones too, hence we can play the same trick as we did before. The IEnumerable<T> interface’s single method returns an IEnumerator<T>, which we already collapsed into a simple function above. And in a dual manner, IObservable<T>’s single method takes in an IObserver<T>, which we also collapsed above. The yields the following result:

IObservable<T>  –>  Action<Action<Option<T>>>
IEnumerable<T>  –>  Func<Func<Option<T>>>

If that isn’t a simplification, I don’t know what would be. An observable is nothing other than an action taking in an action taking in an observed value, while an enumerable is nothing other than a function returning a function returning a yielded value. Or, in concise functional notation:

(Option<T> –> ()) –> ()
() –> (() –> Option<T>)

Again, to go from one world to the other, it suffices to reverse the arrows to reach the dual form. In summary, have a look at the following figure:

image

 

Flat functions – FEnumerable and FObservable

Since we’ve flattened imperative interfaces into flat functions we’re going to provide several operators over those, we need to have a name for the type to stick those items in. Though we’re not going to make things purely functional on the inside (as we’ll rely on side-effects to implement various operators), I still like to call it function-style enumerable and observable, hence the names FEnumerable and FObservable (not meant to be pronounceable), where F stands for Function as opposed to I for Interface. In addition, Ex additions will materialize to realize some layering as discussed below. The result, including FEnumerableEx2 that’s left as an exercise, is shown below:

image

 

Five essential operators, or maybe even less

To continue on our merry way towards the essence of LINQ, we’ll be providing five essential operators as the building blocks to construct most other operators out of. Needless to say so, those operators will use the above flat function “interfaces” to do their work on. Let’s start with a couple of easy ones: Empty and Return.

 

Empty

The Empty operator is very straightforward and never deals with Option<T>.Some values, just signaling an Option<T>.None immediately to signal completion. Hence the produced collection is empty. How do we realize this operator in the enumerable and observable case? Not surprisingly, the implementation is straightforward in both cases:

public static class FEnumerable
{
    public static Func<Func<Option<T>>> Empty<T>()
    {
        return () => () => new Option<T>.None();
    }

First, the FEnumerable one. All it does is simply returning a function that returns an end-of-sequence None signal in return to getting called. Notice the two levels of nesting needed to be conform with the signature. The outer function is the one retrieving the enumerator, while the inner is the equivalent to MoveNext and Current. For absolute clarity:

image

One the FObservable side of things, we find a similar implementation shuffled around a little bit, as shown below:

public static class FObservable
{
    public static Action<Action<Option<T>>> Empty<T>()
    {
        return o => o(new Option<T>.None());
    }

What used to be output now becomes input: the None constructor call no long appears in an output position but has moved to an input position. Similar for the observer, indicated with o, which has moved to an input position. Upon giving the observable object (the whole thing) an observer (o), the latter gets simply called with a None object indicating the end of the sequence. The inner call is equivalent to OnCompleted, while the whole lambda expression is equivalent to Subscribe.

image

The careful reader may spot an apparent difference in the multiplicity of the involved operations. Where one enumerable can be used to get multiple enumerators, it seems that one observable cannot be used with multiple observers. This is only how it looks, as duality comes to the rescue to explain this again. The statement for enumerables goes as follows: “multiple calls to GetEnumerator each return one IEnumerator”. The dual of that becomes “a single call to Subscribe can take in multiple IObservers”. While that’s not exactly the case in the real IObserver land, where you either wrap all of your observers in a single IObserver to achieve this effect, or make multiple calls to Subscribe (assuming – and that’s where the MinLinq approach differs slightly – a call to subscribe doesn’t block), it’s incredibly true in FObservable. How so? Well, one can combine delegates using the + operator to achieve the effect of subscribing multiple observers at the same time:

Action<Option<int>> observer1 = x => Console.WriteLine("1 <- " + x);
Action<Option<int>> observer2 = x => Console.WriteLine("2 <- " + x);

var xs = FObservable.Return(1);
xs(observer1 + observer2);

The above will print Some(1) and None() twice, since both observers are getting it (in invocation order, coinciding with lexical order).

 

Return

The previous sample brings us seamlessly to the next operator: Return, which realizes a singleton enumerable or observable collection. Though this one seems easy as well, it’s getting a bit more complex in the enumerable case as we need to maintain state across calls to “MoveNext”. Moreover, we need to do so on a per-enumerator basis as they all need to have their own view on the sequence. In our observable case, for the reasons mentioned above, things are slightly simpler as we can just “fire and forget” all data upon receiving a call to Subscribe. (Exercise: how would you make Subscribe asynchronous with respect to the sequence producing its values? When is this useful and when is it harmful?)

Let’s first look at the Return operator realization in FEnumerable:

public static Func<Func<Option<T>>> Return<T>(T value)
{
    return () =>
    {
        int i = 0;
        return () =>
            i++ == 0
            ? (Option<T>)new Option<T>.Some(value)
            : (Option<T>)new Option<T>.None();
    };
}

The state local to the “enumerator block” contains a counter that keeps track of the number of MoveNext calls that have been made. The first time, we return a Some(value) object, and the second (and subsequent) time(s) we answer with None. Notice this has the implicit contract of considering a None value as a terminal in the grammar. If you want to enforce this policy, an exception could be raised if i reaches 2.

In the FObservable world, things are quite easy. Upon a subscription call, we signal a Some and None message on the OnNext function, like this:

public static Action<Action<Option<T>>> Return<T>(T value)
{
    return o =>
    {
        o(new Option<T>.Some(value));
        o(new Option<T>.None());
    };
}

 

Bind

Why says Return and knows about monads immediately thinks about Bind (>>= in Haskell). The Bind operator, known as SelectMany in LINQ, provides an essential combinator allowing to compose objects in the monad. In our case, those monads are IEnumerable<T> and IObservable<T>. In a previous episode of my More LINQ series, I’ve explained the basic idea of monadic composition a bit further, as summarized in the figure below:

image

In the above, M<.> has to be substituted for either Func<Func<.>> or Action<Action<.>> to yield the signature for both FEnumerable’s and FObservable’s Bind operators. The implementation of the operator in the latter case is the more straightforward one of both:

public static Action<Action<Option<R>>> Bind<T, R>(this Action<Action<Option<T>>> source, Func<T, Action<Action<Option<R>>>> selector)
{
    return o => source(x =>
    {
        if (x is Option<T>.None)
        {
            o(new Option<R>.None());
        }
        else
        {
            selector(x.Value)(y =>
            {
                if (y is Option<R>.Some)
                    o(y);
            });
        }
    });
}

Here, upon subscribing to an observable using observer “o”, the operator itself subscribes to the source observable that was fed in to the function. It does so by providing an observer that takes in the received element as “x”. Inside the observer’s body, which gets called for every element raised by the source, “x” is analyzed to see whether or not the source has terminated. If not, Bind does its combining work by calling the selector function for the received element, getting back a new observable source “f(x.Value)”. The goal of Bind is to surface the values raised on this source to the surface of the operator call. Hence, we subscribe to this computed source “f(x.Value)” by providing an observer that takes in the received value as “y” and raises that to the surface by calling “o” (the external observer). Again we assume None is terminating the sequence, which could be enforced by keeping a bit of state (left as an exercise). We’ll see examples of operator usage later on.

(Exercise: What if we want the Subscribe method to return immediately, running the Bind in the background. How would you do so?)

In the FEnumerable case, things get more complex as we need to keep track of where we are in the source and projected sequences across different calls to “MoveNext”. While we could realize this using a state machine (just like iterators would do), I’ve taken on the challenge to write a state-keeping set of loops by hand. It may well be optimized or tweaked but it seems to do its job. Important situations to keep in mind include encountering empty inner sequences (signaled by None), requiring us to loop till we eventually find an object to yield. It’s also important to properly return a Option<R>.None object when we reach the end of the outer source. One of the most essential parts of the code below is the storage of state outside the inner lambda, hence keeping per-enumerator state. Besides cursors into the outer and current inner sequences, we also keep the inner enumerator (recall the signature corresponding to IEnumerator<T>) in “innerE”.

public static Func<Func<Option<R>>> Bind<T, R>(this Func<Func<Option<T>>> source, Func<T, Func<Func<Option<R>>>> f)
{
    return () =>
    {
        var e = source();

        Option<T> lastOuter = new Option<T>.None();
        Option<R> lastInner = new Option<R>.None();
        Func<Option<R>> innerE = null;

        return () =>
        {
            do
            {
                while (lastInner is Option<R>.None)
                {
                    lastOuter = e();

                    if (lastOuter is Option<T>.None)
                    {
                        return new Option<R>.None();
                    }
                    else
                    {
                        innerE = f(lastOuter.Value)();
                    }

                    lastInner = innerE();
                    if (lastInner is Option<R>.Some)
                    {
                        return lastInner;
                    }
                }

                lastInner = innerE();
            } while (lastInner is Option<R>.None);

            return lastInner;
        };
    };
}

The reader is invited to make sense of the above at his or her own pace, keeping in mind the regular LINQ to Objects implementation is the following much more comprehensible code:

public static IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source, Func<T, IEnumerable<R>> f)
{
    foreach (var item in source)
        foreach (var result in f(item))
            yield return result;
}

The interesting thing about the SelectMany implementation is that the types in the signature exactly tell you what to do: the main operation on an IEnumerable is to enumerate using foreach. The only parameter you can do that on is source, but you can’t yield those elements as the output expects elements of type R and we got elements of type T. However, the function “f” accepts a T and produces an IEnumerable<R>, so if we call that one an enumerate the results, we got what we can yield. Simple.

This operator is essential to LINQ (and monads) in that it allows many other operators to be written in terms of it. Where and Select and two that pop to mind immediately, and we’ll come to those when we talk about FEnumerableEx (and FObservable) later.

 

Ana

An anamorphism is the fancy word for an operator that produces an M<T> out of something outside M<.>, by use of unfolding. Given some seed value and an iterator function, one can produce a potentially infinite sequence of elements. Implementation of this operator is straightforward in both cases, again with the enumerable case requiring some state:

public static Func<Func<Option<T>>> Ana<T>(T seed, Func<T, bool> condition, Func<T, T> next)
{
    return () =>
    {
        Option<T> value = new Option<T>.None();
        return () =>
            condition((value = new Option<T>.Some(
                value is Option<T>.None
                ? seed
                : next(value.Value))).Value)
            ? (Option<T>)new Option<T>.Some(value.Value)
            : (Option<T>)new Option<T>.None();
    };
}

For fun and giggles I wrote this one using conditional operator expressions only, with an assignment side-effect nicely interwoven. It’s left to the reader to write it in a more imperative style. Again, we’re assuming the enumerator function is not called after a None object has been received. The basic principle of the operator is clear and implementation would look like this in regular C# with iterators:

public static IEnumerable<T> Ana<T>(T seed, Func<T, bool> condition, Func<T, T> next)
{
    for (T t = seed; condition(t); t = next(t))
        yield return t;
}

On the FObservable side, things are simpler again (the main reason being that FEnumerable is hard because of its lazy nature and because we can’t use iterators):

public static Action<Action<Option<T>>> Ana<T>(T seed, Func<T, bool> condition, Func<T, T> next)
{
    return o =>
    {
        for (T t = seed; condition(t); t = next(t))
            o(new Option<T>.Some(t));
        o(new Option<T>.None());
    };
}

Again, the reader is invited to think about what I’d take to have this sequence getting generated on the background, as opposed to blocking the caller.

As an additional exercise, can you rewrite Return and Empty in terms of Ana, therefore making those two operators no longer primitives? Doing so will bring down the total of essentials to three: Ana, Cata and Bind:

image

 

Cata

The opposite of an anamorphism is a catamorphism, also known as Aggregate in LINQ. Its goal is to fold a M<T> into something outside M<.>, e.g. computing the sum of a sequence of numbers. Since this is a greedy operation, we can do it on the spot for both the FEnumerable and FObservable cases as shown below:

public static R Cata<T, R>(this Func<Func<Option<T>>> source, R seed, Func<R, T, R> f)
{
    var e = source();

    Option<T>.Some value;
    R result = seed;
    while ((value = e() as Option<T>.Some) != null)
    {
        result = f(result, value.Value);
    }

    return result;
}

First for the enumerable case, we simply run till we get a None object, continuously calling the aggregation function, starting with the seed value. In the observable case, things are equally simple:

public static R Cata<T, R>(this Action<Action<Option<T>>> source, R seed, Func<R, T, R> f)
{
    R result = seed;

    bool end = false;
    source(x =>
    {
        if (x is Option<T>.Some && !end)
            result = f(result, x.Value);
        else
            end = true; // or break using exception
    });

    return result;
}

This time we have to hook up an observer with the source and analyze what we got back. Notice the code above shows one approach to break out of or immunize an observer after a None message has been received. Notice though that if all constructor functions can be trusted (which is not the case with an Action of Func), such protections wouldn’t be required as we’re defining a closed world of constructors and combinators. If the former group never emits sequences that don’t follow the described protocol and the latter never combines existing sequences into an invalid one (i.e. preserving the protocol properties), it shouldn’t be possible to fall off a cliff.

 

Bridging the brave new world with the old-school one

Before getting into more operators layered on top of the essential ones provided above, we should spend a few minutes looking at ways to convert back and forth between the new functionally inspired “flat” world and the familiar interface-centric “bombastic” world of LINQ. In particular, can we establish the following conversions?

  • IEnumerable<T> to Func<Func<Option<T>>>
  • Func<Func<Option<T>>> to IEnumerable<T>
  • IObservable<T> to Action<Action<Option<T>>>
  • Action<Action<Option<T>>> to IObservable<T>

Obviously the answer is we can. Let’s focus on the first two as a starter. It’s clear that in order to go from an IEnumerable<T> to our new world of FEnumerable we should iterate the specified sequence. We should do so in a lazy manner such that upon every call to FEnumerable’s inner function (playing the enumerator’s role) we fetch an element out of the source IEnumerator<T>, but no earlier. In other words, we have to keep the iteration state which is represented by an IEnumerator<T> as the local state to the enumerator function:

public static Func<Func<Option<T>>> AsFEnumerable<T>(this IEnumerable<T> source)
{
    return () =>
    {
        var e = source.GetEnumerator();
        return () => e.MoveNext()
            ? (Option<T>)new Option<T>.Some(e.Current)
            : (Option<T>)new Option<T>.None();
    };
}

This should be fairly straightforward code to grasp, ensuring we properly terminate a (finite) sequence with a None object to signal completion. The opposite operation is easy as well, now calling a FEnumerable’s enumerator function, providing results to the caller in a lazy fashion by means of a typical C# iterator:

public static IEnumerable<T> AsEnumerable<T>(this Func<Func<Option<T>>> source)
{
    var e = source();
    Option<T>.Some value;
    while ((value = e() as Option<T>.Some) != null)
    {
        yield return value.Value;
    }
}

As soon as we encounter a None object, we’ll break out of the loop causing the consuming enumerator to terminate. Using the operators above, we can readily verify the back and forth conversions easily:

// IEnumerable -> FEnumerable
var xs = Enumerable.Range(0, 10).AsFEnumerable();
{
    var xse = xs();
    Option<int> x;
    while ((x = xse() as Option<int>.Some) != null)
        Console.WriteLine(x.Value);
}

// FEnumerable -> IEnumerable
var ys = xs.AsEnumerable();
{
    foreach (var y in ys)
        Console.WriteLine(y);
}

This is very convenient as we’ll be able to treat arrays and other enumerable collections as FEnumerable functions in a blink of the eye. Now we can start to mix and match typical LINQ to Objects operators with our own academic playground.

On to the dual world, we can also provide conversions for IObservable<T> to FObservable<T> back and forth. Both are relatively easy to realize as well but lets starts with old to new:

public static IObservable<T> AsObservable<T>(this Action<Action<Option<T>>> source)
{
    return Observable.Create<T>(o =>
    {
        source(x =>
        {
            if (x is Option<T>.Some)
                o.OnNext(x.Value);
            else
                o.OnCompleted();
        });
        return () => { };
    });
}

Here I’m using Rx’s Observable.Create operator to simplify the creation of an IObservable<T>, passing in an observer’s code body. Lambda parameter “o” is an IObserver<T>, so all we got to do is subscribe to our source (by means of just calling it, passing in a FObserver function) and forward received objects “x” to the external observer. As we don’t have a notion to run asynchronous in our little world, we simply return the no-op action delegate from the observer function. Since all execution happens synchronously upon a Subscribe call to the produced IObservable<T>, there’s little for us to do in a reaction to an unsubscribe invocation.

In the other direction, things are even simpler. We simply use an Rx extension method for IObservable<T> to subscribe given an OnNext and OnCompleted delegate:

public static Action<Action<Option<T>>> AsFObservable<T>(this IObservable<T> source)
{
    return o =>
    {
        source.Subscribe(x => o(new Option<T>.Some(x)), () => o(new Option<T>.None()));
    };
}

Again we can test this easily, this time using Observable.Range. Since that one runs asynchronously, we have to do a bit of synchronization to see the results printed nicely:

// IObservable -> FObservable
var evt = new ManualResetEvent(false);
var xs = Observable.Range(0, 10).AsFObservable();
{
    xs(x =>
    {
        if (x is Option<int>.Some)
            Console.WriteLine(x.Value);
        else
            evt.Set();
    });
}
evt.WaitOne();

// FObservable -> IObservable
var ys = xs.AsObservable();
{
    // We got this one synchronous inside.
    ys.Subscribe(Console.WriteLine);
}

The result of all this plumbing is summarized in the following diagram. The direct conversion between a FEnumerable and FObservable (and vice versa) is left to the reader as an interesting exercise:

image 

 

Where and Select for monadic dummies

While we leave the implementation of operators like Snoc (Cons in reverse, to construct sequences out of a single element and a sequence) and Concat (concatenating arbitrary sequences to one another) to the reader, we should focus on a few operators that can be realized using the essential building blocks provided before. In particular, we’ll implement Where and Select in terms of Bind, Empty and Return.

Recall what Bind does: it combines a sequence with sequences generated from a function call, collecting the elements all sequences that result from those function calls. In a concrete sample: given a list of products and a way to get all the suppliers for each product we can return a sequence of all suppliers across all products. Or with function arrows: IE<Product> –> (Product –> IE<Supplier>) –> IE<Supplier>. This is exactly the signature of Bind or SelectMany.

How can we use this to create a filter like Where? The answer is pretty simple, by controlling the “selector” function passed to Bind and make it analyze each element that’s passed in, deciding whether or not to return it to Bind. The “whether or not” part can be realized using a conditional either returning Return(element) or Empty(). And there we got our filtering logic:

public static Func<Func<Option<T>>> Where<T>(this Func<Func<Option<T>>> source, Func<T, bool> filter)
{
    return source.Bind(t => filter(t) ? FEnumerable.Return(t) : FEnumerable.Empty<T>());
}

A picture is worth a thousand words, so let’s have a look at the Where operator realization in terms of Bind:

image

And guess what, the FObservable implementation can be derived by mechanical translation from the one for FEnumerable:

public static Action<Action<Option<T>>> Where<T>(this Action<Action<Option<T>>> source, Func<T, bool> filter)
{
    return source.Bind(t => filter(t) ? FObservable.Return(t) : FObservable.Empty<T>());
}

In fact, the code is exactly the same with FEnumerable replaced by FObservable. If we’d have typedefs for the function signatures or static extension methods on a delegate type, we’d actually see both pieces of code being the same of the following “template”:

public static M<T> Where<T>(this M<T> source, Func<T, bool> filter)
{
    return source.Bind(t => filter(t) ? M<T>.Return(t) : M<T>.Empty());
}

Such an M<T> abstraction would be realized as a type constructor in Haskell and the packaging of both Return and Bind on M<T> would be realized by means of a type class that looks as follows:

class Monad m where
    return :: a –> m a
    (>>=)  :: m a –> (a –> m b) –> m b

The second function is Haskell’s infix operator for bind. More information ca be found at the following locations:

How can we realize Select using Bind and Return as well? The answer is again very straightforward: this time we simply apply the projection function to the object passed to the bind selector function and wrap the result using Return. Here’s the code for both worlds, again ready to be abstracted to M<T>:

public static Func<Func<Option<R>>> Select<T, R>(this Func<Func<Option<T>>> source, Func<T, R> selector)
{
    return source.Bind(t => FEnumerable.Return(selector(t)));
}
public static Action<Action<Option<R>>> Select<T, R>(this Action<Action<Option<T>>> source, Func<T, R> selector)
{
    return source.Bind(t => FObservable.Return(selector(t)));
}

Again a picture will make the above more clear:

image

With those extension methods in place, we can actually start writing LINQ expressions against FEnumerable and FObservable (function!) objects. That’s right: now you got a delegate you can dot into, thanks to the magic of extension methods. But using convenient LINQ syntax, we don’t even have to see any of that:

var res = (from x in Enumerable.Range(0, 10).AsFEnumerable()
           where x % 2 == 0
           select x + 1).AsEnumerable();

foreach (var x in res)
    Console.WriteLine(x);

Notice how we go back and forth between classic IEnumerable<T> and our FEnumerable implementation? But the key to see here is that our Where and Select operators are getting called. The result obviously prints 1, 3, 5, 7, 9 and to convince ourselves or calls happening to our methods, we’ll have a look in the debugger:

image

I hope this suffices to convince the reader we got query expression syntax working around our MinLINQ implementation. It’s left to the reader to decipher the exact call stack we’re observing above. The same exercise can be repeated for the FObservable case, using the following equivalent code:

var res = (from x in Observable.Range(0, 10).AsFObservable()
           where x % 2 == 0
           select x + 1).AsObservable();

res.Subscribe(Console.WriteLine);
Console.ReadLine(); // Stuff happening on the background; don't exit yet

Since Bind is none other than SelectMany in disguise, we could rename it as such to enable it for use in LINQ as well, triggered by query expressions having multiple from clauses. In fact, to fully enable query expressions of that form, you’ll need a slight tweak to the SelectMany signature, as follows (same for the observable case of course):

public static Func<Func<Option<R>>> SelectMany<T, C, R>(this Func<Func<Option<T>>> source, Func<T, Func<Func<Option<C>>>> selector, Func<T, C, R> result)
{
    // Left as an exercise.
}

If you implement this one correctly, you will be able to run a query of the following shape:

var res = (from x in Enumerable.Range(1, 5).AsFEnumerable()
           from y in Enumerable.Range(1, x).AsFEnumerable()
           select new string((char)('a' + x - 1), y)).AsEnumerable();

foreach (var item in res)
    Console.WriteLine(item);

This will print the following output:

a
b
bb
c
cc
ccc
d
dd
ddd
dddd
e
ee
eee
eeee
eeeee

Finally, just to go nuts with some back-and-forth transitioning between all worlds (as shown in our diagram before), an all-inclusive sample mixing all sorts of execution:

var res = (from y in
               (from x in Enumerable.Range(0, 20).AsFEnumerable()
                where x % 2 == 0
                select x + 1).AsEnumerable()
               .ToObservable() // Rx
               .AsFObservable()
           where y % 3 == 0
           select y * 2)
          .AsObservable()
          .ToEnumerable(); // Rx

foreach (var item in res)
    Console.WriteLine(item);

The interested reader is invited to create short-circuiting operators to provide a direct path for .AsEnumerable().ToObservable().AsFObservable() and .AsObservable().ToEnumerable().AsFEnumerable(). Refer back to the diagram to see where those operators’ corresponding arrows occur.

 

Fueling Range and Sum with Ana and Cata

To conclude this post, let’s also have a look at how to derive constructor and aggregator operators from our Ana and Cata primitives. As a sequence constructor we’ll consider Range and for the aggregator we’ll consider Sum. Let’s start with Range in terms if Ana:

public static Func<Func<Option<int>>> Range(int from, int length)
{
    return FEnumerable.Ana<int>(from, x => x < from + length, x => x + 1);
}

and (again exactly the same code thanks to the shared primitives)

public static Action<Action<Option<int>>> Range(int from, int length)
{
    return FObservable.Ana<int>(from, x => x < from + length, x => x + 1);
}

Now we can get rid of the AsFEnumerable() use in our samples when creating a range and construct our range sequence immediately in our world (similar example for FObservable of course):

var res = (from x in FEnumerableEx.Range(0, 10)
           where x % 2 == 0
           select x + 1).AsEnumerable();

foreach (var x in res)
    Console.WriteLine(x);

As an exercise, also abstract the AsEnumerable call followed by foreach into a Run method, as seen in System.Interactive, so that you can write the code below. Implement this operator in terms of Cata (!):

(from x in FEnumerableEx.Range(0, 10)
 where x % 2 == 0
 select x + 1).Run(
     Console.WriteLine
);

(Question: could you benefit from such an operator in FObservable as well?)

For the Sum realization we can use Cata:

public static int Sum(this Func<Func<Option<int>>> source)
{
    return source.Cata(0, (sum, x) => sum + x);
}

and

public static int Sum(this Action<Action<Option<int>>> source)
{
    return source.Cata(0, (sum, x) => sum + x);
}

The following example illustrates how to sum 1 to 10 using Range and Sum:

Console.WriteLine(FEnumerableEx.Range(1, 10).Sum());
Console.WriteLine(FObservableEx.Range(1, 10).Sum());

Both print 55 just fine.

Implement more aggregation operators as found in the Standard Query Operators. Also think about how to implement those over nullable value types (e.g. Sum with int?). Could you reuse Option<T> as an alternative to nullables? Could you reuse monadic computation to carry out nullable arithmetic (tip: the Maybe monad)? A few aggregates that some people don’t see as aggregates include All, Any, First, Last, ElementAt, and more. Don’t forget to implement those either (most of them should be a one-liner making a single call to Cata). As an additional caveat, the following implementation of Average is inadequate (why?):

public static double Average(this Func<Func<Option<int>>> source)
{
    return (double)source.Sum() / source.Count();
}

 

Conclusion

Boiling down LINQ to its core essence can be fun and a great eye-opener to many users of the technology. While optimizations often mandate a lower degree of layering, it’s good to have an idea of the conceptual layering of various operators to see which ones are essential and which ones are not so much. If kids can build castles out of Lego blocks, sure every self-respecting developer should be able to exploit the expressive power a few primitive building blocks to create great libraries and applications. Choosing the right set of primitives can get you a long way in such a design, as illustrated in this post. Readers who can’t get enough of essential primitives and the composition thereof are cordially invited to have a go at another Crazy Sunday post titled Unlambda .NET – With a Big Dose of C# 3.0 Lambdas (and many others in that category).

In the continuation of my “More LINQ with System.Interactive” series we’ll get back to less academic stuff with System.Interactive. And before I forget: a happy 2010!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

More Posts