July 2008 - Posts

I’m happy to announce the availability of the 1.0.1 RR (Refresh Release, not Dutch for “Rechtvaardige Rechters” – known as Knights of Christ – a stolen piece of art by Jan Van Eyck in my student-home city Ghent) of LINQ to AD on CodePlex. Here’s the Change Set 12012’s description:

Refresh Release 1.0.1 - Includes bug fixes and support for:
- byte[]- and GUID-valued directory attributes
- Bug fix for non-constant EndsWith, BeginsWith and Contains clauses
- Introduction of DirectoryContext with support for nested contexts and direct update support
- Updated samples to use the new available features

In practice this means that some common issues reported by users have been addressed (more to follow in subsequent refreshes) and a first level of (opt-in) support for “directory contexts” has been added. This means you can create an entity model like this:

    sealed class MyContext : DirectoryContext
    {
        public MyContext(DirectoryEntry searchRoot)
            : base(searchRoot)
        {
        }

        [DirectorySearchOptions(SearchScope.Subtree)]
        public DirectorySource<User>  Users  { get; set; }

        [DirectorySearchOptions(SearchScope.Subtree)]
        public DirectorySource<Group> Groups { get; set; }

        [DirectorySearchPath("OU=Demo")]
        public MyDemoContext          Demo   { get; set; }
    }

    sealed class MyDemoContext : DirectoryContext
    {
        public MyDemoContext(DirectoryEntry searchRoot)
            : base(searchRoot)
        {
        }

        public DirectorySource<MyUser> Users  { get; set; }
    }

where the classes User, Group and MyUser are defined entity types as in previous releases (now also support GUID-valued properties). What’s new though is the fact we have a DirectoryContext base class you can inherit from to create a custom context that contains a mapping of various query data sources for your directory structure. Moreover, this mechanism supports hierarchical contexts that reflects the structure of the directory (which, in combination with the use of DirectorySearchOptionsAttribute can be controlled with respect to the used search scope – Base, OneLevel or Subtree). For example, in the code above the MyContext class contains a property called Demo:

        [DirectorySearchPath("OU=Demo")]
        public MyDemoContext          Demo   { get; set; }

which is annotated with a DirectorySearchPathAttribute that specifies the path relative to the parent context (specified for MyContext using the constructor), so if you write:

var ctx = new MyContext(new DirectoryEntry(“LDAP://localhost/DC=bartdesmet,DC=local”));
var res = from u in ctx.Demo.Users …;

you’re effectively querying for user objects under LDAP://localhost/OU=Demo,DC=bartdesmet,DC=local. Notice I’m using another entity type for the nested context – there’s no need to do so, you can reuse the User entity type there but – as illustrated above – you’re not required to do so (for example, because under OU=Demo, you’d like to query on more directory attributes than elsewhere).

Another notable feature is the support for extension methods on DirectoryEntry and DirectorySearcher. In other words, you can write something like:

DirectoryEntry myDirectory = new DirectoryEntry(…);
var res = from u in myDirectory.AsQueryable<MyUser>(SearchScope.Subtree) …;

This allows users to perform direct querying starting from well-known System.DirectoryServices classes simply by importing the extension methods in the BdsSoft.DirectoryServices.Linq namespace and using AsQueryable to lift those entry-points to a LINQ to AD supported DirectorySource<T>. The fact DirectorySearcher is now supported as well makes it possible to specify much more options (such as time outs, client-side caching behavior, tombstone object inclusion, etc) than was possible before. If you want to make DirectorySearcher capabilities available through a strongly-typed DirectoryContext object, it’s possible to use a constructor overload that takes a DirectorySearcher object.

The new release can be downloaded here. Enjoy!

PS: Updates for LINQ to SharePoint are coming up as well.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

It’s a while ago I’ve been posting here but as usual there are very good reasons (read: excuses) for it. It’s been incredibly busy finishing our work on the .NET Framework 3.5 SP1 release and lately I’ve been preparing for my TechEd South Africa talks for next week – I’m thrilled to present at this conference. One of my talks covers “LINQ to Anything” about writing custom LINQ providers and in this context I’m showing off LINQ to AD because of its relative simplicity compared to other more extensive LINQ providers.

However, there was a slight problem: I used to drag around laptop domain controllers for a while when doing the Microsoft SchoolServer project for Microsoft Belux a long while ago (at that time Windows Server 2003) but right now it’s a little impractical to do that because my demo machine needs to be joined to our corporate domain and obviously our IT department wouldn’t like (to say the very least) me to carry a full corporate domain controller to some random conference :-). Of course there’s virtualization technology to the rescue but the demo machine I’m using has not quite enough RAM and I’d love to run Hyper-V but the machine isn’t 64-bit capable. Oh well, time to upgrade I guess :-).

So the solution I came up with is to rely on Windows Server 2008’s Lightweight Directory Services (LDS) role, also formerly known as Active Directory Application Mode (ADAM) but now shipping as a built-in role in our server OS product (after being a separate download and a Windows Server 2003 R2 “Disc 2” component). In this post I’ll outline how to set up LDS and how to use it.

 

Preparing the installation

First of all, this post assumes you have a Windows Server 2008 installation up and running already. On to the real matters now. It’s a common phenomenon to try something new at the office and have it working but find it not working on some random flight when preparing for the talk. Here’s a headache-medication saver: loopback adapters. More than once I’ve found network-driven applications now working correctly in a fully disconnected situation for various network stack related reasons, so I realized pretty quickly I didn’t have a loopback adapter installed. Installing one is easy: in Computer Management (now known as Server Manager), go to the Device Manager and add a piece of legacy hardware:

image

Follow the instructions to select the network adapter from the list:

image image image

and let the thing install. Ultimately you’ll have an always (though fake) connected computer, ideal to save presentation missions where networks are always influenced by Murphy’s law :-).

 

Installing LDS

To install LDS, select Active Directory Lightweight Directory Services Setup Wizard from Administrative Tools:

image

This will bring up the wizard that guides you through the installation and configuration of a new LDS instance. The concept of an instance is important to understand – you can regard an instance as a stand-alone (meaning not necessarily related to any domains) Active Directory database that can be reached by clients for typical directory operations over well-known protocols like LDAP. That’s precisely what I need: a light-weight (hence the name) instance of AD that isn’t linked to any AD domain whatsoever and reachable through LDAP (LINQ to AD used to be known as LINQ to LDAP).

Click through the first steps of the wizard to create a new instance – we won’t go into advanced topics such as replication:

image image

Next we’ll give our instance a name, e.g. Demo. This information will ultimately be used to create a Windows service (whose name is prefixed with ADAM_) as well as the database under %ProgramFiles%\Microsoft ADAM\<instance name>\data.

image

Next, we’ll select the port number to listen on. If you’re not running any other directory service yet, the wizard will propose to use ports 389 (LDAP) and 636 (SSL) which are both the typical ports for the respective protocols. Indeed, it will just make your computer look like a typical directory service to client computers. However, if you already have a directory service (be it AD or AD LDS) running or plan to do so, alternative ports in the 50000 range will be proposed. We’ll stick with default 389 but make sure to remember the port number as you’ll need it when writing client apps:

image

Next, the wizard asks to create an Application Directory Partition. Expensive word, simple concept. Essentially an application partition is a part of the hierarchical directory services database that will contain your application’s data, and as every node in such a directory hierarchy it needs to have a distinguished name, e.g. CN=Demo,DC=bartdesmet,DC=local. We’ll create such a partition during the LDS instance creation but you can also choose to do it later (although that will complicate matters and require more code).

image

In the few steps following, the wizard will ask where to store the database – it’s okay to stick with the defaults – and what account to run the service under (Network Service is recommended for a more secure configuration):

image image

Obviously services need some administrator that will be allowed to reconfigure the service (using the ADSI Editor for example, see further), so we’ll have to select an account that will be granted those permissions. You can use local accounts but also accounts from the domain the computer is joined to (as I’m doing below):

image

Last but not least, we’re presented a list of LDF files that can be imported in the instance. These get installed under the Schema partition of the instance and carry the schema data for various classes available in the instance to store data with:

image

I’m just selecting MS-User.LDF for now. It’s interesting to know where the wizard gets these files from: %systemroot%\ADAM\*.LDF. In fact, LDF files are just text files that look like this (for a full LDF file, see your installation):

#==================================================================
# @@UI-Description: AD LDS user class and related classes.
#
# This file contains user extensions for default ADAM schema.
# It should be imported with the following command:
#   ldifde -i -f MS-User.ldf -s server:port -b username domain password -k -j . -c "CN=Schema,CN=Configuration,DC=X" #schemaNamingContext
#
#==================================================================

# Attributes

dn: CN=Serial-Number,CN=Schema,CN=Configuration,DC=X
changetype: ntdsschemaadd
objectClass: top
objectClass: attributeSchema
cn: Serial-Number
attributeID: 2.5.4.5
attributeSyntax: 2.5.5.5
isSingleValued: FALSE
rangeLower: 1
rangeUpper: 64
showInAdvancedViewOnly: TRUE
adminDisplayName: Serial-Number
adminDescription: Serial-Number
oMSyntax: 19
searchFlags: 0
lDAPDisplayName: serialNumber
schemaIDGUID:: MnqWv+YN0BGihQCqADBJ4g==
systemOnly: FALSE

The idea is simple: for each of the class attributes you’ll find a definition. Notice the file also has a comment for the ldifde (a command-line tool) invocation if you want to import LDF files after running the wizard.

Finally, the wizard is ready to do the installation:

image image

 

Verifying the installation

Time to check what the wizard has installed. First, you’ll find a file containing the EDB instance database:

image

and in the Service Control Manager a new service will have been registered with the name ADAM_Demo which should have been started by the wizard already:

image

Using the LDP tool (Start, Run, ldp.exe) we can connect to the instance. Use Connection, Connect and specify the name of the machine (localhost):

image

Next, do a bind using the current credentials:

image

and go to View, Tree to display the contents of the database, for example connecting to our application partition:

image

The instance will obviously be pretty empty (still):

image

In addition you can use ADSI Edit from the Administrative Tools to connect to the instance to see the imported schema data:

image

Right-click the root node and choose Connect To. The connection settings should look like this:

image

In the Schema list you’ll find CN=User which we imported (amongst others) earlier during the execution of the wizard. Search for “dn: CN=User,CN=Schema,CN=Configuration,DC=X” in the LDF file to find the import:

image

 

Using it

Finally we’ll write a tiny little bit of System.DirectoryServices code in C# to use the instance. Make sure to import the System.DirectoryServices assembly in Visual Studio:

image

and write the following piece of code:

using System;
using System.DirectoryServices;

namespace PopulateLdap
{
    class Program
    {
        static void Main()
        {
            DirectoryEntry entry = new DirectoryEntry("LDAP://localhost/CN=Demo,DC=bartdesmet,DC=local");
            foreach (DirectoryEntry e in entry.Children)
                Console.WriteLine(e.Name);

            DirectoryEntry user1 = entry.Children.Add("CN=bartde", "user");
            user1.Properties["givenName"].Value = "Bart";
            user1.Properties["sn"].Value = "De Smet";
            user1.CommitChanges();
            entry.CommitChanges();

            foreach (DirectoryEntry e in entry.Children)
                Console.WriteLine(e.Name);
        }
    }
}

where the LDAP path specified should obviously point at the application partition created above. Execution looks like:

image

and LDP will show the new directory entry:

image

Ready to rock!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Lately I've been playing quite a bit with DLR technologies, including IronRuby. During some experiments I came to the conclusion that the Kernel.` method isn't implemented yet in the current version. This `backtick` method allows executing OS commands from inside a Ruby program. It's a bit like Process.Start, redirecting the standard output as a string to the Ruby program for further use (actually the Kernel.exec method is precisely implemented like this).

image

One thing I like about the NotImplementedException is the fact it points unambiguously to the source that's missing :-). Indeed, if you browse the IronRuby source, you'll find this:

[RubyMethod("`", RubyMethodAttributes.PrivateInstance)]
[RubyMethod("`", RubyMethodAttributes.PublicSingleton)]
public static MutableString ExecuteCommand(CodeContext/*!*/ context, object self, [NotNull]MutableString/*!*/ command) {
    // TODO:
    throw new NotImplementedException();
}

Actually there are a couple of things here that are worth to discuss besides the current void of the method body:

  • Strings in Ruby are mutable as opposed to strings in the CLR/BCL. Therefore they are wrapped in a MutableString class.
  • The weird /*!*/ notation indicates not-nullness, mirrored after the equivalent uncommented form in Spec# (e.g. MutableString! means a null-nullable string). To work with regular C#, the NotNullAttribute is used.
  • Methods like this one are not invoked by Ruby directly, instead the RubyMethodAttribute declaration carries the metadata that provides the entry-point to the method (as well as other metadata).

Kernel.`

I won't cover the differences between Kernel.`, Kernel.system and Kernel.exec; more information can be found here. The backtick one is our target for the scope of this post:

`cmd` => string

Returns the standard output of running cmd in a subshell. The built-in syntax %x{…} uses this method. Sets $? to the process status.

   `date`                   #=> "Wed Apr  9 08:56:30 CDT 2003\n"
   `ls testdir`.split[1]    #=> "main.rb"
   `echo oops && exit 99`   #=> "oops\n"
   $?.exitstatus            #=> 99

Actually I want to focus on (yet another) powerful feature in PowerShell, namely the ability to create custom hosts. What we want to achieve here is that the backtick syntax (or the equivalent %x syntax) runs the specified command as a PowerShell command (or pipeline of multiple cmdlets), emitting the string output as the `'s methods return value. Notice though this actually downgrades ones of the core principles of PowerShell concerning the use of objects through the pipeline rather than falling back to strings. One could easily think of a more powerful way to expose the results of a PowerShell invocation as PSObjects in Ruby but we'll keep that for later.

Important: This post outlines no more than the capability to hook up PowerShell in IronRuby through Kernel.`. Obviously no promises are made about the way Kernel.` will eventually be implemented in IronRuby as we move forward.

Building a custom PS host

Creating custom PS hosts isn't that hard, depending on how much functionality you want to take over. We'll stick with the basics of console I/O, actually just the O in this. What we want to get done is this:

  1. Build up a runspace containing the command passed to Kernel.` (in addition to some more pipeline commands to produce the right output, see further).
  2. Invoke the built-up pipeline.
  3. Retrieve the string output from the host, concatenate it into one big string and return that one to the caller of Kernel.`.

 

Preparing for PowerShell programming

In order to extend PowerShell, you'll need to add a reference to System.Management.Automation.dll which can be found in the Reference Assemblies folder (click to enlarge):

image

Runspaces

Let's start at the very top by implementing a method called "InvokePS" that sets up the infrastructure to call PowerShell:

public static class RubyToPS
{
    public static string InvokePS(string command)
    {
        RubyPSHost host = new RubyPSHost();

        using (Runspace runspace = RunspaceFactory.CreateRunspace(host))
        {
            runspace.Open();

            using (Pipeline pipeline = runspace.CreatePipeline())
            {
                pipeline.Commands.AddScript(command);
                pipeline.Commands[0].MergeMyResults(PipelineResultTypes.Error, PipelineResultTypes.Output);                 pipeline.Commands.Add("out-default");

                pipeline.Invoke();
            }
        }

        return ((RubyPSHostUserInterface)host.UI).Output;
    }
}

The RubyPSHost class will be shown next, let's focus on the Runspace stuff for now. A runspace serves as the entry-point to the PowerShell engine and encapsulates all the state needed to execute pipelines. Once we've opened the runspace, a pipeline is created to which we add the passed-in command as a script. This allows more than just one cmdlet invocation to be executed (e.g. "gps | where { $_.WorkingSet64 -gt 50MB }"). To send output to the host we append Out-Default to the pipeline:

NAME
    Out-Default

SYNOPSIS
    Send the output to the default formatter and the default output cmdlet. Thi
    s cmdlet has no effect on the formatting or output. It is a placeholder tha
    t lets you write your own Out-Default function or cmdlet.

SYNTAX
    Out-Default [-inputObject <psobject>] [<CommonParameters>]

DETAILED DESCRIPTION
    The Out-Default cmdlet send the output to the default formatter and the def
    ault output cmdlet. This cmdlet has no effect on the formatting or output.
    It is a placeholder that lets you write your own Out-Default function or cm
    dlet.

The MergeMyResults call is used to ensure that error objects produced by the first command are merged into the output (otherwise you'll get an exception instead). Finally the output is retrieved from the host after invoking the pipeline. How this works will be covered in a minute.

To read more about PowerShell runspaces, check out my other posts on the topic.

Deriving from PSHost

Custom PowerShell hosts derive from the abstract PSHost base class. There's quite some stuff that can be done here but we'll stick with the absolute minimum functionality required to reach our goals:

internal class RubyPSHost : PSHost
{
    private Guid _hostId = Guid.NewGuid();
    private RubyPSHostUserInterface _ui = new RubyPSHostUserInterface();

    public override Guid InstanceId
    {
        get { return _hostId; }
    }

    public override string Name
    {
        get { return "RubyPSHost"; }
    }

    public override Version Version
    {
        get { return new Version(1, 0); }
    }

    public override PSHostUserInterface UI
    {
        get { return _ui; }
    }

    public override CultureInfo CurrentCulture
    {
        get { return Thread.CurrentThread.CurrentCulture; }
    }

    public override CultureInfo CurrentUICulture
    {
        get { return Thread.CurrentThread.CurrentUICulture; }
    }

    public override void EnterNestedPrompt()
    {
        throw new NotImplementedException();
    }

    public override void ExitNestedPrompt()
    {
        throw new NotImplementedException();
    }

    public override void NotifyBeginApplication()
    {
        return;
    }

    public override void NotifyEndApplication()
    {
        return;
    }

    public override void SetShouldExit(int exitCode)
    {
        return;
    }
}

More information about all of those methods and properties can be found on MSDN. The most important one to us the the UI property that points at our PSHostUserInterface implementation called RubyPSHostUserInterface.

Implementing PSHostUserInterface

Where the PSHost class provides basic information concerning the metadata of the host (name, version, id)m lifetime of the host (nested prompt, execit commands) and general settings (cultures), the PSHostUserInterface class deals with "dialog-oriented and line-oriented interaction between the cmdlet and the user, such as writing to, prompting for, and reading from user input" (from MSDN). The part we're interested in the writing to part. We won't deal with prompts or user interaction - if one wants to do this, the Kernel.` command is no longer non-interactive (a possible alternative way to implement this would be to spawn PowerShell.exe and just get the shell's output here - the default host would take care of all user interaction if required; the only problem is that prompts would appear in the Kernel.` output as well). Implementation of this class isn't that hard either:

internal class RubyPSHostUserInterface : PSHostUserInterface
{
    private StringBuilder _sb;

    public RubyPSHostUserInterface()
    {
        _sb = new StringBuilder();
    }

    public override void Write(ConsoleColor foregroundColor, ConsoleColor backgroundColor, string value)
    {
        _sb.Append(value);
    }

    public override void Write(string value)
    {
        _sb.Append(value);
    }

    public override void WriteDebugLine(string message)
    {
        _sb.AppendLine("DEBUG: " + message);
    }

    public override void WriteErrorLine(string value)
    {
        _sb.AppendLine("ERROR: " + value);
    }

    public override void WriteLine(string value)
    {
        _sb.AppendLine(value);
    }

    public override void WriteVerboseLine(string message)
    {
        _sb.AppendLine("VERBOSE: " + message);
    }

    public override void WriteWarningLine(string message)
    {
        _sb.AppendLine("WARNING: " + message);
    }

    public override void WriteProgress(long sourceId, ProgressRecord record)
    {
        return;
    }

    public string Output
    {
        get
        {
            return _sb.ToString();
        }
    }

    public override Dictionary<string, PSObject> Prompt(string caption, string message, System.Collections.ObjectModel.Collection<FieldDescription> descriptions)
    {
        throw new NotImplementedException();
    }

    public override int PromptForChoice(string caption, string message, System.Collections.ObjectModel.Collection<ChoiceDescription> choices, int defaultChoice)
    {
        throw new NotImplementedException();
    }

    public override PSCredential PromptForCredential(string caption, string message, string userName, string targetName, PSCredentialTypes allowedCredentialTypes, PSCredentialUIOptions options)
    {
        throw new NotImplementedException();
    }

    public override PSCredential PromptForCredential(string caption, string message, string userName, string targetName)
    {
        throw new NotImplementedException();
    }

    public override PSHostRawUserInterface RawUI
    {
        get { return null; }
    }

    public override string ReadLine()
    {
        throw new NotImplementedException();
    }

    public override System.Security.SecureString ReadLineAsSecureString()
    {
        throw new NotImplementedException();
    }
}

The core of our implementation lies in the fact that all Write* methods emit their data to a StringBuilder instance that aggregates all output sent to the host. This is the data that gets retrieved by our InvokePS method on the last line:

return ((RubyPSHostUserInterface)host.UI).Output;

Notice this isn't the absolute end of host-level extensibility in PowerShell. A PSHostUserInterface class can point at a PSHostRawUserInterface object that controls host window characteristics (such as the size, position and title of the window). Actually it would be interesting to implement this one as well in order to provide an accurate BufferSize that will be used by PowerShell to control the maximum length of individual lines before wrapping to the next line. The reason this would be a good idea is that screen-scraping Ruby programs should be able to ignore different wrapping behavior depending on the hosting command window (which would cause programs to behave differently depending where they run). Ideally there would be no wrapping at all (letting the DLR IronRuby command-line host dealing with wrapping when printing data to the screen). I'll leave this exercise to the reader.

Hooking it up

All of the above has been implemented in a separate strong-named Class Library which I'm just referencing in the IronRuby.Libraries project. This is actually very quick-and-dirty, making IronRuby directly dependent on our assembly and by extension Windows PowerShell. A way around this would be to load the assembly dynamically possibly based on an environment variable. There are lots of possibilities here which we consider just an implementation detail for now. The only thing left to do is to call our InvokePS method which requires some conversions between System.String and MutableString:

[RubyMethod("`", RubyMethodAttributes.PrivateInstance)]
[RubyMethod("`", RubyMethodAttributes.PublicSingleton)]
public static MutableString ExecuteCommand(CodeContext/*!*/ context, object self, [NotNull]MutableString/*!*/ command) { 
    return MutableString.Create(RubyToPS.InvokePS(command.ConvertToString()));
}

That's it! Here's the result:

image

Note: The \r\n insertions in the output for display by Ruby's console cause things to wrap a bit nasty given the default of 80 characters buffer width. I've adjusted to 83 characters to make this render correctly. With some smart "Raw UI host" one could eliminate some issues here - however the internal contents of the string is more important since the app will likely rely on that (otherwise you'd simply run a PowerShell interactive shell, wouldn't you?). Just as one sample, here's the output of the each_line iterator:

image

Does look an awful lot like PowerShell, doesn't it?

Cheers!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

External or internal?

C# introduced the concept of iterators in C# 2.0 but it's a less-known fact that there are two sorts of iterators. The ones provided in C# are so-called external iterators. The distinction lies in the party that controls the enumeration of the iteration, e.g.

public IEnumerator<int> FromTo(int from, int to)
{
    for (int i = from; i <= to; i++)
        yield return i;
}

does by itself not perform any iteration; it's only when the consumer starts to suck data out of the enumerator object (typically using a foreach loop) that the data is fetched:

foreach (int i in FromTo(1, 3))
    Console.WriteLine(i);

In other words, the object returned by FromTo (which is a little state machine implementing IEnumerator<T> or IEnumerable<T>) just captures the capability of iterating over the produced results in an on-request basis. Notice using C# 3.0 syntax, we can make the use of it 'different', obtaining a more Ruby-like style:

public static IEnumerator<int> To(this int from, int to)
{
    for (int i = from; i <= to; i++)
        yield return i;
}

using an extension method so it can be called like this:

foreach (int i in 1.To(3))
    Console.WriteLine(i);

Still no different than what we started with but as you can guess: besides external iterators we have internal iterators. Internal iterators perform the iteration themselves (they push the generated values into the code body associated with them) and are therefore very similar to built-in loop control structures in the language. A possible implementation looks like this:

public static IEnumerator<int> To(this int from, int to, Action<int> a)
{
    for (int i = from; i <= to; i++)
        a(i);
}

which can be called like this:

1.To(3, i => { Console.WriteLine(i); });

using a C# 3.0 lambda expression. Notice there is no blue (syntax coloring) left, we have defined our own control structure. Nothing fancy but one thing we've actually given up here is our ability to define an object that captures the "iterator's potential" to iterate from 1 to 3 and that can be invoked multiple times. (Whether or not this is a cosmetic detail is left for judgement by the reader.) In other words, we'd like to be able to define a control structure like this:

var myLoop = 1.To(3);
myLoop(i => { Console.WriteLine(i); });
myLoop(i => { Console.WriteLine(i + " again!"); });

What we're doing here is making use of partial function application. It looks like we're calling the To method with just one parameter, allowing the other parameters to be filled in later, realizing higher-order functions. I deliberately obfuscated the type of 1.To(3) in the fragment above to elaborate on it a bit more now. What does the type of 1.To(3) need to look like? Well, we should be able to call it with one parameter of type Action<int>. The result of calling through the 1.To(3) object is purely side-effect based, i.e. the call doesn't return anything by itself, so the return type of the type we're looking for is void. To wrap up, a void-returning callable thing is nothing less than an Action<T> where T is our parameter, an Action<int>, so the resulting type is Action<Action<int>>. This level of "action-indirection" signifies the possible delayed characteristic of the iterator invocation. The reason the delayed execution is optional is obvious since one can call it like this:

1.To(3)(i => { Console.WriteLine(i); });

In Ruby one would write:

1.upto(3) { |i| print i }

 

Implementing internal iterators

So how to implement this? Not that difficult at all:

public static Action<Action<int>> To(this int from, int to)
{
    return a => { for (int i = from; i <= to; i++) a(i); };
}

Notice how the iteration moved inside the "iterator", how it's indirected through one level of lambda-abstraction and now yield keywords have been used (making it no longer an iterator in C# lingo). In fact, there's lots of magic going on in this fragment. First of all, the type inference carried out by the compiler. Since the signature of the To method indicates we're returning an Action<Action<int>> - which is fancy speak for delegate void _(delegate void_(int)) - the compiler can infer that the lambda's parameter a has type Action<int>:

return (Action<int> a) => { for (int i = from; i <= to; i++) a(i); };

Furthermore, lambda's are pure syntactical sugar for anonymous methods (at least in this case where no expression trees are involved):

return delegate (Action<int> a) { for (int i = from; i <= to; i++) a(i); };

The end result with sample looks like this:

image

We could stop here but if you're curious for the technical details that make this work, read on.

 

Behind the compiler curtains

The real question though is what gets returned. This is where closures kick in:

image 

The <>c__DisplayClass1 captures the from and to arguments to the iterator "constructor" call (since that's ultimately what our To call does) and has a method called <To>b__0 that contains our iteration logic:

image

Notice the call through the supplied Action<int> delegate on line IL_000c. An obvious question is whether or not a closure in this case is useful. What we're capturing here are the parameters from and to that are supplied to the To method (that lot's of to's too...). Since closures in C# capture lvals (mimicking copy by ref semantics) there's a code-window where the variables can change and where that change is propagated into the closure. Assume our To-implementation would look like this:

public static Action<Action<int>> To(this int from, int to)
{
    Action<Action<int>> res = a => { for (int i = from; i <= to; i++) a(i); };
    from++; to--;
    return res;
}

when calling To(1,3)(i => { Console.WriteLine(i); }); you'd now see only 2 getting printed on the screen. Why? The locals from and to are getting captured by the closure (also known as captured outer variables, see C# specification §14.5.15.3.1), so wherever one refers to from and to in the local scope the captured variables are getting referenced instead. So by the time a call is made through To(1,3) the captured values have both changed to 2 (from++, to--). However, this closure doesn't propagate outside the scope of the To method, in other words - and again assuming the original implementation of To - you won't see the effects of it when writing code like this:

int f = 1;
int t = 3;
var loop = f.To(t);
f++;
t--;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });

This will still print all 6 expected lines. Let's eliminate our To method for a moment and try to write everything compacted in one place, like this:

Func<int, int, Action<Action<int>>> To = (from, to) => a => { for (int i = from; i <= to; i++) a(i); };
To(1, 3)(i => { Console.WriteLine(i); });
To(1, 3)(i => { Console.WriteLine(i + " again!"); });

Again 6 lines will be printed to the screen. Notice how the (inline) To "method" above has the same signature as our previous extension method To: we take in two ints and return an Action<Action<int>>. Also notice how lambda arrows are right associative, you should read it as:

(from, to) => (a => { for (int i = from; i <= to; i++) a(i); })

What now with closures? Actually, nothing changes. I did say closures capture the outer scope; however, from and to used in the body of the inner lambda refer to the locals from and to defined in the outer lambda, so when calling To(f, t) below:

int f = 1;
int t = 3;
var loop = To(f, t);
f++;
t--;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });

the values f and t are copied into the anonymous method of the outer lambda; where they get captured in a closure used in the inner lambda's body. When you'd loosen the abstraction of the To method like this:

int from = 1;
int to = 3;

Action<Action<int>> To = a => { for (int i = from; i <= to; i++) a(i); };

from++;
to--;

To(i => { Console.WriteLine(i); });
To(i => { Console.WriteLine(i + " again!"); });

there's just one lambda left for which the outer scope is the directly surrounding method, so from and to are captured by a closure and changing them will affect the operation of To.

The closure inside the To method doesn't really gain us anything. Obviously the question is which behavior one wants to realize: one where the loop's boundaries can be changed afterwards (much like closures would allow us to do) or one where the loop object is immutable. To realize the former - if you'd really insist - some creativity is required. 1.To(3) could return an object with properties From and To while also exposing a method called Invoke to invoke the loop with the specified Action<int>. An object with an Invoke method looks pretty much like a delegate, so geeks could hack up IL to create a delegate with From and To properties. Not worth the effort IMO (apart from being a very interesting experiment), immutability is great and this definitely applies to this particular case. However (:-)) if you still insist, there's another escape valve: perform one level of indirection to the arguments to the iterator itself:

public static Action<Action<int>> To(this Func<int> from, Func<int> to)
{
    return a => { for (int i = from(); i <= to(); i++) a(i); };
}

Notice the orthogonal structure between functions/classes (code versus data) and lifted functions/pointers (=> versus * and - the reverse - () versus *). Calling it doesn't look pretty anymore though (partially because the left-hand side used in an extension method doesn't work with a lambda expression since a lambda has no type by itself, it could either be a Func<...> or an Expression<Func<...>> depending on its - in this case non-existent - assignment):

int b = 3;
var loop = (new Func<int>(() => 1)).To(() => b);
b = 2;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });

but here b is caught by a closure, so changing it after obtaining the loop variable will change it's semantics. Brrrr...

 

A dynamic approach

Sticking with immutable internal iterators, we could eliminate the closure and produce the delegate at runtime using Reflection.Emit. This would look as follows:

public static Action<Action<int>> To(this int from, int to)
{
    DynamicMethod m = new DynamicMethod("", null, new Type[] { typeof(Action<int>) });
    ILGenerator ilgen = m.GetILGenerator();

    Label loop = ilgen.DefineLabel();
    Label ret  = ilgen.DefineLabel();

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //      ^^^^^^^^^^^^
    //
    var var_i = ilgen.DeclareLocal(typeof(int));
    ilgen.Emit(OpCodes.Ldc_I4, from);
    ilgen.Emit(OpCodes.Stloc, var_i);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                         ^^
    //
    var var_to = ilgen.DeclareLocal(typeof(int));
    ilgen.Emit(OpCodes.Ldc_I4, to);
    ilgen.Emit(OpCodes.Stloc, var_to);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    // ^^^^
    //
    ilgen.MarkLabel(loop);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                    ^^^^^^^
    //
    ilgen.Emit(OpCodes.Ldloc, var_i);
    ilgen.Emit(OpCodes.Ldloc, var_to);
    ilgen.Emit(OpCodes.Cgt);
    ilgen.Emit(OpCodes.Brtrue, ret);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                                  ^ ^^^^^
    //
    ilgen.Emit(OpCodes.Ldarg_0);
    ilgen.Emit(OpCodes.Ldloc, var_i);
    ilgen.Emit(OpCodes.Callvirt, typeof(Action<int>).GetMethod("Invoke"));

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                             ^^^
    //
    ilgen.Emit(OpCodes.Ldloc, var_i);
    ilgen.Emit(OpCodes.Ldc_I4_1);
    ilgen.Emit(OpCodes.Add);
    ilgen.Emit(OpCodes.Stloc, var_i);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                                          ^
    //
    ilgen.Emit(OpCodes.Br, loop);

    //
    // for (int i = from; i <= to; i++) { a(i); } return;
    //                                            ^^^^^^^
    //
    ilgen.MarkLabel(ret);
    ilgen.Emit(OpCodes.Ret);

    return (Action<Action<int>>)m.CreateDelegate(typeof(Action<Action<int>>));
}

Performance-wise you'll see that the latter obviously is slower (and actually on the functional level you don't gain anything since the redundant closure is harmless as we have proven higher up - also concerning memory allocation we're trading closure objects for DynamicMethod and other Reflection.Emit object instances). It shows however how a 3rd party compiler (or interpreter) could emit IL code that realizes an internal iterator with very little code involved (only 17 IL instructions).

 

Other iterators

Of course the "To" operator iterator is just one of the many possibilities. Here's just one of the many:

public static Action<Action<T>> ForEach<T>(this IEnumerable<T> source)
{
    return a => { foreach (T item in source) a(item); };
}

where the indirection of actions is a little artificial, agreed:

new int [] { 1, 2, 3 }.ForEach()(i => { Console.WriteLine(i); });

This is a place where extension properties would come in handy, or obviously we could just write the following instead:

public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
    foreach (T item in source) action(item);
}

Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Last month we released the June 08 CTP of the Parallel Extensions to the .NET Framework. For more information and to get the latest updates, see http://msdn.microsoft.com/concurrency and the team blog at http://blogs.msdn.com/pfxteam. In this post (of a longer series) I want to highlight one particular feature called futures. But first, to get started: download the CTP, install it, open VS 2008 and create a new Console Application (I'll be using C#) and add a reference to the System.Threading.dll assembly that gets installed by the Parallel Extensions:

image

There's a bunch of stuff in this CTP, as you can see from the Object Browser:

image

I'll be covering quite some of this in subsequent posts, but for now