Sunday, October 19, 2008 7:22 PM
“The C# Programming Language Third Edition” and thoughts on language evolution
With a hypothetical next release of the C# language around the corner (more about that after Anders, our language Caesar, has delivered his “Future of C#” talk on the PDC 08), I’ve had the honor to receive an early print of The C# Programming Language Third Edition. As you can guess, this book is the successor to The C# Programming Language Second Edition, now extended with coverage for the C# 3.0 features and based on the unified C# Language Specification 3.0.
So why would you want to buy a language specification in book format? Personally I like to have those things in print on my desk, just like I have The Common Language Infrastructure Annotated Standard on my desk (unfortunately lagging behind a bit on ECMA 335). But there’s more to it with this new edition of the book: it got annotated by a bunch of people, providing additional insights about the different language features, design decisions, consequences of those decisions, best practices on when and how to use certain features, etc. In other words, if you like to have the ultimate coverage of the C# language (not its uses through certain tools using certain libraries) but don’t want to be bored to death by language specification legalese, this book is for you. The annotations are numerous, enlightening and by times funny.
Some language history
Talking about language evolution, people often raise the question where the language is going with all those new fancy features. Doesn’t it become too bloated? Well, every new feature definitely ages the language (quoting Anders here) and some of the original features are ready for the graveyard because they got superseded by newer more powerful features. And yes, having more features means there’s more to learn when approaching the language. But at the same time, one shouldn’t forget about the core value of a language: capturing common paradigms and patterns, making them easier to use (expressiveness) and less error-prone. A few samples to illustrate this point.
Starting with C# 1.0, one of the common design themes of the language was what I’d call “honesty and pragmatism”. Programmers were talking all the time about e.g. properties, events and function pointers, so why shouldn’t these concepts become constructs directly available in the language? Such observations not only shaped the language but the underlying runtime as well, resulting in first-class support for metadata everywhere (well, almost everywhere). Pragmatic, yes, but honest? Sure enough. Although strictly speaking concepts like properties are foreign to minimalistic object-oriented programming (yes, you can mimic function pointers by one-method-interfaces, and events can be simulated by likewise means), sprinkling a little bit of syntactical sugar and equally important metadata on top provides great benefits with regards to code understandability, tooling and runtime metadata inspection features (known as reflection). But this was only the beginning, as the “Cool” (C-style object oriented language) language also embraced and captured common patterns such as iteration (foreach), proper handling of resources (using), resource locking (lock), etc. Nevertheless, there have always been features that are very powerful but shouldn’t be abused or misused, such as operator overloading, unsafe code, … And of course, the language doesn’t expose all of the underlying runtime’s features such as – just to name a few, there are many more – tail calls, typed references and arglists (somehow that sounds familiar).
In C# 2.0, generics were introduced, making the “Cup of T” approach for collections (and more) a first-class citizen in the runtime and language, resulting in less errors and various other benefits (performance, richer IntelliSense, etc). At the same time, generics made previous “features” (partially) redundant. To point out one of these cases, refer to section 8.8.4 in the book you’re about to buy, covering the foreach loop. Before we had IEnumerable<T>, foreach had to deal with the non-generic counterpart resulting in quite some interesting complications that have to do with boxing and the use of casts on the iteration variable (as developers typically would like to iterate over the collection using a type more specific than System.Object, but there were no generics to give guarantees about the collection’s element type). This is a sample of where a new feature solves known problems while also opening the gateway to lots of new things that weren’t possible before. One such thing is a generalization of “nullability”, the lack of which has plagued many developers before when dealing with database records where the distinction between value and reference types doesn’t exist typically. Looking at the nullable type feature in C# 2.0, again the common theme of making common tasks easier pops up, while solving subtle shortcomings of previous releases, in this particular case with the unified type system lacking nullability for value types (a more fundamental “issue” that has quite some language impact, e.g. the use of the default keyword when dealing with generics, more cases to consider in the generics constraint mechanism, etc).
Although C# 2.0 was mostly about catching up on original design plans, there was also room for some innovation. I don’t know precisely when the idea about iterators came up, but these were definitely one of the more exotic features in the second release of the language, opening for lots of new possibilities. While standing on the shoulders of giants (in this case generics), newer language features start building again upon these (as you can guess I’m referring to LINQ to Objects, heavily relying on iterators). One especially notable thing about the C# 2.0 release was its dependence on new runtime-level features to support generics (ECMA 335 and TR/89), so there’s no “Lost in translation” C#-2.0-to-1.0 approach possible. And again, the language didn’t expose the underlying empowering runtime mechanism fully, namely with regards to generics variance (ECMA 335 II.9.5).
Our latest release so far, C# 3.0, followed the same principles but was really the first one to innovate or revolutionize the way we develop typical applications, whether it’s in the business domain (first class data querying with LINQ), in a scientific atmosphere (lambda BLOCKED EXPRESSION or to enter the New World of Meta Programming (expression trees). Once more, features from previous releases, such as generics and iterators, become increasingly important to build newer features on. Radically different from C# 1.0 to 2.0, no runtime extensions were needed, allowing for a real “Lost in translation” translation from C# 3.0 onto equivalent C# 2.0 constructs (not saying anything about readability of such a translation’s result though). Again in the realm of simplifying programming, other constructs like auto-implemented properties and local variable type type inference were introduced, giving the language a more dynamic feeling while keeping strong-typing all the way through. Silly complexities to new up objects or collections were eliminated with new initializer syntaxes and specifying functions does no longer require noisy keywords like “delegate” (referring to the use in anonymous methods) thanks to lambda syntax. And while I admit that some of the syntax looks foreign, most of it can be understood straight away (maybe the only exception is the lambda arrow that requires one aha-erlebnis), so readability doesn’t suffer. As usual, features can get abused just like a Swiss knife is a powerful tool but can be used for bad purposes too. Samples include over-use of local variable type inference at the cost of readability, or inappropriate use of extension methods.
A changing landscape
Languages (especially general-purpose ones) need to adapt to a changing programming landscape if they want to stay relevant. And while specialized domains hugely benefit from non-general purpose languages (not to use words like DSLs just yet) – and a mixed-language approach to solving problems is entirely possible thanks to the CLS – some concepts that used to be exotic or irrelevant are becoming mainstream for the first time (or again after being doomed dead a while ago). One such first-timer is functional style of programming. I’m deliberately highlighting the word style as I’m not referring to extremist functional programming (lazy evaluation, no side-effects, use of monads) but rather to influences originating from functional languages and all the underlying theory making their way into mainstream languages. C# 2.0 introduced the glue with true closures, C# 3.0 provided easier syntax with lambdas. And those investments have already paid off hugely with the advent of libraries like the Task Parallel Library and various LINQ implementations. In some sense, LINQ’s mission brought more clarity (insider’s joke) to the concurrency domain, almost as a side-effect :-).
Concurrency is huge; Anders called it the elephant in the room in his JAOO talk (actually our hypothetical new version got a name in an interview with Anders on JAOO, I promise you the name won’t be a big surprise…). It has many faces, spreading from one machine to multiple machines, impacting not only the languages but also the way libraries are built and programs are written. Not to talk about tool support, operating system level changes, and the subtle interactions with hypervisor technologies (VM guests that want to benefit from multiple cores, trying not to collide with one another, etc). So there’s lots of work to be done in this domain to shape languages and tackle the beast. It doesn’t come for free though and things will have to give in: shifting from imperative style to a more declarative style, giving up on “mutable data structures everywhere” (notice the placing of the quotes), etc. Essentially sharing knowledge about the problem at hand with the runtime, rather than over-specifying a particular approach to solve the problem.
Finally, I’d point out a third source of language potential: the meta-programming paradigm. It’s maybe the least understood and most ill-defined of the three, but domain specific islands within general-purpose environments (not to just say languages) have an enormous potential, if not abused. Environments like PowerShell have first-class support for regular expressions (-match operator), C# 3.0 and VB 9.0 have LINQ, but what about giving developers the means to cook their own DSLs when appropriate? Here we enter the realm of expression, statement and declaration trees (as implemented in the DLR), running compilers as services (CodeDOM and complete-program source compiler invocations – as in ASP.NET code behind – were the first, albeit degenerate and brute force approach, glimpse of this technique; later followed by LambdaExpression.Compile). Fluent interfaces can be seen as the poor man’s DSLs and going beyond that with Internal DSLs is something that might prove useful in the future. Some people have (and still) bet on AOP as the new big paradigm after OO. Well, I’d put more bets on MP if I get to choose. While it’s not incompatible with ideas from OO, it has the potential of adding another dimension to programming. We already have a spatial extent with data-driven (OO) and operation-driven (procedural and functional) decomposition axes; MP could add a dimension to this picture, in the form of specialized problem space islands within a bigger language. Whether or not it’s really orthogonal is a question we’ll leave open for the time being…
Future will tell, and part of that future is coming at PDC 08. Enjoy!Del.icio.us
| Digg It
Filed under: C# 3.0, Functional programming, Dynamic languages