Tuesday, January 24, 2012

Extending DynamicLINQ language: Specifying class name in "new" clause

Dynamic Linq (1) is a library provided in source code by Microsoft which provides dynamic linq capabilities - i.e. you can construct queries as strings instead of type-safe programming language constructs as in the default linq. The library provides additional extension methods for IQueryable like Where and Select which accept strings which represent queries which are parsed in the runtime into the adequate lambda expressions for linq expression tree.

It allows you to write queries like the following:
var query = YourDB.Products.Where("CategoryID = 2 And Price < 10").OrderBy("StockCount");
What can constitute the string is a simple specific DynamicLinq language which is well documented in the documents attached to the DynamicLinq package. The heart of DynamicLinq is actually an implementation of a parser for this microlanguage which parses the given string into Expression object.

The part of the language is an ability to instantiate new objects of anonymous classes, especially useful in Select and SelectMany directives. You can do e.g.:
cars.Select("new (name, year, engine.type as engine_type)");
This will construct the object containing properties: name, year and engine_type.

In standard Linq as in e.g. C#, apart from dynamic "typeless" objects, you are able to construct the query in such a way that the objects of already existing type are returned. This is, however, not possible with the DynamicLinq as provided by Microsoft.
Nevertheless, with some knowledge of how compilers (or more precisely parsers) work, some understanding of lambda expression trees in .NET and after analyzing the source code of Dynamic.cs (file containing the implementation of the DLinq), it is relatively easy to extend it with the capabilities to name the existing types to be created in new clauses.

Currently, the grammar for new expression in DynamicLinq language is as following:
new (expr1 as name1, expr2 as name2, ..., exprn as namen)
where name# is the name of the property in the resultant object which will hold the value evaluated from expr#. If the expression boils down to the property getter, the as name# part can be omitted and the property in the resultant object will be exact to the name of evaluated property. We can extend the grammar in the following manner without introducing ambiguity:
new Namespace.TypeName (expr1 as name1, expr2 as name2, ..., exprn as namen)
Not providing the TypeName will still denote the instantiation of an anonymous object. If we take a look at ExpressionParser class, we quickly localize the method ParseNew. This is, indeed, a method responsible for parsing the new expression. Currently, the parse method looks more or less like the following:
  1. Consume "new" keyword.
  2. Consume opening parenthesis
  3. Loop doing the following:
    1. Parse expression
    2. If next token is "as" consume it and the following token as an identifier
    3. Store the dynamic property definition using the obtained name and expression
    4. If the next token is comma, consume and continue; otherwise break loop.
  4. Consume closing parenthesis
  5. Synthesize and instantiate the anonymous type based on the accumulated dynamic property definitions.
  6. Return the expression tree node for type instantiation parametrized by the obtained type.
In order to support our new grammar, we will have to add more steps between 1 and 2 which before consuming the opening parenthesis will consume as many as possible identifiers separated by dot(.) which will constitute the name for the existing type. Additionally, if any such identifier is actually present, we will toggle the flag signalizing that we are constructing the object of the existing type.
With the flag on, instead of points 5 and 6, we will instantiate the existing type using Type.GetType and bind the expressions values to its already present properties.

Finally, the ParseNew method will look like the following:
Expression ParseNew() {

    bool anonymous = true;
    Type class_type = null;

    if (token.id == TokenId.Identifier)
        anonymous = false;
        StringBuilder full_type_name = new StringBuilder(GetIdentifier());
        while (token.id == TokenId.Dot)
            ValidateToken(TokenId.Identifier, Res.IdentifierExpected);
        class_type = Type.GetType(full_type_name.ToString(), false);    
        if (class_type == null)
            throw ParseError(Res.TypeNotFound, full_type_name.ToString());

    ValidateToken(TokenId.OpenParen, Res.OpenParenExpected);
    List<DynamicProperty> properties = new List<DynamicProperty>();
    List<Expression> expressions = new List<Expression>();
    while (true) {
        int exprPos = token.pos;
        Expression expr = ParseExpression();
        string propName;
        if (TokenIdentifierIs("as")) {
            propName = GetIdentifier();
        else {
            MemberExpression me = expr as MemberExpression;
            if (me == null) throw ParseError(exprPos, Res.MissingAsClause);
            propName = me.Member.Name;
        properties.Add(new DynamicProperty(propName, expr.Type));
        if (token.id != TokenId.Comma) break;
    ValidateToken(TokenId.CloseParen, Res.CloseParenOrCommaExpected);
    Type type = anonymous ? DynamicExpression.CreateClass(properties) : class_type; 
    MemberBinding[] bindings = new MemberBinding[properties.Count];
    for (int i = 0; i < bindings.Length; i++)
        bindings[i] = Expression.Bind(type.GetProperty(properties[i].Name), expressions[i]);
    return Expression.MemberInit(Expression.New(type), bindings);
We will also have to add comment for the introduced exception to Res class:
public const string TypeNotFound = "Type {0} not found";
Now, we are able to construct queries like following:
cars.Select("new MyApp.CarInfo(name as name, year as year, engine.type as engine_type)");
You can also nest news:
cars.Select("new MyApp.CarInfo(name, year, new EngineInfo(engine.type as type, engine.info as my_info) as engine_info)");
Of course, you can also, for example, nest typed object inside an anonymous one:
cars.Select("new (name, year, new EngineInfo(engine.type as type, engine.info as my_info) as engine_info)");
Feel free to use the modified DynamicLinq, it is uploaded to Google Code.(3)

(1) Dynamic Linq is a part of a package available here.

(2) Dynamic Linq is described thoroughly in this blog post.

(3) The full modified Dynamic.cs is available here. Note: The version from HEAD has further updates not described here.

(4) Original StackOverflow answer where I presented the changes: Dynamic LINQ: Specifying class name in new clause

Monday, January 23, 2012

Closure over foreach variable in C#

I have decided that I will dedicate the very first post in my brand new blog to the topic which I have noticed has pretty high coverage in StackOverflow (1) which I have been participating in recently. I believe it is also a case in other places where programmers exchange their knowledge.

The typical example of the post is that the original poster is puzzled that the following piece of code does not work as they would expect it to:

var query = original_query;
foreach (var s in my_cool_strings)
   query = query.Where(t => t.name == s);

Let as assume that:
my_cool_string = new string [] { "foo", "bar" };

Most people will expect the constructed query to be equivalent to
original_query.Where(t => t.name == "foo").Where(t => t.name == "bar")

However, it is not a case, in fact the result of the presented foreach loop will be the following query:

original_query.Where(t => t.name == "bar").Where(t => t.name == "bar")
(Observe doubled bar instead of foo).

If you are not familiar with the facts that:
1) The lambda closes s by variable, not value.
2) The s variable is external to the foreach block
you may be surprised by it.

Lambdas are simply constructed, not evaluated inside the block (well, this is what the lambdas are meant for). Thus, both reference the same s variable - external to the block, which after loop terminates contains the value "bar".

This is a case because the foreach loop roughly translates to the following code:

    IEnumerator<string> e = ((IEnumerable<string>)my_cool_strings).GetEnumerator();
      string s; 
        s = (string)e.Current;
        query = query.Where(t => t.name == s);
      if (e != null) ((IDisposable)e).Dispose();

The body of while corresponds to the body of foreach loop and we can see that the iteration variable is outside the block (line 6).

The most common problem with this behavior is making a closure over iteration variable and it has an easy workaround:

foreach (var s in my_cool_strings)
    var s_for_closure = s;
    query = query.Where(t => t.name == s_for_closure); // access to modified closure

Eric Lippert in his tremendous blog has posted a two-episode series of posts (2) describing while this design was chosen.

Personally, the most convincing argument to me is that having new variable in each iteration would be inconsistent with for(;;) style loop as you would not expect to have a new int i in each iteration of for (int i = 0; i < 10; i++).

Although, it is hard not to agree with the comment of Random832 placed under my opionion on StackOverflow:
Ultimately, what people actually want when they write this isn't to have multiple variables, it's to close over the value. And it's hard to think of a usable syntax for that in the general case.

Ultimately, Eric Lippert has announced that the C# team is going to take this breaking change and C# 5 will place the loop variable logically inside the loop, rendering no longer valid what discussed in this post and making the original foreach loop closure work as most people expect. (2)

(1) The example of StackOverflow item I was participating in: Is there a reason for C#'s reuse of the variable in a foreach?.

(2) The Eric Lippert's posts on the subject: part one, part two