Monday, January 23, 2012

Closure over foreach variable in C#

I have decided that I will dedicate the very first post in my brand new blog to the topic which I have noticed has pretty high coverage in StackOverflow (1) which I have been participating in recently. I believe it is also a case in other places where programmers exchange their knowledge.

The typical example of the post is that the original poster is puzzled that the following piece of code does not work as they would expect it to:

var query = original_query;
foreach (var s in my_cool_strings)
{
   query = query.Where(t => t.name == s);
}

Let as assume that:
my_cool_string = new string [] { "foo", "bar" };

Most people will expect the constructed query to be equivalent to
original_query.Where(t => t.name == "foo").Where(t => t.name == "bar")

However, it is not a case, in fact the result of the presented foreach loop will be the following query:

original_query.Where(t => t.name == "bar").Where(t => t.name == "bar")
(Observe doubled bar instead of foo).

If you are not familiar with the facts that:
1) The lambda closes s by variable, not value.
2) The s variable is external to the foreach block
you may be surprised by it.

Lambdas are simply constructed, not evaluated inside the block (well, this is what the lambdas are meant for). Thus, both reference the same s variable - external to the block, which after loop terminates contains the value "bar".

This is a case because the foreach loop roughly translates to the following code:

{
    IEnumerator<string> e = ((IEnumerable<string>)my_cool_strings).GetEnumerator();
    try
    { 
      string s; 
      while(e.MoveNext())
      {
        s = (string)e.Current;
        query = query.Where(t => t.name == s);
      }
    }
    finally
    { 
      if (e != null) ((IDisposable)e).Dispose();
    }
} 

The body of while corresponds to the body of foreach loop and we can see that the iteration variable is outside the block (line 6).

The most common problem with this behavior is making a closure over iteration variable and it has an easy workaround:

foreach (var s in my_cool_strings)
{
    var s_for_closure = s;
    query = query.Where(t => t.name == s_for_closure); // access to modified closure

Eric Lippert in his tremendous blog has posted a two-episode series of posts (2) describing while this design was chosen.

Personally, the most convincing argument to me is that having new variable in each iteration would be inconsistent with for(;;) style loop as you would not expect to have a new int i in each iteration of for (int i = 0; i < 10; i++).

Although, it is hard not to agree with the comment of Random832 placed under my opionion on StackOverflow:
Ultimately, what people actually want when they write this isn't to have multiple variables, it's to close over the value. And it's hard to think of a usable syntax for that in the general case.

Ultimately, Eric Lippert has announced that the C# team is going to take this breaking change and C# 5 will place the loop variable logically inside the loop, rendering no longer valid what discussed in this post and making the original foreach loop closure work as most people expect. (2)

(1) The example of StackOverflow item I was participating in: Is there a reason for C#'s reuse of the variable in a foreach?.

(2) The Eric Lippert's posts on the subject: part one, part two

3 comments:

  1. This made me remind of my college days, we used to fight hard to write such codes. As such being an developer for 2 years this is nothing new for me, but beginners will surely find this useful for them .

    ReplyDelete
    Replies
    1. Believe me or not, but a lot of experienced developers get caught on this as well. It's nothing new for you, because you have already struggled with it. But it is not intuitive and obvious and C# team even decided to make a breaking change and alter this behavior.

      Delete
  2. Oh really useful publish actually i am struggling lots of your time and effort on online but now i got what i want.Thanks again keep sharing

    ReplyDelete