Friday, May 09, 2008

Why not come work on the Common Language Runtime team at Microsoft?  My team is a team of SDETs (Software Development Engineer in Test) that works in the really low levels of the CLR.  We cover areas like assembly loading/fusion, AppDomains, the shim (mscoree), and interop.

As an SDET, you are in charge of all aspects of quality for the areas you own. This means you get to be involved in the design process and be literally the first person developing code using new features coming out of the team.  I've found my last year and a half here to be extremely satisfying.

This is a pretty exciting time to join the team as we're working on some long-lead items for the next version of the CLR and we're also busy shipping Silverlight 2 and .NET Framework 3.5 SP1.

You'll need strong coding, problem-solving, and communication skills.  Some background on the CLR and managed code is a plus, but not required.  If you think you've got what it takes to join the team, check out the job details and submit your resume.  I'm happy to answer questions about the team and the jobs within reason, so feel free to ping me at marklio at [you-know-where].com.

There are several openings in other areas of the team as well.  Feel free to search the other CLR jobs and find the one that's right for you.

posted on Friday, May 09, 2008 10:40:09 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Wednesday, April 23, 2008

I've been doing some app-building with Silverlight lately and exploring the limitations of the platform in comparison to the full desktop CLR and what that means for the Silverlight "ecosystem".  Those limitations can be summed up with 3 items:

  1. Strict sandbox security model
  2. Reduced managed framework surface area
  3. No binary compatibility with libraries targeting the desktop CLR (eliminates the number of 3rd party components you can leverage).

I think #3 will begin to become a non-issue as more component providers provide builds for Silverlight. #2 can be broken down into 2 areas:

  • Full technology areas that are unavailable on Silverlight (LinqToSql, WinForms, etc)
  • Reduced/pruned APIs

The second is where my interest lies, and intersects with #1 (the new security model).  We've done extensive threat modeling against our APIs as well as automated tooling that has either removed (or made internal) certain APIs, or marked them as SecurityCritical, meaning they cannot be called directly from "user" code.

In addition, the security model requires safe, verifiable user code.  Normally when developing on the desktop CLR, unless you are specifically targeting a low trust environment, you can do whatever you like.

So, for this exercise, I pulled out my trusty STDF parser (blogged here and here).  I've used this project as a test vehicle for both v2.0 and Orcas, and it's served well as a project that leverages a large cross-section of features in the Framework, from high-level stuff like Linq down to expression tree inspection and further down to LCG/RefEmit.

In short, I was able to get the parser working without too much trouble. I felt like creating a source bed that intended to target both Silverlight and the Desktop should be an attainable goal.  You just have to switch your mindset from binary compatibility to source compatibility.

I combated the reduced surface area with extension methods, which worked quite well to centralize the "overload shims" that needed to be "Silverlight-only", as well as for a general refactoring tool.  My goal is to have all the desktop vs. Silverlight differences centralized into files that are either included or excluded from the build depending on which platform I'm targeting.  I wish you could create extension properties.  That would let me close all the surface area discrepancies that aren't caused by missing/irrelevant technology areas.

I was pleased that 99% of my LCG codegen stuff "just worked".  I make heavy use of Reflection.Emit via DynamicMethod to generate my record parsers based on attributes on the record classes.  The 2 problems I ran into were:

  • Visibility restrictions - The new security model won't let my dynamic methods see internals.  I had used this ability to keep the API clean.  I'm still figuring out the best approach, but it was simple enough just to expose those methods.
  • Verifiability - I had a few places where I was generating unverifiable code.  Some of these were my own codegen bugs, but others were just bad assumptions on my part.

This brings me to constrained callvirt, an interesting little IL tidbit I discovered in the porting process.  Calling conventions are subtly different between reference types and value types, and it also depends on whether the given method is actually overridden in the value type (which can lead to confusing breaks when that state changes).  In the v1.x days, you always knew the type you were dealing with, so it didn't matter that much and you could generally always create the right sequence of IL to make a call.  V2.0 introduced generics, which meant that you couldn't emit unified IL for both the possibility of reference types or value types.  This meant there needed to be a way to unify the IL for the 2 cases.  This is where the constrained prefix came in.  It allows you to write unified IL that works regardless of whether you're working with a value type or a reference type (think generics constrained by an interface, or calling a method defined on System.Object like ToString()).

Anyway, I was able to fix my unverifiable code by utilizing the constrained prefix.  It also simplified my codegen logic significantly in a number of places where I had different paths based on whether I was working with a value type or not.

All in all, I was pleased with the results.  I'll be posting a sample Silverlight app using the library when I get some UI stuff figured out.

posted on Wednesday, April 23, 2008 12:43:46 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Monday, February 25, 2008

Some of my posts are really reactions to search queries that have previously landed on my blog.  If they did a search that got to my blog, but I know they didn't find what they were looking for, chances are they (or someone else) will do the same again.  And, if I HAVE the information they are looking for, it makes sense to just add the information, even if it's what I would consider well-known or common sense information. (common sense for software developers, that is)

One general search query I see again and again is something like "What is Action<T> for?" or "What is Func<T>?"

These are framework-provided, generic delegate types.  If you'll recall, delegates can be thought of as type-safe function pointers.  A delegate type really just captures a signature as "callable" object.  Leveraging generics to define delegate types that can capture common signatures is goodness, since they are very flexible and can be used by anyone.  This also aids in interop between different components, since a general signature is far more interopable than custom delegate types.

In v2.0, several functional-looking APIs were added that took delegates as arguments (think List<T>), so instead of adding a special delegate type for each API, several "generic" delegates were added to capture the "essence" of a signature such as Action<T> which takes T and does some action (returning void), Predicate<T> which takes T and returns bool (presumably doing some test against T), Comparer<T> which compares 2 T's, etc.

In v3.5, even more generalized functional patterns were introduced (used heavily in Linq).  And we added a bunch more Action<> "overloads" for functions returning void, and added Func<> "overloads" for functions with a return value.  (I use overload loosely since these are classes and not methods) These patterns dropped the semantic "meaning" of the delegate, and just went straight to the idea of capturing a signature.

These framework-provided delegates are useful for using in your own code rather than creating your own.  Whether you leverage the Linq-centric, super-generic Action/Func pattern, or opt to consume the more meaningful v2.0 Predicate, Comparer, etc. is up to you.

posted on Monday, February 25, 2008 11:12:15 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Tuesday, February 12, 2008

This is one of those entries that attempts to fill a void in online search for a particular topic.  I ramble on for a while to give enough context so that a search engine can match it up in a relevant manner.

I was debugging something the other day, and thought I had come across a heinous bug in the CLR.  Turns out, everything was working fine and the bug was in the app.

A program was crashing and it was a managed exception, so I attached to it with VS2008 and dug in.  The first thing I noted was that this was a retail build, so my stack was collapsed in a number of places... expected, keep moving.  Then, I noted an interesting frame on my stack (this is a contrived example, not the actual thing I saw):

>    CanonTest.exe!CanonTest.GenericType<System.__Canon,int>.SomeOperation() + 0x55 bytes   

That's weird, what is System.__Canon and what's it doing here? Surely this must be a horrible CLR bug where my method tables are being corrupted!  Internet searches seem to be confirming my thoughts.  A few others with random __Canon's showing up on the stack do look like bugs.  After a bit more reasoning, I come to the conclusion that System.__Canon must be special in some way since it follows the pattern for such things...  marked internal, pre-pended with double-underscore.

A few emails later, I had my answer.  System.__Canon is the special type that is used to identify "canonical" generic type instantiations. It typically only shows up in release builds, so you don't normally see it on the stack while debugging.  So, it's easy to assume something's wrong when you do see it.

If you'll recall, one of the really cool things about the Generics implementation is that it allows for code sharing.  Jitted methods can be shared between compatible type instantiations.  For instance, the code for SomeGenericType<string> can likely be shared with SomeGenericType<Foo> (where Foo is another reference type. We don't currently share code between value types, so you'll continue to see those on the stack such as in the example).  That shared code needs to live somewhere, so we have the concept of a "canonical" generic type instantiation that is identified by __Canon as the type parameter.

Also, in alot of debugger stack representations you'll get the GenericType`2 form rather than expanded form.  In that case, you never see __Canon.

For some more info on Generics and code sharing, see Joel Pobar's excellent blog entry on the subject.

posted on Tuesday, February 12, 2008 11:56:35 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Wednesday, August 29, 2007

I wish I had time to come up with more concrete information (examples/code) in this post, but I don't have the time to work that stuff up.  I did think it would be useful for people searching for solutions to this problem, so here it is in all its ambiguity.

I was playing with expression tree inspection and dynamic interpretation the other day, when I hit something that I was sure was a bug.  I was inspecting an expression tree, identifying "branches" of interest, and generating lamdba expressions from them on the fly.

You might do this to break up an expression into separate units of execution to spread across multiple processors (a la PLINQ), or to replace parts of a tree requiring local execution before passing off to another layer to be transformed into another domain like SQL, or whatever.  In any case, I was doing it.

I found that if the type of the expression was a value type, I could not create a lambda expression returning object from it, even though there is an inheritance relationship.  You get a fairly straightforward, but perhaps surprising exception.

After some back and forth with the Linq team, I discovered that this was by design.  In the case of value types, the boxing operation required to make an object must be represented by a unary convert expression. The solution is to wrap such expression trees with a call to Expression.Convert(expression, typeof(object)).

posted on Wednesday, August 29, 2007 4:17:11 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Thursday, July 26, 2007

An interesting pattern I've seen emerge since the early releases of Orcas is what I might refer to as "delegate properties".  What I mean by that is a property (or field, I suppose) that returns a delegate.  This pattern has some interesting implications.

First, in a language that treats delegates as directly callable objects, this pattern looks just like a method (someInstance.TheProperty(args)).  You can't tell the difference (although VS gives you different intellisense) by looking at a callsite like this. Among other things, this leads to some interesting naming issues.  Do you name it like you would a method?

Second, it opens up opportunities to do some really powerful (and slightly insane) hybrid inheritance models.  Think about a virtual delegate property that has both a getter and a setter, now think about trying to predict what that delegate will do when you call it.  It doesn't sound like something to recommend as part of a public API, but I think there are some interesting scenarios there.

If I come up with something interesting and useful, I'll let you know.

posted on Thursday, July 26, 2007 12:56:27 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Monday, July 16, 2007

If you have been following my series on delegates, you may have experimented with open-instance delegates and perhaps found it difficult to create an open-instance delegate for a value type.

If you'll recall, an open-instance delegate has an extra first parameter, used to pass the instance used for the invocation.  What's not made explicitly clear is that this first parameter must be passed by reference.

For reference types, you've automatically got a reference, but for value types, this must be a "ref" parameter.  For instance, a delegate type used as an open-instance delegate for Int32.CompareTo would have to be defined something like:

delegate int IntCompareToDelegate(ref int instance, int other);

Otherwise, you'll get a System.ArgumentException when you try to bind the method to the delegate, giving you the ever-helpful error message: "Error binding to target method".

There are lots of underlying reasons for this, both from a calling convention perspective, as well as a side-effect perspective.  But, you can simplify it by thinking about modifications to the instance.  If you passed by value (creating a copy that the method acted on), any changes made to the instance by the method would be lost because they happened to a copy.

In most cases, value types are immutable in the framework, but you could run into issues with your own types.  And, again, this isn't the only reason for this restriction (take a look at the IL generated for a value-type method call to get some more ideas).  It's just the easiest to understand.

If you'll recall, Orcas extension methods, which are similar in concept to this, do not follow this pattern and are subject to the infamous value type copying problems.

posted on Monday, July 16, 2007 11:24:58 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Wednesday, June 13, 2007

I was helping a friend with a problem recently.  He was taking a set of serial web service calls and doing them in parallel to save time, and was not up-to-speed on the best approach for that.  Once he settled on an approach, he realized that since his web service calls were being wrapped in an abstraction layer, he didn't have the Begin/End asynchronous call methods that are provided by the proxy class.

"No problem, just wrap them in a delegate".  The compiler automatically gives you Begin/EndInvoke methods in addition to the synchronous Invoke method.  And, you're guaranteed not to mess up the implementation because it's all provided by the CLR!  Just one of those things you might forget if you find yourself in the same situation.

posted on Wednesday, June 13, 2007 3:37:55 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
 Thursday, May 17, 2007

After my last few CLR posts, I've had a couple of private inquiries regarding the usefulness of closed static delegates.  To bring everyone up to speed, a delegate pointing to an instance method needs a "target" instance to operate on (we'll get to open instance delegates later).  A static method, needs no such target, so we can leverage the "space" used for the instance case to carry around another object of interest.  We call a delegate with a provided value for this space "closed over the first argument".

For example, let's say we have a static method that does some operation on two numbers.  For simplicity, let's just say it adds them.  Our silly class and method might look like this:

public static class NumberFunctions {
   public static double Add(double first, double second) {
      return first + second;
   }
}

Normally, a delegate for this method would look like:

public delegate double BinaryOperation(double first, double second);

But, we're going to create a closed static delegate, which means we're going to "burn" the first argument into the delegate itself, so it's not needed in the delegate signature.  Instead, we'll use the following delegate signature (I didn't spend much time thinking up these names, I hope they make sense:

public delegate double ClosedCall(double other);

So, how do we create the delegate?  Normally, since C# (pre-Orcas) doesn't have syntax for creating closed static delegates, you are forced to use one of the Delegate.CreateDelegate overloads:

ClosedCall addToOne = (ClosedCall)Delegate.CreateDelegate(
        typeof(ClosedCall),
        1.0,
        typeof(NumberFunctions).GetMethod("Add", BindingFlags.Public | BindingFlags.Static));

Of course, we just spent 2 entries looking at a helper that can do this for us (I'm not claiming this is better, I just want you to be able to see what's happening):

ClosedCall addToOne = DelegateBinder.Bind<ClosedCall>(1.0,
        typeof(NumberFunctions).GetMethod("Add", BindingFlags.Public | BindingFlags.Static));

Now, a call to addToOne(someNumber) will yield the result of adding the supplied argument to one.  This is a contrived example, but you could imagine taking a method (perhaps generated on the fly via LCG), and "attaching" an instance to it via the first argument.  Then, being able to call it many times with different subsequent arguments, or passing it to another component that would provide the rest of the arguments.  In this way, you get the benefits of not having to keep track of an instance, without having to own the API for the instance.  Additionally, you could "chain" delegates together so that many arguments are captured in a stack of delegate calls, allowing closure-type semantics at the cost of some stack space (although since C# has closure support, you'd never really need to do that).

What's really cool is that with C# 3.0's Extension Methods feature, we now have language support for creating early-bound closed-static delegates.  If you bind a delegate to an extension method (using the regular syntax for an instance method), you will get the exact IL for creating an early-bound closed static method without our fancy helper class.  Let's see how that would look.  Let's use a different example to keep us on our toes.  Here's a helper function that creates email addresses:

public static class StringExtensions {
   public static string MakeEmailAddressWithAlias(this string domain, string alias) {
      return string.Format("{0}@{1}", alias, domain);
   }
}

Notice the "this" in front of the first parameter, this tells the compiler that the method should be considered when resolving method calls for string.  We'll use one of the delegate types provided in Orcas. Now, here's how the bind looks:

string fooDotCom = "foo.com";
Func<string, string> makeFooDotComAddress = fooDotCom.MakeEmailAddressWithAlias;

string email = makeFooDotComAddress("bar");

So, the result is that email will be bar@foo.com.

Hopefully, through these contrived examples, you can see the scenarios that closed static methods provide, as well as learn how you can create one the easy way with extension methods in Orcas.

posted on Thursday, May 17, 2007 9:46:22 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Monday, May 14, 2007

In my last post, I showed a nifty way of constructing "early-bound" delegates using LCG.  Here's the same helper class implemented without LCG:

public static class DelegateBinder {
	public static TDelegate Bind<TDelegate>(object firstArg, MethodInfo method) {
		return (TDelegate)Activator.CreateInstance(
			typeof(TDelegate),
			firstArg,
			method.MethodHandle.GetFunctionPointer());
	}
} 

This one is quite a bit simpler, and extrapolating from what we learned last time, it's easy to see what's happening.  Hopefully, you are already familiar with the Activator class.  Basically, this just shows the managed call chain that produces a function pointer to a method given a MethodInfo.

I really like the LCG-based implementation, but only because of my love of DynamicMethod.  It's pretty complex, and aside from opportunities for caching, doesn't really have anything over this implementation. This one is just plain simple, and would have a single-line implementation if I hadn't put some line breaks to avoid formatting problems.  It does, however, highlight the annoyingness of having to work around the compilers' "helpfulness" when it comes to delegate construction.  If only I could just call the constructor directly.

It is worth noting that this doesn't work in the Silverlight 1.1 alpha or the compact framework (or XNA for that matter), neither of which expose RuntimeMethodHandle.GetFunctionPointer().

posted on Monday, May 14, 2007 3:40:01 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
 Friday, May 11, 2007

In a previous post about delegates, I discussed the following interesting cases of delegates:

  • Closed static
  • Open instance

See the previous post for the full explanation, but these basically open up some interesting dynamic scenarios.  The problem is that C# and VB do not expose syntax for constructing these in an "early-bound" fashion, that is using the special constructor on the delegate type rather than Delegate.CreateDelegate (which more or less binds via reflection).

For most scenarios this is not a huge problem, but there are some performance considerations and other issues to consider that I don't really want to dig into at the moment.  One sufficiently important scenario is testing early-bound invocation.  If your language doesn't support something, how can you test it?  Well, you can write the whole test in IL, but that is not a terribly maintainable proposition.

Another option is to only write the part you need in IL.  Unfortunately, C# doesn't allow you to write inline IL, but you can use Reflection.Emit.  And, since v2.0, you can use LCG (Lightweight Code Generation) via DynamicMethod.

The trick here is to understand how delegates are instantiated.  Delegates are just classes like any other.  They inherit from MulticastDelegate (typically).  The special part is that the runtime provides all the implementation and they have a special constructor.  Here's (approximately) the constructor signature for System.Action<T>:

public Action(object o, IntPtr method)

Object? IntPtr?  What the heck? Well, it's not as bizarre as you might think.  The object is simply the first argument for the invocation.  This allows binding to a particular instance ("this" for instance methods, arg 0 for static methods). The IntPtr is a pointer to the method.  "Pointers?!!?!?! in managed code?!?!" you say?  That's right, a pointer.  An object is easy enough to come by, but where do I get the pointer?  Well, the pointer can be easily retrieved via the ldftn opcode.  It loads the address of a given method (described via a token in IL, and a MethodInfo in Reflection.Emit).

Lets cut to the chase.  Here's a little class that can bind a method to a delegate type and allow you to provide the first argument (you'll need System, System.Reflection, System.Reflection.Emit using statements):

public static class DelegateBinder {

    public delegate TDelegate Binder<TDelegate>(object firstArg);

    public static TDelegate Bind<TDelegate>(object firstArg, MethodInfo method) {
        DynamicMethod dynMethod = new DynamicMethod("PassthroughBinderImplementation", typeof(TDelegate), new Type[] { typeof(object) }, typeof(DelegateBinder));
        ILGenerator gen = dynMethod.GetILGenerator();
        //load the first argument
        gen.Emit(OpCodes.Ldarg_0);
        //load the address of the method
        gen.Emit(OpCodes.Ldftn, method);
        //create the delegate
        gen.Emit(OpCodes.Newobj, typeof(TDelegate).GetConstructor(new Type[] { typeof(object), typeof(IntPtr) }));
        gen.Emit(OpCodes.Ret);
        return ((Binder<TDelegate>)dynMethod.CreateDelegate(typeof(Binder<TDelegate>)))(firstArg);
    }
}

With this class, you can dynamically construct all the early-bound variants (ignoring variants for signature relaxation) like so:

using System;
using System.Reflection;
using System.Reflection.Emit;

public delegate string Passthrough(string str);
public delegate string BoundPassthrough();
public delegate string ProgramPassthrough(Program p);

public class Program {
    static void Main(string[] args) {
        Console.WriteLine("Open Static:");
        Passthrough ospt = DelegateBinder.Bind<Passthrough>(null, typeof(Program).GetMethod("StaticImplementation", new Type[] { typeof(string) }));
        Console.WriteLine(ospt("Hello World"));

        Console.WriteLine("Closed static:");
        BoundPassthrough cspt = DelegateBinder.Bind<BoundPassthrough>("Hello World", typeof(Program).GetMethod("StaticImplementation", new Type[] { typeof(string) }));
        Console.WriteLine(cspt());

        Console.WriteLine("Open Instance:");
        ProgramPassthrough oipt = DelegateBinder.Bind<ProgramPassthrough>(null, typeof(Program).GetMethod("InstanceImplementation", Type.EmptyTypes));
        Console.WriteLine(oipt(new Program("Hello World")));

        Console.WriteLine("Closed Instance:");
        BoundPassthrough cipt = DelegateBinder.Bind<BoundPassthrough>(new Program("Hello World"), typeof(Program).GetMethod("InstanceImplementation", Type.EmptyTypes));
        Console.WriteLine(cipt());
    }

    public static string StaticImplementation(string str) {
        return str;
    }

    public Program(string payload) {
        _Payload = payload;
    }

    string _Payload;

    public string InstanceImplementation() {
        return _Payload;
    }
}

So, there are certainly cases that will break this, most involving incompatible signature issues between the method, delegate, and the first argument.  But I didn't want to make things more complicated for an example. Besides, the point of this is not really to give you some neat tool (you'll probably never need to do this), but to give people a better idea what the compiler is doing for you when you create a delegate.

posted on Friday, May 11, 2007 2:54:23 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
 Tuesday, March 20, 2007

I've been toying with Orcas extension methods recently, and I came across a situation that could be problematic.  The problem involves value types.  Generally, the advice to people is to always make value types immutable.  That is, once a type is constructed, it's state cannot be changed. There are a number of reasons behind this, and now there's one more.

Extension methods allow you to make a static method look like an instance method on another class via the "this" keyword on the first parameter.  The compiler will then use that method if it is in scope when resolving methods in code. So, the first parameter to the method behaves roughly like the "this" pointer.  However, there is a subtle difference when extending value types in this way.

In a regular instance method, the "this" parameter is passed by reference, even for a value type. (In IL, you load an address onto the stack rather than the instance)  This allows you to change the state of the object within the method.  However, for extension methods, the first parameter is passed by value.

So, if you attempt to make changes to the state of a value type in an extension method on that type, the changes won't be reflected after the method completes.

Like most value type problems, this is because you've made changes to a copy of your instance rather than the "original".  So, keep you value types immutable, or be aware of all the various gotchas of mutable value types.

posted on Tuesday, March 20, 2007 4:22:25 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Friday, February 16, 2007

So, my last post talked about delegates.  In it, I mentioned some compiler trickery involved in declaring events, but I didn't bother explaining it.  After reading it over again, and getting some feedback, I felt bad about glossing over what is pretty much the mainline scenario for delegates. So, what is an event?

An event is kind of like a broadcast. It enables an object to notifiy subscribers when some "event" occurs, give them relevant information about the event, and allow them to do something in response.  And, you guessed it, delegates are at the core of making this work.

Fundamentally, events are a callback mechanism, and could be implemented without delegates using anything from raw function pointers to interfaces, and the CLR doesn't keep you from doing either of those, but there's value in a consistent pattern.  In fact, the designers of the CLR felt so strongly about the value of this particular pattern, that it is part of the CLI spec (along with properties, another pattern that is implemented by other more fundamental constructs).

So, how do you make an event?  Well, in C#, you declare an event like you would declare a field whose type is some delegate and you add the "event" keyword.  So, somewhere in a type, you would have something like:

public event EventHandler Click;

Whether or not it's public depends on how you expect the event to be used.  EventHandler is a delegate with the following signature:

void EventHandler(object sender, EventArgs e);

This signature is another pattern that I'll talk about later.  For now, lets look at what the compiler does for our event declaration.  The compiler gives 3 things (if you don't count the things it already did for the delegate EventHandler):

  • A private field whose type is the delegate EventHandler
  • A (public in this case) method "accessor" for adding delegate callbacks: add_Click //Click comes from the event name
  • A (public in this case) method "accessor" for removing delegate callbacks: remove_Click

When other code wants to hook up to your event, they use the += operator on your event.  This is really syntax sugar for calling the add_Click method.  And, conversely the -= operator calls the remove accessor.

Interestingly, you can write your own implementation for the event pattern.  You might want to do this to save size in a possibly large tree structure with lots of events at each node.  ASP.net does this with controls.  Rather than every Control having tons of fields for each event, it has a sparse dictionary of event delegates, that is only populated for events that have "subscribers".  With a tree that can easily have thousands of controls per page view, this results in a sizeable savings.  How do you do this?  Well, in C#, you use the little known syntax:

public event EventHandler Click {

add {/* do something with value in here */}

remove {/* do something with value in here */}

}

Looks like a property eh? This causes the compiler not to create the 3 things I mentioned above. Instead, it calls your add and remove accesors to do the adding and removing (via the value keyword just like properties).  In it, you can do anything you want, although it's advisable to keep the same semantics as the default implementations.

So, lets talk a little bit about what happens when an event happens and it is called.  Let's say that several other classes have registered for your event (via the += syntax or whatever the compiler supports).  Inside your class, you simply call the delegate (there's a recommended pattern for this as well).  But wait, there's more than 1 subscriber!  Remember, delegates aren't just function pointers, and they are more powerful than using interfaces alone.  If you'll recall in the last post, I said that when you create a delegate, you're really getting a MulticastDelegate, which tracks an invocation list of delegates to run. (this is why the standard event pattern returns void, otherwise, you've got the weird situation of multiple return values from what appears to be a single call).  Under normal circumstances, each delegate in the invocation list is called and execution resumes.

posted on Friday, February 16, 2007 10:36:02 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Thursday, February 15, 2007

Soon, my ownership area will extend to include delegates.  Since I'm fairly excited about this, I thought I'd celebrate by writing a little something about them.  So, what are delegates?  A casual observer might be tempted to write off delegates as a sort of managed function pointer.  While this comparison is certainly accurate, there's much more to explaining the power of delegates.

In general, delegates are a sort of universal method dispatch mechanism.  Initially, the scenario they supported was callbacks.  Delegates are one of the things that distinguish the CLR from other VMs like Java.  Java requires the use of interfaces to implement callbacks. (I'm only calling that out as a distinction, not saying the Java way is bad. although personally I like what delegates bring to the table)  So, delegates let you wrap up a method as an object to pass around, with the expectation that it will be called from another context.

Its sort of hard to talk about delegates because the discussion is often framed by the language that's exposing them.  Currently, no managed language exposes them in the way that they are represented in IL.  In C# and VB, you declare a delegate by simply defining a method signature.  From an IL perspective, the compilers generate a class that inherits from MulticastDelegate (another story I'll get to later), with an Invoke method that matches your signature, and some various constructors to support different things.  (You also usually get the corresponding asynchronous calling pattern support methods, but I don't want to get into that)  Some other delegate-related compiler trickery is involved in declaring events, which I'll cover later.

Under the covers, a delegate [conceptually] contains 2 things:

  • A target object
  • A target method

Now, generally speaking, the target method is the method to be run, and the target object is the object on which the target method will be run, but there are cases where this line is blurred a bit.  For instance, when a delegate is pointing to a static method, the target object is conceptually null (internally it's not, but that's an implementation detail).  I'll get into the other cases later.

So now you're saying, "Yup, that's a delegate.  Big deal.  What's so cool about that?"  What's cool about that, my friend is that delegates are the things that power virtually all of the coolest new language features that came out in v2.0 and will be coming out, including all the dynamic language goodness like IronPython.  It's the dynamic stuff that is really exciting, so let's talk about how delegates enable dynamic languages on top of a statically typed system.

(To be fair, Jim Hugunin did his initial Iron Python work before these features were available, but they now play a big role)  One of the pieces of work done in v2.0 was called delegate relaxation.  Previously, the target method had to match the delegate signature exactly.  Now, as you might expect intuitively, the signature can be relaxed such that the target method can have "more general" parameters and return something "more specific" than the delegate's signature.  This is typically defined in terms of covariance and contravariance, terms that even people who understand them get confused.  Here's the way I usually remember it: If I could wrap the target method with a method having the delegate's signature without casting, it will work. Anyway, this feature makes delegates quite a bit more flexible.

Before I go into the other features, lets talk a little about implementation. In normal method calling in the CLR, the first parameter becomes the "this" object.  (Which is why you see ldarg.0 in IL to put it on the stack.)  So, conceptually, the target object represents the first argument for the method. (There is an implementation detail that allows static methods to be called using the same convention, which is a really elegant solution) So, by extending this idea of the target object simply being the first argument, we get a couple of interesting variants.

The first is what is called "closed" static delegates.  This allows you to specify the first argument of a static method at delegate creation rather than at the callsite.  Notice this maps quite nicely to the dynamic language concept of adding a method to an existing instance of an object.  The language runtime just needs to be able to track these extra methods as part of its method dispatch logic.

The second feature is "open" instance delegates.  This allows you to create a delegate that points to an instance method, but doesn't define the target object.  Instead, the delegate signature can have an extra first argument that will specify the target object at the callsite.  When used with LCG (DynamicMethod), this can be used to implement things like adding a method to an existing type.  Again, the language runtime merely needs to add the logic to method dispatch.

These 2 features are intriguing to me because they are not directly exposed from VB or C#.  I believe VB9 exposes these, but they are not accessible in an early bound way in C#.  You can, however, create them via Delegate.CreateDelegate() using reflection, or use Reflection.Emit to generate the corresponding IL.

Hopefully, I'll have some time in the future to do some samples of these as well as discuss more about how these improve the dynamic language support in the CLR.

posted on Thursday, February 15, 2007 2:32:53 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
 Monday, January 29, 2007

I dealt with several situations in the past months where the crux of the problem was confusion over assemblyname and filename.  Let's define what we're talking about:

  • Filename - The name of a file in the filesystem, such as System.dll
  • Assemblyname - The name given to an assembly to establish its identity.  In this case, we'll only concern ourselves with the "simple" name. such as System

Usually, any confusion that arises between the two can be resolved by reminding people that a filesystem is just one of the places you can get an assembly from.  For instance there are APIs for getting assemblies from byte arrays.

For those that still do see it... In the managed world, the assemblyname gives identity to the code that resides in the assembly.  If you have 2 assemblies with the same assemblyname, you expect them to represent the same identity (perhaps different versions, build flavors, bitness, etc.).  If we relied on the filesystem name, the identity of the code could change just by changing the filename.  That's not the semantics we expect.

So, why does the filename matter?  Why do we recommend keeping them the same?  Some of the reasons are simple convenience.  It's nice to look at a file and know what it is without cracking it open.  If the names are different, it's like me going to a party and wearing a nametag that says, "Peter".  While there is nothing keeping me from doing it, it causes confusion.  However, another more important reason to keep them the same is that assemblies are rarely loaded by filename.  References and most dynamic loads are done by assemblyname.  You don't take a reference to System.dll, you take a reference to System.  At some point, the loader has to find an appropriate file to load to satisfy that reference.  If System's filename is Peter.dll, then it's going to have a difficult time finding it to load.  This is actually the very reason that gacutil will not let you install an assembly into the global assembly cache if the filename doesn't match the assemblyname.  However, I think it's silly that it doesn't just fix the name for you.

What about multi-module assemblies?  Well, it's the module with the assembly manifest that matters.  It's the one that should match.  Then the rest of the files need to match the assembly manifest :). But, if you're using multi-module assemblies, let me know.  I'd like to know why.

posted on Monday, January 29, 2007 12:46:56 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
 Monday, January 22, 2007

I've mentioned before that one of my ownership areas at MS is the CLR "shim".  Most people I've said that to ask me, "What's that?".  I usually reply that generally, it's mscoree.dll, to which they typically respond, "Oh yeah.  What does that do?"  In general terms, the shim is in charge of firing up the runtime in a process.  In addition, it exposes all the hosting APIs and other stuff you need to do stuff with the CLR from unmanaged code. If you look at a managed app, you will see that it has a dependency on mscoree.dll, and nothing else CLR-related.  Ater the runtime is spun up, most of the things that mscoree exposes are simply forwarded calls into the mscorwks.dll of the runtime you have installed.

What's interesting about mscoree.dll, is that it is the only piece of the runtime that doesn't run side by side.  You can have v1.1 and v2.0 installed on the machine, but you will only have one mscoree.dll.  You always have the version of mscoree.dll that corresponds to the latest version of the runtime installed on your machine (unless you have installed a patch or something that services mscoree, in which case you may have a v2.0 shim even though only v1.1 is installed on your machine).

So, naturally, backwards compatibility is extremely important in the Shim.  When you start up managed code, the Shim decides which version of the CLR to fire up based on lots of different things.  These things are all fairly well documented and all have a specific scenario they enable, but by their nature they are very confusing.

Aside: When I was job hunting, I interviewed at several companies other than Microsoft.  During some of those other interviews, I was asked questions about what runtime would be started under certain conditions. The rules are so confusing, that some of the interviewers, although all extremely smart people, had formed incorrect models of what the rules were.  Some told me I had the wrong answer to their question, when if fact it was correct. (that's not to say that I knew the correct answers to all of them.)

I'm not certain that I can clear up the confusion, but I do hope to have a series of posts in the coming months on why the shim does what it does under certain circumstances.  Then, at least you might understand what's going on when it happens.

posted on Monday, January 22, 2007 10:50:36 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Wednesday, January 03, 2007

Someone asked me the other day if you could reflect on other people's assemblies using the CLR.  My answer was, "ABSOLUTELY!!!!!"

As it turns out, they were having some problems achieving this, and they were wondering if there was some kind of security mechanism in place that was preventing reflection on 3rd party assemblies.  Here's the basic scenario.  They had a 3rd party library with a type that defined several "constants" using fields.  They needed to be able to specify a named constant via a string and return the value for the constant.  Security and performance arguments aside (they had already been considered), they simply wanted to lookup the field by name via reflection and get its value.  This can be accomplished via just a few fairly reasonable lines of code using one of the GetField family of methods on System.Type and then getting the value.  I'll leave this as an exercise for the reader so they can have the fun of wresting with the silly BindingFlags enum.

After some discussion, I learned that the trouble actually revolved around getting the Type object in the first place.  They were using Type.GetType to load the type with a namespace-qualified name as the string argument.  It was returning null (Nothing in VB).  They were validating that the string was correct using Intellisense, and concluding that since Intellisense could see the type, that Type.GetType should also "see" it (which seems like a perfectly reasonable assumption).

Type.GetType() takes a string argument specifying the type to retrieve.  When specifying types in strings, you can usually use a namespace qualified name ("[namespace.]type"), or an assembly-qualified name ("[namespace.]type, assemblyName").  If you don't provide the assembly name, the API looks through the assemblies already loaded in the AppDomain for a type that matches the name.  If an assembly name is specified, a bind occurs to the assembly and it is loaded if necessary.

In this case, there was a reference to the assembly containing the type, so Intellisense was picking it up and providing completion. The trouble was that, at runtime, the assembly had not yet been loaded into the AppDomain, so the type was unavailable.

So, the options were (in order of appropriateness in my opinion):

  • Use an early-bound type - Since the type was known at compile-time, use typeof() (GetType() in VB).  This will create a compile-time, assembly-qualified type reference in the IL rather than a runtime parsing/bind/load of the type string.
  • Use an assembly-qualified type string - Adding the assembly name to the type string will let the CLR know what assembly to look in (and load if necessary).  There are some subtle versioning issues with this approach, especially for strongly-named assemblies.
  • Make sure the assembly is loaded prior to calling Type.GetType() - Making an early-bound call to something else in the assembly first will get it loaded into the AppDomain.  This seems like a fragile solution and I would not recommend it, although it will technically work.

The real issue here is the number of samples (especially in VB) provided by MS that use Type.GetType() with a non-qualified name (ex. "System.String").

posted on Wednesday, January 03, 2007 9:09:25 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
 Tuesday, November 14, 2006

Something that always bothered me at my previous job was having to install the framework SDK to get a copy of gacutil.exe.  I know the guys still there hate having to install the SDK on a server so they can manipulate the GAC.  Richard Lander gives some interesting information on the topic, but he doesn't go into why it's not included in the redist.

Since I've been at MS, I've learned a great deal more about the Global Assembly Cache (GAC) and the fusion APIs. Last night, I was taking a shower after cutting my hair, and the reason came to me.  That reason is... installers, or more importantly... uninstallers.

During a discussion recently, I heard an amazingly profound saying:

"It is better to fail to do, than to fail to undo"

I don't recall who said it, and they probably got it from someone else, but it is right on the money.

When programs uninstall, they have to correctly remove things they've placed in the GAC.  Let's make up an example.  Let's say I have some software company.  We've developed a magical managed library that makes it wicked easy to develop our software, so we use it in all our products.  Let's say that in our deployment model, it makes sense to deploy that library to the GAC.  So, uninstalling our software should remove it from the GAC, right?  We'll what if one of our other products is on the machine? We don't want to uninstall one and break the remaining one.

When you install an assembly into the GAC via the Fusion APIs, you do so with a traced reference.  That reference tells Fusion "who" installed it.  If 2 installers install the same assembly, it's smart enough to know not to remove the assembly until both uninstall.

GACUtil, as a management tool, enables you to  use traced references as well, but it also allows you to install without a traced references.  It also allows you to force uninstalls and do lots of other screwy things.  In other words, the tool is too powerful.  Devs need to be able to do screwy things.  Administrators need to be able to do screwy things.  Regular users don't.  Give them an install package that handles everything.  Otherwise, you're bound to have a support nightmare.

If I have time, or enough requests, we'll go into how to use the fusion APIs directly to manage the GAC.

posted on Tuesday, November 14, 2006 9:10:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
 Monday, November 13, 2006

Several people have complained that I haven't made any entries in a while.  There are several reasons for this.  Aside from being generally busy, I'm signed up to do some posts about the CLR in general, and some from my ownership areas as part of my job.  This has resulted in several "half-baked" entries ranging from hosting the runtime to the Global Assembly Cache, to some other more general CLR-related entries.

In addition, MS has had several new products in the queue that were on the verge of releasing, and I've had entries waiting for them to release.  Of course, being on the safe side, I've waited for someone else to blog first, at which point an extra post from me doesn't make much difference.

And thirdly, Jenna is growing like a weed and it's hard to find time to sit down and formulate posts when you're chasing her around the house.

So, regarding technical content.  I'd like to get an idea of what people are interested in hearing about. So shoot me an email or leave a comment with suggestions for posts regarding the CLR in general or specifically within my areas of ownership:

  • The unmanaged hosting API's (CorBindToRuntime, etc.). These are what you'd use to host the CLR in your own app in order to more tightly control things, or provide additional isolation or escalation policies.
  • The global assembly cache
  • "automatic" CLR activation - what I mean by this is what happens when the runtime is spun up my a managed app, or via COM interop.  Things like how to decide which runtime to use, etc.

In addition, I'll try to keep the personal updates coming for those who are not looking for just technical content.

posted on Monday, November 13, 2006 12:49:21 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]