# Friday, March 12, 2010

This post is intended to fill a gap in the current MSDN documentation for this attribute (http://msdn.microsoft.com/en-us/library/bbx34a2h(VS.100).aspx).  This gap should be filled by the time .NET 4 ships.

There is alot of confusion about what the useLegacyV2RuntimeActivationPolicy attribute does.  Most often, it is used to allow a pre-v4 mixed-mode assembly to load in v4.  In that context, the name makes very little sense.  Below is an explanation I’ve provided to people internally that explains the attribute in the context for which it was named.  This should give people a better idea of what it does, as well as understand some of the subtlies of in-proc SxS.

Ultimately, this attribute has to do with the behavior of the “legacy shim APIs”.  You can think of these as encompassing several categories of CLR activation:

  • CorBindToRuntimeEx and friends - This includes most of the flat exports of mscoree.dll defined in mscoree.h (GetCORSystemDirectory, GetCORVersion, LoadLibraryShim, etc). Note, this also includes the strong name APIs defined in strongname.h)
  • Pre-v4 COM activation – This includes CoCreateInstance of a CLSID (or type identifier) whose latest registration is against a pre-v4 runtime version. Note this includes both the “new” operator on such a co-class from managed code, or the result of Activator.CreateInstance against a type created by Type.GetTypeFromCLSID on such a CLSID.
  • Pre-v4 IJW (mixed mode) activation – For example, calling into a native export on such an assembly
  • Native activation of a native runtime-provided COM CLSID – Such as CoCreateInstance on ICLRRuntimeHost’s CLSID
  • Native activation of a managed framework CLSID – Such as CoCreateInstance on System.ArrayList’s CLSID (extremely rare)

All these have a “single runtime per process” view of the world, so we try to make those codepaths believe they still exist in that world by “unifying” the version that they see.  After a given version has been chosen by one of these codepaths, that’s the version that all of them see for the remainder of the process lifetime.  Additionally, all of these activation paths had some kind of roll-forward semantics associated with them.  We “cap” those semantics at v2, meaning by default none of these codepaths see v4 at all.  This allows us to claim that installing v4 is “non-impactful”.  It should not change the behavior of existing components when installed. (Note that this has the interesting side-effect of a v4 only machine appearing to have no runtimes installed at all via these codepaths.)

This is all well and good until someone WANTS those codepaths to see v4.  Rolling a v2 managed app forward to v4 using a config (without the attribute) works just fine, unless that app also expects interaction with these “legacy” codepaths to be associated with the current runtime (v4).  For instance, a p/Invoke to GetCorSystemDirectory in order to construct a path to Fusion.dll (please don’t do that, BTW) will give you v2’s fusion.dll.  COM activation of a managed COM object will prefer the runtime it was built against rather than load into the current runtime (meaning you may be dealing with interop rather than a concrete CLR type). That may work, and it may not, depending on what you’re doing.

The useLegacyV2RuntimeActivationPolicy attribute basically lets you say, “I have some dependencies on the legacy shim APIs.  Please make them work the way they used to with respect to the chosen runtime.”  In that context, hopefully the name makes more sense to you. It is *mostly* equivalent to calling CorBindToRuntimeEx using the full version string for v4.  We also have a method in our new shim APIs to do this programmatically, the difference being that in a config file, it can be done declaratively, which is useful for a host that uses config files to determine which runtime to load plugins into. (the attributes value (or lack of value) is conveyed back to a host via the pdwConfigFlags parameter of ICLRMetaHostPolicy::GetRequestedRuntime)

One of the big reasons people need to do this is if they have a dependency on a pre-v4 IJW assembly.  By default, we can’t allow those to load into v4*.  Putting this attribute in your config allows this to happen.

Why don’t we make this the default behavior? You might argue that this behavior is more compatible, and makes porting code from previous versions much easier. If you’ll recall, this can’t be the default behavior because it would make installation of v4 impactful, which can break existing apps installed on your machine.

Well, why don’t we make this the default behavior for v4 managed apps?  Well, that is precisely the behavior we had for beta 1.  As we started trying to explain the behavior to people, we found it was very difficult to explain how these legacy codepaths worked.  We ultimately decided that making the behavior consistent was better.  The example that ultimately convinced me we had made the right choice was that the behavior of a library would change based on whether it was hosted by a native process or a managed one.  That seemed really bad to me.

You might say, “Why shouldn’t I just set this for every app I have?” Well, the downside of this attribute is that it turns off in-proc SxS with pre-v4 runtimes.  It locks them out of the process.  This may not matter to your scenario.  If you look at some of the runtime tools, they are using this attribute.  Even Visual Studio uses this attribute.  Don’t just blindly use it though.  If you' find yourself needing it in something other than a migration aid, or for loading pre-v4 mixed-mode assemblies (which we hope becomes more rare moving forward as people start updating the interesting mixed-mode binaries out there), I’d like to know about it.  Leave me a comment!

Hopefully, you’ve got a better handle on exactly what this attribute means, and can make a more informed decision about when it is appropriate to use.

*There are many engineering challenges around in-proc SxS and IJW assemblies.  Currently, pre-v4 IJW assemblies can only load into the runtime that is associated with the “legacy shim APIs”.  But any given IJW assembly (regardless of version) may only be loaded into a single runtime per process at this time.

posted on Friday, March 12, 2010 7:14:59 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Thursday, November 20, 2008

My brother-in-law posted a note on Facebook that was basically one of those silly things you do and perpetuate across the internet.  I usually don’t take part in such things, but this one seemed interesting, and I’ve been thinking about ways to jumpstart my blogging again now that the embargo on all the cool stuff is lifted.  So, I thought I would do it.  Here are the rules:

  • Grab the book nearest you. Right now.
  • Turn to page 56.
  • Find the fifth sentence.
  • Post that sentence along with these instructions in a note to your wall. (this was on Facebook, so it is referring to that wall)
  • Don't dig for your favorite book, the coolest, the most intellectual. Use the CLOSEST.

So, it took me a while to determine which book was the “closest”, as my position is roughly normal to the bookshelf in my office.  I finally decided to be honest and pick the one that was really closest, but I will share another that was very close, as it is a good segue into future blogs.

The first (and official) one:

Semiconductor materials at 0 K have basically the same structure as insulators – a filled valence band separated from an empty conduction band by a band gap containing no allowed energy states (Fig. 3-4).

Solid State Electronic Devices, Ben G. Streetman

The second, and more relevant/interesting one:

The shim’s algorithm for picking a version in the COM interoperability scenario is much more straightforward – the latest version installed on the machine is always used.

Customizing the Microsoft .NET Framework Common Language Runtime, Steven Pratschner

What is amazing about this second one is that this is directly related to one of the features I’ve been working on for CLR v4 (and yes, that is actually the 5th sentence on page 56).  Namely, this feature is known as “in-process side by side” (or in-proc SxS for short), and was announced publicly at PDC last month.  This feature allows you to have more than version of the CLR loaded and running in a single process.

This feature is primarily a compatibility feature, targeted precisely at the behavior noted in the quote above.  When we use the latest version, we can break existing COM objects.  Not only because of breaking changes we make (of which the number is fairly small), but because of other, more subtle behavior dependencies.

Previously, loading a CLR version into the process locked the process to that CLR version.  Any other policy than “pick the latest” results in a load order dependency problem that can result in “for sure” breaks because COM components targeting newer runtimes cannot run on old runtimes.  So, clearly, that was the best choice of policies.

Now that we support multiple runtimes in the process (v2 and above), we can make a smarter, more compatible choice about runtime activation.  The precise policies are still being worked through, so I’ll avoid stating them explicitly, but you can imagine us being able to make a much better choice about what runtime to activate to run a given managed COM component.

I’ll be posting more about this feature and it’s implications soon.  Feel free to seed my future posts with questions in the comments.  Hopefully, this is the jumpstart I needed.  As for the “game” above, feel free to do it, or ignore it.  It won’t result in any difference to your luck, financial situation, or anything else.

posted on Thursday, November 20, 2008 5:25:12 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Wednesday, April 23, 2008

I've been doing some app-building with Silverlight lately and exploring the limitations of the platform in comparison to the full desktop CLR and what that means for the Silverlight "ecosystem".  Those limitations can be summed up with 3 items:

  1. Strict sandbox security model
  2. Reduced managed framework surface area
  3. No binary compatibility with libraries targeting the desktop CLR (eliminates the number of 3rd party components you can leverage).

I think #3 will begin to become a non-issue as more component providers provide builds for Silverlight. #2 can be broken down into 2 areas:

  • Full technology areas that are unavailable on Silverlight (LinqToSql, WinForms, etc)
  • Reduced/pruned APIs

The second is where my interest lies, and intersects with #1 (the new security model).  We've done extensive threat modeling against our APIs as well as automated tooling that has either removed (or made internal) certain APIs, or marked them as SecurityCritical, meaning they cannot be called directly from "user" code.

In addition, the security model requires safe, verifiable user code.  Normally when developing on the desktop CLR, unless you are specifically targeting a low trust environment, you can do whatever you like.

So, for this exercise, I pulled out my trusty STDF parser (blogged here and here).  I've used this project as a test vehicle for both v2.0 and Orcas, and it's served well as a project that leverages a large cross-section of features in the Framework, from high-level stuff like Linq down to expression tree inspection and further down to LCG/RefEmit.

In short, I was able to get the parser working without too much trouble. I felt like creating a source bed that intended to target both Silverlight and the Desktop should be an attainable goal.  You just have to switch your mindset from binary compatibility to source compatibility.

I combated the reduced surface area with extension methods, which worked quite well to centralize the "overload shims" that needed to be "Silverlight-only", as well as for a general refactoring tool.  My goal is to have all the desktop vs. Silverlight differences centralized into files that are either included or excluded from the build depending on which platform I'm targeting.  I wish you could create extension properties.  That would let me close all the surface area discrepancies that aren't caused by missing/irrelevant technology areas.

I was pleased that 99% of my LCG codegen stuff "just worked".  I make heavy use of Reflection.Emit via DynamicMethod to generate my record parsers based on attributes on the record classes.  The 2 problems I ran into were:

  • Visibility restrictions - The new security model won't let my dynamic methods see internals.  I had used this ability to keep the API clean.  I'm still figuring out the best approach, but it was simple enough just to expose those methods.
  • Verifiability - I had a few places where I was generating unverifiable code.  Some of these were my own codegen bugs, but others were just bad assumptions on my part.

This brings me to constrained callvirt, an interesting little IL tidbit I discovered in the porting process.  Calling conventions are subtly different between reference types and value types, and it also depends on whether the given method is actually overridden in the value type (which can lead to confusing breaks when that state changes).  In the v1.x days, you always knew the type you were dealing with, so it didn't matter that much and you could generally always create the right sequence of IL to make a call.  V2.0 introduced generics, which meant that you couldn't emit unified IL for both the possibility of reference types or value types.  This meant there needed to be a way to unify the IL for the 2 cases.  This is where the constrained prefix came in.  It allows you to write unified IL that works regardless of whether you're working with a value type or a reference type (think generics constrained by an interface, or calling a method defined on System.Object like ToString()).

Anyway, I was able to fix my unverifiable code by utilizing the constrained prefix.  It also simplified my codegen logic significantly in a number of places where I had different paths based on whether I was working with a value type or not.

All in all, I was pleased with the results.  I'll be posting a sample Silverlight app using the library when I get some UI stuff figured out.

posted on Wednesday, April 23, 2008 1:43:46 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Tuesday, February 12, 2008

This is one of those entries that attempts to fill a void in online search for a particular topic.  I ramble on for a while to give enough context so that a search engine can match it up in a relevant manner.

I was debugging something the other day, and thought I had come across a heinous bug in the CLR.  Turns out, everything was working fine and the bug was in the app.

A program was crashing and it was a managed exception, so I attached to it with VS2008 and dug in.  The first thing I noted was that this was a retail build, so my stack was collapsed in a number of places... expected, keep moving.  Then, I noted an interesting frame on my stack (this is a contrived example, not the actual thing I saw):

>    CanonTest.exe!CanonTest.GenericType<System.__Canon,int>.SomeOperation() + 0x55 bytes   

That's weird, what is System.__Canon and what's it doing here? Surely this must be a horrible CLR bug where my method tables are being corrupted!  Internet searches seem to be confirming my thoughts.  A few others with random __Canon's showing up on the stack do look like bugs.  After a bit more reasoning, I come to the conclusion that System.__Canon must be special in some way since it follows the pattern for such things...  marked internal, pre-pended with double-underscore.

A few emails later, I had my answer.  System.__Canon is the special type that is used to identify "canonical" generic type instantiations. It typically only shows up in release builds, so you don't normally see it on the stack while debugging.  So, it's easy to assume something's wrong when you do see it.

If you'll recall, one of the really cool things about the Generics implementation is that it allows for code sharing.  Jitted methods can be shared between compatible type instantiations.  For instance, the code for SomeGenericType<string> can likely be shared with SomeGenericType<Foo> (where Foo is another reference type. We don't currently share code between value types, so you'll continue to see those on the stack such as in the example).  That shared code needs to live somewhere, so we have the concept of a "canonical" generic type instantiation that is identified by __Canon as the type parameter.

Also, in alot of debugger stack representations you'll get the GenericType`2 form rather than expanded form.  In that case, you never see __Canon.

For some more info on Generics and code sharing, see Joel Pobar's excellent blog entry on the subject.

posted on Tuesday, February 12, 2008 11:56:35 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, January 03, 2008

The CTP for the MVC framework includes support for master-page, page, and user-control based views.  I thought it might be interesting to enable .ashx-based views for things like RSS generation via System.Xml.Linq, or other more "raw" view output.

As it turns out, this is fairly trivial.  The place we need to extend is the IViewFactory returned by Controller.ViewFactory.  This is the component that is responsible for creating the view when a call to RenderView is made.

The default view factory is the WebFormViewFactory, which knows how to generate views based on .master, .aspx, and .ascx views.  Since we want to add support for .ashx, we'll use WebFormViewFactory as a starting place.  We'll inherit from WebFormViewFactory and override CreateView to supply our extra .ashx lookup.

using System;
using System.Globalization;
using System.IO;
using System.Web;
using System.Web.Compilation;
using System.Web.Mvc;

public class SpecialViewFactory : WebFormViewFactory {

    static readonly string[] ViewLocationFormats = new string[] { "~/Views/{1}/{0}.ashx", "~/Views/Shared/{0}.ashx" };

    ControllerContext _ControllerContext;

    #region IViewFactory Members

    protected override IView CreateView(ControllerContext controllerContext, string viewName, string masterName, object viewData) {
        _ControllerContext = controllerContext;
        //check to see if there is an ashx that matches here.
        object value = null;
        controllerContext.RouteData.Values.TryGetValue("controller", out value);
        string controllerName = value as string;
        if (controllerName == null) {
            throw new InvalidOperationException("No route data value available for controller.");
        }

        Type viewType = null;
        foreach (var loc in ViewLocationFormats) {
            var path = string.Format(CultureInfo.InvariantCulture, loc, viewName, controllerName);
            viewType = GetCompiledType(path);
            if (viewType != null) break;
        }
        if (viewType == null) {
            return base.CreateView(controllerContext, viewName, masterName, viewData);
        }

        if (!typeof(IView).IsAssignableFrom(viewType)) {
            //TODO: better exception
            throw new InvalidOperationException("Type not a view");
        }
        var view = (IView)Activator.CreateInstance(viewType);
        var viewHandler = view as ViewHandler;
        if (viewHandler != null) viewHandler.ViewData = viewData;

        _ControllerContext = null;
        return view;
    }

    private Type GetCompiledType(string path) {
        Type compiledType = null;
        try {
            if (File.Exists(_ControllerContext.HttpContext.Request.MapPath(path))) {
                compiledType = BuildManager.GetCompiledType(path);
            }
        }
        catch (HttpCompileException) {
            throw;
        }
        catch (HttpParseException) {
            throw;
        }
        catch (HttpException) {
        }
        return compiledType;
    }

    #endregion
}

GetCompiledType had to be replicated as it isn't exposed in the base class.  Note, I added a File.Exists check before I attempt to get the compiled type from the BuildManager. This was really to avoid having to deal with a bunch of first chance exceptions in the debug, although it seems likely that avoiding the exception is a good thing.  It wouldn't catch handlers that are mapped in the app dynamically or via web.config.

As you can see, I also added a ViewHandler class that my handlers can inherit from that gives them the same goodies that the other views get, but I'll leave that as an exercise for the reader to implement.

So, now the only thing remaining is to inject our special view factory into the pipeline instead of the default.  A simple way to do this is to set the ViewFactory property in the constructor of any controller that needs .ashx support. Now, you can create .ashx files and use them as views!

Next time, I'll show you how to add support for routing controller actions based on data not in the URL.

posted on Thursday, January 03, 2008 3:46:58 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Wednesday, August 29, 2007

I wish I had time to come up with more concrete information (examples/code) in this post, but I don't have the time to work that stuff up.  I did think it would be useful for people searching for solutions to this problem, so here it is in all its ambiguity.

I was playing with expression tree inspection and dynamic interpretation the other day, when I hit something that I was sure was a bug.  I was inspecting an expression tree, identifying "branches" of interest, and generating lamdba expressions from them on the fly.

You might do this to break up an expression into separate units of execution to spread across multiple processors (a la PLINQ), or to replace parts of a tree requiring local execution before passing off to another layer to be transformed into another domain like SQL, or whatever.  In any case, I was doing it.

I found that if the type of the expression was a value type, I could not create a lambda expression returning object from it, even though there is an inheritance relationship.  You get a fairly straightforward, but perhaps surprising exception.

After some back and forth with the Linq team, I discovered that this was by design.  In the case of value types, the boxing operation required to make an object must be represented by a unary convert expression. The solution is to wrap such expression trees with a call to Expression.Convert(expression, typeof(object)).

posted on Wednesday, August 29, 2007 5:17:11 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Thursday, July 26, 2007

An interesting pattern I've seen emerge since the early releases of Orcas is what I might refer to as "delegate properties".  What I mean by that is a property (or field, I suppose) that returns a delegate.  This pattern has some interesting implications.

First, in a language that treats delegates as directly callable objects, this pattern looks just like a method (someInstance.TheProperty(args)).  You can't tell the difference (although VS gives you different intellisense) by looking at a callsite like this. Among other things, this leads to some interesting naming issues.  Do you name it like you would a method?

Second, it opens up opportunities to do some really powerful (and slightly insane) hybrid inheritance models.  Think about a virtual delegate property that has both a getter and a setter, now think about trying to predict what that delegate will do when you call it.  It doesn't sound like something to recommend as part of a public API, but I think there are some interesting scenarios there.

If I come up with something interesting and useful, I'll let you know.

posted on Thursday, July 26, 2007 1:56:27 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Monday, July 16, 2007

If you have been following my series on delegates, you may have experimented with open-instance delegates and perhaps found it difficult to create an open-instance delegate for a value type.

If you'll recall, an open-instance delegate has an extra first parameter, used to pass the instance used for the invocation.  What's not made explicitly clear is that this first parameter must be passed by reference.

For reference types, you've automatically got a reference, but for value types, this must be a "ref" parameter.  For instance, a delegate type used as an open-instance delegate for Int32.CompareTo would have to be defined something like:

delegate int IntCompareToDelegate(ref int instance, int other);

Otherwise, you'll get a System.ArgumentException when you try to bind the method to the delegate, giving you the ever-helpful error message: "Error binding to target method".

There are lots of underlying reasons for this, both from a calling convention perspective, as well as a side-effect perspective.  But, you can simplify it by thinking about modifications to the instance.  If you passed by value (creating a copy that the method acted on), any changes made to the instance by the method would be lost because they happened to a copy.

In most cases, value types are immutable in the framework, but you could run into issues with your own types.  And, again, this isn't the only reason for this restriction (take a look at the IL generated for a value-type method call to get some more ideas).  It's just the easiest to understand.

If you'll recall, Orcas extension methods, which are similar in concept to this, do not follow this pattern and are subject to the infamous value type copying problems.

posted on Monday, July 16, 2007 12:24:58 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [2]
# Wednesday, June 13, 2007

I was helping a friend with a problem recently.  He was taking a set of serial web service calls and doing them in parallel to save time, and was not up-to-speed on the best approach for that.  Once he settled on an approach, he realized that since his web service calls were being wrapped in an abstraction layer, he didn't have the Begin/End asynchronous call methods that are provided by the proxy class.

"No problem, just wrap them in a delegate".  The compiler automatically gives you Begin/EndInvoke methods in addition to the synchronous Invoke method.  And, you're guaranteed not to mess up the implementation because it's all provided by the CLR!  Just one of those things you might forget if you find yourself in the same situation.

posted on Wednesday, June 13, 2007 4:37:55 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [2]
# Thursday, May 17, 2007

After my last few CLR posts, I've had a couple of private inquiries regarding the usefulness of closed static delegates.  To bring everyone up to speed, a delegate pointing to an instance method needs a "target" instance to operate on (we'll get to open instance delegates later).  A static method, needs no such target, so we can leverage the "space" used for the instance case to carry around another object of interest.  We call a delegate with a provided value for this space "closed over the first argument".

For example, let's say we have a static method that does some operation on two numbers.  For simplicity, let's just say it adds them.  Our silly class and method might look like this:

public static class NumberFunctions {
   public static double Add(double first, double second) {
      return first + second;
   }
}

Normally, a delegate for this method would look like:

public delegate double BinaryOperation(double first, double second);

But, we're going to create a closed static delegate, which means we're going to "burn" the first argument into the delegate itself, so it's not needed in the delegate signature.  Instead, we'll use the following delegate signature (I didn't spend much time thinking up these names, I hope they make sense:

public delegate double ClosedCall(double other);

So, how do we create the delegate?  Normally, since C# (pre-Orcas) doesn't have syntax for creating closed static delegates, you are forced to use one of the Delegate.CreateDelegate overloads:

ClosedCall addToOne = (ClosedCall)Delegate.CreateDelegate(
        typeof(ClosedCall),
        1.0,
        typeof(NumberFunctions).GetMethod("Add", BindingFlags.Public | BindingFlags.Static));

Of course, we just spent 2 entries looking at a helper that can do this for us (I'm not claiming this is better, I just want you to be able to see what's happening):

ClosedCall addToOne = DelegateBinder.Bind<ClosedCall>(1.0,
        typeof(NumberFunctions).GetMethod("Add", BindingFlags.Public | BindingFlags.Static));

Now, a call to addToOne(someNumber) will yield the result of adding the supplied argument to one.  This is a contrived example, but you could imagine taking a method (perhaps generated on the fly via LCG), and "attaching" an instance to it via the first argument.  Then, being able to call it many times with different subsequent arguments, or passing it to another component that would provide the rest of the arguments.  In this way, you get the benefits of not having to keep track of an instance, without having to own the API for the instance.  Additionally, you could "chain" delegates together so that many arguments are captured in a stack of delegate calls, allowing closure-type semantics at the cost of some stack space (although since C# has closure support, you'd never really need to do that).

What's really cool is that with C# 3.0's Extension Methods feature, we now have language support for creating early-bound closed-static delegates.  If you bind a delegate to an extension method (using the regular syntax for an instance method), you will get the exact IL for creating an early-bound closed static method without our fancy helper class.  Let's see how that would look.  Let's use a different example to keep us on our toes.  Here's a helper function that creates email addresses:

public static class StringExtensions {
   public static string MakeEmailAddressWithAlias(this string domain, string alias) {
      return string.Format("{0}@{1}", alias, domain);
   }
}

Notice the "this" in front of the first parameter, this tells the compiler that the method should be considered when resolving method calls for string.  We'll use one of the delegate types provided in Orcas. Now, here's how the bind looks:

string fooDotCom = "foo.com";
Func<string, string> makeFooDotComAddress = fooDotCom.MakeEmailAddressWithAlias;

string email = makeFooDotComAddress("bar");

So, the result is that email will be bar@foo.com.

Hopefully, through these contrived examples, you can see the scenarios that closed static methods provide, as well as learn how you can create one the easy way with extension methods in Orcas.

posted on Thursday, May 17, 2007 10:46:22 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Monday, May 14, 2007

In my last post, I showed a nifty way of constructing "early-bound" delegates using LCG.  Here's the same helper class implemented without LCG:

public static class DelegateBinder {
	public static TDelegate Bind<TDelegate>(object firstArg, MethodInfo method) {
		return (TDelegate)Activator.CreateInstance(
			typeof(TDelegate),
			firstArg,
			method.MethodHandle.GetFunctionPointer());
	}
} 

This one is quite a bit simpler, and extrapolating from what we learned last time, it's easy to see what's happening.  Hopefully, you are already familiar with the Activator class.  Basically, this just shows the managed call chain that produces a function pointer to a method given a MethodInfo.

I really like the LCG-based implementation, but only because of my love of DynamicMethod.  It's pretty complex, and aside from opportunities for caching, doesn't really have anything over this implementation. This one is just plain simple, and would have a single-line implementation if I hadn't put some line breaks to avoid formatting problems.  It does, however, highlight the annoyingness of having to work around the compilers' "helpfulness" when it comes to delegate construction.  If only I could just call the constructor directly.

It is worth noting that this doesn't work in the Silverlight 1.1 alpha or the compact framework (or XNA for that matter), neither of which expose RuntimeMethodHandle.GetFunctionPointer().

posted on Monday, May 14, 2007 4:40:01 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [2]
# Friday, May 11, 2007

In a previous post about delegates, I discussed the following interesting cases of delegates:

  • Closed static
  • Open instance

See the previous post for the full explanation, but these basically open up some interesting dynamic scenarios.  The problem is that C# and VB do not expose syntax for constructing these in an "early-bound" fashion, that is using the special constructor on the delegate type rather than Delegate.CreateDelegate (which more or less binds via reflection).

For most scenarios this is not a huge problem, but there are some performance considerations and other issues to consider that I don't really want to dig into at the moment.  One sufficiently important scenario is testing early-bound invocation.  If your language doesn't support something, how can you test it?  Well, you can write the whole test in IL, but that is not a terribly maintainable proposition.

Another option is to only write the part you need in IL.  Unfortunately, C# doesn't allow you to write inline IL, but you can use Reflection.Emit.  And, since v2.0, you can use LCG (Lightweight Code Generation) via DynamicMethod.

The trick here is to understand how delegates are instantiated.  Delegates are just classes like any other.  They inherit from MulticastDelegate (typically).  The special part is that the runtime provides all the implementation and they have a special constructor.  Here's (approximately) the constructor signature for System.Action<T>:

public Action(object o, IntPtr method)

Object? IntPtr?  What the heck? Well, it's not as bizarre as you might think.  The object is simply the first argument for the invocation.  This allows binding to a particular instance ("this" for instance methods, arg 0 for static methods). The IntPtr is a pointer to the method.  "Pointers?!!?!?! in managed code?!?!" you say?  That's right, a pointer.  An object is easy enough to come by, but where do I get the pointer?  Well, the pointer can be easily retrieved via the ldftn opcode.  It loads the address of a given method (described via a token in IL, and a MethodInfo in Reflection.Emit).

Lets cut to the chase.  Here's a little class that can bind a method to a delegate type and allow you to provide the first argument (you'll need System, System.Reflection, System.Reflection.Emit using statements):

public static class DelegateBinder {

    public delegate TDelegate Binder<TDelegate>(object firstArg);

    public static TDelegate Bind<TDelegate>(object firstArg, MethodInfo method) {
        DynamicMethod dynMethod = new DynamicMethod("PassthroughBinderImplementation", typeof(TDelegate), new Type[] { typeof(object) }, typeof(DelegateBinder));
        ILGenerator gen = dynMethod.GetILGenerator();
        //load the first argument
        gen.Emit(OpCodes.Ldarg_0);
        //load the address of the method
        gen.Emit(OpCodes.Ldftn, method);
        //create the delegate
        gen.Emit(OpCodes.Newobj, typeof(TDelegate).GetConstructor(new Type[] { typeof(object), typeof(IntPtr) }));
        gen.Emit(OpCodes.Ret);
        return ((Binder<TDelegate>)dynMethod.CreateDelegate(typeof(Binder<TDelegate>)))(firstArg);
    }
}

With this class, you can dynamically construct all the early-bound variants (ignoring variants for signature relaxation) like so:

using System;
using System.Reflection;
using System.Reflection.Emit;

public delegate string Passthrough(string str);
public delegate string BoundPassthrough();
public delegate string ProgramPassthrough(Program p);

public class Program {
    static void Main(string[] args) {
        Console.WriteLine("Open Static:");
        Passthrough ospt = DelegateBinder.Bind<Passthrough>(null, typeof(Program).GetMethod("StaticImplementation", new Type[] { typeof(string) }));
        Console.WriteLine(ospt("Hello World"));

        Console.WriteLine("Closed static:");
        BoundPassthrough cspt = DelegateBinder.Bind<BoundPassthrough>("Hello World", typeof(Program).GetMethod("StaticImplementation", new Type[] { typeof(string) }));
        Console.WriteLine(cspt());

        Console.WriteLine("Open Instance:");
        ProgramPassthrough oipt = DelegateBinder.Bind<ProgramPassthrough>(null, typeof(Program).GetMethod("InstanceImplementation", Type.EmptyTypes));
        Console.WriteLine(oipt(new Program("Hello World")));

        Console.WriteLine("Closed Instance:");
        BoundPassthrough cipt = DelegateBinder.Bind<BoundPassthrough>(new Program("Hello World"), typeof(Program).GetMethod("InstanceImplementation", Type.EmptyTypes));
        Console.WriteLine(cipt());
    }

    public static string StaticImplementation(string str) {
        return str;
    }

    public Program(string payload) {
        _Payload = payload;
    }

    string _Payload;

    public string InstanceImplementation() {
        return _Payload;
    }
}

So, there are certainly cases that will break this, most involving incompatible signature issues between the method, delegate, and the first argument.  But I didn't want to make things more complicated for an example. Besides, the point of this is not really to give you some neat tool (you'll probably never need to do this), but to give people a better idea what the compiler is doing for you when you create a delegate.

posted on Friday, May 11, 2007 3:54:23 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [1]
# Wednesday, February 21, 2007

A quickie here. I had written some code in javascript to simplify testing of a fairly dense, multi-dimensional feature area.  I could define the dimensions I wanted to test declaratively in a more or less JSON fashion and then bang them together via some enumerable-like extensions that I had written.  Viola, full test matrix implemented.

Unfortunately, I learned that I would be unable to take a dependency on the scripting host at run-time.  Too bad I had already invested the time in the solution, and I would now have to code the matrix myself.  But wait, this is a dynamic language I'm dealing with!  So, in a short time, I did some object substitution, and now my javascript is a precompiler that emits C# and other support files that I then compile to do the same thing.

Nikhil has a more general tool that goes the other way.  It compiles C# into javascript for use in AJAX apps.  While this is a handy way for people unfamiliar with the power of dynamic languages to jump into the AJAX world, I'm wondering if it's really a good idea in the long term.  On the one hand, reducing the number of language dependencies in a project is good for maintainability, but choosing the less-flexible one seems like the wrong choice.  Of course javascript has no formal set of class libraries, so that's a limitation.  Hmmm.  I'll have to think about this some more.

posted on Wednesday, February 21, 2007 10:05:09 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Friday, February 16, 2007

So, my last post talked about delegates.  In it, I mentioned some compiler trickery involved in declaring events, but I didn't bother explaining it.  After reading it over again, and getting some feedback, I felt bad about glossing over what is pretty much the mainline scenario for delegates. So, what is an event?

An event is kind of like a broadcast. It enables an object to notifiy subscribers when some "event" occurs, give them relevant information about the event, and allow them to do something in response.  And, you guessed it, delegates are at the core of making this work.

Fundamentally, events are a callback mechanism, and could be implemented without delegates using anything from raw function pointers to interfaces, and the CLR doesn't keep you from doing either of those, but there's value in a consistent pattern.  In fact, the designers of the CLR felt so strongly about the value of this particular pattern, that it is part of the CLI spec (along with properties, another pattern that is implemented by other more fundamental constructs).

So, how do you make an event?  Well, in C#, you declare an event like you would declare a field whose type is some delegate and you add the "event" keyword.  So, somewhere in a type, you would have something like:

public event EventHandler Click;

Whether or not it's public depends on how you expect the event to be used.  EventHandler is a delegate with the following signature:

void EventHandler(object sender, EventArgs e);

This signature is another pattern that I'll talk about later.  For now, lets look at what the compiler does for our event declaration.  The compiler gives 3 things (if you don't count the things it already did for the delegate EventHandler):

  • A private field whose type is the delegate EventHandler
  • A (public in this case) method "accessor" for adding delegate callbacks: add_Click //Click comes from the event name
  • A (public in this case) method "accessor" for removing delegate callbacks: remove_Click

When other code wants to hook up to your event, they use the += operator on your event.  This is really syntax sugar for calling the add_Click method.  And, conversely the -= operator calls the remove accessor.

Interestingly, you can write your own implementation for the event pattern.  You might want to do this to save size in a possibly large tree structure with lots of events at each node.  ASP.net does this with controls.  Rather than every Control having tons of fields for each event, it has a sparse dictionary of event delegates, that is only populated for events that have "subscribers".  With a tree that can easily have thousands of controls per page view, this results in a sizeable savings.  How do you do this?  Well, in C#, you use the little known syntax:

public event EventHandler Click {

add {/* do something with value in here */}

remove {/* do something with value in here */}

}

Looks like a property eh? This causes the compiler not to create the 3 things I mentioned above. Instead, it calls your add and remove accesors to do the adding and removing (via the value keyword just like properties).  In it, you can do anything you want, although it's advisable to keep the same semantics as the default implementations.

So, lets talk a little bit about what happens when an event happens and it is called.  Let's say that several other classes have registered for your event (via the += syntax or whatever the compiler supports).  Inside your class, you simply call the delegate (there's a recommended pattern for this as well).  But wait, there's more than 1 subscriber!  Remember, delegates aren't just function pointers, and they are more powerful than using interfaces alone.  If you'll recall in the last post, I said that when you create a delegate, you're really getting a MulticastDelegate, which tracks an invocation list of delegates to run. (this is why the standard event pattern returns void, otherwise, you've got the weird situation of multiple return values from what appears to be a single call).  Under normal circumstances, each delegate in the invocation list is called and execution resumes.

posted on Friday, February 16, 2007 10:36:02 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, February 15, 2007

Soon, my ownership area will extend to include delegates.  Since I'm fairly excited about this, I thought I'd celebrate by writing a little something about them.  So, what are delegates?  A casual observer might be tempted to write off delegates as a sort of managed function pointer.  While this comparison is certainly accurate, there's much more to explaining the power of delegates.

In general, delegates are a sort of universal method dispatch mechanism.  Initially, the scenario they supported was callbacks.  Delegates are one of the things that distinguish the CLR from other VMs like Java.  Java requires the use of interfaces to implement callbacks. (I'm only calling that out as a distinction, not saying the Java way is bad. although personally I like what delegates bring to the table)  So, delegates let you wrap up a method as an object to pass around, with the expectation that it will be called from another context.

Its sort of hard to talk about delegates because the discussion is often framed by the language that's exposing them.  Currently, no managed language exposes them in the way that they are represented in IL.  In C# and VB, you declare a delegate by simply defining a method signature.  From an IL perspective, the compilers generate a class that inherits from MulticastDelegate (another story I'll get to later), with an Invoke method that matches your signature, and some various constructors to support different things.  (You also usually get the corresponding asynchronous calling pattern support methods, but I don't want to get into that)  Some other delegate-related compiler trickery is involved in declaring events, which I'll cover later.

Under the covers, a delegate [conceptually] contains 2 things:

  • A target object
  • A target method

Now, generally speaking, the target method is the method to be run, and the target object is the object on which the target method will be run, but there are cases where this line is blurred a bit.  For instance, when a delegate is pointing to a static method, the target object is conceptually null (internally it's not, but that's an implementation detail).  I'll get into the other cases later.

So now you're saying, "Yup, that's a delegate.  Big deal.  What's so cool about that?"  What's cool about that, my friend is that delegates are the things that power virtually all of the coolest new language features that came out in v2.0 and will be coming out, including all the dynamic language goodness like IronPython.  It's the dynamic stuff that is really exciting, so let's talk about how delegates enable dynamic languages on top of a statically typed system.

(To be fair, Jim Hugunin did his initial Iron Python work before these features were available, but they now play a big role)  One of the pieces of work done in v2.0 was called delegate relaxation.  Previously, the target method had to match the delegate signature exactly.  Now, as you might expect intuitively, the signature can be relaxed such that the target method can have "more general" parameters and return something "more specific" than the delegate's signature.  This is typically defined in terms of covariance and contravariance, terms that even people who understand them get confused.  Here's the way I usually remember it: If I could wrap the target method with a method having the delegate's signature without casting, it will work. Anyway, this feature makes delegates quite a bit more flexible.

Before I go into the other features, lets talk a little about implementation. In normal method calling in the CLR, the first parameter becomes the "this" object.  (Which is why you see ldarg.0 in IL to put it on the stack.)  So, conceptually, the target object represents the first argument for the method. (There is an implementation detail that allows static methods to be called using the same convention, which is a really elegant solution) So, by extending this idea of the target object simply being the first argument, we get a couple of interesting variants.

The first is what is called "closed" static delegates.  This allows you to specify the first argument of a static method at delegate creation rather than at the callsite.  Notice this maps quite nicely to the dynamic language concept of adding a method to an existing instance of an object.  The language runtime just needs to be able to track these extra methods as part of its method dispatch logic.

The second feature is "open" instance delegates.  This allows you to create a delegate that points to an instance method, but doesn't define the target object.  Instead, the delegate signature can have an extra first argument that will specify the target object at the callsite.  When used with LCG (DynamicMethod), this can be used to implement things like adding a method to an existing type.  Again, the language runtime merely needs to add the logic to method dispatch.

These 2 features are intriguing to me because they are not directly exposed from VB or C#.  I believe VB9 exposes these, but they are not accessible in an early bound way in C#.  You can, however, create them via Delegate.CreateDelegate() using reflection, or use Reflection.Emit to generate the corresponding IL.

Hopefully, I'll have some time in the future to do some samples of these as well as discuss more about how these improve the dynamic language support in the CLR.

posted on Thursday, February 15, 2007 2:32:53 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Monday, January 29, 2007

I dealt with several situations in the past months where the crux of the problem was confusion over assemblyname and filename.  Let's define what we're talking about:

  • Filename - The name of a file in the filesystem, such as System.dll
  • Assemblyname - The name given to an assembly to establish its identity.  In this case, we'll only concern ourselves with the "simple" name. such as System

Usually, any confusion that arises between the two can be resolved by reminding people that a filesystem is just one of the places you can get an assembly from.  For instance there are APIs for getting assemblies from byte arrays.

For those that still do see it... In the managed world, the assemblyname gives identity to the code that resides in the assembly.  If you have 2 assemblies with the same assemblyname, you expect them to represent the same identity (perhaps different versions, build flavors, bitness, etc.).  If we relied on the filesystem name, the identity of the code could change just by changing the filename.  That's not the semantics we expect.

So, why does the filename matter?  Why do we recommend keeping them the same?  Some of the reasons are simple convenience.  It's nice to look at a file and know what it is without cracking it open.  If the names are different, it's like me going to a party and wearing a nametag that says, "Peter".  While there is nothing keeping me from doing it, it causes confusion.  However, another more important reason to keep them the same is that assemblies are rarely loaded by filename.  References and most dynamic loads are done by assemblyname.  You don't take a reference to System.dll, you take a reference to System.  At some point, the loader has to find an appropriate file to load to satisfy that reference.  If System's filename is Peter.dll, then it's going to have a difficult time finding it to load.  This is actually the very reason that gacutil will not let you install an assembly into the global assembly cache if the filename doesn't match the assemblyname.  However, I think it's silly that it doesn't just fix the name for you.

What about multi-module assemblies?  Well, it's the module with the assembly manifest that matters.  It's the one that should match.  Then the rest of the files need to match the assembly manifest :). But, if you're using multi-module assemblies, let me know.  I'd like to know why.

posted on Monday, January 29, 2007 12:46:56 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Monday, January 22, 2007

I've mentioned before that one of my ownership areas at MS is the CLR "shim".  Most people I've said that to ask me, "What's that?".  I usually reply that generally, it's mscoree.dll, to which they typically respond, "Oh yeah.  What does that do?"  In general terms, the shim is in charge of firing up the runtime in a process.  In addition, it exposes all the hosting APIs and other stuff you need to do stuff with the CLR from unmanaged code. If you look at a managed app, you will see that it has a dependency on mscoree.dll, and nothing else CLR-related.  Ater the runtime is spun up, most of the things that mscoree exposes are simply forwarded calls into the mscorwks.dll of the runtime you have installed.

What's interesting about mscoree.dll, is that it is the only piece of the runtime that doesn't run side by side.  You can have v1.1 and v2.0 installed on the machine, but you will only have one mscoree.dll.  You always have the version of mscoree.dll that corresponds to the latest version of the runtime installed on your machine (unless you have installed a patch or something that services mscoree, in which case you may have a v2.0 shim even though only v1.1 is installed on your machine).

So, naturally, backwards compatibility is extremely important in the Shim.  When you start up managed code, the Shim decides which version of the CLR to fire up based on lots of different things.  These things are all fairly well documented and all have a specific scenario they enable, but by their nature they are very confusing.

Aside: When I was job hunting, I interviewed at several companies other than Microsoft.  During some of those other interviews, I was asked questions about what runtime would be started under certain conditions. The rules are so confusing, that some of the interviewers, although all extremely smart people, had formed incorrect models of what the rules were.  Some told me I had the wrong answer to their question, when if fact it was correct. (that's not to say that I knew the correct answers to all of them.)

I'm not certain that I can clear up the confusion, but I do hope to have a series of posts in the coming months on why the shim does what it does under certain circumstances.  Then, at least you might understand what's going on when it happens.

posted on Monday, January 22, 2007 10:50:36 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Wednesday, January 03, 2007

Someone asked me the other day if you could reflect on other people's assemblies using the CLR.  My answer was, "ABSOLUTELY!!!!!"

As it turns out, they were having some problems achieving this, and they were wondering if there was some kind of security mechanism in place that was preventing reflection on 3rd party assemblies.  Here's the basic scenario.  They had a 3rd party library with a type that defined several "constants" using fields.  They needed to be able to specify a named constant via a string and return the value for the constant.  Security and performance arguments aside (they had already been considered), they simply wanted to lookup the field by name via reflection and get its value.  This can be accomplished via just a few fairly reasonable lines of code using one of the GetField family of methods on System.Type and then getting the value.  I'll leave this as an exercise for the reader so they can have the fun of wresting with the silly BindingFlags enum.

After some discussion, I learned that the trouble actually revolved around getting the Type object in the first place.  They were using Type.GetType to load the type with a namespace-qualified name as the string argument.  It was returning null (Nothing in VB).  They were validating that the string was correct using Intellisense, and concluding that since Intellisense could see the type, that Type.GetType should also "see" it (which seems like a perfectly reasonable assumption).

Type.GetType() takes a string argument specifying the type to retrieve.  When specifying types in strings, you can usually use a namespace qualified name ("[namespace.]type"), or an assembly-qualified name ("[namespace.]type, assemblyName").  If you don't provide the assembly name, the API looks through the assemblies already loaded in the AppDomain for a type that matches the name.  If an assembly name is specified, a bind occurs to the assembly and it is loaded if necessary.

In this case, there was a reference to the assembly containing the type, so Intellisense was picking it up and providing completion. The trouble was that, at runtime, the assembly had not yet been loaded into the AppDomain, so the type was unavailable.

So, the options were (in order of appropriateness in my opinion):

  • Use an early-bound type - Since the type was known at compile-time, use typeof() (GetType() in VB).  This will create a compile-time, assembly-qualified type reference in the IL rather than a runtime parsing/bind/load of the type string.
  • Use an assembly-qualified type string - Adding the assembly name to the type string will let the CLR know what assembly to look in (and load if necessary).  There are some subtle versioning issues with this approach, especially for strongly-named assemblies.
  • Make sure the assembly is loaded prior to calling Type.GetType() - Making an early-bound call to something else in the assembly first will get it loaded into the AppDomain.  This seems like a fragile solution and I would not recommend it, although it will technically work.

The real issue here is the number of samples (especially in VB) provided by MS that use Type.GetType() with a non-qualified name (ex. "System.String").

posted on Wednesday, January 03, 2007 9:09:25 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Wednesday, November 15, 2006

Historically, when I gotten questions from friends and colleagues regarding performance, I've often sent them straight to Rico Mariani's blog.  It is a wonderful source of advice, guidance, rules and humor in the realm of performance.  His latest entry is another wonderful piece.

I've spoken with several individuals who have said things like, "Well, our goal was to be able to handle a request in X seconds, or handle Y amount of throughput, and we're too slow.  Any ideas?"  More than likely, they've done their design and implementation without any work to make sure they reach their performance goals, and by that time, my advice to "get rid of that dependency", or "redesign this component" is too late.  When factoring the performance goals into those design decisions would have raised the red flag immediately. 

I love his approaches because they tend to sound alot more like engineering (which is my background) than alot of guidance that tends to be thrown around.  Here's my favorite quote:

I get very worried when people say things like “Productivity and cleanliness always trump performance.”   Productivity is about creating product.  A “clean” design which fundamentally fails to address performance requirements is not an example of a productive enterprise, it is a looming disaster.  A developer productively engaged in creating a failure is uninteresting. 

Now that Rico and I work for the same company (across the street from each other) perhaps I can come up with a good excuse to meet him in person.

[UPDATE] After reading this, I thought it came off a little snobbish.  I think we've all been in the above situation.  Many times, through no fault of our own.  We're often the victims of process, bureaucracy or other external forces that often oppose success.  Please don't feel like I'm talking down to anyone.  Hindsight is 20/20.

posted on Wednesday, November 15, 2006 1:07:39 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, November 14, 2006

Something that always bothered me at my previous job was having to install the framework SDK to get a copy of gacutil.exe.  I know the guys still there hate having to install the SDK on a server so they can manipulate the GAC.  Richard Lander gives some interesting information on the topic, but he doesn't go into why it's not included in the redist.

Since I've been at MS, I've learned a great deal more about the Global Assembly Cache (GAC) and the fusion APIs. Last night, I was taking a shower after cutting my hair, and the reason came to me.  That reason is... installers, or more importantly... uninstallers.

During a discussion recently, I heard an amazingly profound saying:

"It is better to fail to do, than to fail to undo"

I don't recall who said it, and they probably got it from someone else, but it is right on the money.

When programs uninstall, they have to correctly remove things they've placed in the GAC.  Let's make up an example.  Let's say I have some software company.  We've developed a magical managed library that makes it wicked easy to develop our software, so we use it in all our products.  Let's say that in our deployment model, it makes sense to deploy that library to the GAC.  So, uninstalling our software should remove it from the GAC, right?  We'll what if one of our other products is on the machine? We don't want to uninstall one and break the remaining one.

When you install an assembly into the GAC via the Fusion APIs, you do so with a traced reference.  That reference tells Fusion "who" installed it.  If 2 installers install the same assembly, it's smart enough to know not to remove the assembly until both uninstall.

GACUtil, as a management tool, enables you to  use traced references as well, but it also allows you to install without a traced references.  It also allows you to force uninstalls and do lots of other screwy things.  In other words, the tool is too powerful.  Devs need to be able to do screwy things.  Administrators need to be able to do screwy things.  Regular users don't.  Give them an install package that handles everything.  Otherwise, you're bound to have a support nightmare.

If I have time, or enough requests, we'll go into how to use the fusion APIs directly to manage the GAC.

posted on Tuesday, November 14, 2006 9:10:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Monday, November 13, 2006

Several people have complained that I haven't made any entries in a while.  There are several reasons for this.  Aside from being generally busy, I'm signed up to do some posts about the CLR in general, and some from my ownership areas as part of my job.  This has resulted in several "half-baked" entries ranging from hosting the runtime to the Global Assembly Cache, to some other more general CLR-related entries.

In addition, MS has had several new products in the queue that were on the verge of releasing, and I've had entries waiting for them to release.  Of course, being on the safe side, I've waited for someone else to blog first, at which point an extra post from me doesn't make much difference.

And thirdly, Jenna is growing like a weed and it's hard to find time to sit down and formulate posts when you're chasing her around the house.

So, regarding technical content.  I'd like to get an idea of what people are interested in hearing about. So shoot me an email or leave a comment with suggestions for posts regarding the CLR in general or specifically within my areas of ownership:

  • The unmanaged hosting API's (CorBindToRuntime, etc.). These are what you'd use to host the CLR in your own app in order to more tightly control things, or provide additional isolation or escalation policies.
  • The global assembly cache
  • "automatic" CLR activation - what I mean by this is what happens when the runtime is spun up my a managed app, or via COM interop.  Things like how to decide which runtime to use, etc.

In addition, I'll try to keep the personal updates coming for those who are not looking for just technical content.

posted on Monday, November 13, 2006 12:49:21 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, September 12, 2006

For some reason, it really bothers me when I see this pattern in code (using no particular language):

someVar = false;
if (someCondition) {
   someVar = true;
}

This can, of course, simply be specified as:

someVar = someCondition;

Now, I understand that as code is refactored, this pattern can appear due to simplifications...but still...come on. It's even worse when someVar is used only as the condition of another if block.

UPDATE: This seems to occur most often in code whose ownership has changed hands several times. New owners seem to fall into either trying to edit the code while maintaining the structure as closely as possible, or by heavily refactoring. Many times, the former choice is made, especially if the original owner is still around somewhere. Usually my first instinct when I inherit code is to massively refactor. I'll post a more in-depth view on this later.

posted on Tuesday, September 12, 2006 3:09:58 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [6]
# Tuesday, May 23, 2006

I thought some others might find this useful.

I have been baffled for a few days why certain actions in an experimental Rails app would work fine in dev mode, and then give me mysterious HTTP 500 errors when deployed.  The Rails logs would tell me everything was just fine and there was no problem. But, there they were in Fiddler... status 500.  I haven't figured out how to get ahold of the Apache logs from my host yet, so they couldn't help me.

I finally set up Apache with FastCGI myself so I could attempt to duplicate the problem.  It was immediately apparent.  The Apache log was complaining about invalid headers in the FastCGI communication.  I was using "puts" to write out to the console when running in dev mode in order to quickly debug what was going on.  This works fine when using the Webrick standalone server, but FastCGI on Apache evidently uses stdout to do the communication between it and the fastcgi processes, and writing to stdout screws up that communication and Apache reports HTTP 500, even though Rails thinks everything's A-OK.

The lovely thing about Ruby is that I was able to fix it by redefining puts to do nothing.  Ideally, you ought to use the logging mechanisms, and I will.  But that made a great short-term fix.

posted on Tuesday, May 23, 2006 12:59:36 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Friday, May 12, 2006

During the last PDC, I drooled over the announcements regarding C# 3 and LINQ.  Now we are one step closer. Yesterday, they released a new preview of those features that bind against the current version of the framework.

Skeptical language junkies* should take a gander at Don Box's latest post that attempts to explain what the big deal is. I agree with some of the people saying that the syntax really gets in the way (and I feel that way even more after spending alot of time in Ruby lately).  They need to add a language feature that indicates whether you want the lambda or the expression, then you can fall back on type inference to keep the typing to a minimum.

Overall, I couldn't be more giddy about the way integrated query works, and being able to get your hands dirty with it now is really awesome.  We're already working with data in this way in our APIs at work, so I've been able to experiment with the language and library features directly with some of our data and it's an absolute dream.  They should not wait for the next version of the framework to give us a go-live license for this.

Now, this kind of thinking needs to be applied to the web stack and MS could have a compelling alternative to Rails.

* That is, language junkies who are skeptical, not junkies of skeptical languages.

posted on Friday, May 12, 2006 12:38:07 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Sunday, February 12, 2006

One of the most popular search hits for my blog is "Managed XMP Parser".  A while back (actually it was 1 year ago today...whoa, freaky), I blogged about extracting the XMP data out of my pictures after screwing up the upload into Flickr.  I ended up writing my own code to pull out the XMP data.  I mentioned making it available, but it was relatively straightforward, so I never got around to posting it.

In the last week, I've gotten lots of requests for the code, so here it is, uglyness and all.  One interesting thing about my approach is that I do not rely on any particular file format.  I simply look for the XMP markers and pull out the XML in-between.  This means it will work on ANY file with embedded XMP.

All the usual disclaimers apply.  I don't claim this is the best way, but it works.  I've just plucked it out of my little date fixing app I built.  At the end, you'll have an XPathNavigator and a namespace manager setup to run XPath queries.  There's probably some sweet stuff the 2.0 can help us out with, but I haven't updated it.  Enjoy:

MemoryStream xmpStream = new MemoryStream();

byte[] beginPattern = Encoding.ASCII.GetBytes("<?xpacket begin");

int beginIndex=0;

bool beginFound = false;

byte[] beginStopPattern = Encoding.ASCII.GetBytes(">\n");

int beginStopIndex = 0;

bool xmlStartFound = false;

byte[] endPattern = Encoding.ASCII.GetBytes("<?xpacket end");

int endIndex=0;

bool endFound = false;

bool backedUp = false;

using (Stream stream = new FileStream(path, FileMode.Open)) {

      int data;

      while ((data = stream.ReadByte()) != -1) {

            byte b = (byte)data;

            if (!beginFound) {

                  if (b == beginPattern[beginIndex]) {

                        beginIndex++;

                        if (beginIndex >= beginPattern.Length) {

                              beginFound = true;

                        }

                  }

                  else {

                        if (beginIndex != 0) {

                              beginIndex = 0;

                              stream.Seek(-1, SeekOrigin.Current);

                        }

                  }

            }

            else if (!xmlStartFound) {

                  if (b == beginStopPattern[beginStopIndex]) {

                        beginStopIndex++;

                        if (beginStopIndex >= beginStopPattern.Length) {

                              xmlStartFound = true;

                        }

                  }

                  else {

                        if (beginStopIndex != 0) {

                              beginStopIndex = 0;

                              stream.Seek(-1, SeekOrigin.Current);

                        }

                  }

            }

            else if (!endFound) {

                  //load up the memorystream

                  if (backedUp) {

                        backedUp = false;

                  }

                  else {

                        xmpStream.WriteByte(b);

                  }

                  if (b == endPattern[endIndex]) {

                        endIndex++;

                        if (endIndex >= endPattern.Length) {

                              endFound = true;

                              xmpStream.SetLength(xmpStream.Length-endPattern.Length);

                              break;

                        }

                  }

                  else {

                        if (endIndex != 0) {

                              endIndex = 0;

                              stream.Seek(-1, SeekOrigin.Current);

                              backedUp = true;

                        }

                  }

            }

      }

}

if (!endFound) {

      Console.WriteLine("No XMP data found");

      break;

}

//load up the xmp

xmpStream.Position = 0;

XPathDocument xmpDocument = new XPathDocument(xmpStream);

XPathNavigator xmpNav = xmpDocument.CreateNavigator();

XmlNamespaceManager nsManager = new XmlNamespaceManager(xmpNav.NameTable);

nsManager.AddNamespace("x", "adobe:ns:meta/");

nsManager.AddNamespace("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");

nsManager.AddNamespace("iX", "http://ns.adobe.com/iX/1.0/");

nsManager.AddNamespace("crs", "http://ns.adobe.com/camera-raw-settings/1.0/");

nsManager.AddNamespace("exif", "http://ns.adobe.com/exif/1.0/");

nsManager.AddNamespace("aux", "http://ns.adobe.com/exif/1.0/aux/");

nsManager.AddNamespace("pdf", "http://ns.adobe.com/pdf/1.3/");

nsManager.AddNamespace("photoshop", "http://ns.adobe.com/photoshop/1.0/");

nsManager.AddNamespace("tiff", "http://ns.adobe.com/tiff/1.0/");

nsManager.AddNamespace("xap", "http://ns.adobe.com/xap/1.0/");

nsManager.AddNamespace("xapMM", "http://ns.adobe.com/xap/1.0/mm/");

nsManager.AddNamespace("dc", "http://purl.org/dc/elements/1.1/");

XPathExpression dateExpr = xmpNav.Compile("string(/x:xmpmeta/rdf:RDF/rdf:Description/exif:DateTimeOriginal)");

dateExpr.SetContext(nsManager);

string dateTimeStr = (string)xmpNav.Evaluate(dateExpr);

DateTime date = XmlConvert.ToDateTime(dateTimeStr);

 

 

posted on Sunday, February 12, 2006 2:48:18 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, January 26, 2006

Back the the days of .NET 1.1, you'd use a Hashtable to store values with a fast lookup using a key.  Hashtable used a single array of "buckets", which stored the key, value, and a collision index.  The index to the the bucket array is a transform of the hashcode of the key (unless there's a collision).  Long story short, there was no stable ordering to the values in a Hashtable (technically, the order is super-important to the algorithm, but it's not useful on the outside).

Fast-forward to 2.0, we have the Generic Dictionary.  It's algorithm is quite a bit different.  It has an array of Entry<Key, Value>, a private nested class similar to bucket in the Hashtable.  But, the "bucket" data is stored in a separate array of int, which holds the index into the Entry array for that hash.  When you add something to a Dictionary, it is simply added to the Entry array, and the bucket array is the one that's updated and possibly re-ordered.  Long story short, there IS meaning to the order you get from a generic Dictionary (using the enumerator).  It's the order you added them.  This subtle change adds alot of value to Dictionary in my opinion.

It was just a very interesting realization to me, and I thought someone else might find it useful.

posted on Thursday, January 26, 2006 9:03:31 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Friday, January 06, 2006

As I mentioned earlier, I've been playing with WF.  When debugging in Visual Studio, a program running a workflow pretty much uses 99% of CPU the whole time.  I thought that was pretty scary, especially since I had alot of Delay activities in my workflow.  Well, not to worry, running the .EXE outside of devenv gives you what you expect, a very well-behaved program.

I suspect it has to do with the sweet debugging stuff that's built into the workflow designer surface.  You can add breakpoints to activities.  It's pretty cool, despite crashing on a regular basis.  Hopefully, they'll fix this before RTM.

posted on Friday, January 06, 2006 8:00:06 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, January 05, 2006

I've had a bit of time this week to dive into WF (formerly WWF... he he).  I'm very interested in utilizing this both professionally and personally.  In my experience, a HUGE amount of the software I interact with and write deals with workflow.  I need more than one hand to count the pieces of software I deal with at work that attempt to give the user a design surface to create workflow on.  Most of them fail pretty miserably.  That's why I was delighted that the WF designer was so extensible and reusable.

After orienting myself with the concepts of WF, I set out to extend the base SequentialWorkflow to wrap some custom behavior around the execution of the workflow.  I immediately ran into problems.  It seems that if you inherit from SequentialWorkflow, the project immediately thinks you are going to use the designer to create a workflow, rather than build a base type that will be used to build workflows in other projects.  It immediately starts throwing around validation messages like:

Activity 'BlahBlahBlah' validation failed: Property 'ID' not set.

Well, of course it's not set, I'm not even remotely trying to create one yet.  So, I add the following brain-dead code in the constructor:

if (this.DesignMode) {

      this.ID = "FakeID";

}

This gets rid of the validation error, but now it gripes that my class is not partial, so I slap a partial modifier on it and let it go.  Finally, it compiles.

Once I figure that out, I'm golden and things run fairly smoothly.

posted on Thursday, January 05, 2006 11:46:27 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, December 27, 2005

The two main High Definition standards are 720p and 1080i.  Since there is plenty of information on the 2, I will not go into detail on them individually.  What there isn't alot of is objective, well-reasoned explanations of which one is better.  Mostly, this is because it is not a simple topic, and the winner depends largely on the context of the question.  Another reason this is so hotly debated is that people spend alot of money on HDTV's and that causes people to become zealots of whatever particular technology they've embraced. I'll try to compare the two within each of the important contexts, without personal bias.

Let's start with the fundamental question.  From a HDTV technology perspective, which is better?  The answer is, they are the same.  Before you leave a comment on how stupid I am, keep reading.  Fundamentally, we're talking about how much data can be displayed at a time.  For TV, that is limited far more by transmission bandwidth than anything else.  To maintain that bandwidth, the signal is compressed in a lossy fashion (see update below). In either format, you're going to get roughly the same "picture quality" watching TV.

Now, you're saying "shut-up, idiot.  One has got to be better.  Why won't my TV do 720p (or 1080i) if they are the same?". This is where the argument gets interesting.  The signal formats were created for "analog" TV technology, namely CRTs.  For CRTs, it's all about frequency.  (How many lines can you draw per second) 720p has more lines per second.  The problem is, you're still bandwidth limited, whatever that limit may be. Horizontal "resolution" comes down to bandwidth.  Any increase in bandwidth basically gives you more capacity for horizontal resolution, so again the two are the same.  The problem is, many HD sets are based on technologies with a "native resolution", meaning the TV is locked in to a certain resolution.  In these cases, the interlacing trick used by 1080i to gain temporal resolution at the cost of spatial resolution is useless.  [Added:] Many CRT-based HD sets cannot do 720p because it requires a higher horizontal refresh rate and the flyback circuit capable of 1080i is not capable of that speed (in short, you got screwed (myself included) to increase their profit margin). 1080i is refresh frequency comparable to 540p (if it existed as a standard).

So, the result is that most HD sets have a native resolution at (or near) 720p.  So, for those sets, a 720p signal matches the native resolution of the set and "looks" better.  There's alot of hype around 1080p sets, capable of displaying, you guessed it, a 1080p signal, which only really exists as the output of a PC at 1920x1080, which brings me to my next point.

Since basically all HD content is stored digitally, we're not just looking at lines of resolution, we've got frame sizes.  So a 720p frame is 1280x720, and you get 2 per second.  That's 1,843,200 pixels per second.  A 1080i frame is 1920x1080, but you only get 1 per second. That's 2,073,600 pixels per second.  So, compression and bandwidth aside, dealing with digital signal sources, you're getting more data (in the form of pixels per second) with 1080i than you are from 720p.  If that's the definition of better, then 1080i is better.

But, as I mentioned before for fixed resolution TV technology (anything but CRT), it takes a 1080p set to show 1080i in all it's glory.  Otherwise, 720p will probably look better due to the down-conversion resampling. Clear as mud, eh?  Well, I hope this clears it up for at least one person. [Added:] A signal will almost always look better in its native form.  Whenever there is a resampling step in up/down converting, you're going to have some degradation due to aliasing.

[Added:] I owned a 1080p DLP set for a few short days, and it was by far the prettiest thing I've seen.  it displayed both 1080i and 720p signals beautifully.  I sent it back, though.  All the upconverting in the video pipeline introduced a noticable delay that made playing video games very frustrating. :)

[UPDATE:] I had a comment asking about the lossy compression applied to video.  I'm referring to the digital compression that the video signal undergoes as part of the ATSC standard.  It uses MPEG2 compression (basically the same used for DVDs) to reduce the digital bandwidth (and therefore the analog bandwidth) that the video signal takes up so it fits in the "channel" (6MHz wide if I remember correctly).  This particular compression degrades picture quality (the amount can be controlled).  It is both a spatial and temporal compression, and gives very dramatic compression ratios.  Cable companies have more bandwidth to work with, but they usually pack the crap in as tight as they can rather than giving us better quality...jerks.

Note, there is also some "analog" compression going on as well to squeeze the ~20Mbit/sec digital signal into the 6Mhz channel.  The ATSC standard uses an 8-bit "vestigial side band" modulation (8VSB), while cable companies use a form of Quadrature Amplitude Modulation (64-QAM).  Both are beyond the scope of this article and neither of these come into play in picture quality as they are lossless forms of compression.

If you haven't guessed, I did a research paper on HDTV back in college.

[Yet another UPDATE:]  Want some hands-on proof?  Go download the 720p (146MB) and 1080i (211MB) versions of the King Kong movie trailer from Apple's site.  Compare the file sizes and then open them both, and you decide which one shows you more "data" (my definition of "better").

posted on Tuesday, December 27, 2005 2:07:39 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Tuesday, November 01, 2005

Another geeky post.

We built our own query system several years ago.  It used the concept of "query by example" with templates that were interpreted into SQL.  You could generate the equivalent of an "IN" clause by assigning something that implemented System.Collections.ICollection to a template property.  Recently, with the introduction of generics and the generic collections, we decided to relax that contract to IEnumerable since that's really the least common denominator of all collections.  This produced some hilarious results.  As you can imagine, anywhere where we were using strings to specify template values, they were being interpreted as collections of characters.  This manifested itself mostly by having queries return nothing, since our system properly handled Chars and treated them as strings.

But, this reminded me of a similar incident a few years ago that involved a framework that rendered objects as comma-separated values. So, instead of:

Mark, Miller

You'd get:

M, a, r, k, M, i, l, l, e, r

I thought since I had been bitten twice, I would write it down.  I usually don't forget stuff I take the time to write down.

So, the moral of the story: Remember that even though System.String implements System.Collections.IEnumerable (and now System.Collections.Generic.IEnumerable<System.Char>), you usually don't want to treat it as a collection.  You may need to special-case it.

Anyone else think of another type like this?

posted on Tuesday, November 01, 2005 12:23:02 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Monday, August 22, 2005

A while back, I showed you my file-based viewstate persistence solution.  Thanks to Google search hits and traffic from Scott Hanselman's analysis, it's been one of my more popular entries.  With ASP.net's improvements in this area, I felt that it was due for an update.  So, here's a quick whack at it.  The usual disclaimers apply.

2.0 adds the notion of "control state" to state persistence, which is very cool.  It's an opt-in mechanism for things that need to survive postbacks even if ViewState is turned off.  In addition, there's some new flexibility with page adapters and such, but we'll ignore that complexity for now and go for a direct port of my old sample, but include the new ControlState mechanism.

public class FileBasedPageStatePersister : HiddenFieldPageStatePersister {

      public FileBasedPageStatePersister(Page page) : base(page) {}

 

      public override void Load() {

            //let the base class do its thing

            base.Load();

            //get the control state

            object baseControlState = base.ControlState;

            if (baseControlState != null) {

                  //the control state should be our Guid

                  Guid guid = (Guid)baseControlState;

                  //read the contents of the file and set the two states

                  using (TextReader reader = new StreamReader(CreateOfflineViewStateFilePath(guid))) {

                        Pair pair = this.StateFormatter.Deserialize(reader.ReadToEnd()) as Pair;

                        base.ViewState = pair.First;

                        base.ControlState = pair.Second;

                  }

            }

      }

 

      public override void Save() {

            //create a guid for this viewstate

            Guid guid = Guid.NewGuid();

 

            //serialize the states into a temp file

            using (TextWriter writer = new StreamWriter(CreateOfflineViewStateFilePath(guid))) {

                  Pair pair = new Pair(base.ViewState, base.ControlState);

                  writer.Write(this.StateFormatter.Serialize(pair));

            }

            //trick the normal system into thinking all it needs to save is the guid

            base.ControlState = guid;

            base.ViewState = null;

            base.Save();

      }

 

      string CreateOfflineViewStateFilePath(Guid guid) {

            //TODO: put these files whereever you like

            return Path.Combine(Path.GetTempPath(), string.Format("{0}.viewstate", guid));

      }

 

}

So, we immediately see that it's much shorter.  This is because ASP.net uses a mechanism very similar to my 1.1 solution, so alot of the plumbing is built-in. The only thing you need to do to use it is override the PageStatePersister property on Page and return one of these. Again, we're piggybacking on the "hidden field" persistence mechanism and using that to store our Guid for the request.  Not much different, and I'm pretty happy to say that converting to this model from my old is very simple.

Another interesting idea would be to leave the ControlState in the hidden field, and only store the ViewState in the file. it would be a simple change that I'll leave as an exercise to you.  Then, you could be very aggresive about purging old or large files without worrying about breaking anything (provided of course that you've made your controls tolerant to such a method).

posted on Monday, August 22, 2005 2:21:14 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Saturday, August 20, 2005

My friend Peter and I often use Remote Assistance to collaborate on projects.  He and I were doing some things today in Visual Studio, and while I was watching, a beautiful window appeared that showed the open files and windows and such, complete with appropriate icons. "Whoa!", I said, "What was that?!"  "What", he replied, "this?"  The window re-appeared.  Turns out, it was the Ctrl-Tab window.  For some reason, I never even thought of using it to browse the open windows. I always assumed I had to navigate the frightful tabbed interface, which just might actually be usable with that little trick.

I felt better when I typed in a type name and Intellisense didn't recongnize it.  A quick right-click->Resolve instantly added the appropriate using statement.  Peter hadn't seen that feature.  Although I guess Ctrl-Tab's been a standard for much longer than Resolve.

posted on Saturday, August 20, 2005 12:26:33 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Thursday, August 18, 2005

This past week, I've found myself writing alot of methods with the following pattern:

IEnumerable<T> DoSomething(IEnumerable<T> stream);

Then I use C#'s wonderful new iterator syntax to manipulate the stream, or simply intercept and process the data.  I've found this wonderfully useful for alot of the statistical type algorithms that I implement for our project.  It's been especially cool to combine this with a bit of reflection for creating "dynamic filters" based on user input.  All of a sudden, all your algorithms are incredibly flexible with very little effort.

The beauty of this pattern is that it can be chained.  Obviously, there is a practical limit here, but I've found it extremely useful and I thought I'd share it for those who haven't stumbled across this concept before.

Now, if we could only create iterators using anonymous delegates, then I would proceed to drown in my own drool.

[UPDATE] fixed title grammar. Sorry if that pops it back up in your aggregators, RSS readers.

posted on Thursday, August 18, 2005 1:29:34 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Wednesday, August 17, 2005

I blogged a few days ago about a crazy problem I had with garbage collection in ASP.net under 2.0.  I finally tracked the problem down to Array.Sort.  I created a framework for working with large sets of data in a dynamic way.  This allows us to group data and do other complex statistical things based on fairly open-ended user input, rather than having to write special case code for each scenario.  It uses some reflection to be able to drill into objects.  When sorting, we pass in a dynamic IComparer that can drill into each object and do dynamic comparisons.  Unfortunately, many of our algorithms were dependent on sorting the data first.  This was causing a huge number of allocations due to drilling in and boxing lots of value types.  And, the nature of the sort causes that to happen multiple times per item.

Under 1.1, we took the hit on memory.  It didn't take that long, and it was soon collected.  Under 2.0, the GC seems to recognize this allocation pattern early and proactively begins collecting aggresively.  The problem seems to be that this slows down the sort tremendously, to the point where it essentially comes to a halt.  After a long time, you get a bizarre looking exception about a failed sort from deep in the framework.

What I can't figure out is, during all this time, there's still plenty of memory, and activity on other requests is not impacted horribly.

Well, I took a long hard look at some of the operations, and re-implemented them with a hashing strategy rather than relying on a sort.  As it turns out, for most cases, this is much more efficient.  Long story short, overnight stress testing turned up with clean results this morning.

posted on Wednesday, August 17, 2005 7:37:49 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [1]
# Friday, August 12, 2005

This is awesome.  Joe Duffy reveals what I had suspected.  A change to the CLR that will fix the issues I've previously discussed about nullable types in version 2.0!  From his blog:

The core of this change is that the IL box instruction has been modified to recognize Nullable<T>s. For non-Nullables, behavior remains the same; but upon seeing one, it inspects its HasValue property. If HasValue is true, box peeks inside the structure, extracts the T value, and boxes that instead; otherwise, box simply leaves behind a null reference. Obviously, unbox has also been changed to allow nulls to be unboxed back into Nullable<T> structures. This had a rippling effect in the CLR codebase and also required changes to late-bound semantics to mimic the static case.

This is fantastic, and reveals just how strong Microsoft's commitment is to the development community.  I gave my feedback on this before.  I felt it was a problem that aware developers could understand and live with, but I felt that novice developers would struggle with it, and ultimately it would make the feature, and the platform less understandable and approachable.

As you can tell from my earlier post, I had pretty much decided that MS would be unable to fix it at this late stage in the game, and that would be a shame.  But, thanks to some good decision making, they've done it.  Also, the solution was quite similar to my suggestion, I'm happy to say.

Be sure to follow through and read Somasegar's post on the subject.

posted on Friday, August 12, 2005 11:04:23 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Monday, August 01, 2005

As if I didn't have enough fun stuff to play with, the first Windows Vista beta was released last week.  I put it on my little laptop first, where it ran very well, but I was disappointed that the new visuals didn't run on the crappy video card it has.  So, last night, I put the 64-bit version on my PC, which has a recent ATI card on it.  Everything runs great on it and it's quite beautiful.  Both installs I've done have been utterly trouble free.  It brought back pleasant memories of installing the 95 beta and waiting with anticipation through the "Starting windows for the first time" screen, which always reminded me of that cereal commercial..."taste it again... for the first time".

This evening, I'll be putting my apps back on the PC, so we'll see if I run across any problems there.

posted on Monday, August 01, 2005 11:06:07 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Thursday, July 28, 2005

Sorry for the sparse updates.  I've been head-down on quite a few projects recently, so most of my ramblings have taken the form of internal blog entries and emails.  For the last several months, I've been pushing my co-workers to think about migrating to Whidbey (The next version of Microsoft's .Net Framework, the CLR, whatever you want to call it) sooner than later.  Most people know I'm an early adopter of new technology, and always take my suggestions as the ravings of someone infatuated over new things.  However, we're finally moving that direction for one of our biggest tools as a result of some performance optimizations I was able to do with new features provided in 2.0 (namely generics, iterators, and some new asp.net goodness).  The performance opportunities alone are return enough for the investment.

Rico shares with us an overview of these opportunities, and it's a good enough list that even a manager can see the benefits, many of them handed to you without any additional work.

posted on Thursday, July 28, 2005 7:53:41 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [1]
# Tuesday, June 28, 2005

A large part of my time at work recently has been addressing scalability issues in some applications.  Most of these stem from poor use of memory.  Most of these applications were proof of concept that made their way into production code for whatever reason.  More and more we identify problems and when we identify them to the developer, this is the typical exchange:

Me: You're taking too much memory here by using type Xxxxx when you could have gotten by with type Yyyyy

Dev: I have to use type Xxxxx, that's what the data is.

Sometimes, you just have to take a step back and look at your bytes.  Sometimes, you need a profiler, but sometimes you can just do some math.

For example, if I have an array of dates that correspond to events, a logical choice for a datatype is System.DateTime.  However, if you've got millions of events, System.DateTime is relatively expensive.  Especially if you consider that your event times only have resolution down to the nearest minute and only represent times within the last several weeks.  The range, or domain of DateTime (ticks since the epoch) is overkill for your circumstances. You may have been able to get away with UInt16, which would have been 2 bytes instead of 8, which is a huge reduction. (Provided that you've shown this array of DateTime to be a significant part of your memory consumption.)

I'm sure there is some official name for this, but I'll refer to it as "constrained domain", where you know something about your circumstances or usage that allows you to reduce the range or domain of a concept in order to store it as something smaller to improve memory consumption.  Remember, just because a big, rich datatype makes things easier, matches a db schema, or is named the same as the concept you need to store, doesn't mean it's a good match for your scenario.

There are, of course, other considerations to make.  The Y2k problems were the direct result of taking this concept to the extreme (of course resources were in much shorter supply then than they are today).

So, when you are looking for places to trim your memory usage, look for places where you can constrain the domain of a concept and store it in fewer bytes.  Also, if you're in CLR-land when you do this, look for boxing value types.  Remember, bytes in a System.Collections.ArrayList take up 5 bytes each, not 1. (9 bytes in 64-bit land) (Yeah generics)

posted on Tuesday, June 28, 2005 9:17:05 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [1]
# Friday, June 10, 2005

Back last year, I outlined a problem I had with calling Response.End from a thread other than the thread that handled the request.  It stemmed from a progress mechanism we had implemented that had the ability to cancel long-running tasks when the user hit stop or closed the browser.  As it turns out, that wasn't the whole story.

We recently added a few reports where, the majority of the time, gathering the data takes over a couple of minutes.  While doing memory and performance optimizations, we noticed our worker process was recycling with the following message in the System event log:

A process serving application pool 'XXXXXXXXXX' terminated unexpectedly. The process id was '####'. The process exit code was '0xff'.

Googling for this error doesn't do you much good since it could be caused by a variety of reasons.  Most of the advice I came across wasn't well thought out and made quite a few bad assumptions.  So we added some more debug logging and determined that this was happening when our canceling mechanism kicked in when a user decided they didn't really want to wait 5 minutes for the report.  I was greatly puzzled by this since this feature had been tested thoroughly and had been running in production for some time.  Looking back through the server logs, it was evident that it had been happening all along, just not very often. We had just gotten the performance on the vast majority of pages to be very good and it wasn't an issue.  The problem only came to the surface when we added the report that always takes a while.

Here's the problem.  If the handling thread is aborted, it causes a condition that IIS considers to be bad and that forces the worker process to recycle. (presumably, there is some communication that doesn't occur) This doesn't happen when Response.End() is called because it passes a special exception as the exception state to Thread.Abort.  The HttpApplication catches ThreadAbortException and checks the ExceptionState.  If it is an HttpApplication.CancelModuleException, it knows there was either a timeout, or Response.End() was called, and it cancels the Thread.Abort by calling ResetAbort which allows the thread to continue running at that point.  I thought that was pretty slick.

When I was aborting the handling thread, I was on a different thread, so I had to call Thread.Abort manually, so CancelModuleException was not being used, so the thread was ending completely and causing the recycle.  Since HttpApplication.CancelModuleException is internal (and rightly so) I could not simply use that mechanism.

The good news is that the Unload event always happens (for all practical purposes), even when the thread is being aborted.  So I added my own PageIsCancelling property to our base class Page and check it, along with the current ThreadState in Unload, and cancel any pending abort if the page is canceling.  So, the abort is contained within the callstack of the page, and the thread stays alive and all is well.  No more crazy recycling.

As an aside, it seems this is aggravated by multiple processors, which might explain how it passed testing on the dev's machine.  Although I don't have any proof of this.

posted on Friday, June 10, 2005 11:50:44 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Tuesday, May 17, 2005

The generic Nullable type in Whidbey and the C# language features it supports are great.  They've enabled me to eliminate gobs of code and simplify things incredibly. But, Joe Duffy's latest request for feedback on some of the "syntactic suger" on this feature brings up some interesting points.  He shows us several different equality tests against a nullable value type and asks if the results are intuitive.  I suspect it's a loaded question, because they are not.  Someone with a good understanding of the framework can fairly easily guess the outcome, but deep down I think they would have to admit it's not ideal.

What it boils down to is what is the purpose of the generic Nullable type.  I originally viewed it as helping to code against scenarios where values can be null like databases.  Without nullable value types, you frequently have to check to see whether you're dealing with a null value, and carry around alot of state information, and Nullable simply does that for you.  I think some of the designers of this feature see it much more broadly as a concept to unify the type system in a more performant way than simply being able to cast to object.  This requires some special casing, as is evident by the special case for type parameter constraints on generic types.  Specifying struct will not allow you to use a generic Nullable, despite the fact that it is a value type.  The problem is that the special casing stops short of completely abstracting the concepts.

The problem can readily be seen in Joe's second two tests, boxing a valueless "int?", and using the equality operator "==" against a generic argument that is instantiated as "int?" and has no value.  Both do the opposite of what you might expect if you don't think through it first.  Both do not equate to null.

I pondered this for quite a while before deciding how bad this was (and I'm still not exactly sure).  For me, this is no problem since I understand the issues at work.  For the future of the language, the outcome is not so clear.  The real question is, "Is a value-less value type different from a null reference?"  I don't think it should be.  If you cast a null string reference to object, does it retain its identity as a string? No. Can you cast it back to a string reference? Yes. So, should there be a difference in behavior with nullables? I'll let you decide.  Then there is the issue of how (and if) to effectively fix it.  Adding a conversion from Nullable to object that makes it a null reference instead of a boxed value type might do it, at least for the boxing scenario.  That's probably a pretty naive solution.  I haven't really thought that through.  For the generic situation, Nullable seems to be already special cased to some degree.  It doesn't seem like it would be too much of a stretch to generate appropriate code for Nullable.  The big problem here is that since this happens at JIT time, it's no longer a C# feature. Unfortunately, it's likely too late to address the issue in the in any real way, and that will likely set too much precedence to change it in the future if a solution could be reached.  Perhaps some FxCop rules could find these types of situations and at least flag them so they could be looked at more closely.

posted on Tuesday, May 17, 2005 1:43:10 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Monday, May 02, 2005

We have a big configuration file that maps business logic concepts to their currently used implementations so we can dynamically switch these out.  Under ASP.NET 1.0, all those implementations were in assemblies with "well-known" names (meaning we know them deterministically before compile-time).  Some of the implementations were in nested classes of user controls (we'll save the debate over whether that is a good idea for another time).  This means that the ASP.NET compiler is now in charge of them, which makes their names more difficult to discover.

Luckily, there's someone looking out for us here.  The System.Web.Compilation namespace has all kinds of goodies to help us out here, namely the BuildManager class.  It has a GetType overload that at first glance appears to do exactly what we want, unfortunately it only seems to work if the assembly in question has already been loaded.  This is not usually the case when our configuration code runs the first time.  Instead, you can use a combination of GetCompiledAssembly and good ol' Assembly.GetType.  So now, instead of knowing the assembly name, we need to know the virtual path to the compiled control. So here's a snippet that does generally what I want:

if (!String.IsNullOrEmpty(urlString)) {

    System.Reflection.Assembly assembly = BuildManager.GetCompiledAssembly(urlString);

    theTypeIWant = assembly.GetType(typeString, true);

}

else {

    theTypeIWant = Type.GetType(typeString, true);

}

So, for those types that reside under ASP.NET's control, I add the virtual path to the configuration and leave out the assembly name, and use it's presence to determine if the BuildManager needs to get involved.

Also notice the wonderful String.IsNullOrEmpty method.  Now, if there were only some kind of operator I could use to be even lazier about that check.

posted on Monday, May 02, 2005 11:16:09 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Thursday, April 14, 2005

We're making the move to 64-bit at work, and one of the things that's annoying me the most is all the 32-bit shell extensions that I rely on and didn't realize it.  For example, we use Subversion for source control and use the excellent TortoiseSVN shell extension to work with it from explorer.  Since it's a 32-bit extension, it doesn't show up in the context menu.

There are two workarounds I've found for using these 32-bit extensions.

  1. Use the 32-bit explorer.exe in the SysWow64 directory.  You have to have the "Launch folder windows in a separate process" option turned on, otherwise it will just see that explorer is already running and start a new 64-bit window instead of a new 32-bit process.
  2. Use 32-bit IE.  There's already a shortcut to it by default in the start menu.  Just fire it up and navigate to the filesystem instead of a web page.  Voila! 32-bit extensions start showing up.  I like this method since I don't have to have a bunch of explorer processes for each window.

There's probably a cleaner way of using the 32-bit explorer, but I haven't figured it out yet.

posted on Thursday, April 14, 2005 11:42:18 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Friday, April 08, 2005

Last night, I completed my STDF file parser that I started over the weekend.  I'm pretty happy with it.  I now fully support all of the version 4 spec.  It's written with the Feb. 2005 CTP of Whidbey.  It was primarily a good example of a "real world" application, so I wanted to use it to try out some of the spiffy new features and see how they might fit into an application that was not built for the sole purpose of showing them off.

I'm not entirely sure what I should do with it now that I've built it.  There's not much non-commercial demand for something like that that I am aware of. I might be persuaded to license it for use commercially once there is a "go live" license for Whidbey.  I'm sure it will need some testing to work out issues.  I'll probably build a viewer using it and use it at work to get the kinks out of it.  Anyway, I thought it might be useful to capture how some of these features played out in a real scenario.

Lightweight Code Generation

I used attributes to annotate my record classes in order to define the "on disk" layout of the record.  I originally put the attributes on the properties of the record, but decided to move them to the class level so that I could define the fields that are only used for parsing, and not useful after that.  At runtime, I register the record types with the parser, which uses lightweight code generation to generate converters from unknown records to the concrete records.  If you're not familiar with LCG, it is essentially Reflection.Emit without the overhead of a dynamic assembly, module, or type.  If you're unfamiliar with Reflection.Emit, it is essentially generating executable IL code on the fly, which has may benefits over generating C# or some other language and running it through the compiler with CodeDOM or soemthing else. (If you're not familiar with IL, then this entry is somewhat irrelevant) Having dealt with assembly on lots of different instruction sets from 68000 to x86 to DSPs and microcontrollers, I must say that working in IL is wonderful.  I was a bit worried about the startup time for the parser, but it seems to happen very quickly.  I'll need to experiment more to come up with the overhead of the dynamic code.  Again, I was disappointed in the lack of debugging support in LCG.

Generics

Unfortunately, generics did not work into the equation near as much as I had hoped.  I did find them very useful in places where I would normally pass a Type.  Now I can have a much stronger contract on such methods using generic methods with constraints.  I must say that I love the generic collections.  Working with strongly-typed data without having to generate the classes is very nice.  I also like being able to do a custom sort with an inline anonymous delegate with closure-type semantics rather than have the overhead of a separate class with the code in a different place.

Iterators

Iterators came into play in several places.  In the pull-based record parser, it was simple to implement IEnumerable.  I believe it only took 3 or 4 lines of code.

Delegate type inference

I don't remember what the official name of this feature is, but instead of having to do something like SomeEvent += new EventHandler(SomeMethod), you can just do SomeEvent += SomeMethod.  This seems small, but I appreciated it quite a bit.

Visual Studio Enhancements

These were a very pleasant surprise.  I hadn't thought about the exercise as testing the VS enhancements.  The snippets were the most useful improvement in this project.  Implementing properties didn't make me want to rip my hair out. Just type 'prop' and hit tab and fill in the blanks.  The built-in snippets for things like exceptions, attributes, indexers, etc. all work really well.  The one for implementing an indexer/iterator as a nested class was very interesting as well. It was also easy to create my own snippets.  I also found the strongly-typed resources to be very handy.

The debugging stuff is awesome.  I love being able to mouse over variables and drill into them dynamically.  The little popup windows for exceptions are cool, but don't really give you any good information that you couldn't get before.

I found the refactoring support to be extremely useful, although it was a little hard to shift into the mode of making sure everything compiled all the time.  Before, I would jump around alot in the code, so I would leave things in a state where it didn't compile.  It worked great once I got used to it.

posted on Friday, April 08, 2005 11:22:28 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [3]
# Monday, April 04, 2005

I had two projects over the weekend.  One was to get my bathroom ready to install a new bathtub.  The other was an experimental coding project.

The bathroom preparation went fairly well.  I ripped out all the cabinets and countertops.  I'm glad my brother showed up unexpectedly because that countertop was incredibly heavy.  We also got alot of the carpet ripped up.  I'm going to be putting down new flooring as well.

On the coding front, as an experiment in adopting new features in Whidbey, I implemented a binary file parser for Standard Test Datalog Format (STDF) files.  These files make up 99% of the data we work with at work and that fill our many-terabyte test result database.  We have a fairly complex parser and db loader framework, implemented in C# on 1.x.  It works very well, but it was written early on in our adoption of .net with little knowledge of what the CLR could do for us. So, my experiment was basically to see how new features in Whidbey, along with my now deep experience in .net, could make the parser better.

STDF is record-based.  The spec defines alot of records, and leaves room for user-defined records.  The new parser reads chunks of the file based on the record headers and produces "unknown records".  I define the record layouts using attributes on record classes.  Then, the parser uses LCG (lightweight code generation using DynamicMethod) to generate converters to read the content of the unknown records into the concrete record classes (based on the attributes).  The benefit of using LCG is that record types could be registered or removed on the fly and the GC could collect the generated code.  I could have just as easily implemented it using on-the-fly interpretation of the attributes.  I'll measure and see how the performance works out.  The parser is pull-based, meaning that you ask it for records, or alternately just "foreach" through them using an iterator-based IEnumerable implementation, which is pretty sweet.  On top of the pull-based parser, I built an event-based "processor" where a consumer can register to receive certain record types.  This is the model used in our current parser, but after the XmlReader vs. SAX discussions, I thought exposing the pull-based approach was the right thing to do.

I had a few challenges, which I think represent work for the next version of the CLR:

  • Endian-ness - To my knowledge the framework does not have any mechanism to work with binary data with non-native endian-ness.  The STDF is written in whatever endian-ness is native to the platform, so the parser must adapt.  This was a simple enough problem to solve, but now that most of the other gaps have been filled, endian-ness represents a hole in what the framework provides.
  • Generics' proliferation - Generics are great, and saved me tons of code, but they have not made their way into the rest of the platform where they could be leveraged.  For instance, if I create a RecordField, there's not a simple way to do something like BinaryReader.Read() to actually get one, so I was forced into tons of ifs and switches, and passing Types around to get the work done.  It just didn't feel right.
  • LCG debugging - From what I understand, this was cut from Whidbey.  The workaround for me was to have two generation paths.  One would do LCG, and the other would do traditional Reflection.Emit that could be debugged and PEVerified, etc.  The problem with this was the argument were not aligned between the two.  When doing the traditional Reflection.Emit, ldarg.0 would give you the "this" instance, which didn't exist in LCG.
  • Handler registration (Generics compatibility) - Ideally, record handlers should work with concrete record types, but the way generics work a Converter is not assignable to a Converter even though Mir : StdfRecord.  Of course implementing that would complicate many things.  Interestingly, delegate(UnknownRecord unknownRecord) { return new Mir(); } will satisfy both delegate types.  So, this was just frustrating that Generics didn't help me out in solving my record handler registration problems.  There may be a solution that I'm not seeing here because of my approach.  Any ideas?

Oddly enough, I spent about equal time on both projects, but I seem to have alot more to say about the later.

[UPDATE] I realized that the entry box swallowed some of my generics syntax, so I fixed that, as well as fixing some minor spelling and grammatical errors.

posted on Monday, April 04, 2005 7:57:16 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [7]
# Saturday, March 05, 2005

For the last 2 months, I've been having a blast playing Halo 2 with my buddies online. (except Jeff, who for some reason refuses to get on despite having all the necessary ingredients) It has worked reasonably well, with the exception of some weird incompatibilities with certain people.  If they were the party leaders, I'd get the famous "We are experiencing network issues." message.  Bungie says that this is almost always caused by NAT incompatibilities.  This didn't get in the way too much because we could usually juggle around the party leader until everyone could join.

At first, I attributed this to my out-of-the-norm network configuration.  I have VoIP, so I have several routers.  When I first started playing, I had the VoIP box as the outer-most router, followed by the venerable WRT54G.  So I was doing a double NAT.  The last several weeks, I've been trying to reduce my incompatibilities.  First, I managed to set up the VoIP router inside the firewall (I have AT&T CallVantage).  This involved....absolutely nothing.  It simply worked.  I did set up QoS on the Linksys to ensure the phone would always have enough bandwidth.  This fixed some of my incompatibilities, but other issues began to crop up, like being unable to hear everyone in the game lobby sometimes.

I had been running the wifibox firmware on the Linksys, and decided to upgrade it to a more recent, official firmware. So I upgraded to the Live-certified firmware version.  This didn't help.  Then, after changing nothing, I began having problems joining people that I had never had trouble with before.  Last night, after putting the XBox in the DMZ and still having problems, I got fed up and systematically hunted down the issue.  It was definitely something with the router, which didn't make sense since it was XBox Live certified, and I was having trouble with other people who had the same router and they didn't have problems.  It just didn't add up.

So, I broke down and got a new router, and it fixed everything.  I played until 2:00 this morning without a single problem.  VoIP still works. Wireless works.  All the PCs work.  All without any additional port forwarding, DMZ settings, or other configuration besides the QoS.  It just works now.  And do you want to know the most bizarre part?  I got the exact same model of router.

My conclusion is that the wifi-box firmware screwed it up somehow.  It's the only variable left in the equation.  I'm curious if anyone else who is running 3rd party firmware has had problems like that. Perhaps I'll link to someone that I know does, and see if that generates any discussion from his previous post.

posted on Saturday, March 05, 2005 7:59:19 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3]
# Monday, February 28, 2005

I've been toying with ECMAScript for the past week or so, mostly to update some dynamic user interface elements in a web application we have.  Every time I use it, I'm always impressed at the cool things I can do with it.  It also makes me wish that Microsoft had not messed with it as part of the CLR (JScript.NET).  Their newest version of JScript seems to merely make it into an alternate syntax, and not really take advantage of what it could do as a prototype-driven language with closure support.

I found this article that claims that it is the world's most misunderstood language.  I think I agree.  It's incredibly powerful, but very underutilized because of the confusion surrounding it.  Closures coupled with prototypes can yield some very slick code if you understand what you're doing (and I guess maybe that's the problem).  All you really need is a nice IDE to help you manage your objects and their prototype chain.

I'm also getting more into Python and Ruby, which are also very slick.  I'm just not sure how they should fit into a project.  In general, a project with fewer languages is more maintainable.  But imagine trying to build a skyscraper when your only tool is a hammer that works really well. (OK, maybe hammer is not the best analogy.  Maybe a blowtorch?  I don't know)

[UPDATE] I just had to include this link, where the author shows us how to do lot of different code reuse patterns, including multiple inheritance in Javascript.  My favorite quote:

This large set of code reuse patterns comes from a language which is considered smaller and simpler than Java

posted on Monday, February 28, 2005 10:56:23 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, February 15, 2005
Turns out, my problem with metadata in my pictures was I didn't have the latest RAW support installed.  This was evidently something that Adobe fixed.  Oh well, at least I learned alot about XMP and got more of the Flickr API implemented.  I'll have to come up with a strategy for replacing 1GB of pictures.  I also want to add a few more steps in my processing.  Alot of the pictures turned out a little dull.  I think I won't be in such a hurry to get them uploaded in the future.
posted on Tuesday, February 15, 2005 8:15:06 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Saturday, February 12, 2005

Whew, what an ordeal.  I finally got my Adobe Photoshop CS in the other day, and last night, I created a "Droplet" for batch converting all my raw files from my Canon Digital Rebel. Droplets are a really sweet feature that lets you create a little executable from recorded actions for batch operations.  So, I burned through about 2000 images, creating big jpgs suitable for uploading to my newly upgraded Flickr account.  It was getting pretty late, so I fired up an uploader an set it to upload my 1GB limit (roughly 900 pictures).

This morning, I checked my account, only to find that none of the metadata had been uploaded.  All my images appeared to have been taken on Feb 11, 2005...uh oh.  So, I set out to find out what happened.  Turns out, Photoshop saves the metadata in it's XMP format within the file. XMP is simply an rdf encoding of the data in an XML payload within the file.  It's actually pretty cool, but Flickr doesn't read this data yet.  So I set out to "fix" my pictures, since I can't upload anymore until next month and I have lots more to upload.

After looking at lots of libraries and Adobe's XMP SDK, I decided it would be easy enough to pull the data out myself.  So, I built a little app using my FlickrApi library I just created that would blast through my uploaded pictures, find the corresponding image on my local pc, pull the xmp data out of the file, and set the "date taken" on the Flickr site.  That way, I can at least organize them more easily.

It worked perfectly.  It blasted through about 900MB in less than a minute.  Look for the pictures as I tag them, annotate them, and change them from private to public.  I'll have to see if there's a way to have Photoshop preserve that data next time because I'd really like to have the rest of the metadata available.  I'll probably make my XMP parser available as well if anyone's interested.  As far as I know, there is not another managed implementation available.

posted on Saturday, February 12, 2005 2:59:33 PM (Pacific Standard Time, UTC-08:00)  #    Comments [5]
# Friday, February 11, 2005

Last night, I made alot of progress creating a .NET API for Flickr.  There doesn't seem to be alot of activity on the Flickr.NET project, and I don't like some of the decisions they made in the design.  They seem to have taken a direct wrapping approach to it rather than designing an API that fits in the .NET world.  It's amazing how spoiled you get when you're used to using good API's.  You start saying things like, "You mean I get an array back from this method?  Eww."  or "That's a silly name for that member."  Of course, it's probably appropriate to note that Flickr.NET seems to have been created with an emphasis on uploading.  I'm more interested in sifting through the metadata.

Anyway, I've got most of the functionality that I'm interested in implemented, and I'll be doing the rest a little at a time.  I've used the design guidelines fairly strictly, and I think I'm coming up with something that's really approachable for .NET users, and is still correlated enough to the Flickr API documentation that the parallels are easily discoverable.  You may notice some bizarreness here as I play with the different kinds of integration I have planned for my blog.

posted on Friday, February 11, 2005 7:36:12 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Wednesday, February 09, 2005

This is one of the coolest things I've seen in a long time.  Try my flickr name (marklio).  It lets you drill through my contacts and see their information.  Keeping clicking and you'll see all kinds of cool things.  It really helped me see Flickr as a web service rather than just a photo sharing site.  It gave me alot of ideas.  Things like using tag matching to automatically creating blog entry backgrounds that are relevant to the content.  I'm going to be playing with that.

Their API is really nice and well-documented.  Some people made a .NET wrapper for the API, but it's not up-to-date and I wasn't really impressed with the design.  I'll probably roll my own and just implement the things I need as I go.

 

posted on Wednesday, February 09, 2005 8:52:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, February 08, 2005

In an earlier post, I mentioned that I wished HttpModules were more scriptable in ASP.NET.  You can easily create Controls, Pages, HttpHandlers, and web services all without having to compile anything, but as far as I know HttpModules have to be compiled and registered in the web.config in order to start working.

So, I created a ScriptableHttpModule.  You register it in the config the same as a regular module, but it allows you to create .asmodx files that are compiled and called just like regular modules and give you the same kind of dynamic compilation model as the other .asXx files.  I'm still tweaking it a bit, but it looks pretty promising.

posted on Tuesday, February 08, 2005 6:39:29 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Monday, February 07, 2005

my blogmap I just ran across BlogMap, a cool project that attempts to organize RSS information by location.  It lets you "geo-code" your blog so that it can be browsed and searched by location. It doesn't put the icon exactly where you live since it's by zip code rather than exact address, but it's pretty close.  My house is just under the little blob in the lower-right-hand corner (it's a park).

Anyway, once you've geo-coded your site, your feed is added and can be found through a somewhat clumsy (at this point) map, or by searching for a city.  The system needs some work to be really useful, but it's pretty dang cool!  Chandu has done some excellent work on it and I hope he continues to make this into an extremely useful tool.

It really adds alot of scope to content to know where it's coming from.


posted on Monday, February 07, 2005 9:50:16 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, February 01, 2005

I use a dynamic DNS service to host my DNS entries.  Back when I started hosting my blog, I was using marklio.dnsalias.com instead of www.marklio.com.  Since I still have that name pointed at this server, some search engines still have not let go of that old dns name.  In a few months, I'm going to have to move this to a hosted environment (it's currently just running on a server at my house), at which point the two names will point to different places.  I've been strategizing the move for a while now, and I've just implemented step 1, which is to ween http traffic from the old alias.

I've done that by implementing a permanent redirect for any request to the old name.  I created a little IHttpModule that hooks into the BeginRequest event, and looks for the old dns name.  If found, it returns the status 301 (permanent redirect), and dynamically builds the appropriate request to www.marklio.com.  It seems to work well.  Hopefully the search engines will handle this appropriately.

For any interested, here's the relevant code:

public class RedirectModule : IHttpModule
{
#region IHttpModule Members
public void Dispose()
{
//do nothing
}

public void Init(HttpApplication context)
{
context.BeginRequest += new EventHandler(context_BeginRequest);
}
#endregion

void context_BeginRequest(object sender, EventArgs e)
{
HttpApplication application = (HttpApplication)sender;
HttpContext context = application.Context;
Uri url = context.Request.Url;
if (url.Host == "marklio.dnsalias.com")
{
context.Response.AddHeader("Location",
url.ToString().Replace("marklio.dnsalias.com", "www.marklio.com"));
context.Response.StatusCode = 301;
context.Response.End();
}
}

The whole pluggable nature of ASP.net is really sweet, although this really got me wishing there was a way to "script" this class rather than have to build it and add it to the web.config.

posted on Tuesday, February 01, 2005 6:29:04 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Friday, January 28, 2005

I've had my performance hat on at work for the last couple of weeks, optimizing memory and CPU performance ahead of the deployment of a new system.  I've been pretty pleased with the improvements we've made.  We've got alot of in-process caching that increases speed incredibly, but has a hefty footprint.  We were able to get that down quite a bit with some pretty clever ideas.

One of the major pushes was to reduce references, especially boxed references.  That has proven a very effective strategy to reduce memory.  I wish I could just throw generics at the boxing problem, but we'll have to wait a while longer for that.  Another strategy was to take a close look at the data structures that hold the cached data.  I rolled my own AVL tree implementation for a date-based index of the cached data, and I was able to improve both the CPU performance and it's footprint substantially.

We also implemented what I've called a local string intern pool.  We've got alot of redundant string data, and we've used String.Intern in the past fairly successfully, but some analysis of our data revealed some local redundancy that we could use to reduce a 4-byte reference to a single byte that acts as an index into the local string pool.  This will also help keep our memory from bloating on the move to 64-bits when all the references in the system double in size.  (although at that point, we should have loads more memory to work with)  This as the added benefit of eliminating the need for string comparisons in running searches against the cache.

All in all, we were able to reduce the footprint of the cache by about 40%.  I thought someone googling for ways to reduce memory footprint and increase performance might benefit some from this information.  Oh, and so Google puts this in the right context, this is related to: ASP.NET, CLR, C#, DotNet.

posted on Friday, January 28, 2005 8:57:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Tuesday, January 18, 2005

Well, I just completed the upgrade to dasBlog 1.7.  (dasBlog is the software that runs my blog) I've been pretty excited about it because it will address several woes I've been having, not the least of those being referral spam.  Omar, Scott, and a host of others deserve a big pat on the back for their contributions.  Thanks, guys. Check out Scott's announcement for more links and details on new features if you're interested...you're probably not.

The upgrade was complex since it attempts to do alot of fixing and cleaning of old data, but everything went fairly smoothly once I figured out I had accidently wiped out my custom theme in the move.  One nice side-effect is that my really old entries from when I was running BlogX are editable again.  Some hadn't even been viewable.  I immediately took the opportunity to clean up my categories which had gotten way out of hand.  I still have too many, but it's actually manageable now.  I'm looking forward to playing with the new features and seeing the performance improvements.  I'll probably be doing alot of upgrading of other blogs I manage tomorrow after I've had a chance to work any and all kinks out of the system.

posted on Tuesday, January 18, 2005 6:28:30 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, January 06, 2005

I just got finished debugging a crazy problem with some C# code.  This particular code is part of an application that deals with file spooling, and it deals with alot of ambiguous file locking issues by always locking greedily.  Any spooling task that is unable to gain an exclusive lock on a file simply assumes another thread must be working on the file and it leaves it alone.

The developer used a pattern that handles System.IO.IOException and assumes that any IOException must be because of a concurrency issue.  The problem with this is that there are lots of calls that can cause file IO that you may be unaware of.  In this case, there was an assembly binding issue caused by some plugin-style dynamic loading that was throwing System.IO.FileLoadException (Which actually seems to be out of place as a subclass of IOException since it is specific to assembly loading and not IO in general).  The pattern in the code was assuming that, in general, any IOException was not an exceptional event and signified another condition. So, the task never did any work and never reported the exceptional condition.

Eric Gunnerson wrote a nice overview piece on exceptions in C# on msdn.  Some of his guidelines are

Catch the most specific exception

If your code needs to recover from some exceptions, make sure to catch only those exceptions. If you catch more general ones, it's more likely you'll mistakenly swallow exceptions you don't want to swallow.

Only swallow if you're sure

This is really a corollary of the previous guideline. When you swallow an exception, your saying that you understand all cases where this exception could arise, and that the recovery code you're writing handles all of those cases.

Use lock or using if applicable

If you can use the lock or using statements, use them. They make the code more readable and make it more likely you'll do the right thing.

Wrap exceptions if applicable

If you can add additional information to an exception, by all means do so. If I pass a parameter on to another function, it might be useful for me to add additional information about the parameter.

The first two here are obviously directly applicable to the scenario, and would have at least raised some flags if I had first checked all the IOExceptions to see what they encompass.  The solution for me was to take a look at the pattern and reduce the scope of the IOException catches to only those statements that I expected might throw the exception for locking.

[UPDATE] I wanted to note that the exception handling pattern worked great until it was extended by me to a more complicated scenario involving dynamic assembly loading.

posted on Thursday, January 06, 2005 11:05:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Tuesday, January 04, 2005

I've been a proponent of DLP over other display technologies like CRT, plasma, and LCD (direct view and projection) for a while.  I've gotten to the point where I can somewhat easily tell the technology from a glance at the screen alone. All the current technologies have distinguishing characteristics:

  • CRT
    • Direct view - The screen is obviously the front of a CRT
    • Rear Projection - The image is more fuzzy and floor models usually have horrible convergence issues
  • Plasma - Beautiful, bright image.  Super thin
  • DLP - Beautiful bright image, but rainbow effect if you shake your eyeballs really fast on high-contrast images (caused by the color wheel)
  • LCD
    • Direct View - Thin, but not as thin as plasma, and duller than plasma.  Backlit
    • Rear projection - No convergence issues, no rainbow effect, not as bright as DLP

I was browsing in Best Buy the other day and ran into a JVC set whose technology I could not place.  It was obviously rear-projection and it was as beautiful and bright as DLP and plasma, but had no rainbow effect.  I was stumped.  I felt better when I examined it further and found it was a technology I was not familiar with.  It was HD-ILA.  It's based on what they call a direct-drive image light amplifier chip.  It's a reflective technology like DLP, but they use a 3-color process rather than a color wheel, which explains the missing rainbow effect.  It also has a price that's comparable to the DLP sets. Obviously, you could acheive the same results with 3 DLP chips, but at a higher cost.  A further benefit is that they can pack the pixels' reflective surfaces closer together, so more of the lamp light is reflected, so the image can be brighter.

There are a few things I'm still trying to figure out.  They are only using one lamp for all three colors, which may be possible as a result of using the lamp's brightness more efficiently, but I'd still like to break one open and take a look.  JVC's presentations don't have a picture of the insides.  From the presentation, it sounds like they are not using hinges, but perhaps controlling the reflective properties of the pixels themselves to turn pixels on and off. I'm still learning about the technology, but it seems this new player may be the way to go on a new set, which may be closer than I would like, given the problems I've been having with my old Toshiba set lately.

[UPDATE 1:42pm] The technology behind this is LCOS, which is a variant of LCD that has evidently become more manufacturable recently. So, the RGB sub-pixels are on the surface itself and a single lamp simply reflects off of the surface and is focused onto the screen.  This means it avoids convergence issues entirely, which was a concern of mine.  Another perk is that, if the subpixels are aligned the same as LCD, ClearType should work with this technology.

[UPDATE 1:57pm] Here's a review of the set I saw.  Pretty good and fairly humorous.

[UDPATE 1/5/2005 12:19pm] After some more reviews and research, it appears that JVC's D-ILA implementation of LCOS uses three chips, and I would think that eliminates my excitement about ClearType and a convergence problem free set. I'd still like to see the physical configuration. Most reviews recommend it, but the numbers seem to indicate that the current line is targetted at the average consumer rather than at videophiles (super-bright, high black levels, etc).  Unfortunately, I always would like videophile performance at average consumer budget.

posted on Tuesday, January 04, 2005 10:10:39 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Monday, December 06, 2004

I recently began putting just about everything, especially at work, into a revision control repository.  I find that I feel much more comfortable toying with changes in a presentation or document if I know I can easily get the previous version of it.  I noticed Martin Fowler seems to be headed in the same direction and has come to the same conclusions I have about revision control support in applications.  For instance, right now it's very difficult to tell the difference between two version of a presentation.

I noticed the latest version of FxCop has an option to save the file in a format that is friendly for version control.  Perhaps the new XML-based office formats will give us similar capability?

[Update 12:35] I forgot to mention Subversion, without which, the above would be very painful.  I'd also be interested in other people's revision control story.  How much has it seeped into your computing life beyond coding?

posted on Monday, December 06, 2004 9:16:56 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, November 30, 2004

Joe has a cool entry on using a “Stream” concept to approach algorithmic sequences.  I realized it's particular beauty when I saw his Fibonacci example, which blows the doors off some code I barfed out recently for this type of problem.

posted on Tuesday, November 30, 2004 2:11:33 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Sunday, October 17, 2004

Recently, I started having problems browsing in 64-bit XP.  All command line utilities work fine, as well as serving things with IIS, but anything network-related in the UI seems to go stupid.  Accessing the network connections folder, or the connections tab of internet settings just hangs.  If I boot in safe-mode everything works fine, which leads me to believe that the firewall has something to do with it.  Of course, I can't seem to turn it off because it isn't enabled in safe mode, and won't respond in regular mode.

I thought I'd get this out here to see if anyone googling had a similar problem, and perhaps had a suggestion if they'd fixed it.

posted on Sunday, October 17, 2004 2:00:51 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Sunday, September 19, 2004

During some portion of my free time, I've been experimenting with coding a hybrid weblog, wiki, and general content management system I have called ChickenScratch.  This has been the project I've used to familiarize myself with the new features in Whidbey.  My goals thus far have been:

  • Maintain socialness of a weblog (link tracking like referrals, pingbacks, and trackbacks)
  • Maintain the ease of use, openness, and automatic cross referencing features of a Wiki (You don't have to worry with markup)
  • Maintain the organizational/security capabilities of a content management system (Role-based ACLs, user profiles, deeply hierarchical content structure)

This project was the result of some frustration with how hard it would be to add these features to FlexWiki.  I wanted a system that would be just as suitable for holding personal brainstorming ideas not suitable for public consumption as for holding blog entries, or collaborating with my friends on coding projects or film scripts.

I've succeeded on these goals to a great extent, but one of my biggest problems has been deciding on grammar for the content.  I think one of the most powerful features of a Wiki is when you mention a topic and that topic gets cross referenced “accidentally”.  At work, we use a Wiki to help us track features, bugs, and changes to our API (which we treat differently than features) and we have lots of these happy accidents, but only because we are disciplined on how we name topics.

Most Wikis create topics by concatenating words together (or removing the space between them) like MyVacation2004.  The downfall of this (I've found) is that non-coders find this unnatural and annoying.  Other wikis use some syntax to identify a topic such as square brackets like [My Vacation 2004].  The problem here is that this removes the ability to have those happy accidents that I like so much.

So, I've decided to not put the burden of topic identification on the user (the person entering content), but rather on the system itself.  I think it makes a much more interesting problem to solve.

posted on Sunday, September 19, 2004 2:34:57 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [1]
# Wednesday, July 28, 2004

Today I needed an AVL tree implementation for some time-based indexing of cached data.  Unfortunately, I can't use the wonderful-looking OrderedDictionary in the PowerCollections project since .NET 2.0 is not due out for a while.  I searched around without any luck, so I rolled my own.  It's been a while since I messed with tree-based data structures.  I was pretty pleased with its performance characteristics, even with millions of records.

I'm thinking about releasing it as a short-term alternative for people having to wait on 2.0 and Peter Golde's PowerCollections project, so I thought I write a short blurb about it and link to some appropriate sites to generate some referrals and feedback.  Anyone interested in using it?

For those unfamiliar with AVL trees, it's a self-balancing binary search tree. Its characteristics that are of interest to me are:

  • self-ordering - values are stored and can be retrieved in order simply by traversing the tree, something a hashtable cannot give you.
  • self-balancing - This ensures that search times are O(log(n))
  • fairly straightforward to implement, as opposed to Red-Black, or other self-balancing BSTs

Again, if you're interested in using it, leave me some feedback and maybe I'll make it a sourceforge or GDN project.

posted on Wednesday, July 28, 2004 5:23:06 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [7]
# Wednesday, June 23, 2004

I had a request from Scott Hanselman (who has been vocal on the subject of ViewState on many occasions) to share my approach to file-based ViewState persistence that I mentioned in my previous post.  Feel free to comment on my approach.  I need to get code formatting set up.  Pasting from Visual Studio is a pain.

Here's my overrides of my base class page's LoadPageStateFromPersistenceMedium and SavePageStateToPersistenceMedium:


protected override object LoadPageStateFromPersistenceMedium() {
     return _viewStatePersister.LoadViewState();
}

protected override void SavePageStateToPersistenceMedium(object viewState) {
     _viewStatePersister.SaveViewState(viewState);
}


Doesn't tell you much, except I'm delegating persistence to a ViewStatePersister, which looks like (with some things renamed to protect the innocent):


public abstract class ViewStatePersister {
     public ViewStatePersister(BasePage page) {
          _page = page;
     }
     protected BasePage Page {
          get {return _page;}
     }
     BasePage _page;
     public abstract object LoadViewState();
     public abstract void SaveViewState(object viewState);
}


Delegating this responsibility to a separate class gives me finer control over how the persistence happens, as well as modularizing that functionality.  Naturally, I have a class that wraps the default ViewState persistence functionality (which is fairly uninteresting), as well as a FileBasedViewStatePersister.  It extends the DefaultViewStatePersister and harnesses that existing behavior to store a single Guid in the __VIEWSTATE field used to uniquely identify the request.  It's methods of interest look like: (You might look at SaveViewState first so Load makes more sense.  I'm not about to screw with the formatting again to re-order them.)


public override object LoadViewState() {
     object viewState = base.LoadViewState();
     if (viewState != null) {
          Guid guid = (Guid)viewState;
          LosFormatter formatter =
new LosFormatter();
          using (FileStream fileStream = new FileStream(CreateOfflineViewStateFilePath(guid), FileMode.Open, FileAccess.Read, FileShare.None)) {
               viewState = formatter.Deserialize(fileStream);
          }
     }
     return viewState;
}
public override void SaveViewState(object viewState) {
     //create a guid for this viewstate
     Guid guid = Guid.NewGuid();
     LosFormatter formatter =
new LosFormatter();
     using (FileStream fileStream = new FileStream(CreateOfflineViewStateFilePath(guid), FileMode.CreateNew, FileAccess.Write, FileShare.None)) {
          formatter.Serialize(fileStream, viewState);
     }
     //trick the regular system into thinking all it needs to save is the guid
     base.SaveViewState(guid);
}


The appropriate persister is created with a call to a virtual CreateViewStatePersister() method, which a page can use to create the persister of its choice.  As I said in my previous post, I was dissapointed that I had to use the LosFormatter rather than the BinaryFormatter.  The downside of this approach is the possibility for creating LOTS of files, as opposed to creating a single file per session.  But I already have a file cleaning mechanism in place that cleans up files created by my graphing library, which creates more files.

posted on Wednesday, June 23, 2004 11:02:08 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]

On a recent iteration of a project at work, we were analyzing our cache usage for several data analysis pages we have using ASP.net.  The usage of these pages evolved as we updated them with features, and in a pinch we were forced to turn off caching because of the hit to memory we were taking.  As we loaded more and more data into our database, and we had more and more users, more stuff was being cached, but we were not seeing benefits from the caching across users because each user was looking at different data.  Turning off the cache caused individual users to take a big performance hit when reloading the same report twice or making minor adjustments to the options, but we just couldn't justify the memory usage for that small percentage.

I decided the solution was a multi-level cache, where objects expired to lower levels, like from memory to high-speed disk, to slower disk, to db (really, I only want the memory and disk levels, but there's no need to limit it to that).  Unfortunately, there is no mechanism for injecting new behavior into the System.Web.Caching classes to accomplish that, which is a shame because they have already implemented the expiration and dependency code that would be the most complex part.

So, I'm faced with the possibility of creating my own caching framework, or tricking the System.Web.Caching classes to do my bidding using the existing mechanisms like dependencies (which is a possibility).

On another note, I implemented file-based ViewState quite successfully.  It was much more simple and straightforward to address issues like a user with multiple windows than any article led me to believe.  In doing so, I noticed some VERY annoying things about that ViewState persistence mechanism.  The normal behavior uses the LosFormatter to serialize the ViewState to base64 to be put inline with the html.  This is fine for storing ViewState inline, but if I'm serializing to files, I'd rather have the speed and efficiency of the BinaryFormatter.  The problem is that LosFormatter is special and doesn't play by the same rules as the other formatters in the framework (BinaryFormatter, etc).  Most of the built-in controls use Pair and Triplet to store their data in ViewState, but they aren't marked with SerializableAttribute which means that BinaryFormatter can't serialize them!  So I had to continue to use the LosFormatter, which bloats data horrendously.  I hope some of this is cleaned up for ASP.net 2.0.

posted on Wednesday, June 23, 2004 10:02:47 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [2]
# Monday, June 07, 2004

I was adding some complex usage logging in a data abstraction layer today, and I wanted to minimize its impact on performance.  I was dreading having to manage worker threads, then I remembered the ThreadPool. It was so easy to shuffle off my logging to a managed worker thread.  That saved me alot of trouble.  One of those “pit of success” moments for sure.

(I would have rather linked directly to Rico's site, but Brad had a higher google rank.

posted on Monday, June 07, 2004 2:55:25 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [2]
# Tuesday, June 01, 2004

I'm experimenting with a project which will need some complicated file parsing abilities and I don't want to go the XML route for various reasons.  Therefore, I am looking for a parsing framework built to take advantage of the CLR.  Everything seems to be a port of some archaic C library, or some Java framework that's been modified to produce C# code.  I'm used to being able to do a simple Google search to find the kinds of libraries I need, but I'm getting very few good hits for this.  I've only built a few pieces of code that really qualify as parsers over the years, so maybe its my inexperience that's my problem, but it just seems like it ought to be easier to find a good tool.

We've had enough time for the CLR tools to take on their own identity and take advantage of the CLR rather than remain lagging clones of their Java or C++ counterparts.  I see this in almost every space.  There just seems to be a huge hole for file parsing.  Maybe most CLR developers have embraced XML as the one and only file format.  But, when it comes to human-edited files or content, XML is pretty cumbersome and bloated.

For now, I've settled on Grammatica, which is the most straightforward (and working) parser generator thus far.  It's written in Java, but that's not a big deal.  The problem is that its output is Java-centric, and only modified slightly to be C# code.  It's got callbacks instead of events to handle tokens or products.  I found myself modifying the output (a no-no in code generation) to make it simpler before I realized I'd just be doing it again if I changed my grammar.  So, I'm really frustrated and just ranting, but I should be able to mold it to my purposes.

posted on Tuesday, June 01, 2004 5:50:40 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Saturday, May 22, 2004

The other day, I griped about phone numbers and suggested a DNS-mapping for phone numbers. Looks like this is underway on several fronts.  Here's a slashdot story that points to several of these projects.

posted on Saturday, May 22, 2004 3:11:02 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Thursday, March 25, 2004

I did a bunch of profiling at work today.  I use the Allocation Profiler for memory profiling.  It does a great job of letting you see your allocations visually in several very useful forms.  Today, I took a second look at nprof, a more CPU-oriented profiler that attempts to give you an idea of what calls are taking up the most time.  When I first looked at it almost a year ago, it was not useable at all.  Now, it seems to only have a few problems with GUI apps, and multi-threaded/multi-AppDomain projects.

The app I spend most of my time on at work is a data analysis application.  Data is loaded into the DB from a spool, so 99% of the app is “read-only” access to the DB.  This means I can optimize the heck out of it since I'm not concerned with transactions and such.  The downside is that i must optimize the heck out of it.  A common query can return millions of rows from the database, so taking even a few milliseconds out of the loop can save alot over that many iterations.

My first insight is: Don't use NUnit to run performance tests.  At first, my unit tests seemed to be a very convenient location to put performance tests.  That is a bad idea.  The perf numbers scared me to death.  It appeared as thought I had created a performance monster, and not the good kind of monster.  Turns out, NUnit goes to alot of trouble to isolate the test runs in separate AppDomains so it can unload them easily and you don't get undesired interaction.  This seems to add a great deal of overhead, probably in marshaling across the boundaries.

My second insight is: Use the Allocation Profiler (or some other profiler that lets you look at memory usage)!  I don't want to go into a big discussion on how to use it.  It's pretty straightforward.  If you interested in specifics, leave me a comment.  Remember, allocation in the CLR is cheap, but excessive garbage collection can be costly.

My last insight is: Use nprof (or another profiler that gives you CPU info).  nprof gives you a good idea of where your CPU bottlenecks are.  I was really able to get a good idea about what needed attention.

In a particular performance test of my data access layer, I was pulling over a million rows.  I was able to cut memory consumption in half, and increase my speed by a factor of 10 by using information gleaned from the profilers.  This, or course, means the code was pretty crappy.  He he, just kidding.  Like I said, it's a very tight loop, and a little goes a long way.

If you want a good reference for performance-related stuff, check out Rico Mariani's blog.  He appears to be THE Microsoft performance guy.

posted on Thursday, March 25, 2004 6:41:23 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Wednesday, March 24, 2004

With Mozilla (FireFox, Netscape, etc) becoming a more viable alternative to Internet Explorer, we've had a push for Gecko (the Mozilla rendering engine) compatibility at work, so we've been going back and redoing some things to layout correctly.

We noticed a few things that just weren't working.  When we looked at the HTML source using FireFox, I was intrigued by the fact that all my div tags had become tables.  Turns out, the ASP.NET browser capability detection identifies Gecko as a down-level browser, and gives you an Html32TextWriter rather than a regular HtmlTextWriter.  And, sure enough, digging around using the Reflector confirmed that div tags are replaced by tables when using Html32TextWriter.

This guy, has a solution.  Just thought some of you would be interested.

The browser capabilities section is extremely powerful and insulates you from alot of headaches, but the default configuration is very annoying.  Take a look at your machine.config and you'll see what I mean.

posted on Wednesday, March 24, 2004 3:14:30 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Saturday, March 13, 2004

At work, we create weekly reports to let our boss know what we're doing.  The important stuff then get pushed up the chain of command so that upper management knows what we're doing.  When I first started at Motorola, I automated the system our group used so that it was a web-based app that emailed the report at the end of the week.  It went through several iterations and feature additions over the years.

Recently, after I began blogging both at home and at work, I noticed a striking similarity between blogging and entering our weekly reports.  We scrapped the old system and built one that uses the RSS syndication of our blogs to publish the weekly report.  We've used it the last several weeks and it's a fantastic hierarchical model.

This week, we began to see the real power of this approach.  We recently wrote some software to aid in embedded memory bitmapping on devices we test. I posted an image of a memory defect the software found to my bog, and it was automatically aggregated, and emailed up the chain as part of the report.

I'm trying to get management to let us clean up the report software and sell or license it because it's pretty sweet.

posted on Saturday, March 13, 2004 8:15:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, March 11, 2004

I finally had some “spare” time at work today, so I re-worked the layout of our huge data analysis web site using CSS styling and layout rather than the original old skool methods we used when we first built it many years ago.  The original design was done back in about 2000, and I didn't have alot of time to do it, so I just threw it together.

I've had a fair amount of experience in the last couple of years with all-CSS layouts.  My favorite place to learn is CSSZenGarden.  Anyway, the benefits of separating content from presentation are well known, and CSS is not the only method of decoupling, so I won't really go into why I was doing it.  Although I did come to the conclusion that CSS is lacking some things that would make what I was doing much better.  Anyway, I did do some searching for some examples when I couldn't eliminate a stray pixel spacing here and there.

What I found was alot of people raving about having no tables on their site.  Personally, I think that's a little silly.  Not to say I'm not impressed by what people can do without tables, but that would be like me saying, “I built a house, but I didn't use any bricks” and expect people to be impressed.  Certainly, there are many houses built without bricks.  Very functional, beautiful, well-built houses.  But you can't build a brick house without bricks.  Most people who brag about not having any tables don't realize why they would want to banished tables other than they read someone else bragging about being table-free.

The point is not to rid the earth of the table because it's evil. Tables are perfectly fine for identifying content as being part of a table, after all, you don't replace all the images on your site with thousands of tiny DIV tags perfectly sized, colored, and positioned to replace every pixel.  That sounds silly, but I've seen people do the equivalent of that in trying to replace a table in the quest of table-less HTML.  The point is to organize the structure of the content so that it can be interpreted as simple data, and can then be “styled“ for presentation.

Well, looking back over this entry, I find it to be one of the most lame entries ever.  Oh well, I feel better having griped about that.

posted on Thursday, March 11, 2004 9:04:16 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Thursday, February 26, 2004

I know what you're thinking...“Too many posts!  He's mad!  He's beating the pants off of Jenkies!“  Well, this one's technical.

If you're like me, you've always been annoyed at the inherent coupling between pages that pass data to each other through a Server.Transfer() call.  I think it leads to poorly designed, tightly-coupled workflows, and tends to lead people to taking shortcuts or breaking the model to make their stuff work.

Until now, I've tried to minimize this issue by giving my base class page a TransferData property typed as an object so every page can use it to pass data.  This has its own problems.  For instance, if you call Server.Transfer twice in the same Request and use Context.Handler to retrieve the transferer (which seems like a hack to me), it's the first page, and there's not a reference to the second page in the call chain.

I now have discovered a nifty little storage location for putting things like this... HttpContext.Items. It's just an IDictionary that stores stuff in the context of the current request.  Since you can always get the current HttpContext with HttpContext.Current (an implementation worth taking a look at with Reflector), you can get to it from anywhere, regardless of whether you have a reference to ANY page.  It works even if you're passing control to or from some handler that's not even a page.

Think of it as the analog to Application state, or Session state, but for the current Request only.  It automatically decouples your pages because they only have to agree on a common key.

UPDATE: I should note that it was my co-worker Casey Marshall who initially brought the property to my attention.  Thanks, Casey

posted on Thursday, February 26, 2004 10:52:25 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, January 22, 2004

In the past year, I've learned an incredible amount about software design.  It seems that the amount of data I take in increases exponentially.  I suppose this is standard in most fields of study, it's still surprising to me.

My recent learnings have involved design patterns.  I've spoken about them before so I won't go into huge detail.  I've recently come across a general pattern known as Inversion of Control (IoC for short).  It is a pretty broad concept and is frequently used by developers, most of whom would tell you it's common sense after you explained it.  The real benefit is formalizing and categorizing it.  This enables you to talk about approaches for solving problems in a common way. For example, try getting some musicians together to write a song, but do not allow them to refer to musical terms.

The particular pattern I've just recently been aquainted with is Dependency Injection, where you define modules of software as components that define their dependencies to other components as part of their declaration rather than tieing them to some specific implementation or third party factory.  I've been playing with PicoContainer, a project that provides a framework for Dependency Injection.  I'm still trying to get my hands around it, and trying to see how this pattern fits into some of the projects I'm working on.

posted on Thursday, January 22, 2004 5:44:04 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Friday, December 12, 2003

I just read a fanstasic article on designing for performance by Rico Mariani.  He's the .NET runtime performance guy at Microsoft.  It's a great article.

posted on Friday, December 12, 2003 6:49:43 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Monday, September 22, 2003

[Technical]

After answering Peter's comment to my last entry on the threshold of complexity required before moving to an HttpHandler-based web application, I've done some more thinking.

I originally mentioned that threshold is reached when you start thinking about the application in terms of a flow or state machine, but that test would indicate everything but the most basic "list-of-standard-reports" web application would merit this approach.  While it is true that all of them would benefit from the approach, I think it's important to think about Microsoft's vision for ASP.net, which was to speed development, and provide a more power YET FAMILIAR framework for building web applications.

It's the familiar part that's most important.  People have been creating Handler-based solutions all along using ISAPI filters, and .NET reduces the complexity of this approach IMMENSLY using IHttpHandler, even thought it is significantly more complex than the classic ASP page model.  .NET reduced that effort as well.

I think .NET brings simplification to both approaches.  Most people creating handler-based approaches already have a significant amount of conceptual design in their current systems. and probably wouldn't use any canned solution for such an approach.  However, classic ASP developers were overdue for improvement.

I believe most applications would benefit from a handler-based solution, but without a standard coding model for this, most developers will feel more comfortable with the page-based model.  What's worse, Microsoft's suggested implementation still relies on the Page class, without explaining fully how to handle complex user interaction in this hybrid environment.

I'm afraid I haven't answered many questions.

[UPDATED 9:15 same day after reviewing dasBlog source code]

When should you stray?  As soon as you're ready.

posted on Monday, September 22, 2003 6:05:45 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [2]
# Sunday, September 21, 2003
warning, it's technical
posted on Sunday, September 21, 2003 3:23:43 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [4]