Monday, April 04, 2005

I had two projects over the weekend.  One was to get my bathroom ready to install a new bathtub.  The other was an experimental coding project.

The bathroom preparation went fairly well.  I ripped out all the cabinets and countertops.  I'm glad my brother showed up unexpectedly because that countertop was incredibly heavy.  We also got alot of the carpet ripped up.  I'm going to be putting down new flooring as well.

On the coding front, as an experiment in adopting new features in Whidbey, I implemented a binary file parser for Standard Test Datalog Format (STDF) files.  These files make up 99% of the data we work with at work and that fill our many-terabyte test result database.  We have a fairly complex parser and db loader framework, implemented in C# on 1.x.  It works very well, but it was written early on in our adoption of .net with little knowledge of what the CLR could do for us. So, my experiment was basically to see how new features in Whidbey, along with my now deep experience in .net, could make the parser better.

STDF is record-based.  The spec defines alot of records, and leaves room for user-defined records.  The new parser reads chunks of the file based on the record headers and produces "unknown records".  I define the record layouts using attributes on record classes.  Then, the parser uses LCG (lightweight code generation using DynamicMethod) to generate converters to read the content of the unknown records into the concrete record classes (based on the attributes).  The benefit of using LCG is that record types could be registered or removed on the fly and the GC could collect the generated code.  I could have just as easily implemented it using on-the-fly interpretation of the attributes.  I'll measure and see how the performance works out.  The parser is pull-based, meaning that you ask it for records, or alternately just "foreach" through them using an iterator-based IEnumerable implementation, which is pretty sweet.  On top of the pull-based parser, I built an event-based "processor" where a consumer can register to receive certain record types.  This is the model used in our current parser, but after the XmlReader vs. SAX discussions, I thought exposing the pull-based approach was the right thing to do.

I had a few challenges, which I think represent work for the next version of the CLR:

  • Endian-ness - To my knowledge the framework does not have any mechanism to work with binary data with non-native endian-ness.  The STDF is written in whatever endian-ness is native to the platform, so the parser must adapt.  This was a simple enough problem to solve, but now that most of the other gaps have been filled, endian-ness represents a hole in what the framework provides.
  • Generics' proliferation - Generics are great, and saved me tons of code, but they have not made their way into the rest of the platform where they could be leveraged.  For instance, if I create a RecordField, there's not a simple way to do something like BinaryReader.Read() to actually get one, so I was forced into tons of ifs and switches, and passing Types around to get the work done.  It just didn't feel right.
  • LCG debugging - From what I understand, this was cut from Whidbey.  The workaround for me was to have two generation paths.  One would do LCG, and the other would do traditional Reflection.Emit that could be debugged and PEVerified, etc.  The problem with this was the argument were not aligned between the two.  When doing the traditional Reflection.Emit, ldarg.0 would give you the "this" instance, which didn't exist in LCG.
  • Handler registration (Generics compatibility) - Ideally, record handlers should work with concrete record types, but the way generics work a Converter is not assignable to a Converter even though Mir : StdfRecord.  Of course implementing that would complicate many things.  Interestingly, delegate(UnknownRecord unknownRecord) { return new Mir(); } will satisfy both delegate types.  So, this was just frustrating that Generics didn't help me out in solving my record handler registration problems.  There may be a solution that I'm not seeing here because of my approach.  Any ideas?

Oddly enough, I spent about equal time on both projects, but I seem to have alot more to say about the later.

[UPDATE] I realized that the entry box swallowed some of my generics syntax, so I fixed that, as well as fixing some minor spelling and grammatical errors.