Monday, April 04, 2005

I had two projects over the weekend.  One was to get my bathroom ready to install a new bathtub.  The other was an experimental coding project.

The bathroom preparation went fairly well.  I ripped out all the cabinets and countertops.  I'm glad my brother showed up unexpectedly because that countertop was incredibly heavy.  We also got alot of the carpet ripped up.  I'm going to be putting down new flooring as well.

On the coding front, as an experiment in adopting new features in Whidbey, I implemented a binary file parser for Standard Test Datalog Format (STDF) files.  These files make up 99% of the data we work with at work and that fill our many-terabyte test result database.  We have a fairly complex parser and db loader framework, implemented in C# on 1.x.  It works very well, but it was written early on in our adoption of .net with little knowledge of what the CLR could do for us. So, my experiment was basically to see how new features in Whidbey, along with my now deep experience in .net, could make the parser better.

STDF is record-based.  The spec defines alot of records, and leaves room for user-defined records.  The new parser reads chunks of the file based on the record headers and produces "unknown records".  I define the record layouts using attributes on record classes.  Then, the parser uses LCG (lightweight code generation using DynamicMethod) to generate converters to read the content of the unknown records into the concrete record classes (based on the attributes).  The benefit of using LCG is that record types could be registered or removed on the fly and the GC could collect the generated code.  I could have just as easily implemented it using on-the-fly interpretation of the attributes.  I'll measure and see how the performance works out.  The parser is pull-based, meaning that you ask it for records, or alternately just "foreach" through them using an iterator-based IEnumerable implementation, which is pretty sweet.  On top of the pull-based parser, I built an event-based "processor" where a consumer can register to receive certain record types.  This is the model used in our current parser, but after the XmlReader vs. SAX discussions, I thought exposing the pull-based approach was the right thing to do.

I had a few challenges, which I think represent work for the next version of the CLR:

  • Endian-ness - To my knowledge the framework does not have any mechanism to work with binary data with non-native endian-ness.  The STDF is written in whatever endian-ness is native to the platform, so the parser must adapt.  This was a simple enough problem to solve, but now that most of the other gaps have been filled, endian-ness represents a hole in what the framework provides.
  • Generics' proliferation - Generics are great, and saved me tons of code, but they have not made their way into the rest of the platform where they could be leveraged.  For instance, if I create a RecordField, there's not a simple way to do something like BinaryReader.Read() to actually get one, so I was forced into tons of ifs and switches, and passing Types around to get the work done.  It just didn't feel right.
  • LCG debugging - From what I understand, this was cut from Whidbey.  The workaround for me was to have two generation paths.  One would do LCG, and the other would do traditional Reflection.Emit that could be debugged and PEVerified, etc.  The problem with this was the argument were not aligned between the two.  When doing the traditional Reflection.Emit, ldarg.0 would give you the "this" instance, which didn't exist in LCG.
  • Handler registration (Generics compatibility) - Ideally, record handlers should work with concrete record types, but the way generics work a Converter is not assignable to a Converter even though Mir : StdfRecord.  Of course implementing that would complicate many things.  Interestingly, delegate(UnknownRecord unknownRecord) { return new Mir(); } will satisfy both delegate types.  So, this was just frustrating that Generics didn't help me out in solving my record handler registration problems.  There may be a solution that I'm not seeing here because of my approach.  Any ideas?

Oddly enough, I spent about equal time on both projects, but I seem to have alot more to say about the later.

[UPDATE] I realized that the entry box swallowed some of my generics syntax, so I fixed that, as well as fixing some minor spelling and grammatical errors.

Tuesday, April 05, 2005 4:00:56 AM (Pacific Standard Time, UTC-08:00)
Hey Mark, if you decide to tile the floor in there, give Dave a call... he's helped with tiling projects lots over the past year and between him and my dad, they have lots of cool helpful tools.
Monday, June 20, 2005 4:55:53 PM (Pacific Standard Time, UTC-08:00)
Hi,

I am interested in the stdf parser you have done. Can I have it? It's not for commercial purpose.

Thanks,
kmhui
Thursday, November 24, 2005 10:44:17 PM (Pacific Standard Time, UTC-08:00)
Hi,

STDF parser looks intresting. Can I have it. Actually I am trying to write using java.....

Thanks,
Satya.
Wednesday, September 19, 2007 5:33:06 AM (Pacific Standard Time, UTC-08:00)
I'm working on my final project on a sdtf analyzing tool and I'll save a lot of time if you decide share me your parser. I have some solutions in perl but I prefer the C#.
Thanks in advance
Dan
Thursday, November 08, 2007 5:32:54 PM (Pacific Standard Time, UTC-08:00)
Hello,

Did you decide to post your C# STDF parser code?

If not can you elaborate on why creating class attributes dynamically improves performance?

Thanks,
/Carl

Tuesday, April 08, 2008 7:24:43 AM (Pacific Standard Time, UTC-08:00)
I am interested in seeing your code for the stdf parser. Are you willing to share it?
Name
E-mail
(will show your gravatar icon)
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):