The other day a friend, whose not a developer, approached me looking for help with a problem they were having dealing a few large server log files. The file contained 100’s of $g(MIME headers) listed one after the other each with a starting comment and separated by a blank line. I was given a sample containing 1400 MIME headers which at first I opened in Windows Notepad and looked like what you see to the right. They had been trying to work with this file using Excel but not having much luck because the headers were inconsistent sizes and I imagine extracting the right fields was clearly a problem. The desired result was a format from which they could perform some analysis of the data and particularly of the X-Originating-IP field.
After about 10 seconds of staring at the data in Notepad I opened it in Notepad++ and things looked a bit more sane and it sort of dawned on me what my friend’s first thoughts probably were when they first glanced at this file. Excel looked better but didn’t make the process any easier.
A few choices entered my mind:
- Write a simple parser, regex etc.
- Look for an existing MIME parser
- Use VS.NET editor Macros to extract the content
My focus desire was to:
- not spend much time
- produce a CSV file
- do something my friend could duplicate (lessen “support”)
I opted to search for a MIME parser largely because I figured one written in C# had to exist then write a tool to spit out a CVS file. My first Google search was “parse email header C#” which gave me a few interesting links but nothing that really caught my eye. The next attempt was “parse MIME header C#”:
The CodeProject article is largely code snippets and the first one looked interesting.
Mime m = Mime.Parse("message.eml"); // Do your stuff with mime
I thought, if there is .eml message parsing then I’m good regardless of the fact that it was expecting a file. I downloaded the source and it compiled without error, always encouraging. Next, I looked into the parsing support:
// Summary: // Parses mime message from byte data. // // Parameters: // data: // Mime message data. public static Mime Parse(byte data); // // Summary: // Parses mime message from stream. // // Parameters: // stream: // Mime message stream. public static Mime Parse(Stream stream); // // Summary: // Parses mime message from file. // // Parameters: // fileName: // Mime message file. public static Mime Parse(string fileName);
For testing I saved off a single MIME header and created a simple console application to try and parse a fake .eml file which worked like a charm. All that was left to do was write some code to read the log file one header at a time and spit out a .CSV file.
I made one minor change to the MIME parsing code which was to change it’s HeaderFieldCollection from an IEnumerable to IEnumerable<HeaderField> so as to leverage LINQ to search for the “X-Originating-IP”. Of course, I later found out that the code attached to the article is outdated.
At any rate, I quickly had the file parsed, output to .CVS using a simple console application with input and output filename params which I mailed off. So, if you’re looking for MIME header parsing this library worked well for the 1400 headers I tried and I’m glad I could offer this tiny bit of help in a situation that sounds very serious for the folks involved.
Btw, kudos to Ivar Lumi for making this available, heck I think writing this post took longer than developing the solution.