Scott Hanselman recently posted about
parsing CSV files from PowerShell and while I found it interesting I didn’t find the solution to be intuitive though perhaps that’s just me. I’ve recently purchased
PowerShell in Action by Bruce Payette and I like the idea of PowerShell though I find myself bailing back to a plain old Command Prompt quite frequently. Anyway, a few days ago I blogged about using the Open Source
FileHelpers library for parsing fixed and delimited length files. I thought it might be fun to tackle Scotts problem and see just how difficult it would be to duplicate his solution in C#. Here is what he was parsing (at least a snippet of it anyway):
"File","Hits","Bandwidth"
"/hanselminutes_0026_lo.wma","78173","163625808"
"/hanselminutes_0076_robert_pickering.wma","24626","-1789110063"
"/hanselminutes_0077.wma","17204","1959963618"
"/hanselminutes_0076_robert_pickering.mp3","15796","-55874279"
"/hanselminutes_0078.wma","14832","-1241370004"
"/hanselminutes_0075.mp3","13685","-1840937989"
"/hanselminutes_0075.wma","12129","1276597408"
"/hanselminutes_0078.mp3","11058","-1186433073"
With FileHelpers the first thing to do is define a class that maps the data in your file which for Scott’s data would look something like this:
[DelimitedRecord(",")]
class Show
{
public string Filename;
[FieldConverter(typeof(IntConverter))]
public Int32 downloads;
[FieldConverter(typeof(IntConverter))]
public Int32 bandwidth;
}
Notice the class attribute indicates the separator used. Also notice the member attributes that provide string conversion for the integer values read from the file. Since the integers, unfortunately, have quotes the stock FileHelpers converter won’t work though that’s easily solved by writing a new IntConverted like this:
public class IntConverter : ConverterBase
{
public override object StringToField(string from)
{
if (from != "" || from != "\"\"")
return Convert.ToInt32(from.Substring(1, from.Length - 2));
else
return string.Empty;
}
}
For parsing the file we use FileHelpers DelimitedFileEngine like this:
DelimitedFileEngine e = new DelimitedFileEngine(typeof(Show));
e.Options.IgnoreFirstLines = 1;
object[] shows = e.ReadFile("c:\\shows.txt");
This creates an instance of the engine passing our Show class type with an option to ignore the first line of the file (the one with column headings). Calling ReadFile parses it into and array of Show objects populated with the data. Next, we’ll change the filename to the episode number using Scott’s regex and sort the data both using anonymous delegates:
Regex regex = new Regex("\\d{2}(?=[_.])");
Array.ForEach((Show[])shows, delegate(Show one)
{ one.Filename = regex.Match(one.Filename).ToString(); }
);
Array.Sort((Show[])shows, delegate(Show one, Show two)
{ return one.Filename.CompareTo(two.Filename); }
);
Finally, we sum the downloads and print the data. Note this for loop modifies the array using the first instance of an episode to aggregate the total downloads, perhaps not optimal but gets the job done:
Show sh = (Show)shows[0];
for (int i = 1; i < shows.Length; i++)
{
if (((Show)shows[i]).Filename.CompareTo(sh.Filename) == 0)
sh.downloads += ((Show)shows[i]).downloads;
else
{
Console.WriteLine("{0,-4} {1,-6}", sh.Filename, sh.downloads);
sh = ((Show)shows[i]);
}
}
Console.WriteLine("{0,-4} {1,-6}", sh.Filename, sh.downloads);
So, what’s the point? Well, at the end of Scott’s post he wrote:
It took less time than it would take to write a C# program and it’s easily modified ad-hoc.
Armed with FileHelpers this took no time at all, the bulk of the code is spent working with the data not importing into a usable form. Another point is, IMO the C# code is considerably easier to read. I’m finding that getting my head around PowerShell commands to be a bit awkward. For me PowerShell is tantilizingly powerful but I feel requires a different enough mind set to perhaps thwart rapid adoption. I can empathize with all those people who have "been meaning to try..." Of course, I suppose if we all had the chance to sit down with
Lee Holmes things might be different.