Processing CSV files in C# using Open Source FileHelpers Library


Scott Hanselman recently posted about parsing CSV files from PowerShell and while I found it interesting I didn’t find the solution to be intuitive though perhaps that’s just me. I’ve recently purchased PowerShell in Action by Bruce Payette and I like the idea of PowerShell though I find myself bailing back to a plain old Command Prompt quite frequently. Anyway, a few days ago I blogged about using the Open Source FileHelpers library for parsing fixed and delimited length files. I thought it might be fun to tackle Scotts problem and see just how difficult it would be to duplicate his solution in C#. Here is what he was parsing (at least a snippet of it anyway):

“File”,”Hits”,”Bandwidth”
“/hanselminutes_0026_lo.wma”,”78173″,”163625808″
“/hanselminutes_0076_robert_pickering.wma”,”24626″,”-1789110063″
“/hanselminutes_0077.wma”,”17204″,”1959963618″
“/hanselminutes_0076_robert_pickering.mp3″,”15796″,”-55874279″
“/hanselminutes_0078.wma”,”14832″,”-1241370004″
“/hanselminutes_0075.mp3″,”13685″,”-1840937989″
“/hanselminutes_0075.wma”,”12129″,”1276597408″
“/hanselminutes_0078.mp3″,”11058″,”-1186433073″

With FileHelpers the first thing to do is define a class that maps the data in your file which for Scott’s data would look something like this:

    [DelimitedRecord(“,”)]

    class Show

    {

        public string Filename;

        [FieldConverter(typeof(IntConverter))]

        public Int32 downloads;

        [FieldConverter(typeof(IntConverter))]

        public Int32 bandwidth;

    }

Notice the class attribute indicates the separator used. Also notice the member attributes that provide string conversion for the integer values read from the file. Since the integers, unfortunately, have quotes the stock FileHelpers converter won’t work though that’s easily solved by writing a new IntConverted like this:

    public class IntConverter : ConverterBase

    {

        public override object StringToField(string from)

        {

            if (from != “” || from != “\”\“”)

                return Convert.ToInt32(from.Substring(1, from.Length – 2));

            else

                return string.Empty;

        }

    }

For parsing the file we use FileHelpers DelimitedFileEngine like this:

     DelimitedFileEngine e = new DelimitedFileEngine(typeof(Show));

     e.Options.IgnoreFirstLines = 1;

     object[] shows = e.ReadFile(“c:\\shows.txt”);

This creates an instance of the engine passing our Show class type with an option to ignore the first line of the file (the one with column headings). Calling ReadFile parses it into and array of Show objects populated with the data. Next, we’ll change the filename to the episode number using Scott’s regex and sort the data both using anonymous delegates:

    Regex regex = new Regex(“\\d{2}(?=[_.])“);

    Array.ForEach((Show[])shows, delegate(Show one)

        { one.Filename = regex.Match(one.Filename).ToString(); }

    );

    Array.Sort((Show[])shows, delegate(Show one, Show two)

   &nb
sp;    { return one.Filename.CompareTo(two.Filename); }

    );

Finally, we sum the downloads and print the data. Note this for loop modifies the array using the first instance of an episode to aggregate the total downloads, perhaps not optimal but gets the job done:

    Show sh = (Show)shows[0];

    for (int i = 1; i < shows.Length; i++)

    {

        if (((Show)shows[i]).Filename.CompareTo(sh.Filename) == 0)

            sh.downloads += ((Show)shows[i]).downloads;

        else

        {

            Console.WriteLine(“{0,-4} {1,-6}”, sh.Filename, sh.downloads);

            sh = ((Show)shows[i]);

        }

    }

    Console.WriteLine(“{0,-4} {1,-6}”, sh.Filename, sh.downloads);


So, what’s the point? Well, at the end of Scott’s post he wrote:

It took less time than it would take to write a C# program and it’s easily modified ad-hoc.

Armed with FileHelpers this took no time at all, the bulk of the code is spent working with the data not importing into a usable form. Another point is, IMO the C# code is considerably easier to read. I’m finding that getting my head around PowerShell commands to be a bit awkward. For me PowerShell is tantilizingly powerful but I feel requires a different enough mind set to perhaps thwart rapid adoption. I can empathize with all those people who have “been meaning to try…” Of course, I suppose if we all had the chance to sit down with Lee Holmes things might be different.

9 thoughts on “Processing CSV files in C# using Open Source FileHelpers Library

  1. # Done in python (if I understood the problem statement)
    shows = [] # data destination
    totaldownloads = 0 # sum var
    for show in file(‘shows.txt’,’r’):
    show = show.replace(‘"’,”) # no quotes
    show = dict(zip([‘filename’,’downloads’,’bandwidth’],show.split(‘,’)))
    try:
    show[‘downloads’] = int(show[‘downloads’]) # hits
    show[‘bandwidth’] = int(show[‘bandwidth’]) # bandwidth
    # filename -> episode number. Ugly, could use regex if needed
    show[‘filename’] = int(show[‘filename’].split(‘_’)[1].split(‘.’)[0].split(‘_’)[0])
    totaldownloads += show[‘downloads’] # accumulate
    shows.append(show) # add to list
    except:
    print ‘problem with line: ‘+str(show[‘filename’])
    def myCmp(a,b): # custom sort condition
    return a[‘filename’] > b[‘filename’]
    shows.sort(cmp=myCmp) # in-place sort
    for show in shows: # output
    print ‘%10d%20d%20d’ % tuple(show.values())
    print ‘Total Downloads = %d’ % (totaldownloads)

  2. Thanks a lot for your comments about the library =)
    You can use the generics version of the engine to get even more readable code and avoid casts =)
    DelimitedFileEngine<Show> e = new DelimitedFileEngine<Show>();
    e.Options.IgnoreFirstLines = 1;
    Show[] shows = e.ReadFile("c:\\shows.txt");
    so you can use…
    Array.ForEach(shows, delegate(Show one)
    { one.Filename = regex.Match(one.Filename).ToString(); }
    );
    Array.Sort(shows, delegate(Show one, Show two)
    { return one.Filename.CompareTo(two.Filename); }
    );
    .. and …
    Show sh = shows[0];
    for (int i = 1; i < shows.Length; i++)
    {
    if (shows[i].Filename.CompareTo(sh.Filename) == 0)
    sh.downloads += (shows[i]).downloads;
    else
    {
    Console.WriteLine("{0,-4} {1,-6}", sh.Filename, sh.downloads);
    sh = shows[i];
    }
    }
    Best Regards

  3. Steve, one minor quibble: in the title, CVS should be CSV 😉
    But still a great article, for those of us for whom CSV is an everyday fact of life.

  4. Actually, I think you’re typing in an editor <g>.
    But a proofreader might be a good thing… an actual editor, competent to find fault in your content, would be harder to locate. Or at least, one who was competent AND had the free time available AND was willing to give it 😉
    By the way, I see the box to enter the code, but no code image to enter. So I click on Save Comment, and the page redraws, but with a code I can then type into the box. Odd, and minorly annoying.

  5. i was initially using Microsoft Text Driver (*.txt; *.csv), and at time this will fail,
    so my manager suggested use (stream reader and *figure it out*)
    but im a person who like to focus on getting the problem fixed, and not spend too much time sorting out stream readers… so found ur post, and thanks! it works, just need 3 line and result is a nice class object.

Comments are closed.