Steve Trefethen
Contact me
About Me View my LinkedIn profile

Powered by discountASP.NET
referal ID: sdtref
Why recommend discountASP.NET?
Need consulting?
Need Consulting?

Spread Thunderbird

Disclaimer

The posts on this weblog are provided AS IS with no warranties, and confer no rights. The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

TestComplete Keyword Testing Online Training

I will be presenting a three day webinar April 12-14th 2010 from 9am-1pm PST on AutomatedQA's TestComplete. The cost is $499/person and you can register on Falafel Software's website here. For detailed information on this training click here.

Processing CSV files in C# using Open Source FileHelpers Library

November 01 2007 8:39AM

Scott Hanselman recently posted about parsing CSV files from PowerShell and while I found it interesting I didn’t find the solution to be intuitive though perhaps that’s just me. I’ve recently purchased PowerShell in Action by Bruce Payette and I like the idea of PowerShell though I find myself bailing back to a plain old Command Prompt quite frequently. Anyway, a few days ago I blogged about using the Open Source FileHelpers library for parsing fixed and delimited length files. I thought it might be fun to tackle Scotts problem and see just how difficult it would be to duplicate his solution in C#. Here is what he was parsing (at least a snippet of it anyway):

"File","Hits","Bandwidth"
"/hanselminutes_0026_lo.wma","78173","163625808"
"/hanselminutes_0076_robert_pickering.wma","24626","-1789110063"
"/hanselminutes_0077.wma","17204","1959963618"
"/hanselminutes_0076_robert_pickering.mp3","15796","-55874279"
"/hanselminutes_0078.wma","14832","-1241370004"
"/hanselminutes_0075.mp3","13685","-1840937989"
"/hanselminutes_0075.wma","12129","1276597408"
"/hanselminutes_0078.mp3","11058","-1186433073"

With FileHelpers the first thing to do is define a class that maps the data in your file which for Scott’s data would look something like this:

    [DelimitedRecord(",")]

    class Show

    {

        public string Filename;

        [FieldConverter(typeof(IntConverter))]

        public Int32 downloads;

        [FieldConverter(typeof(IntConverter))]

        public Int32 bandwidth;

    }

Notice the class attribute indicates the separator used. Also notice the member attributes that provide string conversion for the integer values read from the file. Since the integers, unfortunately, have quotes the stock FileHelpers converter won’t work though that’s easily solved by writing a new IntConverted like this:

    public class IntConverter : ConverterBase

    {

        public override object StringToField(string from)

        {

            if (from != "" || from != "\"\"")

                return Convert.ToInt32(from.Substring(1, from.Length - 2));

            else

                return string.Empty;

        }

    }

For parsing the file we use FileHelpers DelimitedFileEngine like this:

     DelimitedFileEngine e = new DelimitedFileEngine(typeof(Show));

     e.Options.IgnoreFirstLines = 1;

     object[] shows = e.ReadFile("c:\\shows.txt");

This creates an instance of the engine passing our Show class type with an option to ignore the first line of the file (the one with column headings). Calling ReadFile parses it into and array of Show objects populated with the data. Next, we’ll change the filename to the episode number using Scott’s regex and sort the data both using anonymous delegates:

    Regex regex = new Regex("\\d{2}(?=[_.])");

    Array.ForEach((Show[])shows, delegate(Show one)

        { one.Filename = regex.Match(one.Filename).ToString(); }

    );

    Array.Sort((Show[])shows, delegate(Show one, Show two)

        { return one.Filename.CompareTo(two.Filename); }

    );

Finally, we sum the downloads and print the data. Note this for loop modifies the array using the first instance of an episode to aggregate the total downloads, perhaps not optimal but gets the job done:

    Show sh = (Show)shows[0];

    for (int i = 1; i < shows.Length; i++)

    {

        if (((Show)shows[i]).Filename.CompareTo(sh.Filename) == 0)

            sh.downloads += ((Show)shows[i]).downloads;

        else

        {

            Console.WriteLine("{0,-4} {1,-6}", sh.Filename, sh.downloads);

            sh = ((Show)shows[i]);

        }

    }

    Console.WriteLine("{0,-4} {1,-6}", sh.Filename, sh.downloads);


So, what’s the point? Well, at the end of Scott’s post he wrote:

It took less time than it would take to write a C# program and it’s easily modified ad-hoc.

Armed with FileHelpers this took no time at all, the bulk of the code is spent working with the data not importing into a usable form. Another point is, IMO the C# code is considerably easier to read. I’m finding that getting my head around PowerShell commands to be a bit awkward. For me PowerShell is tantilizingly powerful but I feel requires a different enough mind set to perhaps thwart rapid adoption. I can empathize with all those people who have "been meaning to try..." Of course, I suppose if we all had the chance to sit down with Lee Holmes things might be different.
Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Tags: , , ,

Comments

11/1/2007 12:34:48 PM #

Caleb

# Done in python (if I understood the problem statement)

shows = [] # data destination
totaldownloads = 0 # sum var
for show in file('shows.txt','r'):
    show = show.replace('"','') # no quotes
    show = dict(zip(['filename','downloads','bandwidth'],show.split(',')))
    try:
        show['downloads'] = int(show['downloads']) # hits
        show['bandwidth'] = int(show['bandwidth']) # bandwidth
        # filename -> episode number.  Ugly, could use regex if needed
        show['filename'] = int(show['filename'].split('_')[1].split('.')[0].split('_')[0])
        totaldownloads += show['downloads'] # accumulate
        shows.append(show) # add to list
    except:
        print 'problem with line: '+str(show['filename'])

def myCmp(a,b): # custom sort condition
    return a['filename'] > b['filename']
shows.sort(cmp=myCmp) # in-place sort

for show in shows: # output
    print '%10d%20d%20d' % tuple(show.values())
print 'Total Downloads = %d' % (totaldownloads)

Caleb

11/1/2007 4:02:37 PM #

Marcos Meli

Thanks a lot for your comments about the library =)

You can use the generics version of the engine to get even more readable code and avoid casts =)

DelimitedFileEngine<Show> e = new DelimitedFileEngine<Show>();
e.Options.IgnoreFirstLines = 1;
Show[] shows = e.ReadFile("c:\\shows.txt");

so you can use...

   Array.ForEach(shows, delegate(Show one)
        { one.Filename = regex.Match(one.Filename).ToString(); }
    );

    Array.Sort(shows, delegate(Show one, Show two)
        { return one.Filename.CompareTo(two.Filename); }
    );

.. and ...

Show sh = shows[0];
for (int i = 1; i < shows.Length; i++)
    {
        if (shows[i].Filename.CompareTo(sh.Filename) == 0)
            sh.downloads += (shows[i]).downloads;
        else
        {
            Console.WriteLine("{0,-4} {1,-6}", sh.Filename, sh.downloads);
            sh = shows[i];
        }
    }

Best Regards

Marcos Meli

11/1/2007 4:37:15 PM #

Steve Trefethen

Caleb,
Sweet, thanks!

Marcos,
Thanks for the tip. I hadn't noticed the generic version of the engine. Great work, btw!

Steve Trefethen

11/1/2007 7:48:36 PM #

Bill Meyer

Steve, one minor quibble: in the title, CVS should be CSV ;)

But still a great article, for those of us for whom CSV is an everyday fact of life.

Bill Meyer

11/1/2007 8:48:54 PM #

Steve Trefethen

Fixed. Thanks Bill. I need an editor.

Steve Trefethen

11/2/2007 4:01:02 AM #

Bill Meyer

Actually, I think you're typing in an editor <g>.

But a proofreader might be a good thing... an actual editor, competent to find fault in your content, would be harder to locate. Or at least, one who was competent AND had the free time available AND was willing to give it ;)

By the way, I see the box to enter the code, but no code image to enter. So I click on Save Comment, and the page redraws, but with a code I can then type into the box. Odd, and minorly annoying.

Bill Meyer

11/2/2007 5:08:44 PM #

Steve Trefethen

Bill,
Thanks for the chuckle. I think as long as you keep reading (and catching issues) I'll be ok.  Smile

Steve Trefethen

11/3/2007 3:41:09 AM #

Bill Meyer

Oh, I feel so used....

;)

Hey! There's a code to type in the box...  first time!

Bill Meyer

Add comment


(Will show your Gravatar icon)

  Country flag

biuquote
  • Comment
  • Preview
Loading



Spam filtering provided by: Spam Counter
331 comments approved, 1455 spam caught since October 28, 2009
Powered by Commentor