About/Contact

Steve Trefethen

Steve Trefethen is a Director of Engineering at Reply. Contact me

View my LinkedIn profile


Powered by discountASP.NET
referal ID: sdtref
Why recommend discountASP.NET?
$720 in referrals so far!


Calendar

<<  February 2012  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
2728291234
567891011

View posts in large calendar

Disclaimer

The posts on this weblog are provided AS IS with no warranties, and confer no rights. The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.



Text file processing with LINQ

September 09 2009 7:15AM

After working on this problem the other day I started Googling looking for posts written about using LINQ for text file processing. I found the post Parsing textfiles with LINQ (or LINQ-to-TextReader) by Arjan Einbu.

LINQ shows us alternate ways to write code, introducing a more declarative coding paradigm. To use LINQ over the lines of a file, we can read all the lines in the file into a collection, and use LINQ over that collection. There’s some overhead to this; the need to read the entire file upfront and to fit the entire file in memory at once.

The solution was to create an extension method on TextReader for IEnumerable<string>. That post was followed up by another post, rather unfortunately titled, improving upon the solution using  TextFieldParser class in the Microsoft.VisualBasic.FileIO namespace, something I wasn’t aware existed and now find it odd this class is stuck well off in left field.

One of the reasons this subject interests me is I’ve been working with EDI files for awhile now and querying data directly from this file format would be really nice. For example, given a PO with line item segments like this:

PO1*1*36*CA*11.15*PE***VP*RRSKRC85*PI*0001111091127~
PID*F****PRSL ROMNE BBY TRAY ORGNC~
PO1*2*84*CA*11.15*PE***VP*RSMKRC85*PI*0001111091131~
PID*F****PRSL SPRG BBY TRAY ORGNC~
PO1*3*84*CA*11.15*PE***VP*RBSKRC85*PI*0001111091128~
PID*F****PRSL SPNCH BBY TRAY ORGNC~
PO1*4*72*CA*11.15*PE***VP*RHEKRC85*PI*0001111091126~
PID*F****PRSL SPRG W/HRB CLM ORGNC~

You can calculate the total quantity, highlighted in yellow, of all line items using LINQ like this:

using (var reader = new StreamReader("c:\\edi\\inbound\\850_09022009_1311_89.txt"))
{
    var query = (from line in reader.GetSplittedLines("*")
                            where line[0].Equals("PO1") && line[2].Length > 0
                            select Convert.ToInt32(line[2])).Sum();
... }

Using Arjan’s implementation of GetSpittedLines, that’s his name not mine for the extension method he wrote, you can apply logic to any of the columns from the file which is pretty cool.

Of course, there are a myriad of ways of doing the same thing but it’s interesting to have access to the columns allowing for calculations and querying. For my EDI work I’m using FileHelpers which works well though I really like this LINQ option. That said, I haven’t done any benchmarking so I’m not sure about the performance but most of the PO’s I’m working with are less than 4KB and the volume isn’t so great that this would be a major factor. At any rate, I hope you find useful for you too.

Btw, if you’re looking for custom EDI implementations feel free to contact me.

FacebookDel.icio.usDigg It!

Add comment




  Country flag
biuquote
  • Comment
  • Preview
Loading