Why do we need to format html in Delphi?

I know this issue has been pretty well discussed but someone recently asked me about it again and I put together a small example to help illustrate the problems that arise when using the MSHTML control and trying to preserve source code.

Here is what we hand the MSHTML control


  
    
  

  

        

  • one
        
  • two
      

  this is
wrapped text
  

    

      

  

we get this back (without editing anything btw):

  • one
  • two

this is wrapped text

Now, to point out the differences…

  1. All tags are now uppercase
  2. runat=”server” on title tag is gone
  3. Missing the first closing (notice the last one is preserved)
  4. A single space was added between “two” and
  5. The wrapped text has been unwrapped
  6. The order of the table tag attributes has been reversed
  7. The quotes around the table tag attribues have been removed
  8. The case of the table tag attributes has been changed
  9. TBODY tag has been added
  10. A closing TR tag has been added (but no closing TD??)
  11. All whitespace has been removed (except of course where it was added see above)

So, as you can see when it comes to source preservation using the MSHTML control we definately have our work cut out for us thus we currently reformat the markup to make it readable again. I’ll have more on this later.