I know this issue has been pretty well discussed but someone recently asked me about it again and I put together a small example to help illustrate the problems that arise when using the MSHTML control and trying to preserve source code.
Here is what we hand the MSHTML control
- one
- two
this is
wrapped text
we get this back (without editing anything btw):
- one
- two
this is wrapped text
Now, to point out the differences…
- All tags are now uppercase
- runat=”server” on title tag is gone
- Missing the first closing (notice the last one is preserved)
- A single space was added between “two” and
- The wrapped text has been unwrapped
- The order of the table tag attributes has been reversed
- The quotes around the table tag attribues have been removed
- The case of the table tag attributes has been changed
- TBODY tag has been added
- A closing TR tag has been added (but no closing TD??)
- All whitespace has been removed (except of course where it was added see above)
So, as you can see when it comes to source preservation using the MSHTML control we definately have our work cut out for us thus we currently reformat the markup to make it readable again. I’ll have more on this later.