Skip to content Skip to sidebar Skip to footer

Batch Conversion Of Docx To Clean HTML

I'm starting to wonder if this is even possible. I've searched for solutions on Google and come up with nothing that works exactly how I'd like it to. I think it'd benefit to expla

Solution 1:

This looks like just what you need: http://msdn.microsoft.com/en-us/library/ff628051(v=office.14).aspx

The author Eric White blogged about his experiences developing that tool. You can see that list of posts on his blog here: http://blogs.msdn.com/b/ericwhite/archive/2008/10/20/eric-white-s-blog-s-table-of-contents.aspx#Open_XML_to_XHtml


Solution 2:

Since I'm a big fan of Aspose.Words, a commercial library to create/process Word documents, I would do something like:

  1. Open the Word document with Aspose.Words.
  2. Save the Word document as HTML.
  3. Use something like SgmlReader or HTML Agility Pack (or even Regular Expressions if it is suitable) to remove unwanted HTML tags/attributes.

Since you wrote you work at an university, I'm not sure whether commercial packages are an option, though.


Solution 3:

Hi not sure what the rules are on promoting your own solutions, so do let me know if I am out of line.

I am a web developer who had the same issues, so I created my own tool: http://www.convertwordtohtml.com

We are also working on a new version that will have even better conversion quality and one click conversion eg you can right click on a word file and it will be directly converted to html and the code placed into the clipboard. The current version also supports command line access and the new version will have a server version to.

There is a free trial version downloadable from the site , and if you have any questions do contact me any time.


Post a Comment for "Batch Conversion Of Docx To Clean HTML"