Skip to content Skip to sidebar Skip to footer

Convert Docx To Html Incliding Images

I am using DOCX4J to convert the DOCX to HTML .I have successfully done the conversion and got the html format.I will be using the html format to embed it as EMAIL body to send an

Solution 1:

You may add like this in your code

package tcg.doc.web.managedBeans;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;

@Component@Scope("session")@Qualifier("ConvertWord")publicclassConvertWord {
    privatestaticfinalStringdocName="TestDocx.docx";
    privatestaticfinalStringoutputlFolderPath="d:/";


    StringhtmlNamePath="docHtml.html";
    String zipName="_tmp.zip";
    FiledocFile=newFile(outputlFolderPath+docName);
    FilezipFile=newFile(zipName);




      publicvoidConvertWordToHtml() {

          try {

                // 1) Load DOCX into XWPFDocumentInputStreamdoc=newFileInputStream(newFile(outputlFolderPath+docName));
                System.out.println("InputStream"+doc);
                XWPFDocumentdocument=newXWPFDocument(doc);

                // 2) Prepare XHTML options (here we set the IURIResolver to load images from a "word/media" folder)XHTMLOptionsoptions= XHTMLOptions.create(); //.URIResolver(new FileURIResolver(new File("word/media")));;// Extract imageStringroot="target";
                FileimageFolder=newFile( root + "/images/" + doc );
                options.setExtractor( newFileImageExtractor( imageFolder ) );
                // URI resolver
                options.URIResolver( newFileURIResolver( imageFolder ) );


                OutputStreamout=newFileOutputStream(newFile(htmlPath()));
                XHTMLConverter.getInstance().convert(document, out, options);


                System.out.println("OutputStream "+out.toString());
            } catch (FileNotFoundException ex) {

            } catch (IOException ex) {

            } 
         }

      publicstaticvoidmain(String[] args) {
         ConvertWord cwoWord=newConvertWord();
         cwoWord.ConvertWordToHtml();
         System.out.println();
    }



      public String htmlPath(){
        // d:/docHtml.htmlreturn outputlFolderPath+htmlNamePath;
      }

      public String zipPath(){
          // d:/_tmp.zipreturn outputlFolderPath+zipName;
      }

}

For maven Dependency on pom.xml

<dependency><groupId>fr.opensagres.xdocreport</groupId><artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId><version>1.0.4</version></dependency>

or download it from Here

Solution 2:

For images to work in an email body, I guess you need to use either a data URI or publish them to a web-reachable location.

In either case, you'll need to write an implementation of:

publicinterfaceConversionImageHandler {

/**
 * @param picture 
 * @param relationship of the image 
 * @param part of the image, if it is an internal image, otherwise null
 * @return uri for the image we've saved, or null
 * @throws Docx4JException this exception will be logged, but not propagated
 */public String handleImage(AbstractWordXmlPicture picture, Relationship relationship, BinaryPart part)throws Docx4JException;
}

and configure docx4j to use it with htmlSettings.setImageHandler.

You can look at some of the existing implementations in the docx4j source code, and take advantage of the helper methods in AbstractConversionImageHandler (eg createEncodedImage if you want data URIs).

Post a Comment for "Convert Docx To Html Incliding Images"