Java:using apache POI how to convert ms word file to pdf?

By using apache POI how to convert ms word file to pdf ? I an using the following code but its not working giving errors I guess I am importing the wrong classes?

import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.OutputStream; import org.apache.poi.hslf.record.Document; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.hwpf.usermodel.Paragraph; import org.apache.poi.hwpf.usermodel.Range; import org.apache.poi.poifs.filesystem.POIFSFileSystem; public class TestCon < /** * @param args */ public static void main(String[] args) < // TODO Auto-generated method stub POIFSFileSystem fs = null; Document document = new Document(); try < System.out.println("Starting the test"); fs = new POIFSFileSystem(new FileInputStream("/document/test2.doc")); HWPFDocument doc = new HWPFDocument(fs); WordExtractor we = new WordExtractor(doc); OutputStream file = new FileOutputStream(new File("/document/test.pdf")); PdfWriter writer = PdfWriter.getInstance(document, file); Range range = doc.getRange(); document.open(); writer.setPageEmpty(true); document.newPage(); writer.setPageEmpty(true); String[] paragraphs = we.getParagraphText(); for (int i = 0; i < paragraphs.length; i++) < org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i); // CharacterRun run = pr.getCharacterRun(i); // run.setBold(true); // run.setCapitalized(true); // run.setItalic(true); paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", ""); System.out.println("Length:" + paragraphs[i].length()); System.out.println("Paragraph" + i + ": " + paragraphs[i].toString()); // add the paragraph to the document document.add(new Paragraph(paragraphs[i])); >System.out.println("Document testing completed"); > catch (Exception e) < System.out.println("Exception during test"); e.printStackTrace(); >finally < // close the document document.close(); >> > 
7,586 6 6 gold badges 57 57 silver badges 105 105 bronze badges asked Jun 1, 2011 at 13:16 11.9k 17 17 gold badges 74 74 silver badges 126 126 bronze badges

Hello Denis when i try to convert word file to pdf i got fallowing error in import com.lowagie.text.Document; import com.lowagie.text.DocumentException; import com.lowagie.text.Paragraph; import com.lowagie.text.pdf.PdfWriter; please tell me with library i forgot to add it also if it is possible to give me a link for download

Commented Aug 8, 2011 at 11:36

8 Answers 8

import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.OutputStream; import com.lowagie.text.Document; import com.lowagie.text.DocumentException; import com.lowagie.text.Paragraph; import com.lowagie.text.pdf.PdfWriter; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.hwpf.usermodel.Range; import org.apache.poi.poifs.filesystem.POIFSFileSystem; public class TestCon < /** * @param args */ public static void main(String[] args) < // TODO Auto-generated method stub POIFSFileSystem fs = null; Document document = new Document(); try < System.out.println("Starting the test"); fs = new POIFSFileSystem(new FileInputStream("D:/Resume.doc")); HWPFDocument doc = new HWPFDocument(fs); WordExtractor we = new WordExtractor(doc); OutputStream file = new FileOutputStream(new File("D:/test.pdf")); PdfWriter writer = PdfWriter.getInstance(document, file); Range range = doc.getRange(); document.open(); writer.setPageEmpty(true); document.newPage(); writer.setPageEmpty(true); String[] paragraphs = we.getParagraphText(); for (int i = 0; i < paragraphs.length; i++) < org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i); // CharacterRun run = pr.getCharacterRun(i); // run.setBold(true); // run.setCapitalized(true); // run.setItalic(true); paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", ""); System.out.println("Length:" + paragraphs[i].length()); System.out.println("Paragraph" + i + ": " + paragraphs[i].toString()); // add the paragraph to the document document.add(new Paragraph(paragraphs[i])); >System.out.println("Document testing completed"); > catch (Exception e) < System.out.println("Exception during test"); e.printStackTrace(); >finally < // close the document document.close(); >> > 
answered Jun 2, 2011 at 5:14 11.9k 17 17 gold badges 74 74 silver badges 126 126 bronze badges

@Harinder i am trying to execute this sample (conversion doc to pdf) and getting java.lang.NullPointerException: Attempt to invoke interface method 'org.w3c.dom.Node org.w3c.dom.Node.removeChild(org.w3c.dom.Node)' on a null object reference exception. Are you able to run this successfully on Android platform?

Commented May 16, 2018 at 3:56

Hi @Harinder, Being a Long term user of STACK OVERFLOW, you would know code-only answers are discouraged here, could you edit your answer to explain why this answers the question? It'll help teach others rather than just encouraging copy-paste coding. Thanks very much :-)

Commented Jan 3, 2019 at 18:28

As @VetrivelPS mention, adding more information to this answer would have been useful. Especially the version of APIs used.

Commented Aug 6, 2019 at 20:50 It's removing all headings, not considering images inside doc files. Commented Oct 21, 2019 at 5:49

@cody123 of course it doesn't because it attemts for fix very specific case the OP had and doesn't even attempt to profit future visitors. This answer is most likely useless for anyone stumbling on this answer, but for luck, other answers point in rigth direction

Commented May 31 at 13:27

Version 2.0.6 support POI 5.x.x

package pdf; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.OutputStream; import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter; import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions; import org.apache.poi.xwpf.usermodel.XWPFDocument; public class PDF < public static void main(String[] args) throws Exception < String inputFile="D:/TEST.docx"; String outputFile="D:/TEST.pdf"; if (args != null && args.length == 2) < inputFile=args[0]; outputFile=args[1]; >System.out.println("inputFile:" + inputFile + ",outputFile:"+ outputFile); FileInputStream in=new FileInputStream(inputFile); XWPFDocument document=new XWPFDocument(in); File outFile=new File(outputFile); OutputStream out=new FileOutputStream(outFile); PdfOptions options=null; PdfConverter.getInstance().convert(document,out,options); > > 
22.8k 42 42 gold badges 148 148 silver badges 227 227 bronze badges answered Apr 18, 2017 at 12:32 Kushagra Sahni Kushagra Sahni 171 2 2 silver badges 7 7 bronze badges Commented May 20, 2017 at 22:40

org.apache.poi.xwpf.converter.pdf.PdfConverter (and PdfOptions) is not a part of Apache POI but xDocReport which misused Apache POI namespace See github.com/opensagres/xdocreport/issues/174 Nowadays their PdfConverter is in the package fr.opensagres.odfdom.converter.pdf

Commented Nov 6, 2017 at 16:55

Hi @Kushagra Sahni, Being a Long New user of STACK OVERFLOW, you would know code-only answers are discouraged here, could you edit your answer to explain why this answers the question? It'll help teach others rather than just encouraging copy-paste coding. Thanks very much :-)

Commented Jan 3, 2019 at 18:29 Getting this error : java.lang.NoClassDefFoundError: org/apache/poi/POIXMLDocumentPart Commented Apr 7, 2022 at 19:51 it's currently fr.opensagres.poi.xwpf.converter.pdf.PdfConverter Commented May 31 at 13:28

In addition to Kushagra's answer, here the updated maven dependencies:

  fr.opensagres.xdocreport fr.opensagres.xdocreport.converter.docx.xwpf 2.0.1  fr.opensagres.xdocreport fr.opensagres.xdocreport.converter 2.0.1  fr.opensagres.xdocreport fr.opensagres.poi.xwpf.converter.pdf 2.0.1  fr.opensagres.xdocreport fr.opensagres.poi.xwpf.converter.xhtml 2.0.1  
answered May 24, 2018 at 9:35 69 5 5 bronze badges

Hi @Erich13, code-only answers are discouraged here, could you edit your answer to explain why this answers the question? It'll help teach others rather than just encouraging copy-paste coding. Thanks very much :-)

Commented Jan 3, 2019 at 18:27

The below code worked for me:

Public class DocToPdfConverter < public static void main(String[] args) < String k=null; OutputStream fileForPdf =null; try < String fileName="/document/test2.doc"; //Below Code is for .doc file if(fileName.endsWith(".doc")) < HWPFDocument doc = new HWPFDocument(new FileInputStream( fileName)); WordExtractor we=new WordExtractor(doc); k = we.getText(); fileForPdf = new FileOutputStream(new File( "/document/DocToPdf.pdf")); we.close(); >//Below Code for else if(fileName.endsWith(".docx")) < XWPFDocument docx = new XWPFDocument(new FileInputStream( fileName)); // using XWPFWordExtractor Class XWPFWordExtractor we = new XWPFWordExtractor(docx); k = we.getText(); fileForPdf = new FileOutputStream(new File( "/document/DocxToPdf.pdf")); we.close(); >Document document = new Document(); PdfWriter.getInstance(document, fileForPdf); document.open(); document.add(new Paragraph(k)); document.close(); fileForPdf.close(); > catch (Exception e) < e.printStackTrace(); >> > 
3,523 5 5 gold badges 39 39 silver badges 52 52 bronze badges answered Aug 12, 2016 at 7:23 Rohit Dubey Rohit Dubey 61 4 4 bronze badges

Hello, welcome to StackOverflow and thank you for your answer. When posting code, please indent it by 4 characters (or use the code-formatting button on the toolbar) to ensure it displays as code (I've suggested an edit for you to fix that). Also, as code-only answers are discouraged here, could you edit your answer to explain why this answers the question? It'll help teach others rather than just encouraging copy-paste coding. Thanks very much!

Commented Aug 12, 2016 at 7:46

There are several steps here:

  1. Read Word document using POI into a format-agnostic form
  2. Convert format-agnostic form into PDF
  3. Write PDF

I don't know if POI will do step 2 for you. I'd recommend something else, like iText.

answered Jun 1, 2011 at 13:19 308k 46 46 gold badges 373 373 silver badges 565 565 bronze badges

The code in your initial post wasn't mentioning the lowagie/iText packages. I was already puzzled as to where to find something PDF related in the POI library. Duffymo is correct in the steps he listed. In a similar situation I use 'WordML' (Word 2003 xml format) that is transformed into FO and then rendered using Apache FOP. There are other possibilities, including OpenOffice API. Search through StackOverflow and you'll find plenty of questions/answers about Office2PDF.

Commented Jun 6, 2011 at 10:59

As a side note, it's also possible to read content on-the-fly directly from a Word/Excel content stream instead of reading it from the filesystem and serializing it to disk, for example when retrieving content from CMIS repositories:

 //HWPFDocument docx = new HWPFDocument(fs); HWPFDocument docx = new HWPFDocument(doc.getContentStream().getStream()); 

(doc is of type org.apache.chemistry.opencmis.client.api.Document and in this case I adapted your code to retrieve a word file from an Alfresco repository by means of opencmis and transformed it to PDF)

470k 46 46 gold badges 338 338 silver badges 334 334 bronze badges answered Sep 6, 2012 at 16:46 161 1 1 silver badge 2 2 bronze badges

This save my day, i load docx file from an url and convert it to pdf:

pom.xml

 org.apache.poi poi 3.13  org.apache.poi poi-ooxml 3.13  fr.opensagres.xdocreport org.apache.poi.xwpf.converter.pdf LATEST  

main_class

public String wordToPDFPOI(String url) throws Exception
answered Oct 18, 2019 at 7:39 41 6 6 bronze badges

All of the answers above will fail if the document has images. I would not suggest you to use apache poi since its library to convert word to pdf have been discontinued now. As of today I don't think that there is any open source library which do the conversion (they require some dependencies like some need MS word to be installed, etc). The best way I could think of (it will only work if you are deploying project on linux machine) is that install Libre Office (open source) in the linux machine and run this :

 String command = "libreoffice --headless --convert-to pdf " + inputPath + " --outdir " + outputPath; try < Runtime.getRuntime().exec(command); >catch (IOException e)
answered Sep 14, 2022 at 3:32 Anmol Jain Anmol Jain 394 4 4 silver badges 14 14 bronze badges

Highly active question. Earn 10 reputation (not counting the association bonus) in order to answer this question. The reputation requirement helps protect this question from spam and non-answer activity.

Linked

Related

Hot Network Questions

Subscribe to RSS

Question feed

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Site design / logo © 2024 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2024.9.9.14969