I want to parse pdf websites.
Can anyone say how to extract all the words (word by word) from a pdf file using java.
The code below extract content from a pdf file and write it in another pdf file. I want that the program write it in a text file.
import java.io.FileOutputStream; import java.io.IOException; import com.itextpdf.text.*; import com.itextpdf.text.pdf.*; public class pdf { private static String INPUTFILE = "http://www.britishcouncil.org/learning-infosheets-medicine.pdf" ; private static String OUTPUTFILE = "c:/new3.pdf"; public static void main(String[] args) throws DocumentException, IOException { Document document = new Document(); PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(OUTPUTFILE)); document.open(); PdfReader reader = new PdfReader(INPUTFILE); int n = reader.getNumberOfPages(); PdfImportedPage page; for (int i = 1; i <= n; i++) { page = writer.getImportedPage(reader, i); Image instance = Image.getInstance(page); document.add(instance); } document.close(); } } Thanks in advance