How to extract bookmarks from a PDF file

Question

I have a PDF file. I need the bookmarks in that file extracted to a text file or an Excel spreadsheet. I also need to validate the bookmarks from the large PDF file. How would I do that?

vinc17 · Accepted Answer · 2019-07-30 14:39:21Z

You can use pdftk to extract data (in particular, bookmarks) from PDF files.

Example: with pdftk 2.02,

pdftk file.pdf dump_data_utf8 | grep '^Bookmark'

outputs the list of bookmarks, 4 lines for each bookmark, under the form:

BookmarkBegin BookmarkTitle: <title in UTF8> BookmarkLevel: <number> BookmarkPageNumber: <number>

where for instance, level 1 corresponds to sections, level 2 to subsections, and so on. Instead of dump_data_utf8, you can use dump_data, which will give you HTML/XML numeric entities for non-ASCII characters (e.g. è for "è").

Note: Without the grep, you can get other interesting data, such as the metadata (creation date, author, keywords, title, etc.), the number of pages and the dimensions of each page. This pdftk utility can do other things on the PDF file(s); see its man page for a full description.

Just a note to self- pdftk can export as well as import bookmark data to and from a text file. pdflabs.com/blog/export-and-import-pdf-bookmarks — Prem
– Prem, Commented Sep 6, 2021 at 20:12

Matthias Braun · Accepted Answer · 2020-06-29 15:26:47Z

with qpdf

This should get you started:

qpdf --json your.pdf | jq '.objects' | grep -Po 'Title": \K.*'

That command will also yield the title of the PDF, though.

Have a look at the qpdf manual regarding its JSON output.

I'm pretty sure the command can be simplified, getting rid of grep, by using jq's wildcards.

Kudos for qpdf! A JSON output of bookmarks using ` --json --json-key=outlines` could not be simpler. Easy to parse for further processing, this is what I searched for, far too long. — CodeBrauer
– CodeBrauer, Commented Aug 25, 2020 at 16:17
Thanks! qpdf --json --json-key=outlines test.pdf > test.json, works like a charm :) — Alex G
– Alex G, Commented Dec 22, 2023 at 9:05

Glutanimate · Accepted Answer · 2014-07-10 23:49:50Z

You can use the CLI of jpdftweak to extract bookmarks in CSV format:

java -jar -Xmx512M jpdftweak.jar "file.pdf" -savebookmarks "bmarks.csv" /dev/null

After validating and possibly modifying the bookmark data you could load it back into the PDF file with the following command:

java -jar -Xmx512M jpdftweak.jar "file.pdf" -loadbookmarks "bmarks.csv" "file_updated.pdf"

The -Xmx512M Java parameter is optional but can help with processing larger PDF files that require more memory.

You might want to read this related Q&A as well.

Stack Exchange Network

How to extract bookmarks from a PDF file

3 Answers 3

with qpdf

You must log in to answer this question.

Hot Network Questions

How to extract bookmarks from a PDF file

3 Answers 3

with qpdf

You must log in to answer this question.

Related

Hot Network Questions