https://github.com/dbrizard/add-pdf-contents

Add bookmarks or outline, which is the same, to PDF or DJVU files to better scroll through its contents

https://github.com/dbrizard/add-pdf-contents

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

bookmarks contents djvu outline pdf
Last synced: 5 months ago · JSON representation

Repository

Add bookmarks or outline, which is the same, to PDF or DJVU files to better scroll through its contents

Basic Info
  • Host: GitHub
  • Owner: dbrizard
  • License: agpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 57.6 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
bookmarks contents djvu outline pdf
Created about 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Add bookmarks/outline to PDF or DJVU files to better scroll through its contents.

These small tools simplify the addititon of bookmarks/contents to PDF or DJVU files.

Installation

Python module

Put the Python module contents.py somewhere in your python path. Or modify the python path to tell where to find the module.

PDF manipulating tool

Install either CPDF or PDFTK.

DJVU tool

Make sure you have djvused installed (via DjVuLibre).

Shell script

Put the Shell scripts (.sh files) in an accessible folder, such as in ~/bin/

In case you only have PDFTK, the second part of the script has to be modified to use PDFTK instead of CPDF.

Usage

Write the contents of the pdf/djvu file

First, you have to manually write in a text file (e.g. contents.txt) the contents of the pdf file you want to put bookmarks in:

  • one line per bookmark;
  • blank lines are ignored;
  • the page number must be at the end of the line, seperated by a whitespace (other separator possible, not tested yet);
  • indentation is functional: the depth of the bookmark is proportional to the number of whitespaces at the beginning of the line;
  • page offsets in the form +10 or -8 on separate lines;
  • if you use CPDF, the caracter " is reserved and connot be used (see generated file contents.bmk).

Convert contents.txt into contents.bmk

The Python module contents.py converts the contents of the pdf file into a format enabling the addition of bookmarks in the pdf with the chosen tool (CPDF or PDFTK). By default, it searches the contents.txt file and writes a contents.bmk file.

The simplest line is therefore, in a Python terminal: from contents import Contents Contents().write4CPDF()

See the Python module for the following options:

  • page number offset;
  • open bookmarks or not (CPDF);
  • debug option in case writing the contents.bmk file fails.

Add the bookmarks to the file

This now takes place in a terminal.

With CPDF:

cpdf -add-bookmarks contents.bmk in.pdf -o out.pdf

With PDFTK

First, get the metadata of the PDF file: pdftk file.pdf dump_data > metadata.txt

Then, modify the metadata file by including contents.bmk after the line containing the keyword NumberOfPages:.

Finally, updade the PDF metadata: pdftk file.pdf update_info metadata.txt output newfile.pdf

With DJVU files

Use the two following commands: djvused -e print-outline book.djvu djvused -s -e 'set-outline contents.bmk' book.djvu

The shell scripts

The shell script addpdfcontents.sh allows, in one line, to directly add the bookmarks in a pdf file with CPDF, provided the contents.txt file is in the current directory. addpdfcontents.sh file.pdf

The shell script adddjvucontents.sh does the same on djvu files, using djvused.

The thrid shell script, watchpdfcontents.sh, allows to add the bookmarks in the pdf file each time the file contents.txt is modified. This can be useful to see the resulting pdf file while typing the contents file.

Last, the autoindentcontents.sh script performs automatic indentation of a text file containing the table of contents:

  • works only for "1.2.3 Subection title"-like TOC;
  • also has a -d option to remove lines of dots ("........" often present between title and page number).

Other similar tools

  • pdfoutline is very similar. It relies on ghostscript to add the outline to the pdf. I did not find this tool when I developped the present one. Very slow but page number offset can be given anywhere in the contents file (this functionality is now also available). Can also significantly increase the size of the pdf (I have to find out why and in which cases precisely);
  • pdfoutliner seems a good option (not tested, only recently discovered), relies on pdftk;
  • simple-PDF-outline-adder also uses ghostscript. The main drawback is that the outline text file must have, on each line, the page numbers BEFORE the title text;
  • pdfoutline is Haskell based and requires the level of each entry to be written explicitly (and is not determined from indentation of the text file);
  • doc-tools-toc is a tool to manage table of contents (TOC) of pdf and djvu files with Emacs.

See also

  • QuickOutline has a GUI and seems to be very powerful (automatically creates the outline from the OCR of the outline);
  • HandyOutliner also treats PDF and DJVU files, GUI based.

Owner

  • Login: dbrizard
  • Kind: user
  • Location: Lyon, France
  • Company: @ifsttar

Research fellow, Biomechanics and impact mechanics laboratory (Laboratoire de Biomécanique et Mécanique des Chocs). Wave propagation, SHPB, UQ, GSA

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1