Working with PDFs in Python: Inserting, Deleting, and Reordering Pages

This article is the third in a series on working with PDFs in Python:

Introduction

This article is part three of a little series on working with PDFs in Python. In the previous articles we gave an introduction into reading PDF documents using Python. So far you have learned how to manipulate existing PDFs, and to read and extract the content - both text and images. Furthermore, we have discussed splitting documents into its single pages, as well as adding watermarks and barcodes.

Now in this article we will go one step further and demonstrate how to rearrange a PDF document in a few different ways.

Deleting Pages with pdfrw

Deleting individual pages from a PDF file is as simple as the following:

  • Read a PDF as an input file
  • Write selected pages to a new PDF as an output file

The following example removes the first two pages from a PDF document. Using the pdfrw library, the file is read with the help of the PdfReader() class first. Except for both the first and second page, each page is added to the output file using the addpage() method, and then written to disk eventually.

Figure 1 shows the output when executing the code on a four-page PDF file.

# !/usr/bin/python
# Remove the first two pages (cover sheet) from the PDF

from pdfrw import PdfReader, PdfWriter

input_file = "example.pdf"
output_file = "example-updated.pdf"

# Define the reader and writer objects
reader_input = PdfReader(input_file)
writer_output = PdfWriter()

# Go through the pages one after the next
for current_page in range(len(reader_input.pages)):
    if current_page > 1:
        writer_output.addpage(reader_input.pages[current_page])
        print("adding page %i" % (current_page + 1))

# Write the modified content to disk
writer_output.write(output_file)
Delete the first two pages from a PDF

Deleting Pages with PyMuPDF

The PyMuPDF library comes with quite a few sophisticated methods that simplify deleting pages from a PDF file. It allows you to specify either a single page (using the deletePage() method), or a range of page numbers (using the deletePageRange() method), or a list with the page numbers (using the select() method).

The following example will demonstrate how to use a list in order to select the pages to keep from the original document. Be aware that the pages that are not specified will not be part of the output document. In our case the output document contains the first, second, and fourth pages only.

# !/usr/bin/python

# Recall that PyMuPDF is imported as fitz
import fitz

input_file = "example.pdf"
output_file = "example-rearranged.pdf"

# Define the pages to keep - 1, 2 and 4
file_handle = fitz.open(input_file)
pages_list = [0,1,3]

# Select the pages and save the output
file_handle.select(pages_list)
file_handle.save(output_file)

Inserting Pages with PyMuPDF

The PyMuPDF library allows you to insert pages as well. It provides the methods newPage() for adding completely blank pages, and insertPage() in order to add an existing page. The next example shows how to add a page from a different PDF document at the end of another one.

# !/usr/bin/python

# Recall that PyMuPDF is imported as fitz
import fitz

original_pdf_path = "example.pdf"
extra_page_path = "extra-page.pdf"
output_file_path = "example-extended.pdf"

original_pdf = fitz.open(original_pdf_path)
extra_page = fitz.open(extra_page_path)

original_pdf.insertPDF(extra_page)
original_pdf.save(output_file_path)

Splitting Even and Odd Pages with PyPDF2

The following example uses PyPDF2 and does this by taking a file, separating it into its even and odd pages, saving the even pages in the file even.pdf, and the odd pages in odd.pdf.

This Python script starts with the definition of two output files, even.pdf and odd.pdf, as well as their corresponding writer objects pdf_writer_even and pdf_writer_odd. Next, in a for-loop the script goes through the entire PDF file, and reads one page after the other. Pages with even page numbers are added to the stream pdf_writer_even using addPage(), and odd numbers are added to the stream pdf_writer_odd. At the end the two streams are saved to disk in separate files, as defined before.

#!/usr/bin/python3

from PyPDF2 import PdfFileReader, PdfFileWriter

pdf_document = "example.pdf"
pdf = PdfFileReader(pdf_document)

# Output files for new PDFs
output_filename_even = "even.pdf"
output_filename_odd = "odd.pdf"

pdf_writer_even = PdfFileWriter()
pdf_writer_odd = PdfFileWriter()

# Get reach page and add it to corresponding
# output file based on page number
for page in range(pdf.getNumPages()):
    current_page = pdf.getPage(page)
    if page % 2 == 0:
        pdf_writer_odd.addPage(current_page)
    else:
        pdf_writer_even.addPage(current_page)

# Write the data to disk
with open(output_filename_even, "wb") as out:
     pdf_writer_even.write(out)
     print("created", output_filename_even)

# Write the data to disk
with open(output_filename_odd, "wb") as out:
     pdf_writer_odd.write(out)
     print("created", output_filename_odd)

Conclusion

Re-writing and re-arranging the structure of a PDF is fairly easy with the libraries pdfrw, PyMuPDF, and PyPDF2. With just a few lines of Python code you can delete pages, separate them, and add new content.

Author image
Berlin -- Genève -- Cape Town Twitter Website
IT developer, trainer, and author. Coauthor of the Debian Package Management Book (http://www.dpmb.org/).