This article is the third in a series on working with PDFs in Python:
- Reading and Splitting Pages
- Adding Images and Watermarks
- Inserting, Deleting, and Reordering Pages (you are here)
Introduction
This article is part three of a little series on working with PDFs in Python. In the previous articles we gave an introduction into reading PDF documents using Python. So far you have learned how to manipulate existing PDFs, and to read and extract the content - both text and images. Furthermore, we have discussed splitting documents into its single pages, as well as adding watermarks and barcodes.
Now in this article we will go one step further and demonstrate how to rearrange a PDF document in a few different ways.
- Deleting Pages with pdfrw
- Deleting Pages with PyMuPDF
- Inserting Pages with PyMuPDF
- Splitting Even and Odd Pages with PyPDF2
Deleting Pages with pdfrw
Deleting individual pages from a PDF file is as simple as the following:
- Read a PDF as an input file
- Write selected pages to a new PDF as an output file
The following example removes the first two pages from a PDF document. Using the pdfrw library, the file is read with the help of the PdfReader()
class first. Except for both the first and second page, each page is added to the output file using the addpage()
method, and then written to disk eventually.
Figure 1 shows the output when executing the code on a four-page PDF file.
# !/usr/bin/python
# Remove the first two pages (cover sheet) from the PDF
from pdfrw import PdfReader, PdfWriter
input_file = "example.pdf"
output_file = "example-updated.pdf"
# Define the reader and writer objects
reader_input = PdfReader(input_file)
writer_output = PdfWriter()
# Go through the pages one after the next
for current_page in range(len(reader_input.pages)):
if current_page > 1:
writer_output.addpage(reader_input.pages[current_page])
print("adding page %i" % (current_page + 1))
# Write the modified content to disk
writer_output.write(output_file)
Deleting Pages with PyMuPDF
The PyMuPDF library comes with quite a few sophisticated methods that simplify deleting pages from a PDF file. It allows you to specify either a single page (using the deletePage()
method), or a range of page numbers (using the deletePageRange()
method), or a list with the page numbers (using the select()
method).
The following example will demonstrate how to use a list in order to select the pages to keep from the original document. Be aware that the pages that are not specified will not be part of the output document. In our case the output document contains the first, second, and fourth pages only.
# !/usr/bin/python
# Recall that PyMuPDF is imported as fitz
import fitz
input_file = "example.pdf"
output_file = "example-rearranged.pdf"
# Define the pages to keep - 1, 2 and 4
file_handle = fitz.open(input_file)
pages_list = [0,1,3]
# Select the pages and save the output
file_handle.select(pages_list)
file_handle.save(output_file)
Inserting Pages with PyMuPDF
The PyMuPDF
library allows you to insert pages as well. It provides the methods newPage()
for adding completely blank pages, and insertPage()
in order to add an existing page. The next example shows how to add a page from a different PDF document at the end of another one.
# !/usr/bin/python
# Recall that PyMuPDF is imported as fitz
import fitz
original_pdf_path = "example.pdf"
extra_page_path = "extra-page.pdf"
output_file_path = "example-extended.pdf"
original_pdf = fitz.open(original_pdf_path)
extra_page = fitz.open(extra_page_path)
original_pdf.insertPDF(extra_page)
original_pdf.save(output_file_path)
Splitting Even and Odd Pages with PyPDF2
The following example uses PyPDF2 and does this by taking a file, separating it into its even and odd pages, saving the even pages in the file even.pdf
, and the odd pages in odd.pdf
.
This Python script starts with the definition of two output files, even.pdf
and odd.pdf
, as well as their corresponding writer objects pdf_writer_even
and pdf_writer_odd
. Next, in a for-loop the script goes through the entire PDF file, and reads one page after the other. Pages with even page numbers are added to the stream pdf_writer_even
using addPage()
, and odd numbers are added to the stream pdf_writer_odd
. At the end the two streams are saved to disk in separate files, as defined before.
#!/usr/bin/python3
from PyPDF2 import PdfFileReader, PdfFileWriter
pdf_document = "example.pdf"
pdf = PdfFileReader(pdf_document)
# Output files for new PDFs
output_filename_even = "even.pdf"
output_filename_odd = "odd.pdf"
pdf_writer_even = PdfFileWriter()
pdf_writer_odd = PdfFileWriter()
# Get reach page and add it to corresponding
# output file based on page number
for page in range(pdf.getNumPages()):
current_page = pdf.getPage(page)
if page % 2 == 0:
pdf_writer_odd.addPage(current_page)
else:
pdf_writer_even.addPage(current_page)
# Write the data to disk
with open(output_filename_even, "wb") as out:
pdf_writer_even.write(out)
print("created", output_filename_even)
# Write the data to disk
with open(output_filename_odd, "wb") as out:
pdf_writer_odd.write(out)
print("created", output_filename_odd)
Conclusion
Re-writing and re-arranging the structure of a PDF is fairly easy with the libraries pdfrw
, PyMuPDF
, and PyPDF2
. With just a few lines of Python code you can delete pages, separate them, and add new content.