This article is the third in a series on working with PDFs in Python:
- Reading and Splitting Pages
- Adding Images and Watermarks
- Inserting, Deleting, and Reordering Pages (you are here)
This article is part three of a little series on working with PDFs in Python. In the previous articles we gave an introduction into reading PDF documents using Python. So far you have learned how to manipulate existing PDFs, and to read and extract the content - both text and images. Furthermore, we have discussed splitting documents into its single pages, as well as adding watermarks and barcodes.
Now in this article we will go one step further and demonstrate how to rearrange a PDF document in a few different ways.
- Deleting Pages with pdfrw
- Deleting Pages with PyMuPDF
- Inserting Pages with PyMuPDF
- Splitting Even and Odd Pages with PyPDF2
Deleting Pages with pdfrw
Deleting individual pages from a PDF file is as simple as the following:
- Read a PDF as an input file
- Write selected pages to a new PDF as an output file
The following example removes the first two pages from a PDF document. Using the pdfrw library, the file is read with the help of the
PdfReader() class first. Except for both the first and second page, each page is added to the output file using the
addpage() method, and then written to disk eventually.
Figure 1 shows the output when executing the code on a four-page PDF file.
# !/usr/bin/python # Remove the first two pages (cover sheet) from the PDF from pdfrw import PdfReader, PdfWriter input_file = "example.pdf" output_file = "example-updated.pdf" # Define the reader and writer objects reader_input = PdfReader(input_file) writer_output = PdfWriter() # Go through the pages one after the next for current_page in range(len(reader_input.pages)): if current_page > 1: writer_output.addpage(reader_input.pages[current_page]) print("adding page %i" % (current_page + 1)) # Write the modified content to disk writer_output.write(output_file)
Deleting Pages with PyMuPDF
The PyMuPDF library comes with quite a few sophisticated methods that simplify deleting pages from a PDF file. It allows you to specify either a single page (using the
deletePage() method), or a range of page numbers (using the
deletePageRange() method), or a list with the page numbers (using the
The following example will demonstrate how to use a list in order to select the pages to keep from the original document. Be aware that the pages that are not specified will not be part of the output document. In our case the output document contains the first, second, and fourth pages only.
# !/usr/bin/python # Recall that PyMuPDF is imported as fitz import fitz input_file = "example.pdf" output_file = "example-rearranged.pdf" # Define the pages to keep - 1, 2 and 4 file_handle = fitz.open(input_file) pages_list = [0,1,3] # Select the pages and save the output file_handle.select(pages_list) file_handle.save(output_file)
Inserting Pages with PyMuPDF
The PyMuPDF library allows you to insert pages as well. It provides the methods
newPage() for adding completely blank pages, and
insertPage() in order to add an existing page. The next example shows how to add a page from a different PDF document at the end of another one.
# !/usr/bin/python # Recall that PyMuPDF is imported as fitz import fitz original_pdf_path = "example.pdf" extra_page_path = "extra-page.pdf" output_file_path = "example-extended.pdf" original_pdf = fitz.open(original_pdf_path) extra_page = fitz.open(extra_page_path) original_pdf.insertPDF(extra_page) original_pdf.save(output_file_path)
Splitting Even and Odd Pages with PyPDF2
The following example uses PyPDF2 and does this by taking a file, separating it into its even and odd pages, saving the even pages in the file
even.pdf, and the odd pages in
This Python script starts with the definition of two output files,
odd.pdf, as well as their corresponding writer objects
pdf_writer_odd. Next, in a for-loop the script goes through the entire PDF file, and reads one page after the other. Pages with even page numbers are added to the stream
addPage(), and odd numbers are added to the stream
pdf_writer_odd. At the end the two streams are saved to disk in separate files, as defined before.
#!/usr/bin/python3 from PyPDF2 import PdfFileReader, PdfFileWriter pdf_document = "example.pdf" pdf = PdfFileReader(pdf_document) # Output files for new PDFs output_filename_even = "even.pdf" output_filename_odd = "odd.pdf" pdf_writer_even = PdfFileWriter() pdf_writer_odd = PdfFileWriter() # Get reach page and add it to corresponding # output file based on page number for page in range(pdf.getNumPages()): current_page = pdf.getPage(page) if page % 2 == 0: pdf_writer_odd.addPage(current_page) else: pdf_writer_even.addPage(current_page) # Write the data to disk with open(output_filename_even, "wb") as out: pdf_writer_even.write(out) print("created", output_filename_even) # Write the data to disk with open(output_filename_odd, "wb") as out: pdf_writer_odd.write(out) print("created", output_filename_odd)
Re-writing and re-arranging the structure of a PDF is fairly easy with the libraries pdfrw, PyMuPDF, and PyPDF2. With just a few lines of Python code you can delete pages, separate them, and add new content.