Working with PDFs in Python: Adding Images and Watermarks

Introduction

Today, a world without the Portable Document Format (PDF) seems to be unthinkable. It has become one of the most commonly used data formats ever. Up to PDF version 1.4, displaying a PDF document in an according PDF viewer works fine. Unfortunately, the features from the newer PDF revisions, such as forms, are tricky to implement, and still require further work to be fully functional in the tools. Using various Python libraries you can create your own application in an comparable easy way.

This article is part two of a little series on PDFs with Python. In part one we already gave you an introduction into reading PDF documents using Python, and started with a summary of the various Python libraries. An introduction followed that showed how to manipulate existing PDFs, and how to read and extract the content - both the text and images. Furthermore, we showed you how to split documents into its single pages.

In this article you will learn how add images to your PDF in the form of watermarks, stamps, and barcodes. For example this is quite helpful in order to stamp or mark documents that are intended to be read by a specific audience, only, or have a draft quality, or to simply add a barcode for identification purposes.

Adding a Watermark via Command Line with pdftk

In order to add a watermark to an existing PDF on a Unix/Linux command-line we can use pdftk. The name abbreviates "PDF Toolkit", and describes itself as "a simple tool for doing everyday things with PDF documents". pdftk has been ported to Java, and made available as an according package for Debian GNU/Linux.

In order for this to work you need to have a background image available that comes with the word "DRAFT" on a transparent layer, which you can apply to an existing single-page PDF as follows:

\$ pdftk input.pdf background background.pdf output output.pdf


The pdftk tool takes in the PDF file input.pdf, merges it with background.pdf, and outputs the result to the file output.pdf. Figure 1 shows the output of this action.

For more complex actions, like stamping a document with different stamps per page, have a look at the description at the PDF Labs project page. We also show the stamping use-case in this article below, although our example uses the library pdfrw instead of pdftk.

The PyPDF library provides a method called mergepage() that accepts another PDF to be used as a watermark or stamp.

In the example below we start with reading the first page of the original PDF document and the watermark. To read the file we use the PdfFileReader() class. As a second step we merge the two pages by using the mergepage() method. Finally, we will write the output to the output file. This is done in three steps - creating an object based on the PdfFileWriter() class, adding the merged page to this object using the addPage() method, and writing the new content to the output page using the write() method.

# !/usr/bin/python
# Adding a watermark to a single-page PDF

import PyPDF2

input_file = "example.pdf"
output_file = "example-drafted.pdf"
watermark_file = "draft.pdf"

with open(input_file, "rb") as filehandle_input:
# read content of the original file

with open(watermark_file, "rb") as filehandle_watermark:
# read content of the watermark

# get first page of the original PDF
first_page = pdf.getPage(0)

# get first page of the watermark PDF
first_page_watermark = watermark.getPage(0)

# merge the two pages
first_page.mergePage(first_page_watermark)

# create a pdf writer object for the output file
pdf_writer = PyPDF2.PdfFileWriter()

with open(output_file, "wb") as filehandle_output:
# write the watermarked file to the new file
pdf_writer.write(filehandle_output)


PyMuPDF are the Python bindings for MuPDF, which is a lightweight PDF and XPS viewer. In your Python script the module that needs to be imported is named fitz, and this name goes back to the previous name of PyMuPDF.

For this section we are going to show how to add an image by using a barcode as an example since this is a pretty common task. Although the same steps can be applied to adding any kind of image to a PDF.

In order to decorate a PDF document with a barcode we simply add an image as another PDF layer at the desired position. As for image formats, PyMuPDF accepts PNG or JPEG, but not SVG.

The position of the image is defined as a rectangle using the method fitz.Rect() that requires two pairs of coordinates - (x1,y1) and (x2,y2). PyMuPDF interprets the upper-left corner of the page as (0,0).

Having opened the input file and extracted the first page from it, the image containing the barcode is added using the method insertImage(). This method requires two parameters - the position delivered via imageRectangle, and the name of the image file to be inserted. Using the save() method the modified PDF is stored to disk. Figure 2 shows the barcode after it was added to the example PDF.

# !/usr/bin/python

import fitz

input_file = "example.pdf"
output_file = "example-with-barcode.pdf"
barcode_file = "barcode.png"

# define the position (upper-right corner)
image_rectangle = fitz.Rect(450,20,550,120)

# retrieve the first page of the PDF
file_handle = fitz.open(input_file)
first_page = file_handle[0]

first_page.insertImage(image_rectangle, fileName=barcode_file)

file_handle.save(output_file)


pdfrw is a pure Python-based PDF parser to read and write PDF documents. It faithfully reproduces vector formats without rasterization. For Debian GNU/Linux, the package repository contains releases for both Python 2 and 3.

The following example will demonstrate how to add a barcode or watermark to an existing PDF that contains multiple pages. From the pdfrw package it is sufficient that you import the three classes PdfReader, PdfWriter, and PageMerge. Next, you establish the according reader and writer objects to access the contents of both the PDF, and the watermark. For each page in the original document you continue creating a PageMerge object to which you add the watermark, and which is rendered using the render() method. Finally, you write the modified pages to the output file. Figure 3 shows the modified document next to the code that made the addition possible.

# !/usr/bin/python
# Adding a watermark to a multi-page PDF

from pdfrw import PdfReader, PdfWriter, PageMerge

input_file = "example.pdf"
output_file = "example-drafted.pdf"
watermark_file = "barcode.pdf"

# define the reader and writer objects
writer_output = PdfWriter()
watermark = watermark_input.pages[0]

# go through the pages one after the next