Image Captioning with CNNs and Transformers with Keras - Data Preprocessing

Data Preprocessing

David Landup
David Landup

Let's start with importing all of the packages and libraries we'll be using:

# tensorflow version
import tensorflow as tf
print('tensorflow: %s' % tf.__version__)
# keras version
from tensorflow import keras
print('keras: %s' % keras.__version__)
import keras_cv
print('keras_cv: %s' % keras_cv.__version__)
import keras_nlp
print('keras_nlp: %s' % keras_nlp.__version__)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2

import os

Downloading the Data

Next up, let's download the dataset we'll be working with - Flickr8k. It was removed from the website publically, but is widely available on Kaggle and other repositories where others are now hosting it. It contains 8K images with 5 human-written captions each. A larger one would be Flickr30K which follows the same format - so you can easily substitute it for a larger one in this project.

Let's use kaggle datasets to download the dataset and unzip it:

! kaggle datasets download -d adityajn105/flickr8k
! unzip -d Flickr8k_Dataset

It's unzipped into a Flickr8k_Dataset directory, with a text file, named captions.txt, and an Images directory containing all of the images. Let's save this useful information in a config dictionary, alongside the batch size, for global access:

Start project to continue
Lessson 3/3
You must first start the project before tracking progress.
Mark completed

© 2013-2022 Stack Abuse. All rights reserved.