<h3 id="yolov5recap">YOLOv5 Recap</h3>
We've covered YOLO in more detail in the previous lesson and will more be focusing on applying it in a practical setting here, but let's make a quick recap. YOLO is an object detection methodology, upon which a family of models, oftentimes called YOLOv[Version], or other variations.
The methodology works by passing the image through a CNN backbone, and outputing an SxS grid (feature map) which encodes the spatial location of objects of interest, as well as the labels and confidence scores in a single output tensor.
YOLOv1, YOLOv2 and YOLOv3 were created by the same authors, and started the methodology. YOLOv4, YOLOv5 and YOLOv7 were created by individual contributors and companies, and open-sourced. PP-YOLO, YOLOX and YOLOR are some other well-performing models.
<blockquote>
YOLOv7 was released in July of 2022, and the source code and training scripts can be found at the official <a target="_blank" rel="nofollow noopener noreferrer" href="https://github.com/WongKinYiu/yolov7">YOLOv7 GitHub repository</a>. At the time of writing, in July of 2022, it doesn't have a programmatic API that YOLOv5, developed by Ultralytics, does. The previous highest model on the associated PapersWithCode leaderboard was ConvNeXt-XL++, with 9FPS and 55.2 box AP. YOLOv7 had 36FPS with 56.8 box AP. That's real-time!
</blockquote>
Although YOLOv7 does perform better than YOLOv5 by a couple of percentage points in accuracy, the speeds are fairly comparable, and the practicality of YOLOv5's API, export and serving options and rich documentation makes it an easier choice for solving practical problems and deploying solutions.

David Landup

Ultralytics' YOLOv5

Let's go ahead and get to the crux of the project. Data collection, labelling and preprocessing.
Object detection is a specialized application of computer vision, and typically, you'll need to detect something specific for a niche you're working in, such as the unique anomalies that can occur in a specific toy factory, on the line 25.
Each dataset will be relatively unique, so it'll be harder to find datasets to pre-train on, but not impossible. Transferring something is better than transferring nothing. You might be lucky enough that a team has gone out of their way to label the data for you and provide you with the clean files to work with, so you can call the <code>train.py</code> script and everything works!
<h3 id="datacollection">Data Collection</h3>
More commonly, you'll be collecting your own data and labelling it yourself. Since local traffic signs will always be at least a tiny bit different (their frequency, positions and style) between countries - this is a great time for a little excercise! If you own a car, attach a phone to act as a dash cam, and record a few minutes of representative driving through your neighborhood. If you don't own a car, kindly ask a public bus driver to stand in the front to record. If all else fails, download a dashcam video from the internet.

Data Collection, Labelling and Preprocessing

With the images and labels set in their respective folders, we can start training our model! Let's create a <code>data.yaml</code> file that encompasses the information required for the <code>train.py</code> script:
<pre><code class="hljs">import yaml
config = {&#x27;path&#x27;: os.path.join(os.getcwd(), &#x27;drive-data&#x27;),
 &#x27;train&#x27;: &#x27;images/train&#x27;,
 &#x27;test&#x27;: &#x27;images/test&#x27;,
 &#x27;val&#x27;: &#x27;images/valid&#x27;,
 &#x27;nc&#x27;: len(labels),
 &#x27;names&#x27;: labels}
 
with open(os.path.join(&quot;drive-data&quot;, &quot;data.yaml&quot;), &quot;w&quot;) as file:
 yaml.dump(config, file, default_flow_style=False)
</code></pre>
<h3 id="training">Training</h3>
We'll initiate the training script as last time - but with a larger number of epochs, and loading the weights from the previous run, rather than from the network that was just pre-trained on MS COCO:
<pre><code class="hljs">$ python3 yolov5/train.py --batch -1 --epochs 100 --data drive-data/data.yaml --weights yolov5/runs/train/exp/weights/best.pt
</code></pre>
This kicks off a relatively short training session, as our dataset is fairly small:
<pre><code class="hljs">Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to yolov5/runs/train/exp2
Starting training for 100 epochs...

 Epoch gpu_mem box obj cls labels img_size
 0/99 7.74G 0.1108 0.02974 0.04983 67 640: 100% 2/2 [00:08&lt;00:00, 4.20s/it]
 Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 1/1 [00:00&lt;00:00, 1.15it/s]
 all 12 17 0 0 0 0
...
Epoch gpu_mem box obj cls labels img_size
 99/99 12.9G 0.02097 0.007557 0.01709 88 640: 100% 2/2 [00:05&lt;00:00, 2.73s/it]
 Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 1/1 [00:00&lt;00:00, 4.51it/s]
 all 12 17 0.735 0.611 0.638 0.413

100 epochs completed in 0.182 hours.
Optimizer stripped from yolov5/runs/train/exp2/weights/last.pt, 14.4MB
Optimizer stripped from yolov5/runs/train/exp2/weights/best.pt, 14.4MB

Validating yolov5/runs/train/exp2/weights/best.pt...
Fusing layers... 
Model summary: 213 layers, 7023610 parameters, 0 gradients, 15.8 GFLOPs
...
</code></pre>

Training a YOLOv5 Model on Custom Data

Let's start by training the network on some public datasets of road signs. They're fairly universal, but not always identical in the world. YOLOv5 can be instantiated as a blank slate or pre-trained on MS COCO. MS COCO contains images of various objects in context, including traffic lights, cars, buses and trains.
A network pretrained on MS COCO already has a fair bit of knowledge about traffic and objects within traffic, so using Transfer Learning, we can leverage some of that knowledge to make our model more robust and accurate even if we only have small training sets. This is especially helpful since collecting and labelling the data manually takes time and effort and creating a large dataset can be prohibitively expensive, as we'll see in a later section of this project.
<blockquote>
To further decrease reliance on custom-labelled data, and increase the model's robustness, we'll transfer MS COCO weights to a public road sign dataset, and only then fine tune it on our own data.
</blockquote>
Let's start with cloning the YOLOv5 repository and installing its dependencies. This is required from training, as the <code>torch.hub</code> model is only used for inference:
<pre><code class="hljs">$ git clone https://github.com/ultralytics/yolov5
$ pip install -r yolov5/requirements.txt
</code></pre>

Training on Public Roboflow Datasets

If you drive - there's a chance you enjoy cruising down the road. A responsible driver pays attention to the road signs, and adjusts their speed in accordance to the laws mandating that you follow the speed limit in a given area, amongst other signs that regulate drivers.
Though - what if you miss a sign? Not everyone has a sidekick to also pay attention and to tell them when there's a change in the speed limit or if there's another sign worth acknowledging. Some cars, especially modern ones, are equiped with cameras that read road signs in real time and show the current limit on your dashboard. For example, the Citroen C3 has a &quot;Memory&quot; button, which applies the latest noticed speed limit to your cruise control if it's active.
<blockquote>
Wouldn't it be nice to have a system that also watches for road signs and gives you audio cues when it sees one?
</blockquote>
Whether it's a speed limit sign, a stop sign, or another sign - having a side passenger that reminds you of the signs can be pretty useful, especially if this side passenger doesn't blink, only watches for the signs, and runs on your phone if your car doesn't already have a system built in. My old car doesn't have this system and I'd love to use my already existing phone to also look out for the signs, with no extra cost. Furthermore, if you're app-savvy, you can integrate the model into an application that plays sounds or audio clips of voices calling the road signs out loud.
<blockquote>
In this guided project, we'll use a mixture of public datasets, and create our own dataset, manually prepare and label it, train and fine-tune a YOLOv5 model with Transfer Learning to detect road signs. We'll then take a look at how PyTorch models are generally deployed to the web with Flask, as well as Android and iOS devices. This encapsulates the entire life-cycle of an object detection application.
</blockquote>

Introduction

Hosting platforms that offer Inference as a Service are more numerous now than ever before. Roboflow is one of them - and you can upload your model to Roboflow, which will host it and spin up an API that you can send images to and receive predictions from.
This is a low-code solution, but doesn't let you have the good old control over the system. Some prefer deploying the model to their own servers and applications, such as creating a Flask app, and serving an endpoint that acts as an API for your model's inference. Spinning up a Flask REST API takes a minute or so, especially when you consider that the prediction your model will make is essentially a mathematical function call on the input.
Finally - you may decide to totally skip APIs and directly deploy a model to a phone or an edge device such as an Arduino or Raspberry Pi. Platform GUIs and offering's change, so if you want to deploy to a platform to host the model - they usually have a nicely detailed, simple tutorial page.
In this section, I'll focus on spinning up a Flask API and deploying to an iOS/Android device. Mobile development is a very separate skillset, and it out of scope for this book. I'll cover the process of preparing models for mobile phones, and loading them in the respective environments, but will assume experience with Android/iOS development on your part as it is required to make any actual use of the model once it reaches the environment.

Deploying a YOLOv5 Model to Flask, Android and iOS

If you drive - there's a chance you enjoy cruising down the road. A responsible driver pays attention to the road signs, and adjusts their speed in accordance to the laws mandating that you follow the speed limit in a given area, amongst other signs that regulate drivers.
Though - what if you miss a sign? Not everyone has a sidekick to also pay attention and to tell them when there's a change in the speed limit or if there's another sign worth acknowledging. Some cars, especially modern ones, are equiped with cameras that read road signs in real time and show thex current limit on your dashboard. For example, the Citroen C3 has a &quot;Memory&quot; button, which applies the latest noticed speed limit to your cruise control if it's active.
<blockquote>
Wouldn't it be nice to have a system that also watches for road signs and gives you audio cues when it sees one?
</blockquote>
Whether it's a speed limit sign, a stop sign, or another sign - having a side passenger that reminds you of the signs can be pretty useful, especially if this side passenger doesn't blink, only watches for the signs, and runs on your phone if your car doesn't already have a system built in. My old car doesn't have this system and I'd love to use my already existing phone to also look out for the signs, with no extra cost. Furthermore, if you're app-savvy, you can integrate the model into an application that plays sounds or audio clips of voices calling the road signs out loud.
<blockquote>
In this guided project, we'll use a mixture of public datasets, and create our own dataset, manually prepare and label it, train and fine-tune a YOLOv5 model with Transfer Learning to detect road signs. We'll then take a look at how PyTorch models are generally deployed to the web with Flask, as well as Android and iOS devices. This encapsulates the entire life-cycle of an object detection application.
</blockquote>

 <div class="alert alert-note">
 <div class="flex">
 
 <div class="flex-shrink-0 mr-3">
 <img src="/assets/images/icon-information-circle-solid.svg" class="icon" aria-hidden="true" />
 </div>
 
 Note: This Guided Project is part of our in-depth course on <a target="_blank" href="https://stackabuse.com/courses/practical-deep-learning-for-computer-vision-with-python/">Practical Deep Learning for Computer Vision</a>, but doesn't require almost any prerequisite knowledge.

 </div>
 </div>