We've covered YOLO in more detail in the previous lesson and will more be focusing on applying it in a practical setting here, but let's make a quick recap. YOLO is an object detection methodology, upon which a family of models, oftentimes called YOLOv[Version], or other variations.
The methodology works by passing the image through a CNN backbone, and outputing an SxS grid (feature map) which encodes the spatial location of objects of interest, as well as the labels and confidence scores in a single output tensor.
YOLOv1, YOLOv2 and YOLOv3 were created by the same authors, and started the methodology. YOLOv4, YOLOv5 and YOLOv7 were created by individual contributors and companies, and open-sourced. PP-YOLO, YOLOX and YOLOR are some other well-performing models.
YOLOv7 was released in July of 2022, and the source code and training scripts can be found at the official YOLOv7 GitHub repository. At the time of writing, in July of 2022, it doesn't have a programmatic API that YOLOv5, developed by Ultralytics, does. The previous highest model on the associated PapersWithCode leaderboard was ConvNeXt-XL++, with 9FPS and 55.2 box AP. YOLOv7 had 36FPS with 56.8 box AP. That's real-time!
Although YOLOv7 does perform better than YOLOv5 by a couple of percentage points in accuracy, the speeds are fairly comparable, and the practicality of YOLOv5's API, export and serving options and rich documentation makes it an easier choice for solving practical problems and deploying solutions.