Our team of experts is ready to answer!
You can contact us directly
Telegram iconFacebook messenger iconWhatApp icon
Fill in the form below and you will receive an answer within 2 working days.
Or fill in the form below and you will receive an answer within 2 working days.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Reading Time
7 Minutes
Sergey Shutov
Developer at OpenCV.ai
Ekaterina Romanova
Developer at OpenCV.ai
The Ultimate Guide to YOLO (You Only Look Once)

YOLO Unraveled: A Clear Guide

This comprehensive guide offers insights into the latest YOLO models and algorithms comparison, helping developers and researchers choose the most effective solution for their projects.
January 25, 2024

The YOLO (You Only Look Once) family of models is a popular and rapidly evolving series of image object detection algorithms. Independent research teams are constantly releasing new models that outperform their predecessors in terms of quality, speed, and size, while also providing open access to the code, weights, and detailed analysis of their experiments. Eight teams have contributed to the development of YOLO, with 6 releasing new models in the last 1.5 years. We've compiled key information about YOLO in brief, structured materials to help you navigate this diversity. You Only Look Once to comprehend YOLO :).

These materials will be helpful for:

Developers: To find the best solution that meets technical and legal requirements. YOLO's diverse range of models offers options for various applications, from real-time processing on mobile devices to high-precision detection in complex scenes. Understanding the strengths and limitations of each model can guide the selection of the most appropriate version for a specific project.

• Researchers: To understand the history of algorithm development and draw inspiration for their research. YOLO's evolution highlights how different techniques have impacted the performance and accuracy of the models. For instance, the transition from YOLOv3 to YOLOv4 and further to YOLOv5 and beyond illustrates significant improvements in detection accuracy and processing speed, often achieved through architectural changes, advanced training techniques, and more efficient use of computational resources.

Materials

The provided diagram offers some information on the YOLO (You Only Look Once) family of object detection models. Let's explore what this information reveals and how it can be utilized.

1. Chronology

The models are organized according to their release dates.

Generally, newer models are more optimized (their points are positioned lower and to the right on the latency-accuracy graph). Remember, each YOLO model has several versions, ranging from the fastest and least accurate to the slowest and most accurate.

Indirectly assess the compatibility of library versions used in the model's code with your own. This is important for developers who need to integrate these models into existing systems or those planning an upgrade.

2. Authors

Models made by the same team of authors are placed in a band of one color.

A uniform codebase makes it easier to switch to a new model if it's made by the same team.

A consistent development track helps understand the logic behind changes and the direction of future research.

Opportunity to contact the authors for queries about the models. This can be an excellent option for both developers and researchers to clarify doubts, or even collaborate on future projects.

3. Base Model

Arrows on the diagram represent the parent-child relationship, showing which model was used as the foundation for developing a subsequent one. This relationship can be utilized in various ways:

By viewing the complex algorithm as a series of modifications from a base model, it becomes simpler to understand its development.

Transitioning to a new model is more straightforward if it's a descendant of the current one, as this often eliminates the need to learn the architecture from scratch.

By studying parent-child model pairs, you can effectively assess how various techniques have contributed to the enhancement of the models.

4. License

Understanding the license of a model is important. Models with licenses that allow commercial use are marked in green, and those that prohibit such use are marked in red. This knowledge is essential for developers to ensure that the use of the model is legally valid, especially for commercial purposes.

5. Framework

Consider models written in the framework you're already proficient with or the one used in your existing pipeline. This approach streamlines development and integration processes, saving time both in learning new frameworks and in integrating the YOLO model into your system or workflow.

Invest time in learning the framework that is likely to host future top models. To evaluate this likelihood, count the frequency of each framework's use over the years and extrapolate over time. The diagram indicates that, currently, the trend favors PyTorch.

Image Source: OpenCV.ai (data from YOLO-NAS repository)

The provided latency-accuracy graphs from the repositories of the two latest models (where DAMO-YOLO represents DAMO-YOLO v0.3.1) offer insightful comparisons. Although speed measurements in both cases were performed on an NVIDIA T4 GPU, it's important to note that the values are only directly comparable within each graph. Since both models were released around the same time, there's no single graph featuring all the latest models.

As mentioned, the rule that newer models are typically more optimized is generally true. However, there are cases where faster versions of the previous generations achieve higher quality than faster versions of the second, while for slower versions the trend is reversed (as seen with YOLOX and YOLOv8). This underlines the importance of consulting these graphs when choosing a model for a specific task, especially when you have known performance constraints.

Additionally, we have prepared a comprehensive table. It includes:

Links to the article and repository.

Whether the repository is still being updated.

Whether the model is Anchors-based or Anchors-free.

Details about the model's Backbone, Neck, and Head.

Information on Loss, data augmentations, training strategies.

Unique features of the model.

Usage examples

Here's a recommended course of action for popular scenarios using the provided materials.

How to choose a model for your task using these materials?

1. License Consideration: Identify models that align with the license requirements of your task. For example, if your project is for commercial use, you should focus on models that are compatible with your licensing needs.

2. Framework Filtering: Focus on models written in the framework you’re familiar with.

3. Consider Recent Models: Look at models from the last couple of years.

4. Refer to Latency-Accuracy Graphs: Identify points of interest based on priorities - if quality is paramount, choose points from the right side; if speed is essential, select points from the left.

5. Compatibility with Target Device: Choose models that can be successfully converted to the required format (ONNX, TorchScript, TensorRT, etc.). Check the repository (via links in the table) for available conversion scripts, or write one yourself.

6. Speed Constraints: Select models that meet the speed limitations on the target device. Models often have a parameter that allows an infinite number of versions with different latency-accuracy ratios. Input resolution can also be varied. However, you may want to use pre-trained weights, so models cannot be varied easily in this case.

7. Opt for the Best Quality on Target Data: Quality on COCO is just an approximate indication for your data, so train several models and then choose the best one.

If you're already using one of the models, how to meet new requirements with minimal changes to the project?

a. To Increase Quality While Maintaining Speed:

1. Move within the same row towards the right: A descendant model by the same authors, on the same framework, saves time on learning a new architecture and code, as well as on integration with other parts of the pipeline. It will likely have the same license (e.g., YOLOv5→YOLOv8, PP-YOLO→PP-YOLOE, YOLOv6→YOLOv6 v3.0).

2. Alternatively, study the techniques that allowed more modern models to surpass their predecessors in quality and implement them in your code. This could be faster than switching to a new model. It also allows the use of techniques from models with licenses that are unavailable to you.

b. To Switch from a Model with a “Red” License to a “Green” License:

1. Find a model with a green license that shares a common predecessor with your current model (e.g., YOLOv7 → YOLOX, where YOLOv3-pytorch is the common predecessor).

We hope that our road map will help you to find the shortest way to the best solution for the challenge you are facing!

Let's discuss your project

Book a complimentary consultation

Read also

April 12, 2024

Digest 19 | OpenCV AI Weekly Insights

Dive into the latest OpenCV AI Weekly Insights Digest for concise updates on computer vision and AI. Explore OpenCV's distribution for Android, iPhone LiDAR depth estimation, simplified GPT-2 model training by Andrej Karpathy, and Apple's ReALM system, promising enhanced AI interactions.
April 11, 2024

OpenCV For Android Distribution

The OpenCV.ai team, creators of the essential OpenCV library for computer vision, has launched version 4.9.0 in partnership with ARM Holdings. This update is a big step for Android developers, simplifying how OpenCV is used in Android apps and boosting performance on ARM devices.
April 4, 2024

Depth estimation Technology in Iphones

The article examines the iPhone's LiDAR technology, detailing its use in depth measurement for improved photography, augmented reality, and navigation. Through experiments, it highlights how LiDAR contributes to more engaging digital experiences by accurately mapping environments.