Our team of experts is ready to answer!
You can contact us directly
Telegram iconFacebook messenger iconWhatApp icon
Fill in the form below and you will receive an answer within 2 working days.
Or fill in the form below and you will receive an answer within 2 working days.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Reading Time
3 Minutes
Anastasia Reshetova
Developer at OpenCV.ai
Navigating 3D Space: Guide to Object Position Prediction
In the vast world of 3D object pose estimation, one group of tasks demands a distinct spotlight. This is where we delve into predicting the position of rigid objects in 3D, which comes down to 6DoF (Degrees of Freedom) estimation.
April 10, 2024

Introduction

Object Position in 3D space - 6Dof metrics

The 6dof meaning refers to an object's six degrees of freedom in 3D space, allowing for movement and rotation across three axes, a fundamental concept for our exploration of object position prediction. In this guide, we will dissect the intricate process of object position prediction in 3D space, discussing the mechanics of rotation, translation, and scale, focusing on metrics for evaluating these predictions.

The 6DoF estimation task 
encompasses predicting an object's position in 3D space (X, Y, Z coordinates), along with its rotation around these axes, called yaw, pitch, and roll.

6DoF Estimation

Though various approaches exist to predict these, measuring their effectiveness is no simple task. Quality metrics in 2D space tasks have reached a certain level of consensus, but in 3D, things are a bit more complicated.

Before we start reviewing metrics, let’s look closer at this task.

Task overview

3D pose estimation begins with an RGB (and sometimes RGBD) image that features the target object. The aim is to predict the object's 6D position, representing the rigid transformation from the object's coordinate system to the camera's coordinate system.

3D Pose Estimation task Overview

A complete 6D pose consists of two elements - the 3D rotation (3x3 matrix R) of the object and the 3D translation (3x1 vector t). For calculation convenience, they can both be padded to 4x4 matrices.

This task is also helpful when we deal with multi-object tracking.

Rotation

Rotation involves a rotation matrix (R), which essentially breaks down into three 2D rotation matrices. The rotation matrix is defined by the yaw, pitch, and roll angles:

Rotation Matrix Explanation

where α, β, and γ are yaw, pitch, and roll angles, respectively.

Translation

In the translation matrix, we take into account the distances in the x, y, and z coordinates (𝒗x, 𝒗y, 𝒗z). 

Translation transformation matrix T in the 3D space is a 4D matrix with the following structure:

Translation Matrix Explanation

where 𝒗x, 𝒗y, 𝒗z are the translation distances in x, y, and z.

In other terms, the translation is a vector (t), which, when added to the original position, shifts the entire model in 3D space

GT Metrics

Evaluation of these transformations is usually done via two groups of metrics: those measuring the whole transformation matrix and those measuring R, T, and S matrices separately. We will overview the 2 most common metrics - one from each group.

The most common overall metric is the average distance (ADD) or ADD(s) for symmetric objects. Here, the goal is to measure the distance between the Ground Truth (GT) 3D point cloud and the predicted 3D point cloud resulting from the transformation.

As a first step, predicted and GT 3D point clouds, are calculated from a base model using predicted and GT transformation matrices. Then the distance is measured for each point, and the mean distance is calculated. The mean distance is calculated for each object.

Distance Calculation

As a second step, the threshold of mean distance is picked. Then the percentage of objects with the mean distance below this threshold is calculated. This number is called ADD accuracy. ADD(s) is the same metric for symmetric objects.

R, T, and S

However, in certain cases, evaluating rotation, translation, and scale separately can provide deeper insights into the error sources. 

The translation error is typically measured as the distance between the predicted and GT vectors. 

The scale error is calculated by dividing the GT scale by the predicted scale.

Calculating the rotation error, on the other hand, is more challenging. Rotation matrices belong to the 3D rotation group, often denoted SO(3). Therefore the difference between two rotation matrices, Rgt and Rpred, can be calculated by the metric of distance in SO(3). 

There are several approaches to defining a distance function or metric in a 3D rotation group. You can take a closer look at them in this paper in section 3. Some of them are based on quaternions, and some use the direct comparison of matrices, or for example, deviation from identity matrix. We will overview the most representable and intuitive method here.

This method calculates the solid angle between Rgt and Rpred matrices:

R, T, S calculation

From this equation angle of rotation can be easily calculated:

Angle of rotation calculation

As a result, we get a solid angle, which would represent the overall error angle in 3D space. The advantage of this metric is that it gives a spatial visual representation of the error.

Conclusion

Evaluation is one of the critical processes in Deep Learning, and the right choice of evaluation gt metrics is crucial. Only with reliable and interpretable metrics can we not only make the right decisions but also explain them to our colleagues or customers.

Discover how integrating AI can elevate your projects across different sectors, thanks to the specialized computer vision services from OpenCV.ai. Our team is passionate about utilizing AI Services to innovate and redefine practices within numerous industries.

Let's discuss your project

Book a complimentary consultation

Read also

June 27, 2024

AI in Fashion

Everyone wants clothes that fit better and cost less.
June 23, 2024

Artificial intelligence and computer vision — behind the microphone and on the stage

Robots can't do the dishes or clean our house yet, but they can already create a symphony.
June 12, 2024

Why it's important to calibrate multiple cameras — and how to do it right

In the previous article, we talked about the importance of the camera calibration process, which is employed by computer vision and machine learning algorithms. We discussed how to do this by placing a pattern or, in some cases, using surrounding objects as a pattern in the camera's field of view. But if there is more than one camera, things get complicated!