Our team of experts is ready to answer!
You can contact us directly
Telegram iconFacebook messenger iconWhatApp icon
Fill in the form below and you will receive an answer within 2 working days.
Or fill in the form below and you will receive an answer within 2 working days.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Reading Time
3 Minutes
Daniil Osokin
Developer at OpenCV.ai
Revolutionizing Object Tracking: "Tracking Everything Everywhere All at Once" Paper Review

Tech track #1. "Tracking Everything Everywhere All at Once" review

The "Tracking Everything Everywhere All at Once" paper, a collaborative work by Cornell University, Google Research, and UC Berkeley, offers a breakthrough solution to the problem of tracking any point in video footage. The method maps each pixel from every frame into a common 3D space, tracing its trajectory across time. This method is not designed for real-time tracking but for in-depth recorded video analysis. In this article, we highlight the most exciting points of this paper.
June 15, 2023

Introduction

Last week Cornell University, Google Research, and UC Berkeley published a paper with an intriguing title "Tracking Everything Everywhere All at Once". It proposes a solution for the track any point problem. As the name hints, it tracks everything everywhere - meaning pixels across all frames, even if they are occluded. See the beautiful visualizations from the paper’s OmniMotion website:

objects trajectory tracking

So, if you select a specific pixel, you can find its coordinates on all previous frames, as well as on all next frames. That is fantastic! However, we believe (while no code is available) this pixel should be a good feature to track, e.g. corner, not from single-color or low-textured areas.

Two things to note:

The first thing is that this algorithm is intended to work on the whole video all at once. It runs an optimization process given all video frames, so it is not designed for real-time tracking. However, it might be useful for the analysis of surveillance videos or sports analytics.

Second thing: it needs to run an external algorithm for supervision to perform the track optimization process. The authors compute the RAFT optical flow for all frame pairs before the optimization.

Idea

The authors propose to map each object pixel from every frame to a single common 3D space (the canonical 3D volume G in the paper). Thus one point in this space corresponds to a 3D trajectory across time. They use two algorithms to create this 3D space:

NeRF to model the volume density and color prediction.

Invertible neural networks  capture the camera parameters and scene motion from different frames.

The algorithm overview is shown in the picture below:

Two algorithms to create the 3D space

The loss functions are flow loss, photometric loss, and penalty for large displacement between consecutive 3D point locations to ensure temporal smoothness of the 3D motion. OmniMotion compares tracking results on the target TAP-Vid benchmark and shows an impressive improvement.

Object tracking remains one of the hard computer vision problems. This paper significantly advances the state-of-the-art using complex but elegant algorithms. It is a great research paper - and we are looking forward to trying it live!

Paper: https://arxiv.org/pdf/2306.05422.pdf

Let's discuss your project

Book a complimentary consultation

Read also

October 17, 2024

Ethics in artificial intelligence and computer vision

Smart machines are making decisions in people's life death and taxes, — and let's be honest, they don't always do it well or transparently
October 4, 2024

Artificial intelligence and computer vision in education

How smart machines make learning easier and cheating harder
September 27, 2024

Robotics and agriculture

Let's explore how AI for agriculture improving the lives of animals, plants, and humans alike