Our team of experts is ready to answer!
You can contact us directly
Telegram iconFacebook messenger iconWhatApp icon
Fill in the form below and you will receive an answer within 2 working days.
Or fill in the form below and you will receive an answer within 2 working days.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Reading Time
13 Minutes
Georgy Dyuldin
Developer at OpenCV.ai
Anzhella Pankratova
Content Author at OpenCV.ai
Depth Estimation: Revolutionizing Photography and AR with iPhone's LiDAR Technology

Depth estimation Technology in Iphones

The article examines the iPhone's LiDAR technology, detailing its use in depth measurement for improved photography, augmented reality, and navigation. Through experiments, it highlights how LiDAR contributes to more engaging digital experiences by accurately mapping environments.
April 4, 2024

Introduction

Today's smartphones are not just a means of communication; they're our access to the world. iPhones have evolved into brilliant devices with many features, but there's one you may need to learn about - their ability to measure depth accurately. The iPhone's LiDAR depth measurement opens the door to advanced photography, augmented reality apps, and improved navigation, making our interactions with the digital and real worlds more engaging.

These capabilities are enabled by LiDAR technology, which is typically associated with the large cameras on Google Street View cars rather than a device that fits in your pocket. Are you curious to see how Depth measurement works on your iPhone? In this article, we'll look at a few experiments we conducted to understand how depth technology, specifically LiDAR, works on the iPhone.

1. How Depth Is Estimated

To understand how the iPhone measures depth, let's start by explaining the general depth estimation methods. Don't worry if this overview seems brief - it's intended to give you a foundational understanding before diving deeper.

Depth refers to the distance between a sensor or camera and the objects it's observing or measuring in its environment. It's how far away things are from the device's point of view. Understanding depth is crucial for accurately interpreting and interacting with the three-dimensional (3D) world.

Depth estimation can be achieved through various technologies, each with its unique approach.

1.1 Stereo Vision

The stereo vision method for depth estimation mimics human binocular vision, leveraging the minor differences in the images captured by two cameras placed a short distance apart, similar to how our eyes see the world from different angles. This method falls into the category of passive sensing techniques because it does not require the emission of light to measure depth; instead, it relies on analyzing visual information that naturally reaches the cameras.

How Stereo Vision Works

1. Capture: Two cameras (or a stereo camera with two lenses) take images of the same scene from slightly different viewpoints.

2. Correspondence Matching: The system identifies similar features (edges, corners, or textured areas) in both images. By comparing these features' locations across the two images, the system can determine each feature's disparity, which is the difference in its position between the left and right images.

3. Triangulation: Triangulation is then used to calculate the distance to each matched feature. The geometry of the setup is crucial here: knowing the distance between the two cameras (the baseline) and the angle of the cameras to the objects (obtained through the disparity), one can apply trigonometric calculations to determine the distance from the cameras to each point of interest in the scene.

4. Depth Map Creation: By repeating this process for numerous points across the scene, the system constructs a depth map, representing the scene where each value corresponds to the distance from the sensor to objects at different points in the visual field.

In addition, active stereo vision introduces its light source into the equation. Unlike passive stereo, which only relies on existing ambient light, active stereo projects a structured light pattern, such as dots or lines, onto the scene. This light source enriches the scene with identifiable features. It enhances the system's ability to accurately match these features between the two camera views, thereby improving depth estimation even when natural textures or lighting conditions are not ideal.

1.2 Active Infrared Pattern

Active Infrared Pattern method for computer vision depth estimation uses an infrared light source to project a known pattern (such as dots, lines, or grids) onto a scene. This approach is part of the broader category of structured light systems, which enhance the environment's texture with patterns invisible to the human eye but detectable by specialized cameras.

How It Works

1. Projection: An infrared projector emits a specific pattern onto the scene. The pattern choice is crucial, as it must comprehensively map the scene's geometry when distorted by various surface orientations and distances.

2. Detection: An infrared camera, sensitive to the same wavelength as the projector, captures the distorted pattern. Since this pattern is known and controlled, any deviations from its expected geometry can be attributed to the scene's three-dimensional structure.

3. Depth Calculation: By analyzing the distortions in the captured pattern, the system can calculate the depth of each point in the scene. The process involves comparing the known pattern with its captured, distorted version and using the differences to estimate how far each part of the pattern is from the projector and camera setup.

The main advantages of IR lidar are that it does not affect the camera picture and works a little better in haze conditions. Moreover, this technology is particularly effective for indoor environments and close-range applications, such as gesture recognition, object scanning, and augmented reality, where detailed depth information is critical.However, bright sunlight has more power in the IR part of the spectrum and reduces the effectiveness of the lidar.

1.3 Time of Flight (ToF), including LiDAR

Time of Flight (ToF) technology, including its advanced form known as LiDAR (Light Detection and Ranging), measures depth by calculating the time it takes for a light signal to travel to an object and back to the sensor. This approach offers a direct and highly accurate means of determining the distance to objects in a scene, facilitating detailed 3D mapping and spatial understanding. LiDAR is a specific type of ToF sensor that uses pulsed laser beams to measure distances. It can capture wonderful details over long ranges and is used in various applications, from autonomous vehicles and drones to environmental mapping and archaeology.

How ToF Works

1. Emission: The ToF sensor emits a light signal, often in the form of infrared or laser light, towards the scene.

2. Reflection: The light hits objects in the scene and is reflected to the sensor.

3. Measurement: The sensor measures the time it takes for the light to return. Since the speed of light is constant (approximately 299,792 kilometers per second in a vacuum), calculating this "time of flight" precisely measures the distance to the objects.

4. Depth Map Creation: By repeating this process across the scene, the ToF sensor can generate a depth map representing the distance from the sensor to objects at each point in the image.

The ToF method provides highly accurate distance measurements, even over long ranges or in complex environments. It is also capable of providing real-time data, making it suitable for applications requiring immediate spatial analysis, such as navigation and obstacle avoidance in autonomous systems.

However, advanced ToF and LiDAR systems, especially those offering high precision over long distances, can be expensive compared to other depth-sensing technologies. In addition, the performance of ToF and LiDAR can be influenced by extreme lighting conditions (e.g., direct sunlight) and atmospheric conditions (e.g., fog or rain) that affect light travel.

2. Depth in iPhone

In iPhones, depth measurement is precisely conducted using LiDAR sensors found on both the front and back. These advanced sensors are crucial to mapping environments by capturing depth on a per-pixel basis.

The iPhone creates its depth maps using the LiDAR scanner, TrueDepth camera, and Scene Geometry. The LiDAR scanner measures distances by timing light reflections; the TrueDepth camera uses an infrared pattern to enhance depth perception. Scene Geometry constructs a 3D scene layout, integrating this data for precise spatial understanding.

Let's explore in detail how the iPhone achieves high-quality depth mapping step by step.

1. Depth Measurement with Light Travel: Both the LiDAR Scanner and the TrueDepth camera on iPhones use light to measure depth. The LiDAR Scanner does this by emitting light pulses and measuring their time to bounce back from objects. This "time of flight" measurement gives a precise distance of objects from the sensor, creating a depth map of the scene.

2. TrueDepth Camera and Pattern Projection: The TrueDepth camera complements depth measurement by projecting an infrared pattern. It then observes how this pattern is distorted by objects in front of it. This distortion helps the system calculate the distance to various points in the scene, enhancing the depth map created by the LiDAR Scanner.

3. Scene Geometry Contribution: Scene Geometry further refines the depth data by creating a 3D mesh of the environment. This mesh is generated using the detailed depth information from the LiDAR Scanner and the TrueDepth camera. It represents the shapes and layout of the surroundings, allowing for more accurate and realistic interactions between the digital and physical worlds.

Together, these steps enable iPhones to have a sophisticated understanding of space, allowing various applications from augmented reality to more accurate spatial measurements. Scene Geometry enhances performance by interpreting the depth data to create a comprehensive 3D environment model.

For more detailed insights, you can visit the Apple Developer Documentation and Apple's ARKit documentation.

3. iPhone Depth Experiments: Possibilities and Limitations

Now we'll move on to the experimental part of our article. Having learned how depth is measured on the iPhone using LiDAR scanner and TrueDepth camera technologies, let's practically examine the differences in depth measurement between the rear and front cameras.

We'll pay particular attention to the patterns utilized by these technologies. In our first experiment, we measured and compared how these patterns appear when using both types of cameras. In our experiments, we use iPhone 14 Pro version.

3.1 LIDAR Pattern of Emission

We used an Astro camera to capture visible light and near IR from the lidar. The camera and iPhone were fixed with tripods.

Front Camera
Rear Camera

The patterns projected by the iPhone's front and rear cameras differ notably. As part of the TrueDepth system, the front camera projects a complex grid of infrared dots, resulting in a detailed and dense pattern. This complexity is vital for facial recognition and precise depth measurements for close-up objects.In contrast, the rear LiDAR sensor emits a simpler, less dense pattern. This pattern is optimized for quickly measuring larger areas, suitable for augmented reality applications that require mapping environments rather than intricate detail. It is designed to maintain accuracy over greater distances and integrates with the iPhone's camera system to provide comprehensive scene analysis.Together, these systems illustrate Apple's dual approach to depth-sensing: one finely tuned for close interaction, the other for the broader canvas of the world around us.

IR emission from the rear Lidar is much stronger. We had to turn off the light to capture the front Lidar pattern. You may notice the difference in illumination on the iPhone and notebook surfaces.

3.2 iPhone's Depth Technology Exploration

The distinct patterns we observe between the iPhone's front and rear cameras—the dense, intricate grid on the front and the sparser, broader pattern on the back—significantly influence their operational use cases. To understand the practical impact of these differences, we'll evaluate how they perform in various real-world scenarios.

3.2.1 Estimate Noise Level of Depth

Firstly, we'll examine how depth temporal noise appears in the data captured by ARKit using both the rear and front cameras. In this experiment, we fixed the iPhone with a tripod and captured some frames with a depth of static scene using both cameras.

Though we've seen a significant difference in the patterns between the two, the depth data from the rear camera appears quite refined.

As described in the Apple documentation, "The colored RGB image from the wide-angle camera and the depth ratings from the LiDAR scanner are fused together using advanced machine learning algorithms to create a dense depth map that is exposed through the API.”

Depth from the front camera looks unprocessed or only slightly preprocessed. Some areas (white in pictures) has no depth value.

Front Camera
Rear Camera

The visualization of the standard deviation for the rear camera's depth data indicates a different level of variability compared to the front camera. It seems to show less variation across the scene, suggesting that the rear LiDAR sensor produces a more uniform set of depth data with potentially less noise in this particular scenario. This could imply that for certain applications, especially those requiring broader environmental mapping, the rear LiDAR might offer more consistent data.

Additionally, it's noted that the depth map size from the rear camera is approximately half that of the front camera (256x192 vs 640x480), indicating a more efficient data representation without compromising the quality of depth information.

Front Camera
Rear Camera

The boxplot of errors shows how the standard deviation of distance measurements changes at different distances, providing a clear view of how precise the depth estimations are. The use of a logarithmic scale on the vertical axis makes it easier to compare errors across a wide range of values.

When comparing the error boxplots from the front and rear cameras, the rear camera shows a similar level of variability in depth estimation across various distances. The rear camera keeps a steady standard deviation, even at farther distances.

However, it's important to note that this experiment focuses on temporal consistency, meaning the rear camera shows less variation over time but not necessarily greater accuracy. The aim here was to observe how depth measurements from the same static scene vary across different frames.

3.2.2 Estimate Min and Max Value

In our next experiment to determine the range of depth measurement capabilities of the iPhone's ARKit, we captured a scene containing objects at varying distances. We analyzed the captured data to determine the minimum and maximum depth values detected by the sensors.

Our findings are as follows:

For the ARKit front sensor, the minimum measurable distance was approximately 0.22 meters (8.66 inches), and the maximum was around 8.09 meters (318.50 inches).

For the ARKit rear sensor, the minimum came to about 0.26 meters (10.24 inches), while the maximum was roughly 6.42 meters (252.76).

From these results, we conclude that while we did not test the absolute closest range possible, the lower bound could potentially be lower than what we've captured.

On the other hand, the maximum distance for the front sensor slightly exceeds the estimated size of the room used for testing, suggesting that the front sensor can accurately measure distances in a typical indoor environment.

The rear sensor has a slightly shorter maximum range, and it's unusual. It may be reflective of its optimization for different types of spatial interaction and object placement within augmented reality applications.

3.2.3 Estimate Depth Accuracy

We've observed that Apple employs post-processing to refine depth data captured by iPhone sensors. This has implications for accuracy, particularly in detecting planes—a key task in augmented reality and 3D modeling. Depth accuracy is crucial for ensuring that virtual objects placed on real-world surfaces appear stable and correctly sized. Let's explore what this post-processing means for the precision of plane detection using depth data from the iPhone's cameras.

Measuring depth accuracy is a hard task, so here we capture a flat wall and check average distance from the estimated surface and points from the depth map. MAE values:

~ 1mm for rear camera

~ 7mm for front camera

Distance from IPhone to wall was about 70 cm in both cases.

3.2.4 Depth Sensing in Varied Lighting Conditions

The adaptability of the iPhone's depth-sensing technology to various lighting conditions is crucial for its effectiveness across a wide range of applications. In this experiment, we explore how the iPhone's LiDAR and TrueDepth systems perform in both daylight and nighttime conditions, specifically examining the impact of direct sunlight on the projector and the challenges of low-light environments.

As expected - rear camera works better with good illuminated scenes. Missing RGB information at the night makes results less accurate.

Front camera depth map is less sensitive to low-light conditions. But under direct sunlight it can’t determine depth at distances > ~ 1.5 m

3.2.5 Detail Resolution of iPhone's LiDAR Sensors

The resolution of depth sensors like LiDAR is a crucial metric determining how finely the system can measure details within its environment. In this experiment, we assess the resolution capabilities of both the iPhone's front and rear LiDAR sensors, examining how the resolution changes as the distance to the observed objects increases.

The size of the target is 1cm, 3cm, and 5cm. The outside diameter is 12cm.

Front Camera, 0.3 meter
Front Camera, 1 meter
Front Camera, 3 meter

To compare the depth data from the front and rear cameras of the iPhone at various distances (0.3 meters, 1 meter, and 3 meters), we need to focus on several key aspects:

1. Resolution and Detail:

1.1 Front Camera: The front camera's depth map showed a high level of detail at 0.3 meters. At 1 meter, the detail decreased, but objects remained distinct. At 3 meters, the detail further decreased.

1.2 Rear Camera: The rear camera's depth map at 0.3 meters also presented a high level of detail. At 1 meter, the detail was preserved to some extent, though some definition was lost. At 3 meters, there is a drop in resolution with increased distance.

2. Depth Accuracy:

2.1 Front Camera: The front camera's depth perception accuracy appeared to be very high at close range, suitable for applications like facial recognition that require fine detail. At 1 meter and beyond, the accuracy of small information diminished.

2.2 Rear Camera: The rear camera's depth accuracy at 0.3 meters was suitable for broader AR applications. At increased distances, the depth map showed a more significant degradation in accuracy.

3. Depth Map Consistency:

3.1 Front Camera: The consistency across the depth map was strong at close distances but started to display irregularities at 3 meters.

3.2 Rear Camera: Similarly, there was consistent depth mapping at closer distances, with inconsistencies becoming more apparent at 3 meters.

4. Overall Impressions:

4.1 The front camera's depth sensor is optimized for short-range interactions, delivering higher resolution and more detailed depth maps at closer distances.

4.2 The rear camera's depth sensor can handle a wider range of distances but is optimized for medium-range use, as seen in its ability to maintain detail at 1 meter compared to the front camera.

4.3 At 3 meters, both sensors show a reduction in detail resolution and depth accuracy, which is expected as distance challenges the technology's limits.

4. Applications

Image Source: Apple Vison Pro

Integrating depth-sensing technology directly into the iPhone transforms the device into a powerful tool capable of many practical applications. Here's a closer look at the potential applications:

4.1 Object Scanning

The front camera's LiDAR sensor is ideal for high-fidelity object scanning with impressive close-range accuracy. This capability can be precious in fields like digital archiving, where artifacts can be captured in three dimensions, or in retail, where products can be scanned for virtual try-ons, enhancing the e-commerce experience.

4.2 Spatial Reconstruction

Another area where the iPhone's depth-sensing shines is the ability to reconstruct rooms and surfaces. Apple's own applications set a benchmark in this domain, providing users with the means to capture the dimensions of their surroundings with ease. This technology can revolutionize interior design, where precise room measurements are crucial, and could also benefit architects and constructors for preliminary surveys.

4.3 Measurements with Relative Precision

For tasks that require a degree of measurement precision, such as DIY home improvement projects or professional on-site evaluations, the iPhone's depth sensors can provide quick and relatively accurate estimations. This eliminates the need for bulky traditional measuring tools, offering convenience and efficiency.

4.4 Augmented Reality Experiences

The rear camera excels at providing environmental depth data that is critical for AR experiences. This allows developers to create immersive games and educational apps that integrate seamlessly with the user's surroundings, offering a more engaging and interactive experience.

5. Conclusion

We launched some experiments to show you the iPhone's depth capabilities and measured two camera views. We do not claim to be complete and suggest you draw your conclusions. Apple has developed powerful technology by taking an unconventional approach.

In summary, incorporating the iPhone's depth-sensing features into app development presents opportunities and challenges. The precision of close-range depth data allows for detailed object scanning and room mapping, which is helpful for a range of practical applications. The technology's accuracy diminishes at greater distances, but this limitation is a clear boundary for developers to design their applications.

The iPhone's capabilities are sufficient for tasks like quick measurements or augmented reality that can accommodate a margin of error. However, applications requiring long-range precision must account for the current technological constraints.

As this technology progresses, it may expand the possibilities of how we interact with our environment through our devices. For now, the effective use of depth-sensing is about understanding and applying its strengths within the limits of its range.

Let's discuss your project

Book a complimentary consultation

Read also

May 15, 2024

AI in football

Football is not only the most popular sport, watched by more than 4 billion people around the world — it is also a huge market. Some of the strongest clubs in Europe are businesses with annual revenues of $8 billion and more than 100,000 employees in different countries. Football is also a competition field for the latest technologies in computer vision and artificial intelligence. Let's take a look at what AI and CV are doing in football!
May 7, 2024

Computer Vision in Sports: People Train and Compete — Machines Watch and Help

At the upcoming 2024 Olympic Games in Paris, the world will see the most advanced AI and computer vision systems for sports developed by Intel. These systems will not only help capture athletic performance with millimeter and millisecond accuracy but also create 3D models of athletes for replays and analyzing complex situations. The data and models will be available to both referees and spectators. Artificial intelligence and computer vision systems in sports are no longer a high-tech novelty but an everyday reality. People train, challenge, and watch others compete — and hundreds of tech companies are helping to make it safer and more efficient. And more fun, too!
April 16, 2024

Which GPUs are the most relevant for Computer Vision

In the field of CV, selecting the appropriate hardware can be tricky due to the variety of usable machine-learning models and their significantly different architectures. Today’s article explores the criteria for selecting the best GPU for computer vision, outlines the GPUs suited for different model types, and provides a performance comparison to guide engineers in making informed decisions.