Our team of experts is ready to answer!
You can contact us directly
Telegram iconFacebook messenger iconWhatApp icon
Fill in the form below and you will receive an answer within 2 working days.
Or fill in the form below and you will receive an answer within 2 working days.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Reading Time
7 Minutes
Ekaterina Romanova
Developer at OpenCV.ai
Integrating MediaPipe with Python: A Comprehensive Guide

Look into MediaPipe solutions with Python

Discover the future of media processing with MediaPipe. This open-source framework empowers you to harness machine learning for video and audio, operating in real time on diverse devices. This article, dated August 24, 2023, guides you through accessing intermediate results, customizing solutions, and diving into Python APIs. Unveil hidden data, declare outputs, and even explore C++ code. Elevate your media projects with MediaPipe's innovation.
April 10, 2024

What is MediaPipe Python

MediaPipe Python is an open-source cross-platform framework for building machine learning pipelines for processing sequential data like video and audio and deploying it on a wide range of target devices.

MediaPipe empowers your application with state-of-the-art machine learning algorithms, running with real-time speed on edge devices with low-code APIs or via no-code studio builder.

You are free to use any pre-built solution as a "black box" or to customize it to your needs. You can even fully reimplement the algorithm. This article will help you to make it, as it shows how to dive inside a solution as deep as you want.

Let's figure out how to access any intermediate result inside the solution graph from the Python API using the official code example from MediaPipe itself: mediapipe.solutions.pose.Pose.

The very first step we should do is to build the mediapipe python package from its source code.

Building MediaPipe python package from source code

To achieve our goal - building the MediaPipe python package from its sources - we follow the official build instructions.

It should be as easy as:

1. clone the source code

2. install all the dependencies and build tools

3. build the Python wheel/install the package into the virtual environment.

However, sometimes you get the following error while building the MediaPipe Python package:


mediapipe/tasks/cc/components/processors/proto/detection_postprocessing_graph_options.proto:39:12:
Explicit 'optional' labels are disallowed in the Proto3 syntax. To define 'optional' fields in Proto3,
simply remove the 'optional' label, as fields are 'optional' by default.

To fix it, you should change the file mediapipe/tasks/cc/components/processors/proto/detection_postprocessing_graph_options.proto:

1. find the line containing text: syntax = "proto3"; (for me it was line 16)

2. change it to the following text: syntax = "proto2"; and try again.

Note: If you missed some steps and jumped directly into building the package, you may get annoying error messages even after returning back and following all the instructions. One possible solution is to execute the bazel clean --expunge command and try to rebuild the package. Sometimes, even this step is not enough and your best option is to delete your MediaPipe copy and start from the beginning of the process.

NOTE: you may have to manually fix __init__.py file of the built MediaPipe package after each rebuilding (delete duplicated code).

Accessing intermediate results

There are several possible situations you may get into while trying to extract some intermediate processing results from the MediaPipe solutions:

intermediate results are exposed from the graph, but not passed to the Python code;

you want to expose graph node inputs/outputs as the new outputs from the graph and Python code;

you want to print some information from inside the C++ code of the graph nodes.

Let's take a look at all these situations one by one.

Accessing outputs that are already declared in the calculation graph

We will use the Pose solution from the MediaPipe as an example for our discussion. The Pose solution is interesting for us not only because it is a practically applicable pipeline (as well as the other MediaPipe solutions), but also because it holds inside much more information than it exposes to the outside.

Let's find out what it keeps hidden from us!

1. Find the calculation graph of your solution.

- Look at mediapipe.solutions.pose.Pose class code (in the installed package) and find the path to the binary file with the graph: _BINARYPB_FILE_PATH = mediapipe/modules/pose_landmark/pose_landmark_cpu.binarypb

- Look at the source file from which this binary file was generated. It is located in the source code under the same path but with the .pbtxt extension.

- Here you can find the outputs that you get from this graph. They are declared as follows:

output_stream: "LANDMARKS:pose_landmarks"

For mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt we have 6 outputs:


output_stream: "LANDMARKS:pose_landmarks"
output_stream: "WORLD_LANDMARKS:pose_world_landmarks"
output_stream: "SEGMENTATION_MASK:segmentation_mask"
output_stream: "DETECTION:pose_detection"
output_stream: "ROI_FROM_LANDMARKS:pose_rect_from_landmarks"
output_stream: "ROI_FROM_DETECTION:pose_rect_from_detection"

Only 3 of them will be available by default as the attributes of the class mediapipe.python.solution_base.SolutionOutputs object, which is returned by the method process of the mediapipe.solutions.pose.Pose class.

2. Add graph outputs to the Python object with the results.

To make SolutionOutputs object containing the other outputs, you should modify the file mediapipe/python/solutions/pose.py. Change the outputs parameter in the call of the super class init method in the constructor of the class Pose.

Extend the list with the names of the outputs (part after : ) with the ones you want to access.

Great, now we know how to get some additional information from the computation graph in Python in case the graph itself already provides it. But what should we do in case the graph itself has no such outputs?

Declare new outputs in the calculation graph

We continue working with the MediaPipe Pose Estimation Solution. Our goal now is slightly different: we're trying to debug the inner state of the graph and to do that we'd like to know the inputs for one of the nodes (or the outputs of the other one). Unfortunately, this is not a graph output, so we cannot just add it to the outputs in the Python code. However, we're still able to get these results - but we need to go deeper inside the mediapipe to achieve it. Let's start:

1. You can declare any output of any node of the graph as an output of the whole graph

For example, mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt graph has the following node:


node {
calculator: "PoseLandmarkByRoiCpu
input_side_packet: "MODEL_COMPLEXITY:model_complexity"
input_side_packet: "ENABLE_SEGMENTATION:enable_segmentation"
input_stream: "IMAGE:image"
input_stream: "ROI:pose_rect"
output_stream: "LANDMARKS:unfiltered_pose_landmarks"
output_stream: "AUXILIARY_LANDMARKS:unfiltered_auxiliary_landmarks"
output_stream: "WORLD_LANDMARKS:unfiltered_world_landmarks"
output_stream: "SEGMENTATION_MASK:unfiltered_segmentation_mask"
}

To get the node output as the graph output, we should change the graph description. We modify the same file mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt since it contains both the node and the graph description. On the top level of the file, we add the next line:


output_stream: "UNFILTERED_LANDMARKS:unfiltered_pose_landmarks"

Note: here the first part of the name (before the ":") is made by us, and the second part should be already declared among the outputs in the node description.

After that, we rebuild mediapipe with python3 setup.py install --link-opencv.

Now we have a new graph output and can follow the way we described in the previous section to add a new output name to outputs in mediapipe/python/solutions/pose.py.

2. In case you want to obtain the intermediate result of the node (which is not declared as the output of the node), the very first step is to find out the declaration of that node. It might be declared in the same protobuf file as the graph, as we've already seen, or in another graph, described in a separate .pbtxt file.

For example, in mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt we have the following node:


# Detects poses.
node {
calculator: "PoseDetectionCpu"
input_stream: "IMAGE:image_for_pose_detection"
output_stream: "DETECTIONS:pose_detections"
}

Imagine we want to get the input tensor of pose detector net inference.

To do that, we should make some changes to the file mediapipe/modules/pose_detection/pose_detection_cpu.pbtxt

Find there the  node with the required output:


node: {
calculator: "ImageToTensorCalculator"
input_stream: "IMAGE:image"
output_stream: "TENSORS:input_tensors"
output_stream: "LETTERBOX_PADDING:letterbox_padding"
options: {
	[mediapipe.ImageToTensorCalculatorOptions.ext] {
		output_tensor_width: 224
		output_tensor_height: 224
		keep_aspect_ratio: true
		output_tensor_float_range {
		min: -1.0
		max: 1.0
		}
	border_mode: BORDER_ZERO
	# If this calculator truly operates in the CPU, then gpu_origin is
	# ignored, but if some build switch insists on GPU inference, then we will
	# still need to set this.
	gpu_origin: TOP_LEFT
	}
}
}

In the general case, to pass input_tensors to the main graph, we make several changes:

- declare output stream on the top level of the sub-graph description in the mediapipe/modules/pose_detection/pose_detection_cpu.pbtxt:


output_stream: "DET_INPUT_TENSORS:input_tensors"

- declare additional output in the node that uses that sub-graph within the main graph in the mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt:


node {
calculator: "PoseDetectionCpu"
input_stream: "IMAGE:image_for_pose_detection"
output_stream: "DETECTIONS:pose_detections"
output_stream: "DET_INPUT_TENSORS:det_input_tensors"
}

- add graph output on the top level of the main graph ( mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt ) as we did before:


output_stream: "DET_INPUT_TENSORS:det_input_tensors"

Note: It is necessary for part of the name before : to coincide in the top-level declaration of mediapipe/modules/pose_detection/pose_detection_cpu.pbtxt and related node description in mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt. This is how streams are matched.

But this particular example will not work. You will get an error saying, that the new output has the wrong type, which MediaPipe can not pass as output. To fix this issue, add the node converting the output to the appropriate type:


node {
calculator: "TensorsToFloatsCalculator"
input_stream: "TENSORS:input_tensors"
output_stream: "FLOATS:input_tensors_image"
}

And use input_tensors_image as output: output_stream: "DET_INPUT_TENSORS:input_tensors_image"

Print any intermediate variable

You can also print any information in C++ code with std::cout or std::printf.

The latter will help you when several calculators will work in parallel (not to mess up prints from different parts of the code).

Launch example

Use demo script to print all added outputs:

python demo.py -i your_image.jpg -o pose_detection unfiltered_pose_landmarks det_input_tensors

Conclusion

Discover how AI integration can enhance your projects in multiple areas, guided by OpenCV.ai's expertise in computer vision. Our team focuses on using Computer Vision Services to innovate and improve processes across different industries.

Let's discuss your project

Book a complimentary consultation

Read also

April 12, 2024

Digest 19 | OpenCV AI Weekly Insights

Dive into the latest OpenCV AI Weekly Insights Digest for concise updates on computer vision and AI. Explore OpenCV's distribution for Android, iPhone LiDAR depth estimation, simplified GPT-2 model training by Andrej Karpathy, and Apple's ReALM system, promising enhanced AI interactions.
April 11, 2024

OpenCV For Android Distribution

The OpenCV.ai team, creators of the essential OpenCV library for computer vision, has launched version 4.9.0 in partnership with ARM Holdings. This update is a big step for Android developers, simplifying how OpenCV is used in Android apps and boosting performance on ARM devices.
April 4, 2024

Depth estimation Technology in Iphones

The article examines the iPhone's LiDAR technology, detailing its use in depth measurement for improved photography, augmented reality, and navigation. Through experiments, it highlights how LiDAR contributes to more engaging digital experiences by accurately mapping environments.