In the first part of our "How To Budget For a Computer Vision AI Solution?" article series, we took a closer look at the details of hardware selection and its impact on the design and budgeting of a computer vision solution. It's a good idea to start there, as the hardware is the foundation of everything else. If you haven't read it yet, please do.
Having armed you with the knowledge of hardware selection, we are now moving on to another critical component – algorithm development.
Think of software as the brain of your project. It makes everything work as it should. Choosing the right software is crucial because it affects how well your project performs and how much it costs. Let's dive into how to make smart choices in software for your AI solution.
In this article, we'll dive into the key factors of software that influence your budget when building an AI project. We'll explore:
•Task Analysis: How to tackle the challenge of choosing the right AI method. We'll discuss balancing your budget, meeting deadlines, and task metrics.
•Data Collection: The costs of gathering the data you need, and finding the right mix of quality versus quantity.
•Data Annotation: Understanding the effort and expense involved in making your data usable for AI.
•Deep Learning Models Development: Choosing between various architectures, and deciding whether to build your models from scratch or use pre-made ones.
•Building the Pipeline: The complexities of piecing together all parts of a computer vision project, including preparing the data, refining the output, and ensuring everything works together smoothly.
Join us as we continue our exploration into the world of AI solution development, ensuring that you're equipped with comprehensive knowledge to make informed financial decisions.
Analyzing the problem is a crucial first step in developing any solution. While we touched on this when selecting hardware, ideally, problem analysis and hardware selection should happen together. Choosing the right AI method is about balancing several things at once: keeping costs under control, meeting deadlines, and saving the solution’s quality.
Expert Guidance: Having an expert handle the first steps of looking at the problem and designing the solution is important. This expert can see potential issues ahead of time and help guide the project correctly from the beginning.
The customer's job is to prioritize: What's more important: speed, cost, or quality? This choice significantly influences the project's direction.
Setting Clear Expectations: It falls to the customer to set clear priorities and expectations. Understanding which factors are most important — be it faster implementation, cost savings, or the highest possible quality — guides the project's strategic decisions.
Before jumping into any solution, deciding on the right metrics is our first step. Why is this necessary?
Guiding Better Decisions: Metrics are the only way to measure which solution performs better. Every improvement we make is based on these metrics, ensuring we're always moving in the right direction. It's essential to compare like with like, meaning the same datasets, metrics, and measurement methods, to get accurate comparisons.
Alignment with Business Goals: The metrics we choose must closely align with the project's business objectives. For instance, if we're tracking objects, it's crucial how accurately our system matches tracks to real-world movements. The goal isn't just to pick a generic metric like detection mAP but to use those that reflect our specific needs. Often, there are multiple metrics involved, each for different aspects of the project. Identifying the key metric(s) helps focus efforts, though balancing several important metrics can be more challenging. In these cases, setting clear benchmarks (e.g., an FPR not exceeding 0.0001) can help manage complexity.
Flexibility in Metrics: Metrics aren't set in stone. If a metric no longer serves its purpose or fails to meet project demands, we're prepared to switch it out. It's important, however, to adjust any new metrics to match established standards to keep things consistent.
Initial Point: We start by looking into existing research and the various methods available. It's rare to find the optimal solution on the first try. This is where an experimental mindset comes in. We run multiple experiments, testing different approaches to choose the one that fits best. This process takes time, as it's about exploration and refinement.
Expecting Evolution: Be prepared for solutions to evolve. Sometimes, the first method we try meets the quality standards we set. However, as we deal with more data, the complexity of the task can increase, often requiring a new, more sophisticated approach. Here, we have to choose what is more important: getting a quick result or working longer for a better solution.
This phase of pre-research and experimentation is vital. It lays the foundation for everything that comes next, ensuring we’re not just moving forward but achieving the most effective solution possible.
Data collection is an essential phase in any solution development. To achieve great outcomes, we need high-quality data because the quality of the data directly impacts the solution's effectiveness. Here's what's essential:
1. Data Relevance: It's crucial for the data to closely match what we expect to encounter in real-world use. We optimize our solutions based on the test data available. If this data isn't relevant, we risk optimizing for the wrong objectives.
2. Expanding Tests: Achieving high metrics on a current test is a sign to broaden our testing scope. Ideally, this should be an ongoing effort, making it important to consider efficient data collection methods from the start.
Therefore, the cost and complexity of data collection primarily relate to the specificity and uniqueness of the required data. There are many cases, where it's enough to use publicly available datasets since the problem is not unique. This option greatly reduces the difficulty and cost associated with data collection.
Depending on the complexity of the task, gathering data can be straightforward or time-consuming and costly due to the specificity of the case. Let's explore some examples.
Imagine setting up a basic video surveillance system in a store using regular cameras to track customer movements and transactions. This data is straightforward to collect and can train AI models to spot unusual activities.
However, if we switch to infrared cameras for after-hours monitoring, the data changes significantly. These cameras pick up heat, showing humans and machines as bright against a dark background. Adding fisheye cameras, which give a wide-angle view but distort the edges, adds another layer of complexity.
Upgrading to specialized cameras means adjusting how we collect and use data, increasing both the project's cost and complexity.
Consider the task of detecting people in a public park. Using open-source datasets or a short recording period, we can easily gather diverse data on people's appearances and activities due to the general nature of the task.
However, specific tasks like developing a system to detect elks on roads for ADAS (Assisted Driver Assistance Systems) introduce significant challenges. Elks, mainly active at dawn and dusk, require data that captures their size, varied postures, and movements in low light. This specificity requires collecting targeted data in their natural settings, increasing the effort and cost.
The diversity and volume of data directly influence the development of an effective AI solution. The need varies significantly based on the application:
1. Controlled Environments: In scenarios like object detection on a factory's conveyor belt, where conditions like lighting and object placement are consistent, less data is required. The predictability of the environment simplifies the AI's task, allowing for a smaller, more focused dataset.
2. Varied Environments: Applications like object detection in diverse urban settings demand a vast and varied dataset. Factors such as different lighting conditions, a wide range of objects, and varied backgrounds demand a comprehensive dataset. For instance, an AI designed for traffic analysis must understand diverse elements, from cars and pedestrians to animals and bicycles, in varying lighting and weather conditions.
Adding to the complexity:
•Enhancing ADAS: Introducing elements like snow on roads requires the AI to adapt to additional challenges, ensuring reliability under less common but critical conditions.
•Detection Challenges in Different Regions: Certain objects, like tuk-tuks in Asia, may be poorly detected due to their uniqueness, highlighting the need for region-specific data.
•Consistency in Data Distribution: The data used for testing and training the AI should match the real-world conditions it will operate in. For instance, if an AI is to function in snowy conditions, its training data must include scenarios with snow to ensure accuracy and reliability in those settings.
Synthetic data is revolutionizing data collection by providing tailored datasets for specific needs, like training AI for drone navigation in cities. Instead of costly and complex real-world data gathering, developers can now simulate urban environments, complete with varied weather and obstacles, entirely in virtual setups. Tools like NERF enable the creation of realistic 3D scenarios from 2D images, allowing for endless training possibilities without physical trials.
Services like Kopikat use data augmentation to enhance datasets, adding variability such as different weather and lighting conditions, further enriching training material. Moreover, synthetic data simplifies the process by enabling automatic labeling of images or videos, significantly reducing manual annotation efforts.
However, it's crucial to ensure the synthetic data's quality closely mirrors real-world conditions to train AI models that are effective outside of simulated environments.
After gathering data, we must label it accurately for our computer vision algorithms to learn correctly. For instance, if detecting elks on roads is the goal, we need to mark each elk's position in the images. This process's complexity and cost vary based on the data's nature and the annotation's required detail.
The task complexity influences annotation difficulty. Simple image classification tasks, like identifying an object in an image, are straightforward. However, more intricate tasks such as object detection, pose estimation, and instance segmentation demand more time and a higher attention level from annotators.
3D annotation introduces additional challenges, requiring advanced tools and deeper data understanding, increasing costs.
Furthermore, the difficulty escalates when faced with ambiguous situations. Annotating low-resolution images, for instance, can be particularly challenging due to the lack of clarity.
Or consider the muffin-chihuahua problem - it requires a high concentration from annotators to identify the correct label for each image!
Specialized fields, such as medical image annotation, require not only attention to detail but also specific knowledge, significantly raising the costs. In summary, annotation complexity depends on the task type, the precision of data, the necessary tools, and the annotators' expertise.
Another important aspect is the amount of data to be annotated. The amount of data required to produce effective results varies greatly from project to project. We may build a computer vision solution with a modest dataset. On the other hand, if an AI solution is to perform consistently across multiple environments, it requires a wider dataset diversity.
Let's examine a few scenarios to illustrate this:
1. Simple Tasks in Controlled Settings: A project in a consistent environment, like spotting defects on a production line, might need around 5,000 images. With simple conditions, annotating an image could take 10 seconds, totaling about 14 hours for the whole dataset.
2. Complex Tasks in Diverse Settings: For a solution estimating human poses in different settings, we might need a diverse set of 200,000 images. The complexity could mean annotating each image takes up to 1 minute, leading to around 3,300 hours or 140 days of work.
3. Specialized Tasks Like Medical Imaging: Detecting anomalies in MRI scans could require annotating 50,000 images. Given the need for medical knowledge, each image might take 2 minutes to annotate, totaling 1,667 hours or about 70 days.
These scenarios show that data volume and annotation time can vary widely, from a few days to several months, emphasizing the need for efficient planning and tools like CVAT for streamlining the annotation process.
In the model development phase, it's crucial to decide on the hardware where the model will be deployed. This choice impacts the development process, as different hardware configurations can support varying levels of model complexity and processing speed. Higher-spec equipment can handle more complex models and faster processing, which is vital for real-time applications. For more details on selecting the appropriate hardware, refer to the earlier article.
Developing deep learning models is more about experimentation than following a set path. Unlike traditional software, AI development involves a lot of trial and error and refining along the way.
The process starts with understanding the task at hand, similar to task analysis. This might mean diving into research to find an existing solution that fits or can be adapted to our needs. This could involve testing solutions from academic papers or open-source projects to see if they work with our data and meet performance standards.
Developers face a choice between using an existing open-source model or building a custom solution. This decision depends on how unique the task is and the specific requirements of the project.
1. Using Pre-built Open-source Solutions: Alternatively, developers can choose pre-existing models, especially for common problems. While this can speed up the development process, it is important to ensure that these models meet the specific requirements of the project. Sometimes even open source solutions may require fine-tuning or adaptation to fit a unique dataset or problem context.
2. Training a Custom Model: When developing a unique solution from scratch, a series of experiments are carried out. These include selecting the right architecture, tuning hyperparameters, and continuously training and validating the model. Depending on the complexity of the task, this training phase can demand significant computational power, often translating to many GPU hours. To solve complex problems, these hours may increase, resulting in increased costs. It is critical to monitor key metrics for each experiment, ensuring that the model is moving closer to the desired performance targets.
Model development can range from a few weeks to over a year, depending on the project's complexity and requirements. The choice of development framework (like TensorFlow or PyTorch) also plays a role, affecting development speed and ease of deployment.
Integrating multiple models for tasks like tracking a person across different cameras adds layers of complexity, demanding attention to each step of the process.
Overall, building deep learning models is a complex and iterative process, where each decision impacts the project's timeline, effectiveness, and cost.
Putting together multiple Deep Learning (DL) models is key to our solution. It's not just about training a model but making sure different models work together smoothly to do what we need. In real setups, this involves steps like preparing the data before it goes into a model and processing the data after it comes out. We use special code to make sure data moves correctly from one step to the next in this process.
Pre-processing and post-processing are vital. Initially, raw data is transformed (e.g., normalized or resized) to fit model requirements. Following model processing, data might undergo further refinement, like eliminating duplicate detections in object tracking, to prepare for final analysis.
Once we developed pre-processing and post-processing for all DL models, we need to integrate each atomic solution into one pipeline, where data flows smoothly from one stage to the next, with each component aware of its upstream and downstream neighbors. Middleware or "glue" code ensures this consistency, handling tasks like buffering data, managing model inferences in parallel, and synchronizing results.
For instance, in a system that tracks people across multiple cameras, we bring together various basic computer vision tasks to create a complete solution. This includes adjusting the cameras, detecting people, keeping track of their movements, and recognizing them across different camera views, all within a single workflow.
The pipeline's data flow can range from straightforward linear sequences to complex, nonlinear arrangements requiring multiple models to operate in parallel, requiring synchronization of results.
1. Data Buffering: To manage varying processing speeds among components, we employ data buffering, ensuring no data loss when the production rate exceeds the consumption rate.
2. Parallel Model Inference: Running multiple model instances in parallel enhances throughput, crucial for handling extensive datasets or ensuring real-time data processing across multiple sources.
3. Synchronizing Results: Coordinating outputs from various processes is essential for consistency, especially in systems monitoring multiple inputs or where tasks are performed out of sequence.
While integrating a single complex model may seem straightforward, combining several simpler models often introduces significant complexity. This complexity is due to the need for custom "glue" code to link the models and manage data flow, making the integration of atomic solutions into a cohesive pipeline a critical yet challenging task.
In summary, developing an end-to-end DL pipeline is an intricate process, involving careful consideration of how each component interacts, the necessity for real-time data handling, and the overarching need for seamless integration of all parts to ensure coherent and reliable results.
The expenses and challenges of this phase are shaped by the chosen software stack and hardware. A straightforward option is deploying the solution in Python on an on-premise Linux server, simplifying compatibility and resource concerns.
Moving to edge devices introduces additional complexities. This shift usually involves adapting the pipeline to different programming languages and optimization technologies like C++, Objective-C, TRT, CoreML, TFlite, or DeepStream. The device's popularity often dictates the ease of porting, with more common platforms offering broader support and resources.
When deploying on cloud solutions like Google Cloud, it's crucial to first configure the appropriate virtual machines, storage buckets, and other cloud resources. The scalability advantage of the cloud is achieved through the use of services that automatically adapt to the workload, such as Google Cloud ML Engine.
For mobile platforms, models are converted into mobile-friendly formats using tools specific to the operating system, such as CoreML for iOS or TFlite for Android. The model is then embedded into the mobile application using the platform's specific SDKs and APIs. It's essential to ensure that the inference speed is optimized and the application remains responsive, offering real-time feedback to users.
The complexity of integrating the computer vision pipeline often surpasses the complexity of the models themselves. In practice, deploying a single advanced deep learning model tends to be less complicated than managing multiple simpler models connected by glue code. This distinction highlights the importance of considering not just the individual model complexities but the overall system architecture when planning deployment strategies.
This article provides a comprehensive guide to budgeting for a computer vision solution, focusing on the complex balance between hardware and software components.
Developing a computer vision solution is a multifaceted task. Each stage, starting from understanding the problem, collecting and annotating data, creating AI models, implementing the pipeline, to deploying the final solution, has its own complexities and financial implications.
It seems like the next step might be to build your neural network which will define the most optimal solution and find the best model architectures! After all, who better to design AI than another AI? 😉