In this episode, Dr. Satya Mallick interviews Kwabena Agyeman, president of OpenMV. He is a Senior Embedded Systems Engineer by day and an Entrepreneur in his free time developing Computer Vision Algorithms for Microcontrollers.
In 2015 he did a Kickstarter Campaign for a smart camera and raised $100k, but faced a setback when one component of the camera failed and 80% of the cameras were not usable. In this episode, we will see how he fought back to deliver what he promised and how he takes out time from his busy schedule to follow his passion. Here is Kwabena Agyeman's inspiring story.
The video and audio versions you can find on Youtube, Spotify, SoundCloud, Google Podcasts, Apple Podcasts, and Stitcher.
The interview was recorded in September 2020.
SM (Satya Mallick): In 2015 they did a Kickstarter campaign for a smart camera based on a microcontroller. The campaign was a huge success, and they raised over $100,000, but then came a disaster that they were not prepared for – one of the components in the smart camera failed, and 80 percent of the cameras they had built were unusable. Nevertheless, they fought back, and they delivered the cameras. This is the amazing story of OpenMV, and in this episode, we will learn what microcontrollers are, what the challenges of putting computer vision algorithms to microcontrollers are, and why microcontrollers matter. But above all, we will learn the amazing journey of a resilient entrepreneur Kwabena Agyeman, who is one of the co-founders of OpenMV. Welcome to the show! This is our podcast AI for Entrepreneurs and I'm your host, Satya Mallick from OpenCV.
Kwabena, it is an honor to have you on the show! I'm pleased that you could make the time! It is a show for entrepreneurs, and who could be a better entrepreneur in AI than you. You have such great stories to share with us. So, before we dive in, I would like to know a little bit more about the OpenMV project. It is a very popular project, but still, some people may not know about it. So, you can start with a little introduction to the OpenMV project.
KA (Kwabena Agyeman): Hi! I'm Kwabena Agyeman, I am a co-founder and the president of OpenMV LLC. We're a tiny two-person startup that's run in our free time because both my co-founder and I have day jobs where we are developing computer vision algorithms that run on a microcontroller. This is a little bit different than OpenCV because we've had to tackle both the data input problem of interfacing with the camera, and then storing the data and DRAM interfacing and attaching that to SD card to save that data, and then also doing the vision processing algorithms, driving IO pins. So, it encompasses the full compute stack. But the goal was basically to raise the bar for embedded computers for computer vision, so when we started no one had the thought that you could run an algorithm like AprilTags and AprilTag detection on a microcontroller with less than one megabyte of RAM. But back in 2017, we demonstrated actually that you could run AprilTags on a microcontroller with 512 kilobytes of RAM. And the reason this kind of stuff is cool to research, and develop, and bring to the forefront is that if you want to see a world where computer vision is a commodity and it's embedded in everything – like every single light switch in your house has a camera that just tracks you and turns on the lights when you walk around. Sounds like Big Brother maybe. These things just throw away the data immediately. We can only get to that future if we work really hard on how we make our computer vision algorithm super embedded and run on extremely cheap hardware.
SM: That's cool! So, for people who are not familiar with the embedded vision and embedded systems in general, can you just describe the difference between what a microcontroller is versus a system on a chip – like a Raspberry Pi – what's the difference between the two?
KA: It's gotten blurred now, to be honest, but the basic way it worked was as follows. Microcontrollers were essentially little processors that have a limited amount of IO, a limited amount of RAM, a limited amount of flash, and they were designed to do one thing in life. So, a microcontroller might be designed to spin a motor – for example, it would look at current and voltage on a motor. Imagine, there are things called ESC's that are often used for quadcopters – those have a microcontroller on them, and they basically take a servo input and then spin a motor. So, with a little computer, it's executing code, and it’s one job in life is to read that servo input and then spin a motor to match that servo input. When you talk about a full computing system like a Raspberry Pi, or a Linux PC, this has a whole suite of software designed to do anything, but that extra flexibility adds cost, size, complexity, and reduces your ability to deploy it as a cheap, less-than-one-dollar kind of device for doing one job. For sure it's more powerful, but it gets away from that one specific purpose and life goal.
SM: So, do the microcontrollers have an operating system, or they are completely programmed in a different way?
KA: Typically, microcontrollers don't come with OS, they don't have something called virtual memory, they have a processor. The processor executes code just like a regular processor, but there's a fundamental difference in that they don't have something called a memory management unit. This allows you to virtualize your different processes, and the kernel is separate from all the other applications running on board. Each application then gets its own separate four gigabyte or 64-bit memory space to play with. On a microcontroller, everything is running at the same time, there's no process separation, and because of this, there's no real concept of loading dynamically different software from other people that's in a pre-compiled format. Typically, on a microcontroller, you compile a binary image for the whole system that you want it to run, and then that is flashed onto the system, and it will execute that code and only just that code.
SM: Got it! So, as for the OpenMV camera, could you explain the various components of the camera itself? What does it have – could you give some details, and how does it compare to some other systems, let's say a camera attached to a Raspberry Pi Zero?
KA: So, this is the OpenMV Cam, this is our first popular version, which is the OpenMV Cam M7, the system is just one chip. That's the important thing to note there are a few other things like voltage regulation, crystals, and capacitors, and there's an SD card slot, and then there's a camera chip beneath this lens. This is the new one that we launched in 2019. It is effectively one chip, so what we've tried to do is stay away from having SD-RAM and flash onboard the system, because the goal is effectively to shrink it down to a small size. We were going to launch this new product in the future called the OpenMV Cam Micro this year, but Covid-19 happened, and it was delayed. We've effectively shrunken the whole system down to the size of a quarter. You see that thing on the top that's a two-millimeter by two-millimeter camera, it does 640x480. We have the processor in the center, which is seven millimeters by seven millimeters. And we have a four-pin JS key connector on one side, on the back we have the IMU, a whole bunch of passives for power control, and crystals. But the idea is that by experimenting with these one-chip-type solutions, we can move towards a future where you can have a one-chip camera setup with a small embedded device that is designed to do one thing in life and nothing more. In terms of the difference between a Raspberry Pi camera module, we're working with a lot fewer pixels generally just because we don't have the SD-RAM to play with. We did make a version called the OpenMV Cam H7 Plus, where we reduced these constraints because people kept asking for higher resolution, so we did add SD-RAM to the system. However, the general goal is to design a full compute solution that just runs on one chip with internal SRAM.
SM: And the software stack for all these different solutions, is it the same?
KA: It is. The OpenMV Cam comes from a long lineage of sensors called the CMUcam, which was created back in 2001 by Anthony Rowe. Basically, it was a color tracking sensor, where a microcontroller was strapped onto a camera, and it could do one job – literally threshold the image by looking for color thresholds. Then it would output a very simple centroid calculation of all the pixels in the image. It was very limited, but it was sold quite well back in the day because it was the first cheap computer vision system in 2001. It kept getting revved, and eventually, there was the CMUcam 4, that I helped to build in 2012 or so. That was another color camera, you may now see it still as the CMUcam 5 on the market. More or less, it maintained the concept of being just a straightforward color tracker and it's a good solution just to get started with computer vision. With the OpenMV Cam, the idea was that we wanted to add a little bit more functionality to the system, and, in particular, add in the ability to be programmable. One issue with the stem U-cam is that typically it is faster than the microcontrollers that you use to control, so typically the CMUcam would be attached as a secondary sensor to one device, and then that device would tell the CMUcam to look for this color, and then the camera would respond where that color is at. Of course, the problem is now you have this huge communication layer stack that you have to run to share that data. So, the idea of the OpenMV Cam was that it's easier to have richer, more complex data being used internally if we can get that on one chip. In order to make that possible we leveraged something called MicroPython which we've been using from the beginning. MicroPython is a really interesting project, where a guy named Damian George got Python 3 running on a microcontroller. Just to understand, this is more or less all the cool stuff from Python 3. He got that running onboard a microcontroller with basically 32 kilobytes of RAM, 256k of flash, so this is executable in the whole virtual environment to run a Python interpreter that fits into an incredibly small size. Oddly enough, if you take the MicroPython executable and put it on Linux, it outperforms CPython for many applications. For example, CPython natively stores integers as these big numbers that can be variable-sized, it doesn't use the concept of optimization. If you're storing a variable that fits in 32 bits only, use 32 bits to store it, CPython stores an integer as a class, that can have flexible bits and the number of bits can be unbounded. MicroPython does tricks: when the integer is less than 32 bits, it is stored in a machine word of 32 bits in length, and then tries to use the actual processor's built-in instructions to add numbers together, so it can go very quickly versus emulating arbitrary bit with math as CPython does.
SM: As for MicroPython, there are some limitations, you can’t use any external Python library, and it will work.
KA: MicroPython does have limitations, it doesn't run all of the external libraries that Python comes with, but it provides a very easy development environment experience for Python programmers to program a microcontroller. One of the best features of MicroPython is that as long as you keep your program small, you can write Python code that runs the microcontroller, and the compile and load time is very quick. It is an important thing to understand because with OpenMV cam we're using the highest-end end microcontrollers in the market, and so these have about two megabytes of flash, and two megabytes of flash doesn't sound like a huge program. But when you program and erase that flash, when you're writing data to an SD-card, moving data around, your operating system does this fancy called caching, where it'll real store rights that you're doing in RAM and not perform them immediately, to give you the appearance that your hard drive is fast. Microcontrollers don't do that kind of thing for you, so when you write directly to the medium, it might take on the order of a minute or two to write two megabytes of flash, and the reason that's a hassle – if you were updating a program repeatedly and like changing one line of code, you would incur like a two-minute penalty to rewrite the flash per line of code you changed. With micro python all of a sudden that's reduced to less than a second for compile, load, and execute.
SM: That’s nice. Suppose somebody gets the OpenMV Cam, and they want to implement, so there are a few routines that come with the library that you have written. What can it do out of the box?
KA: Basically, with OpenMV Cam, we got this project started with a kick-starter and that turned into a burning trash fire immediately. Because of that, we had to scramble to add features to keep the project alive. In the beginning, there was a goal that it would be very similar to OpenCV, we tried copying a lot of the features from OpenCV, but generally what we ran into were memory constraint limits. We're on our third-generation processor now, so we started with the Cortex-M4, and then we moved to an M7 and then an even stronger M7, but a lot of the limitations on how the library was designed were based on that M4, which had significantly less RAM than we currently have. Well, more or less what it came down to is we have a computer vision library it's not as full-featured as OpenCV, but if you are an actual practitioner of computer vision, and you understand how binary image processing works, or just general-purpose image processing, you'll find that most of the stuff we have on the OpenMV Cam mirrors exactly the kind of functionality you'd expect to find in OpenCV. We've taken the liberty to mix and match things though, in order to save RAM, so one of the things the OpenMV Cam does is… OpenCV, for example, will typically store images as 32-bit floats per color channel and that's very good for having the best accuracy per pixel, and being able to do nice mathematics, however it is bad for image size. So, what our algorithms typically do: we have three copies of the same algorithm pretty much for every function that operates on a one byte per pixel grape, and two byte per pixel RGB 565 image, and we have also support for Bayer pattern images too. Actually, to save RAM, they operate directly on the underlying image data structure that's stored that way. So, for example, if you want to perform a morph operation like if you want to multiply an RGB 565 image by smoothing kernel, our library will just automatically handle unpacking those pixels while it's doing the operation, and repacking them, and storing them. This is all done to save RAM. Anyway, I mentioned that because it's not OpenCV, the exactly generic command to command, but if you understand computer vision, it's not a leap and jump – pretty much the same kind of stuff we have on our system. As for how you can work with the library: we have two ways of programming with it. There are a lot of customers who never email us, and I only meet them randomly in person. They're typically pretty silent, but what they do is directly take our C-code and modify it to make it meet their needs. These customers are typically folks who are pretty hardcore c-programmers, who know how to operate on image arrays directly and aren't afraid to write code, and then otherwise you have a python level interface, where you can kind of use the built-in features of our library. Some built-in features that help to move the product and that make it popular are things like built-in QR code detection and decoding support, built-in data matrix detection and decoding support, built-in barcode and decoding support (courtesy of the z-bar library), and built-in AprilTag detection and decoding support. Recently we just added a new feature called TensorFlow Lite for microcontrollers. TensorFlow Lite for microcontrollers is the full TensorFlow implementation on microcontrollers.
SM: And there’s also TinyML that comes into play.
KA: Yes, I'm part of the TinyML Group. This is where a lot of people are interested in this new domain. An example of what we can do onboard the system: we can run a person detector at about 6 fps on the processor using MobileNetV2 that was trained on a data set of around 50,000 images. Right out of the box the camera will be able to execute this neural network that'll literally tell you a binary “yes” or “no” – if there’s a person in the scene or not. Just using that data there are all types of applications you can make – e.g. if you want to make a light switch that turns on when a person walks in or a door that just swings open when a person walks in, that will all run with the help of person / no person detection on the OpenMV Cam micro. It is the size of a quarter and will draw about 100 milliamperes of current at 3.3 volts.
SM: So, 100 milliamperes mean you can power it using a coin battery?
KA: You can use it with a coin battery if you keep running it constantly, not for long, but the idea is that you can power cycle. A basic example would be as follows: because the OpenMV Cam has no operating system and it's loading your program from flash, there's no penalty to cutting its power anytime you choose, there's no hard drive to corrupt. The benefit is that you can have another low-power microcontroller with a Wi-Fi radio, for example, connected to our system and that low-power microcontroller can look with a motion detector sensor like a PIR sensor. A motion happens, and if that happens then turn on the camera, take a picture, get the analysis, and then power cycle the camera off.
SM: And you have also built an IDE, which is very easy to use, you can run your code right there and see the effects. It is possible to plug this camera in using a USB cable and see the results immediately.
KA: Exactly! One of the goals with OpenMV was to provide an Arduino-like experience. OpenCV gives you all the power but it's still the command line. You can launch open the GUI tools, but you still have to build that for a complex application. A lot of times you may not care to spend that effort. With the OpenMV Cam, the IDE takes care of the front end displaying what the camera sees, and then with IDE, you can just write code to focus on giving the camera the ability to perform the operations you want. If you want to take a picture and execute the median kernel on it to do median filtering on that image followed by some type of binary image filter, and then erode and dilate. You can write four lines of code and then add it to our script or default scripts with id outputs, and then you'll be able to see that effect happening on the camera immediately. And you do not have to worry about building up all of the communication environment to stream that image data to a PC.
SM: And you also have the IO pins, which you can use to control the output, which can be used to control other things like a light switch or something like that.
KA: I want to get around to launching the micro this year, hopefully at the end of the year maybe we can do a production run for it. A basic example is that the OpenMV Cam has 10 IO pins, and so those 10 IO pins allow you to do spy busts for high-speed communication from one device to another, a UART, the CAN interface, for CAN communication, which is a very reliable protocol that cars use to communicate internally, and you can also do a PWM output on IO pins, interrupt triggering on high low edge events, Servo control pulse lift measurement. We have an ADC pin to read IO, to read analog IO, and adapt to output analog voltage. And I scored C for communication to a serial bus, so you can have multiple devices hanging off. And what's nice about all this stuff is that we have just finished an interface library for the OpenMV Cam which turns it into a sensor that is unlike any other camera sensor you've messed with. With our IO interface library, it's possible to set up an OpenMV Cam, and using two wires you can have multiple OpenMV cams all sharing the same data bus going back to one master unit that could be commanding and controlling. So, for example, you could have a Raspberry Pi that talks to 10 OpenMV cams at the same time on a robot or another system, and those 10 OpenMV cams could all be doing different things, and then relaying that data back to the Raspberry Pi.
SM: That makes it so much more interesting because once a robot has multiple of these, each could be doing different things, but even if they work together to do a single thing, it's very useful. Coming to robotics and drone applications, some people are building some cool applications, I heard that there is a company that uses your AprilTag detector for landing drones – tell me more about that!
KA: Basically, some folks wanted to land a drone more precisely, and so basically our system can detect AprilTags. I can't say we have the perfect python level API for it, but a lot of folks just take our C-code and modify it to whatever they need. But we actually have AprilTags 2, the implementation that you can download online. We ported that to the system and using the AprilTags 2 algorithm, you will get a tag, and we perform all of the necessary image matrix calculations from the camera sensor to give you the exact distance, the translation in XYZ, and the rotation, and XYZ of the tag in 3D space. Once you have that information, you can do that AprilTag detection on a high-resolution image. We have the OpenMV Cam H7 plus which enables you to go really high-res, you can do up to 1080p if you'd like. You're not going to get a frame rate of above one hertz on our system, we're not an application processor, but typically what someone will do is this thing called virtual windowing where you can try to detect the tag on an image pyramid – you can try to shrink the resolution and then look for it at different resolutions each being a subset and then you'll get a much faster fps, around 20 frames a second. And then by doing that, you would take the image downscale to 160 by 120, look for a tag – if you don't see it, ok then, upscale it to look for a tag again, and keep doing that until you find that particular tag, then you can zoom in on it and track it. Once they do that and once they have that tag being tracked, you can find that tag in a very high-resolution image, and then, as the camera gets closer to the ground, unzoom that effect. By doing all of that the system can find AprilTag, and then precisely land a drone on it.
SM: That's neat. The code is not trivial, how you fit it in on a microcontroller it's not trivial at all. AprilTag is quite involved – I don't know how large this thing is, but it is quite involved.
KA: Well it comes down to 15,000 lines of code. It was a challenge to port that. It took me about seven days of really hard debugging. There are all kinds of issues in that code. When we had to shrink all the math from doubles to floats because the whole goal of our system isn't necessarily precision, the whole purpose is to get high-level features onto a microcontroller. So, going from doubles to floats uncovered different situations where they would have an optimization loop that would try to get 0.000001 epsilon 1 accuracy, and when you shrunk the thing from a double to float that's impossible, and you have an infinite loop in the code then. Debugging was a challenge. As for AprilTags, the fundamental concept is that they have this thing called a 36-bit code – like a six by six tag. There are six boxes here, six boxes down. Let me tell you what the algorithm effectively is doing in a nutshell. AprilTags first will take the image and threshold it, so it does a thresholding pass and this takes up most of the time, as your resolution gets larger. Almost 90 percent of AprilTags time is spent thresholding the image. The default AprilTag algorithm does thresholding just using the CPU and not any complex fancy MMX stuff. It doesn't use vector operations, so the default algorithm is not that fast on a PC just because it's not using the best features available. But for thresholding, it'll segment the image into no contrast areas/lots of contrast areas. Then it finds black pixels on the border and white pixels on the border of the region. The reason why it has those no-contrast regions is that it can quickly ignore them. After it finds those, it then builds a list of pixels that connect a list of points that connect around regions and so it might say okay these are all the pixels in this black region that are touching this border, and these are all the pixels in this white region that are touching this border. Using that data then it can kind of build out something called Quad detection. So, with the Quad detection, there's a whole bunch more stuff going on – you have to sort these points by the centroid and then try to define corners. Using those edge regions it'll try to find things that look like four points, finds those four corners, and then do a standard homography. Then in 3D space, the tag is rotated to be a flat image. Once it has been done, it just does a simple line scan of back and forth of the tag’s pixels, and then it looks up and tries to go to a lookup table to figure out what that tag is now. Here is the memory wasting problem – in order to be fast they didn't want to scan a lookup table in a linear format, to try to find the closest hamming match, leave back up for just one second so the tag has 36 bits. It's a hamming code. There might only be 512 tags and all tags are some number of bits distance away from each other.
SM: It's error-correcting for automatically making sure that there is no collision, and the 36 comes from that 6x6 grid that you just mentioned.
KA: This tag, in particular, is called TAG36H11, where H11 means that the hamming code is 11. So out of the 36 bits, 11 bits can randomly flip in the tag, and it will not collide with another tag, that's what that means.
SM: Very robust in that sense!
KA: Yes! If you scan a tag in the picture and you want to find what tag that is in your table of tags, you have to compare it with every tag in that table. And you count the number of bits difference, and if that is less than 11, then you found your match, and then keep track of the closest lowest one which is the closest match. That's slow though. So, what AprilTags does is they compute all permutations on startup of 11-bit flips, and then they compute that again for another 10-bit flip, and so they allocate about 64 megabytes of that table to the N1 lookup speed.
SM: That's their optimization though, they are optimizing for speed, not for memory or anything like that. That's their optimization. Your choices would be different.
KA: There wasn't like a switch though for turning that off. I have to go into the library and more or less understand the algorithm that they wrote, and then rewrite it, so it can do a slow linear search. The benefit though is all of a sudden you save 64 megabytes of lookup table that would have been allocated previously. And additionally, there are other things like “union find” which they use to store a table of how they match image coordinates, how they track which edges are part of the same surface. They use 32-bit floats to throw that union of 32-bit integers to store that table. If the resolution is less than 64k pixels in the image, you can store that in 16-bit numbers. There are just a lot of different little optimizations I had to do on the code to shrink the amount of RAM it used from megabytes down to kilobytes.
SM: That's amazing – the level of dedication you have to make these things work on a microcontroller! I want to talk about the main Kickstarter campaign: how you built the OpenMV community, and how you launched the Kickstarter campaign. There was a big failure after you launched the Kickstarter campaign, but you came out of it, so I want to hear about the whole deal – how you built the community. That’s an amazing story!
KA: I'll go ultraquick on that! When we launched the OpenMV Cam product, the center choice was this thing called the OV2640, it’s not in production anymore, but there are just so many different modules floating around in the black market, you can find a lot of them there. But all those modules though are “defect grade” meaning they were units rejected from the factory in terms of their quality. Anyway, we decided to buy some of those sensors. They had been left outside, and the packaging says that you can go and expose them to moisture for
24 hours before they start to corrode. We bought some that had been exposed to moisture probably for decades.
SM: And you can tell by looking at these things you cannot tell that these are micro rust or very small rust elements.
KA: Yes. Well, what happens when you try to solder the sensor onto a PCB board it just fails to do that – it corrodes instead and creates a layer of oxide, which doesn't electrically connect, it's structurally intact, but there's no electrical current flowing – so, you pump out dead.
SM: How did you debug this? With no electrical connection.
KA: They will have an X-ray machine that'll look at the cross-section of the bumps, and it'll be able to tell whether or not there was a failure to connect by the color of the bump underneath the chip.
SM: And 80 percent of the chips that you had ordered were bad, right?
KA: Yes, 80 percent of them. To get the product out of the market we had to take an approach where we had to charge everyone's shipping twice. We used our shipping funds to restart production and pay for all of the lost components. And then by doing that we were able to generate hype. Basically, we did that by constantly coding, adding new features, making the system better. If you don't like the OpenMV Cam now and think it’s underpowered, trust me it was much worse when it started. We've continually built up the system – when it first started there were so many bugs to fix!
SM: How much did you raise during the Kickstarter campaign?
KA: We did a pretty modest 100k which was enough to get the production started.
SM: 100k is pretty big the first-time product launch, it's a big amount. Nobody gets 100k by just launching a Kickstarter campaign, you need to have some sort of a community built around!
KA: OpenMV had been on this website called Hackaday for quite a while, so Hackaday is a website with a blog about different electronics. Back in 2011-2012, Hackaday launched this thing called hackaday.io where they let people put projects up, and the OpenMV Cam at that time was the most popular project there. It had the highest number of likes, views, etc. There was a lot of interest back then for people who want to see this kind of thing happen – could you get a microcontroller to have computer vision, and could you do it in a way where it was not just one function but pretty flexible.
SM: You have been building the community for more than a year, haven’t you?
KA: I think the community has been under construction for about a year and a half.
SM: Well, they all came together to help you in the Kickstarter campaign, but after this during delivery the Kickstarter campaign was successful, but after the debacle happened, how much was your shipping date pushed back?
KA: I think we shipped like a year later, it was pretty bad, but our hardcore backers have tolerance for that, so we managed to get out the door. When we sold the M4 variant it wasn't that good, but we somehow managed to sell it. I was surprised by that.
SM: Do large companies order large volumes or are there mostly hobbyists who are buying the cameras?
KA: Mostly hobbyist and educational segment. This year was hard for us as the educational market was destroyed during Covid-19. OpenMV Cam pretty much shines in competitive robotic applications. One of the nice things about our camera system is that the frame rate's rather unlocked. You have direct control of the camera and you can do crazy things like changing the readout window to increase the fps beyond what the manufacturer wants you to run it at.
SM: Do people run into weird problems when they try to hack those things like that?
KA: Yes, but I think they're fun weird problems. Did you know most camera sensors support a readout window setting where you can define where the readout area of the camera sensor is, and the benefit of that is if you change the readout window when you make it really tiny? The frame rate will typically go way up, then you can get into several hundred to 200 fps easily of some cameras – because if you make the readout window very small.
SM: Let's say you have already detected a face and detected an object, and the object is not moving that fast, so you can read out only around that object, it's almost like tracking, so you're reading out a smaller area and then tracking that object very fast because now you're at 200 fps.
KA: Yes, and we have an example of that with our OpenMV Cam H7 plus. This is a five-megapixel camera sensor, it has readout window control, and we have an example of doing IR LED tracking where we can track at 100 fps and IR LED in the 2592x1944-pixel region. I think it does a little bit higher than 100 fps but our camera sensor driver isn't quite at the final level of optimization. With super huge deficiencies in place, somehow the product has managed to sell. Our camera system data pipeline only accepts one image at a time, we do not do double buffering – right now I’m working on getting double buffering. But we've been able to survive basically where our camera only runs at half the frame rate it's capable of doing. Just because we only have one image data buffer, and so if you're working on that you can't accept a new image; with double buffering, our software architecture will then be allowed to accept a new image while it's processing a current one. That allows you to double your frame rate.
SM: So, with the same code that you had written before, it is going to now run twice as fast with the same processor, isn’t it?
KA: Yes, with the same processor. We never did it originally just because we've always been RAM limited, and on the plus model we added 32 megabytes of SD-RAM. When you want to record a video, for example, you have to deal with something called SD cards. We're going to implement ISA FIFO in RAM, and this will basically allow the camera to collect maybe 30 to 40 frames in memory, and then write those all quickly to the SD card once erase finishes. This will also give you the ability to do things like smooth video recording, just like a commercial Linux system will do.
SM: What fascinates me is that OpenMV is still your side project, it's not your main day job. How do you manage to get so much done? Most people have difficulty just keeping a single job. What does time management look like?
KA: Well, it depends on what makes you happy. I like being a service to others and helping people, and so I get the most energy from doing that. I used to be into video games, but those aren't fun for me anymore. I've already experienced all the standard plots and tropes, and I don't want to do that. Also, I never feel energized after watching TV. I did watch The Last Dance with Michael Jordan and that made me feel sympathetic, so I still like documentaries, but I've gotten over just watching random action flicks, where things explode and strong guys go around punching people. There's a lot of different things to watch and consume, but you can be a little bit more focused that instead of unwinding.
SM: I do something similar. I don't watch tv, it doesn't even bother me, and sometimes all the tv I watch is while doing exercise, and that's not too much. It helps me focus. I don't watch the news either – people think I will be uninformed, but I think the news makes no difference to our life.
KA: That’s so true! It’s a reality that you can't change or affect any of those decisions.
SM: It has an aftereffect that you keep thinking about that news item long after you finished reading the news. I used to read news mostly on weekends, but recently things have gotten so busy that I don't even read the news during the weekends anymore. When I scroll through my social media page that gives me enough information about what is going on. That's an important way to save time.
KA: And it comes down to also having a good healthy exercise habit. When you start letting your body slip, it'll result in a decrease in your energy levels, which will hurt your ability to continue working hard. You need to exercise more to build up your internal energy levels and have a good regime. I don’t necessarily say I want to do it all the time – I’ve been picking up things in Covid-19 like golfing, trying to get out and do other things just to keep my mind sane. But it's all about finding that healthy work-life balance and sticking to it more or less.
SM: Tell me a little bit about your childhood and growing up, what influenced you to come into robotics and machine learning. Did you have any early influences which held you into this?
KA: I think one of the big things is that when I was a kid I played with Lego – Lego makes engineers. Also, there were electronic kits back in the day that you soldered up. As an example, they'd have one PCB board, and the PCB board had nothing on it, and you would literally build a robot's brain. It would drive straight and when it hit something, it had a microphone that would make an impulse when it ran into a door. That impulse would then trigger a set/reset latch for it to back up. And the set/reset latch had like a capacitor that would decay in power, and so it backs up for so long, and then it would run out of power, and then it would go straight again. And so, all it would do is run into something, back up, go straight, run into something, back up, go straight… What was cool is that you had to build that using electronic circuits, there was no processing onboard, it was just analog circuits and a few digital latches. And from there I got into the BASIC Stamp back a little bit before the Arduino was taken off and that was my first experience of playing with microcontrollers, and it was wonderful because it was the first time, I could make something of my own. I was a high school student then. Before this, I wasn’t following this stuff, I was more into just standard childhood things like playing games, hanging out with friends.
SM: And then you went to Carnegie Mellon for your undergrad as well as masters. What got you interested in electrical engineering at Carnegie Mellon?
KA: I would say I didn't come in as a superstar at all. At school, I was determined to work hard and try to go somewhere. A guy who really helped me was Professor James C. Hoe, he took me under if his wing, gave me the budget to work on research projects, and more. This was in my undergrad years. Typically, undergraduates can go to school and can sign up to do research.
SM: So, you just talk to this professor. How did you make a connection? A lot of undergrads would just spend their time doing undergrad stuff, they won’t even think about approaching a professor saying that they want to be part of a research lab – because they would be shy. They may be thinking that they cannot pull it off. How did you do that as an undergrad?
KA: Intel was a big help for this, so back when I went to school at Carnegie Mellon, Intel was running something like a scholar’s fund. Intel would give you a stipend, they'd pay you to do research. It was not competitive at all because most people didn't care to research. I just applied and they didn’t look at me basing on my previous accomplishments, being a freshman in Electrical and Computer Engineering was enough. They suggested me picking a professor on a special list. James Hoe was into microcontrollers, and I decided to approach him, that’s my path got started.
SM: It is the luck you happened to be with the right professor who helped you a lot.
KA: It is definitely luck! I could have picked any other professor on that list, and some of them may not have had time to bother with me.
SM: But it's also initiative – a lot of people in your situation may not even bother approaching a professor or applying to this kind of activity. For somebody who is an undergrad at this level, maybe in their sophomore year, what recommendation would you have for them to enter the field? There is a book called So Good They Can't Ignore You by Cal Newport, and there’s another book Deep Work.
KA: Yes, when you're getting started in the field, there are two ways to think about. When you're in school, one of the primary goals that you should try to do is get that basic education on how things work. The mark of a good engineer is that a person wants to understand as much as they fundamentally can beneath the hood. You shouldn't enjoy mystery, so if you're going, for instance, leverage OpenCV, but you have no idea how these algorithms work fundamentally at any level, that's not the best. You should try to have some understanding of those algorithms, if you understand things from a really core competency level, it'll help you leverage and go faster for bigger and greater things in the future. I don’t mean you need to reimplement all OpenCV in your undergraduate, but if you are taking classes to understand these algorithms, it's really helpful if you know how to code them by hand. You need to get some knowledge of how these things work, for example, reading AprilTags code for example made me 10 to 15 times smarter just because I saw all these different tricks that you would use in computer vision and linear algebra. It is good for you to get reach of these black boxes and try to figure out how things operate. At the same time, I would also say the education you'll receive at your university will give you a resume saying you were educated, but you shouldn't depend on what your university or institution gives you as being the only education you can get. Make an effort to do your projects and build your own things in your free time, it'll help you get beyond and push your abilities to new levels. Technology is always changing, things are always changing, try to learn as much as you can and then implement that in practice, and build your things to see what can happen. I had an operating system and design course, so one of the things I tried my hand at was building my own operating system for a microcontroller. I learned in class how that would be built on a computer with a memory management unit, and then I tried my hand at doing that on a microcontroller. It had no commercial retail value, it didn't have anywhere to go, but it was a project that I tried to build just to understand more.
SM: That's a great strategy. If you manage to understand things deeply, it clarifies the prospects. For embedded vision, the work that you're doing, are there any specific books or courses, or blogs that you think are useful? Specifically, for microcontrollers, I don't think there is much information.
KA: I think we are popular – we sold like 40,000, probably 50,000 units now, it's hard to know the numbers because we get cloned a lot. Our software and our hardware designs are open source, so there's a huge market of clones, especially in China.
SM: How do you feel about being cloned? The whole reason you open-sourced is that somebody can clone you, but at the same time that profit could have been yours.
KA: With the open sourcing, you can’t clone me, the open sourcing was there so you could remix it into a product. One of the things that makes OpenMV Cam different from the Raspberry Pi is that our processor is available is for purchase on Digikey and Mouser. Actually, you can buy our processor, take our code, and build a commercial product on it. My goal isn't to have zero revenue from that per se, but it to unlock situations where people could build things using microcontrollers, there'd be a code base for them to start with versus building it all by themselves. Different people can just copy the product directly and they produce the exact mirror of it which competes against my product. That's been happening less often now. There's a new Chinese processor on the market, they didn't clone my processor, they don't clone the hardware anymore, but it’s pretty much the same – the whole software development environment is the same as the software concept of OpenMV. It is now being cloned but with different hardware. You can find our IDE has been rebranded and changed into a different product.
SM: In your licensing, could you have written something that it is open source but if you're using it for commercial purpose you pay some licensing fee. Is it possible or is it not enforceable?
KA: It's not enforceable. We leveraged for our IDE creator which was already GPLTop build on our system, and so the cloners also keep their software open source too, so they're not violating any licenses. I have to accept it; it's flattering that someone thinks my idea is great.
SM: I completely agree. When somebody copies your idea and clones it, even builds a business around it, it is flattering, it means your engineering has succeeded at some level.
KA: There is a general trend for microcontroller stuff right now, we're really on the cusp. It proves to the world it's possible. But a lot of the applications we have are underpowered, and we work well on a few particular scenarios, but it's not a knockout hit like OpenCV. That's going to start changing though a little bit in the future because Silicon manufacturers are starting to get an understanding of how many devices they can deploy in the market, and what kind of revenue they can make by developing a future where it is possible to cheaply deploy cameras. You're going to start seeing in the next couple of years stupendously powerful microcontrollers start arriving on the market, about 100x performance in terms of what we have now.
SM: And it is driven by the cost consideration and the form factor.
KA: Yes! This little processor is running at 480 megahertz, and it can do four ops per clock cycle, and that gives you six fps person detection. That's going to be 100x in a few years. At that point then this little device the size of a quarter is going to be the same footprint, it's going to be able to do a complete computer vision solution where it's tracking multiple people instead of just like person / no person detection. It'll be full bounding box image segmentation on all the people in the field of view, and tracking them.
SM: That's yeah amazing. These things are so low cost and low power, and the market for these things is huge. People would put it in stuff that maybe doesn't even require it. I read somewhere that you said that a toaster could have one of these, and it's not inconceivable when the price goes down substantially. Everyone can start having these microcontrollers.
KA: Well, yes. Imagine we've reduced the cost of the compute and camera of deep learning down to less than ten dollars per unit and let's just slap a camera per level of your fridge. Now everyone thinks they can only have one camera because that camera is too expensive, so they have one camera which has to do everything. We have to do the perfect lens correction somehow, get everything into the field of view… We can solve it by just throwing more cameras at it, and so now your fridge has a camera per side of the wall or even three cameras on each level, and from that, you can identify all objects within the fridge, and then literally tell you, when you're home or away, what's in your fridge, what to buy. There are all kinds of really cool things once we can kind of drive that price down to a ridiculously low level. Cameras then become a commodity instead of them being treated as this ultimate sensor that requires the ultimate compute package. They're treated more of a sensor that we can use for random whatever applications.
SM: The people who want to help you with OpenMV, can just go to GitHub and contribute to it through a pull request, or is there any other way? Is there a forum that people can join?
KA: We've been running for about five years now, basically we have an online store where you can buy our camera system – it's just me and my co-founder Ibrahim, and we have two partners in China. It's not a big team, but we've managed to go pretty far. We also have a software optimizer hired this year, who's helped us to increase the performance of some algorithms by like a thousand percent. A lot of my stuff was naively coded just because I didn't have time to think about what's the best way to do it.
SM: I just want the audience to know that you are very down to earth, and you try to downplay a lot of work that you have done. Some things that you have done are spectacular – like putting AprilTag to a microcontroller.
KA: I think that was one of the most impressive things I did. I wrote a median filter where I did the naive example of computing. I would compute the histogram for each neighborhood and then sort it, and then repeatedly do that, and so our software optimizer came in and said that's bad. And instead, he has a sliding window histogram, and the performance difference was stupendously huge. Now the OpenMV Cam supports a medium filter where you can do like a 21 by 21 median filter on a 320x240 image and still hit above 10 fps on this microcontroller, which would have been maybe five seconds per image or something before. Basically, with a huge corpus of computer vision algorithms that we ported over, maybe it's kind of similar to OpenCV.
SM: The code team is at Intel, and so there is a push towards making it faster and faster for all the intel processors, but we also do Google Summer of Code which adds in completely new features. Then we have collaborations with other companies where we do optimizations for other processors. So, it's a mix of a consistent effort to make things faster. The DNN module is something that everybody is interested in, and everybody wants to make the DNN module faster on their processors, so that's a big focus right now. But there are also pretty random applications that get added during Google Summer of Code, which is very nice. These are rich applications; they make the library so much more usable.
KA: With OpenMV we've just been adding random features that people ask for, and there's never been what the library should have in it.
SM: That's been true for OpenCV as well. When I was a student back in 2001, I used it for two purposes: one to read and write an image, videos, etc. – that was very convenient. And the second thing was the phase detector. The OpenCV phase detector was always available. So, those were the two killer applications. After that, things just kept getting added, and every once in a while, the library is reorganized. We cannot have a plan to implement all the different algorithms because it completely depends on what the users want. If you try to do it yourself, you would end up implementing some good algorithms, but nobody is going to use them.
KA: Originally, I was trying to update OpenMV to have as many features as people needed to mirror all of OpenCV's functionality, and then I realized that's impossible, there's too much stuff in there, and I realized I had added a bunch of stuff that people never use. There’s a bunch of garbage in our library right now that needs to get thrown away, what I plan to do. Moving forward means focusing on really useful things, so mirroring OpenCV's image processing module which is like the core functionality of OpenCV that most people use, followed by really leveraging TensorFlow Lite for microcontrollers for deep learning. And then we also have someone bringing on support for NumPy. There's a library called U-lab and it is effectively NumPy on MicroPython, and it doesn't have all the necessary integration features, but once it's done being fully supported for our image array type, this will let you do NumPy matrix algorithms on images, more or less on the OpenMV Cam. If you are a NumPy coder you can write all of the matrix ops to happen on OpenMV Cam and have it running on a microcontroller. How people can help us? Well, it's always just Ibrahim and me to run the company, to keep OpenMV on the float, and also work on different things like an interface library for the Arduino. Primarily driver development takes up all my time because that's hard for anybody else to do if they don't have the necessary tools or understand the architecture. But we're always looking for help with anyone who wants to contribute to the image library and round out features that were missing, for example, all of our drawing algorithms are non-anti-aliasing, so you can help to update that to anti-aliasing.
SM: Have you tried applying for Google Summer of Code and getting some interns?
KA: I haven't actually, I didn't know there was an option available.
SM: It is available, just apply and they may choose you, they will choose you I’m pretty certain.
KA: Ok I will do that. We did apply for GitHub sponsorship and we've got maybe $200 a month.
SM: Google Summer of Code is huge because they give you an intern and they pay the intern well, so if you get two or three interns that can completely fast-track some of the algorithms, some of the work that you're doing.
KA: One thing we want to do is to port the library out as an external module, so right now we have a few different things coming into play for our codebase. There's MicroPython, there's the STM32, and then there's our library, and all of them are a blur in their functionality being combined. You can't take our image library as a module and just put it into another system without editing a lot of different pieces of code. So, one thing that'd be helpful for us is if we could split our image library off completely from being integrated too heavily with MicroPython and our STM32 in particular. Luckily, my co-founder had the foresight not to directly inject a lot of MicroPython into our image library, so our image library does not depend on MicroPython, but there's a lot of STM32 specific things going on in it. One of the things we'd love to do is to break that into its own separate module.
SM: Cool. So, people can simply go to the GitHub repository, join your forum – do you guys have a forum?
KA: We have a forum, it's not on GitHub, we chose to put up a PHP v3forum, so that Google can index, unlike discord-based forums, or slack-based forms – those are impossible to index by Google.
SM: What is the best way people can reach you?
KA: OpenMV is a company created to get the OpenMV cams on the market. It is hard to sell products otherwise. We have a website, a commercial email address, so you can sign up for our email list – we post updates when I have time. We have a Twitter @OpenMVcam, we do not own @OpenMV, that's owned by another account, and you can do #OpenMV to hit us up. We have been around for a while, so we've got products on sale on SparkFun if you're in the USA and you want to buy from there. We have a partner in China you can buy from, and then we have various distributors all over the world who hold our products that you can buy and play around with.
SM: That's awesome! And are you going to do the Kickstarter campaign for the micro camera that you showed?
KA: It's cheap to make, so we're going to be able to pay for it out of pocket and do a production run of about a thousand units without even needing Kickstarter. By the way, the camera does autorotation, it has an IMU located directly behind that camera, so when you rotate it, you can automatically rotate the camera field of view using the gravity directional force to figure out which way we're pointing at. That'll also be helpful for visual odometry.
SM: The production version would be out this December, right?
KA: I can’t say for sure; funds have been tight to be honest, with Covid-19. And we have just paid for a remanufacture of literally everything in stock, so we managed to pay that huge bill. I hope that we'll be able to announce a new production run of the micro probably in December or January. Probably we’ll just go directly to building a thousand units, and then putting it on the store and trying to sell it from there. One thing that's nice about it is the whole purpose of this device is that you can plug it into another microcontroller easily to have it be commanded and controlled by that, so we have a remote control library that's going to let you talk to the camera and command it to do different things. For instance, to tell the camera to track AprilTags, and then you can get that data on where the AprilTag detections are. Our RPC library is called the remote procedure call, it's quite flexible, so this will let your main processor do something like ask the camera to detect AprilTags. The camera will then detect the AprilTags, and then it can return a JSON list of every tag in the image. That list will be serialized over the serial link to another processor, and then the processor can take that JSON string and decode it to get the same object representation from the camera of what the camera saw. So, it's a lot more featureful than if you've ever used embedded devices. We designed our RPC library to be a full-featured band, a full-featured data link so that you can move anything over that link. You can get a jpeg image over that link if you feel like it. You want a jpeg image plus all the AprilTag detections – you can get both at the same time.
SM: That's very nice! I wish you all the best with the new camera. It was such a pleasure to talk to you! I have wanted to talk to you for a long time because embedded vision is something that I don't know much about, and it is great to know from somebody who has so much experience – we are learning from you! Brandon, who is the chief architect of OpenCV AI Kit, thinks of you as a mentor, your name comes up in our discussions quite a lot. There are things
that you directly tell Brandon, but there are also other things that we learn by watching your videos online. It's been very useful, especially that Hackaday: Fail of the Week was such a big learning experience. We were inspired that despite that debacle you just came back in a very big way and you delivered the product. We are in that stage right now, we have to deliver our product, so it gives us a lot of inspiration, a lot of hope.
KA: I think OpenCV AI Kit is amazing and it's going to be super powerful, and your colleagues know what they're doing in terms of hardware design. There are always pitfalls when choosing the right sensor, and building hardware are difficult because there are real constraints on where you can buy things from, when you can buy it, and how much you can buy.
SM: In your case, it was not a design fault, it was one of these weird things: you found a good sensor, a good chip, but it had just this weird quality, so it was not really your fault, but you still have to bear the responsibility and you did.
KA: Yes! When you're a manufacturer of hardware when you ship a product, it's your fault if that product doesn't arrive, and you have to refund the customer no matter what, even if you handed it to the shipping company correctly, and you wrote down the address correctly, and everything on your side went well. If that shipment happened to get robbed by pirates on the way to the customer, you still have to refund.
SM: These products often get stuck at customs for some time, and it's difficult to know
where things were lost.
KA: One of the issues of selling hardware is that it doesn't come with a box that has pretty pictures and big words, so customs officials typically will look at that as contraband.
SM: Wow! Fortunately, ours will come in a case!
KA: Make sure then to put that in a box that has some labeling like OpenCV, and it's all in a white box with all the regulation stamps. That's the best way to get through customs. With OpenMV cams we spend a little less on the packaging, just put it in anti-static bagging with a barcode on it, but having those things though does help.
SM: That's good advice, that is something only people with experience can advise us on.
KA: Here's a footnote. My co-founder lives in Egypt, and I have to ship him hardware now and then, and it's a challenge to do that because of customs and regulations. Sometimes they impound the cams that I send to him.
SM: It's been good talking to you, let's stay in touch, thank you so much for making the time for this long interview, I'm glad we covered a lot, and I’m sure that our audience will be inspired by all the work that you have done, and OpenMV will probably get a few new people who would work with you.
KA: We're looking forward to hopefully working closer with OpenCV in the future just to enable computer vision more on all the embedded devices that are going to start appearing in the future. In the next ten years, we're going to witness high volume integration of ultra-powerful machine accelerators for all different types of AI applications, so we hope that OpenMV will enable folks to make that a reality and make it easier for people to see that this new wave of deep learning and computer vision devices being integrated into all sorts of random trivial things.
SM: That's great, thank you, we'll keep in touch!