In this episode, Anna Petrovicheva (CTO of OpenCV.AI) meets Ara Ghazaryan, Scylla Tech Co-founder and Lead Data Scientist to talk about the mission of the Scylla, real-time threat detection system, and their prospects.
Anna Petrovicheva (AP): Hi, my name is Anna Petrovicheva and today I’m going to be speaking with Ara Ghazaryan, Scylla Technologies co-founder and lead data scientist. Scylla tech focuses on real-time threat detection systems. I’m excited to have you here! To start with, would you please describe what your company is all about?
Ara Ghazaryan (AG): Thanks for having me here! I’m Ara, Scylla technical co-founder and lead data science. In short, Scylla is a world-leading real-time physical threat detection solution. Our mission is to empower the private security industry with next-generation AI solutions. And with every new product, we strive to make safety more accessible to those who could not afford it otherwise. We offer a wide variety of powerful AI physical security solutions, ranging from object detection within a perimeter, intrusion detection, thermal screening, anomaly detection, behavior recognition, personal search, and false alarm filtering to name a few. We started off as a startup back a few years ago and now we are proud to deliver our solutions to such companies as Daimler, Major League Baseball’s Chicago Cubs, Oman International Airport, Legrand Mexico Facilities, USC Verdugo Hills, among others.
AP: Thank you. So, could you please tell us a bit about how you got into this industry, what is your story?
AG: Well, my story is a little bit cumbersome. I started in academia, my background is in optical physics and I was doing research with biomedical-related optics and imaging. I used to develop the tools that nowadays are available by a single line of code, and back then we had to develop all this matrix analysis and everything from scratch and do all the mathematics beyond. Back then this image analysis and conversion and all of that were pretty exciting. Then at some point, I met co-founders of Scylla: Albert, Armen, and Ashot. I merged with them and brought some insight probably: that was when Scylla was just deciding what ideas to pursue. I jumped from my research academic life into this very exciting field. The company was chosen as one of the startups in Startup Boot Camp in Amsterdam and spent several months there: after that, we gained these first very important customers.
AP: When you research algorithms in academia and when you create products in a startup, what is the main difference between these types of work?
AG: Well, I must confess, there is a difference of course. If you're developing a product that is supposed to be later on in production, in sales, that has a very strict, of course, depending on the use case: it may be customer-oriented and goal-oriented, etc. In academia the goals are slightly different, it's more generic, it's more about the community. The most important difference is when I'm doing stuff here, that is very exciting to see it in action, and when you see that it is potentially saving – safeguarding at least and then it saves some troubles, and you see it works on a day-to-day basis. There's a person without a mask, the system gets alerted, there's an intruder unwanted person in the premises you get that the owner gets an alert, and we get the confirmation that the system worked and performed. And that's something that typically in academia you don't have. You just type in an article and send it out and you don't even know whether it will be okay. Well, people cite you and you know that some part of it was used in later research, but the endpoint is not obvious unless you jump from academia and create your own company and develop your own product or whatever it is. So, the clear difference endpoint here: it is very distinct and it's very inspiring too – other than the commercial part and everything, you really work on making the world a safer place.
AP: Well, that's a big mission I would say. I’d guess that there are also differences in, you know, the timelines. In product development, you do have some deadlines, which you may not necessarily have in academia. Obviously in academia deadlines for you know paper submissions but I would guess that timelines are way shorter in product development, am I right?
AG: For sure you don't have the luxury of taking your time and then checking over checking and everything, but at the same time of course their product should be working, it’s not just like cutting shortcuts and everything, they're not always justified, you cannot always afford them and yes, the pace is much more aggressive here, and to be honest one of the personal reasons I jumped into this type of environment was also that because I craved for this faster pace for some, let's say, aggression in the development. Because in academia you are too relaxed and you don't see things changing. Sometimes it's dragging, and there is nothing like that fast-fast-fast race.
AP: And one point that I’m personally really interested in, I would guess that the reliability of this product is completely different in academia and product development. I’m not necessarily right here, so we can discuss it, right. When you create something in academia, the aim for your work is “to be published” and “to be continued”, but in threat detection, in the product, you have a completely different level of responsibility to publish something, so you will probably need to test it way more thoroughly
AG: Definitely! If we are talking about computer vision and AI, we all know that there's at least a big portion of statistics under. So, statistics means that there's a portion of truth versus false right and, of course, you want to bring this threshold to the highest. But it's always 99.99 something. The question that you raise is very important because this 0.01 is something you are responsible and liable for. There are two things that you have to keep in mind. The first thing for sure and the end client should be aware of that. There's not a magic solution that works in hundred percent, this should be clarified and this information should be provided clearly. That's why we have very thorough white papers within each and every product, and these white papers contain all sorts of information that do contain these numbers. So, the client is aware that it's an augmentation, it doesn't substitute completely in some cases. I mean especially in cases where a human cannot avoid being there, there are many solutions that I can talk about, e.g. drones. In some cases, you are increasing the number of eyes of the helicopter pilot, you put the cameras on the helicopter so that it does the job that otherwise one human being shouldn't be able to do, but at the same time, it's still 99.99 something. Our task is to bring these nines as much as possible, in many cases I can give some astonishingly high numbers, so they are two misses per week for 30 cameras or something like that, so just again, we demonstrate quite high numbers of accuracy, but still it's not 100. And if that is reliable, and legally everything is covered, then we're good to go.
AP: This makes a lot of sense, especially given that the industry you're working with is very specific; you have a very big mission, but also a lot of responsibilities.
AP: So, I wanted to talk a bit more about threat detection again. This is a very specific industry you're working in. What is the difference in artificial intelligence algorithms for threat detection compared to some general artificial intelligence applications now?
AG: Again, to clarify: of course, we all understand that artificial intelligence itself as a topic and as an idea, is slightly different from what computer vision tasks are doing. I would just add what we are usually claiming: that's in the process of our training and implementation. So, when you start to teach the models, I would say at least for my own experience, and there is a sigma type of a jump when the model is jumping to the level when it aids the training process itself, so how it works: of course, we're talking about data sets, we're talking about the more the merrier, etc., but then what we have preached so far is that our models, first of all, are so accurate that they aid the collection of the data, so there's this back hoop and the sources that we create and then we make, previously we used a lot of manpower. Nowadays the system itself is at some level self-learning, and yes, we can say that we have much more from artificial intelligence than typically is claimed because it's a hype word and everyone is using it. So, it's not that clean machine learning in this way, the data set is being created, so we taught the child to the level that now the child can look at the world and say “okay I see another weapon, here is another weapon, let's add up to my data set”, and on the next iteration it becomes smarter. So there's a clear difference in that respect, and of course use cases define a lot of directions: it's not just boosting the numbers, you have to know what at the end of the day is important for the client, there might be some cases where this thing is not as important than that thing, like detecting smaller objects can be much more important than detecting in the darkness. The use case tells us, for example, that for the implementation on the drone smaller objects are much more important than anything else, because they overview from the high and everything is small there, so different use cases define the direction in which the models will be trained can be a super cool model with highest accuracies, but then for a specific use case, this direction would be much more interesting and desirable.
AP: I would also guess that data is really hard to combine in threat detection, am I right?
AG: Exactly you are right, especially nowadays with all the privacy issues and concerns, you might think that if there are some publicly available data you can get them and then use them, but at the end of the day that's at least not allowed for commercial use, and we are very careful in that respect, that's why we ended up creating our own data sets. A lot of data sets in the field are created by us in different locations, also we use synthetic data, synthesize them and make them out of like we have our own teams that do create the data. We started with the available and commercially available data sets, and then we added up to that from our own.
It's not that clean machine learning in this way, the data set is being created, so we taught the child to the level that now the child can look at the world and say “okay I see another weapon, here is another weapon, let's add up to my data set”, and on the next iteration it becomes smarter
AP: I would also guess that one of the problems with the public data, except for the licensing issues, is that, in some cases, it's captured not in your real environment with your threats and weapons, but in some artificial environments with some actors, that are overplaying their roles, so it's a bit too much and not similar to what happens in real life.
AG: For sure, and that is a big concern especially when we're talking about let's say action detection models and we do have two branchings, one of them will be like object detection tracking, the main branching, and the second one will be mainly detecting the actions themselves. The action detection models are very specific in that respect, it's very hard, and unless we are given data by the client usually that's how it works. For example, we have the shoplifting solution, so it detects the shoplifting acts. The dataset is provided by the client and it's a huge dataset of long-term recordings, with all the prices, and everything in mind. That's how it works if it's trained on this specific data: you cannot create shoplifting, because none of us is an expert in this field, and the same goes with the fight. You should see, sometimes we do these live demos and then we do tests, and, as for the fight detection model that we have, it's always so hard to showcase the demo, because we are peaceful guys here and then we need to mimic the fighting, and that's not always worrisome.
AP: I expect that the threat detection industry is going to be huge. I wanted to ask you about its current state: how many clients or how many installations do you see now and what do you expect in the future? My impression – I may be wrong here – is that right now it's getting more and more common to install automatic threat detection systems everywhere, not only in major big facilities but also in small shops, for example. What do you think about the future of threat detection using artificial intelligence?
AG: You are completely right. Because the field is new, the solutions are new, of course, some people are still reluctant. In some cases, these are people who are decision-makers and they are, for example, an old police officer who is used to this old-fashioned way. Even if they are presented with this solution, we can judge by the questions they ask, that they are not fully aware of the capabilities and sometimes they overvalue it, or they think it's magical. They cannot be persuaded that the system can augment what a human person can do, and, in many cases, it is doing the job much better.
When I am asked about the limits of the system, sometimes I’m given this video footage and asked if it can detect the gun in this footage, and the rule of thumb answer that I usually give, the straightforward answer to that is: if you can see the weapon, then the system will do that and most likely it will be done better. I’m not thinking that a machine is superior. The reason for that is that in fact, the algorithm, that is embedded at least in our solutions, is zooming in this area. That's the checking and over-checking, it has internal cascading decision-makers in the zooming area that check and overcheck, and that's why we don't miss the weapon, and that's why we don't do more than 0.01–0.05 mistakes per camera per day, and if a human being could do five times in second zooming in the different areas, well, that would be pretty amazing and he will overperform the system that we are suggesting, but usually that's not the case. And especially with human factor errors, – 24/7 overviewing is a tough and tiresome job to do, – they are quite common.
To cut a long story short, there is an augmentation of human capabilities in many aspects. We're talking about weapon detection, but if you look at the solutions that we provide, it will be people counting when they pass, I mean, there are so many similar solutions that come with their own problems. I would not stop on each and every one, but I can give some examples of common solutions that are there, and the way we overcome that – we are using the capabilities of computer vision, and typically they are faster, and they are 24/7 human factor error-free, etc.
Of course, with their own limits, it's not magic, we cannot detect intentions of violations – a question which we are sometimes asked during the demos. The person who reviews and asks if you can detect the intention, unfortunately, and, judging from some movies out there – luckily, we cannot do that yet. But if you can see that, if a human being can see that, then most likely the system can do that, and maybe even better, because of these integrated smart features, that makes the difference. That is the main thing that I talk about when a person asks this question.
AP: It's like having a person expert in the field, who is never quite hungry or unhappy while looking at this area to decide whether it's a weapon or not, right?
AG: Exactly, and again, using all the capabilities of the system. Just an example, imagine a 4k camera looking at something which is 50 meters away: to see whether or not there's a weapon in the hand of the man out there, you need to zoom in on this very small portion. Now imagine the system that does it five times a second, zooms in the section, and checks up. It doesn't miss a frame, it looks at each and every frame separately and does the analysis in time and that's the specific conclusion based on that. Human beings physically cannot do that manually. In certain areas – yes, but not that fast, and it’s always like jumping from here and there in the crowd of people. That's why in these specific cases the system definitely outperforms human beings. But still, it's an augmentation of the existing surveillance system that you have there, so the final decision of making the action is still at human disposal. So, they have to make the final decision, the system will not do that. Basically, what maximum it can do automatically is to close the doors and wait for a confirmation from a supervisor to open them up.
AP: So, it's not autonomous at this point, at least yet, right?
AG: In many cases not. We alert people, that's the idea.
AP: Do you expect these systems to be more and more common in the upcoming years, probably exponentially?
AG: For sure, and especially because of the technical background developing so rapidly. Initially, the whole AI became available and abundant due to this emerging sector – the background and the educational part that raised so many specialists, also the hardware beyond that, like GPUs that are utilized in most of the cases, and, moreover, the testing and physically available items. I can run our system on my notebook and make it analyze 25 cameras. One notebook can analyze 25 cameras! That at the moment is more than we could dream of 10 years back. But what I see in the upcoming future currently, for example, embedded solutions cannot be that accurate. Cameras that are being sold nowadays have some embedded analytics. But at the end of the day, what we experience at least, many users have overpromised some things, maybe they imagined things differently, but they end up not liking the accuracy levels because they have misses and false positives, especially, for example, in motion detection in perimeter protection. It gives probably like 30 alerts a day to the camera, we know this number is from real installations, and guards are starting not to pay attention because there's a shadow that raises the alerts and everything. Unless you put some smart analytics in this type of solution, for example, we are providing one of our solutions intrusion detection based on the actual content, the visual content, so if it's a human being or a car (or both), then you get the alert. Or we filter the false positives that are sent out from the camera. For the moment – yes, embedded systems cannot afford to be that accurate, but I foresee that shortly there will be smaller and smaller hardware that can provide this type of background like the hardware in the base, so the solutions can be embedded locally. That's one of the directions, as well as cloud-based solutions.
All of our solutions can be both locally installed and cloud-installed because we know that end-users want different architecture. We know the benefits of the cloud, of course, that's straightforward, no hardware needed to be bought, etc. But in my opinion, for the moment cloud-based installations have two issues that haven’t been solved yet. Maybe with 5G it can be fixed, however, the first issue is a network, so the bandwidth is not always sufficient for the analysis to be fast and reliable, and infeasible. The second is price. Still, there are some affordable solutions, but if you calculate especially for low hardware solutions like ours – as I said, it's really low hardware, – you end up with the following numbers: you can spend two months of cloud payment and get the hardware permanently, buy the hardware and have the solution on-premises. Also, cloud-wise people so far are not that much trusting that, especially speaking about the security, sending the video feed to some cloud to analyze – not everyone will like it. There are names out there that are trustworthy, but still, it's something that is not that easy nowadays. So, we end up having a lot of customers that want their own solution on their premises: no touch, no connection to the Internet or whatever, there is just a closed-up system; and yeah, we do provide that.
AP: This makes total sense to me. I would expect that in some facilities it would not be even possible to send a whole video feed due to no remote Internet connection. And you cannot afford not to detect something because of an unstable Internet connection obviously.
AP: This brings me to my last question about the general future of deep learning and computer vision, for example, for the next five years. You already mentioned that embedded systems will be big and also cloud systems will probably be growing, but maybe not at the same speed as embedded. So what other trends do you see based on your experience over the next five years?
AG: Some ideas definitely like a generalization of the solution, I also foresee and hopefully that will happen sometime soon monopoly currently the AI solution, especially with hardware. There's somewhat some kind of a model and we all know that. NVidia provides superb products, very good in everything, like hardware-wise, and also libraries that come with it are very nice and everything, but it's still a monopoly. There is no alternative out there and we all know what monopoly brings with it, right? So that's one of the things I would at least like to see changed in the near future.
As a generalization of the processes, there are already some startups, some ideas, for example, there are some models that are out there existing, you can purchase them and use them as-is. Of course, not always the solution is the model, there's a lot of things that (at least for us) we do, e.g., post-processing, pre-processing. For example, usually, our solutions are not just one single model, but a cascade of models. Anyway, I can envision there will be some markets of pre-trained and ready production grade models, that's one thing that will be there. Also, of course, hardware will get better and better, especially with this breaking news like a two-nanometer scale from IBM. Let's hope that sometimes we will see that also in other fields like GPUs and everything that will add up. And maybe quantum computing will come to the field also available to the general consumers and developers, let's see. And that brings you a few new opportunities, because there are many ideas currently that are collecting dust on the shelf, not because they are not cool and they're not feasible, in fact, they are feasible, but they are not commercially attractive. I can make something working very cool on one computer, but that will be for very specific tasks and very specific niches, for example, military solutions. In that respect, in some cases – yes, hardware is pretty affordable, just an example: for the hardware of the server of one of our solutions, we're talking about not just a general computer but like a decent server with everything included 24/7 operation ready, so the hardware will be something like 50 dollars per camera for the installation, and typically the camera itself costs more than 250. So, it's a kind of a small addition on the camera if you want to install the server and that will be analyzing 24/7 ever since, right. It's not the bottleneck, but then some other solutions are waiting their turn just because commercially they wouldn't be that attractive. Any solution that is based on polls estimation will highly depend on the number of people in the view, and either they will not be available for me, let's say, there's a crowd, or they will require huge computing power.
AP: You mentioned these marketplaces with already pre-trained models that work, it seems that we are very much aligned here OpenCV AI releases OpenCV AI Model Place exactly what you were talking about very soon, so please check it out when there is a public release for this product.
Thank you so much, it was a wonderful talk, a lot of insights here. Thank you for sharing this with our followers!
AG: Thanks a lot, Anna. Let's boost this field and, as I said, let's make the world also a safer place if that is possible with means of technical advantages and development! That's what we are going to do.