ServicesNewsPodcastOur ProcessAbout us
Episode 06
28 OCT 2020 — 69 MINUTES

In this episode, Satya Mallick, OpenCV CEO, is talking with Serge Belongie, professor of computer science at Cornell Tech, whose research interests include computer vision, machine learning, and human-in-the-loop computing. Also, he is a co-founder of several companies including Digital Persona, which has been called “the world’s first mass-market fingerprint identification device”, and Anchovi Labs acquired by Dropbox.

The video and audio versions are available on Youtube, Spotify, SoundCloud, Google Podcasts, Apple Podcasts, and Stitcher

Satya Mallick (SM): Hello everybody! In this episode of AI for Entrepreneurs we will talk to Doctor Serge Belongie. He is a professor of computer vision at Cornell Tech and he has made significant and fundamental contributions to the field of computer vision through his research. In addition, he has started successful AI companies. His first company DigitalPersona created the first mass-market fingerprint recognition system. And his second company Anchovi Labs got acquired by Dropbox. Not only that – he is the founder of Visipedia and as a Professor, he started a rock band.

So, without further ado, let's welcome Dr. Serge Belongie. You're listening to AI for Entrepreneurs and I'm your host, Satya Mallick. 

Welcome to the podcast, Serge. It is such a pleasure to have you with us here. And we have such a long history at UCSD and I'm so glad that you could take the time to be on this podcast.

Serge Belongie (SB): Thanks, Satya. Yeah, it's a privilege to be able to do this.

So, I have a very interesting story from UCSD, I don't know if I've ever told you. So, we were all taking your class at UCSD, and the first two days one of the students – she thought that you were the TA because you were so young, you were I think three or four years older than us. And she thought that you are the TA, and then I have to tell her “No, this is Professor Serge Belongie.” However, for two days she thought that you were the TA. And then another interesting thing was that we were all surprised that you sat down and coded with our other lab mates... 

Yeah, those were the days.

Those were the days. Do you still code?

Not very much. The last time I did was for the Image Processing class. At some point, there were no students left that were using MATLAB, so we just moved everything over to Python. And so, I had to go through all the sort of history of the exercises and switch them to Python.

So, now all the students are using Python, right?

Yeah, it's done.

How I Met Pietro Perona

SM: So, what's the trend these days? Do people use OpenCV with Python, and for Deep Learning people use PyTorch or TensorFlow? What's the trend?

Certainly, on the Deep Learning side people just choose their favorite packages, whether it's TensorFlow-based, or PyTorch-based. But really for the Computer Vision Fundamentals things like camera calibration, the basic image processing, pre-processing routines or color space conversion… any of those nuts and bolts kinds of things that you need for the I guess you'd say the computer vision part of the pipeline, not the machine-learning-focused parts, but especially things like calibration or tracking, it's OpenCV that people are using.

Right, right. So, that's our experience also. As soon as we jump to Deep Learning then we start using PyTorch usually in our company. And there was also this flow from Caffe to TensorFlow to finally PyTorch. So, I wanted to talk a little bit about your background because when you started in Computer Vision back in 1996 as an undergrad the field was almost non-existent.

It was before 1996. Yeah, so I started as an undergrad in 1991, and I met Pietro Perona in 1991. And we started doing work maybe a year later or so.

Interesting. So, Pietro Perona is one of the top researchers in our field. How do you get somebody like Pietro to become your advisor when you're an undergrad? I mean, this is a question that students ask me all the time. And I don't have a very good answer because I didn't have this firsthand experience. 

SB: Yeah, I do have a story about that. So, when I got to Caltech, I briefly considered majoring in Physics or Math. Then I kind of realized that wasn't going to happen and I switched to Electrical Engineering. And there was this one course that people were very afraid of. It was called EE-14B. It was a sort of Solid-State Circuits course. And there was this brand-new Professor that Caltech's EE Department had hired. He had just been a Postdoc at MIT, and before that he got his Ph.D. at Berkeley. And his name was Pietro Perona. So, this guy was assigned to teach W14B with this notorious weed-out course. So, I was out in the hallway outside Bob McCleese's office – he was another Professor of Information Theory at Caltech at the time – and I happened to be standing outside when Pietro got the news that he had to teach this course. And it was not a course that he wanted to teach, and he was very upset about it. And I remember just trembling outside thinking “oh my gosh, this is the person I'm going to be taking this class from.” And sure enough, I took W14A, which is a Linear Circuits course, and then Pietro was our Professor. This was the sophomore year that was between 1992 and 1993. So, I was taking this Circuits course with him and it was awful. It was a really bad class, except there was a point in the class where we were talking about filters. And it was a stretch, but you could see that during that part of the course Pietro just lit up and he was so interested in talking about filtering. I thought “oh, okay, this seems to be what he's interested in.” And I remember dropping by his office hours, and he confided in me that he wasn't that interested in the core material of the course, but it was just something that he had to teach. But was he interested in talking about filters and that led to him introducing me to a very interesting paper on image filtering called Steerable Filters which is a famous paper by Bill Freeman and Ted Adelson. And in some sense, the rest is history because when he gave me that paper, and it just changed everything. 

Actually, that's also a paper that you had recommended in the class, and it was one of the papers that you just read and get really impressed: wow, this is really cool! I also had that experience when I read that paper. So, what happens next? How do you convince Pietro Perona to do research with you? Or how do you join his group?

It was indirect because there was a Postdoc named Hayit Greenspan who was working with another Professor named Rod Goodman. That was at Caltech at the time. And Hayit was already working with Pietro and Rod on some steerable-filter-related projects. And Caltech has an undergrad research program called SURF (Summer Undergraduate Research Fellowship), and I just applied for that. In those days there weren't these incredible industry internships like people have today at Google and Facebook, and so on. So, it was quite appealing to be able to do a Summer Research project right there at Caltech. So, I did that in Rod Goodman's lab. And it was through that project where we were working on efficient implementation of steerable filters that I got to know Pietro basically as a second advisor for that project. And then he just opened the door to his lab and that was when I met people like Stefano Soatto, and Jean Ives Bouguet, and Roberto Manduchi, and so on. Thomas Leung was there; he was also an undergrad that was one year ahead of me. No doubt, it was a formative experience in terms of understanding how as an undergrad there was something that I could bring to the table. But it was also very scary because these people were extremely smart and accomplished.

That's really interesting because when I see your work – it is spread over so many collaborators. It is just the who's who of Computer Vision and you have worked with all of them. It's almost like Paul Erdős who used to work with so many mathematicians. It's just incredible! Across so many different problem domains as well as people that you've worked with. 

Digital Persona and First Fingerprint Recognition System

SM: So, tell us a little bit about DigitalPersona. During your undergrad, you did start a DigitalPersona which is a company that made the first fingerprint recognition system that was widely adopted. Could you walk us through what the process was like? How did you come up with the idea? And fingerprint recognition is extremely hard. It's not an easy problem: there's the hardware component, and then there is the software component. Even getting the hardware right is not trivial. How did you go about working on that?

In 1994 Pietro Perona and Yaser Abu-Mostafa – another Professor of Pattern Recognition at Caltech – jointly offered a project course either for upper-division undergrads or for grad students. It was just called Projects and Pattern Recognition. So, Vance Bjorn and I – Vance was another undergrad in Computational Neuroscience – so we took the class and proposed a project on fingerprint recognition using wavelets. So, at the time wavelets were as big as Deep Learning is right now. We just wanted to use wavelets for something. And so, we proposed a project using discrete wavelet transforms to do fingerprint recognition which was actually a bad idea, at least the way we were doing it. And then Pietro and Yaser let us take the course and then in the first couple of weeks we realized our approach didn't work, and we transitioned to some other approaches. And we came up with something that worked reasonably well. And we were working with a third-party sensor from a company called Identix. It was as big as a Kleenex box and it probably cost a thousand dollars. And at the end of that class, Pietro just pulled us aside and he said “if you're interested in commercializing this, I happen to know some of the founders of Logitech.” And that at the time was a Switzerland-based company with an office in the San Francisco Bay Area. And so, Vance and I said “sure, make the introductions.” And we met Pierluigi Zappacosta who was one of the founders of Logitech. And this was a time where not that many people were spinning off startup companies, especially at Caltech which was pretty focused on Core Math and Science. So, we got to know Pierluigi and explored various ways of starting a company. And Pietro and Yaser became some of our advisors for that startup company. 

Well, how did you go about it? Did you take VC funding?

Pietro made the introduction to the founders of Logitech. We went through some sort of false starts before connecting with the Logitech team. We tried to start it on our own, we didn't really know what we were doing. And with Pietro and Pierluigi's help, we were able to kind of get back on track and then raise VC funding to start this company. One of the things that really helped us – and I was just mentioning how at the time there weren't that many spin-off companies that were happening, particularly at Caltech – during the graduation address Gordon Moore, who was our chairman of the board of trustees, was speaking before all the degrees were handed out, and he shocked us by mentioning Vance and myself as examples of entrepreneurship that he was hoping that Caltech would embrace in coming years.

Wow! That’s impressive!

And it was really one of those imposter syndrome moments because we hadn't succeeded. We were sort of trying to do this thing, and we weren't sure what we were doing. But it was kind of this out-of-body experience to see Gordon Moore talking about this. And he actually helped in many ways down the road in terms of raising additional funds and making introductions. And that was all just through that Caltech network. Vance and I, both assumed that we'd be going on to grad school to get PhDs, and I ended up going to Berkeley to work with Jitendra Malik, and Vance went to MIT to work with Tommy Poggio. Vance was flying across the country probably multiple times per month because you pretty much had to be in the Bay Area to do this kind of thing in the mid-90s, and in my case, I was just crossing the San Mateo or Dumbarton bridge on a regular basis, and I realized that my heart was really in academia. In any given semester I would go to a conference one month, and then a tradeshow the next month. In tradeshows, I had to wear a tie, and I didn't know how to tie a tie. I had to talk to business people and investors, and I was so out of my element. But at CVPR – when it was in San Francisco in 1996 – it just felt like the right environment for me.

However, Vance was so at home, so at ease, and so effective in entrepreneurial circles that he ended up going on leave from MIT on very good terms. Tommy Poggio became one of the advisors of DigitalPersona, and Vance really came into his own as a leader, as a CTO at DigitalPersona setting up shop in Redwood City. And so, I worked closely with DigitalPersona until I graduated. Actually, I did a little bit of Postdoc at Berkeley after graduating and then eventually found through Professor Jane at Michigan State we hired someone to replace me as the Chief Researcher at DigitalPersona. And that was when I ended up going to UC San Diego.

That's cool. So, how big was the team before it was acquired, I think in 2012 or 2013?

By that point, it was probably 80 or 90 people, with offices in many places around the world. And at that point I remember going to Brazil for ICCV 2007, I think, and some of my students were just so excited to see DigitalPersona sensors in random stores. And there was one point where someone pulled me into the store and said “this guy invented that” and the people at the store looked at me like “yeah, right.” 

There's a similar story about Feynman. Feynman had these diagrams painted on his vehicle. And then somebody said “you can't do that, that's Feynman’s diagram”, and he answered, “I am Richard Feynman.”

Right. “I'm literally the guy.”

I was just crossing the San Mateo or Dumbarton bridge on a regular basis, and I realized that my heart was really in academia. In any given semester I would go to a conference one month, and then a tradeshow the next month. In tradeshows, I had to wear a tie, and I didn't know how to tie a tie. I had to talk to business people and investors, and I was so out of my element.

SM: So, when you were an undergrad, these questions about which hardware to use – those kinds of things – did you get help from Pietro and other people, or is it something just trial and error? The hardware part – how did you figure that out?

SB: There were a couple of other professors that were involved on the hardware side from Demetri Psaltis group. He had a postdoc named Fai H. Mok. So, Fai and Demetri both were experts in holography. And they taught us a lot about optics and FTIR sensing and basically the guts of what would become the DigitalPersona fingerprint sensor. And Vance started to take some Fourier optics courses and get up to speed on that. But my role was very focused on the software part of matching fingerprints.

UCSD: Research Interests and Collaborations

SM: Right. Now, after that, you came to UCSD, and then – as I said that you have worked on a very broad range of problems. Could you just pick and choose some of them to explain to our audience? The various things that you have worked on. And also give us a sense of where the field is moving, so that people who are starting out can choose what are the good problems to work on.

Yeah. So, I joined UCSD as an assistant professor in Computer Science in 2001. 

Thank god! There's a personal story but I'm not going into it.

Sure. Now I'm curious. And my whole thesis was focused on shape matching. So, for a few years in grad school, I worked on normalized cut-based image segmentation together with Jianbo Shi and Thomas Leung, and a few other students. But my thesis itself was on something called shape context and just generally this problem of shape matching. And during the time I was in grad school, I was interested in Multiview Geometry, but Jitendra told me not to get tempted, not to fall for it, saying it is beautiful math, but it's done, it's solved. Hartley has proved whatever it is that you need to know; they wrote the book, it's done. And occasionally, he'd see a Multiview Geometry book on my desk and say “oh, what are you doing?”. And so, this made me so interested in Multiview geometry but it was completely different than what I was working on, which was really more like machine learning applied to images. And so, when I got to UCSD I thought well now Jitendra’s not here anymore, so if I want to learn Multiview geometry, then it's just my decision. And so one of the first students that I took on in my group, Sameer Agarwal, who's now at Google – he had been a math undergrad and master's student before this, and he was looking for an advisor and I was looking for students – so we just started brainstorming about what courses I could teach; and I thought “well, one way to make sure I learned this topic is to teach a course on Multiview geometry; so it's going to force me to learn it.” And Sameer thought “All right, I'll help.” And then another student, Josh Wills, joined the group and was helpful. Every day collectively we would go through a different section of Hartley and Zisserman or other books on Multiview geometry and basically figure out the material the day before the lecture, put together homework assignments, and things like that. And it was pretty terrifying because I had to be one step ahead, I had to anticipate questions. And eventually, I had people like you in the class, and like Manmohan Chandraker who I might as well have just handed the chalk to give the lecture. But I think that I just wanted to learn this thing. That was sort of forbidden. And Sameer and Manmohan really picked up on that and have had quite distinguished careers.

And years later Jitendra chuckled at that – that it was sort of something that skipped a generation. However, it just seemed like I liked how Computer Vision in principle is a balanced discipline with Geometry, Photometry, video, still images, human perception. Before the days of Deep Learning if you looked at a Computer Vision conference the topics were diverse. And I basically liked all of them. And the conferences were small and nothing worked anyway, it was all just sort of these anecdotal experiments. So, in those days where it was just three of us – Sameer, Josh and me – we nicknamed the group SO(3) (it was named after the special orthogonal group of three by three matrices, which basically is defined by these three orthogonal basis vectors). 

SM: So, were you the three?

SB: Yeah, we were the three. So, it was just three completely different directions, but somehow, we needed a name that would indicate that we were moving forward together.

That was very interesting because I always thought that when you came to UCSD your interest – if you looked at your bio – it showed that your interests lay in shape matching and that kind of thing. So, choosing a group that said SO(3) was very confusing, but now I get the context. The other thing I wanted to mention to our audience is that shape matching that you mentioned – it's something that won the Helmholtz prize, which is given to one of the most influential papers in 10 years in Computer Vision. So, it's a big deal. And also, the people mentioned Sameer Agarwal. He wrote the Ceres Solver, which is one of the most popular solvers out there. You should check it out. Very well written, very well documented. And Manmohan Chandraker won the Best Paper Award at ICCV as a student. So, these are really great guys in Computer Vision and they were all my lab mates and it was such a pleasure to work with all of them. And when I mentioned that “thank god” you came to UCSD. So, at that time I was not 100% sure I would do my Ph.D., and I took your class. I mean I wanted to work with you, be your student. So, the whole point of taking your class was to score an A+. That's the only course at UCSD where I scored an A+. 

Oh, wow, and you got it?

Yeah, I did. I got it and Sameer Agarwal was the other one, right.

Wow, okay.

And then I go to you and you say that “oh, unfortunately, I don't have funding right now.”

Right, I was broke. I should have just stuck with my core research area and I would have gotten funding, but I didn't.

SM: You had two students at that time, you were starting out. But fortunately, you introduced me to David Kriegman who had funding, and it worked out very well. So, thank you so much for that.

Of course.

It was just a pleasure working with a professor who can code.

SB: Yeah, that's great to hear that. 

But now your research has moved on. There are a few different threads there. You worked on segmentation afterward, and then the new Deep Learning stuff. There's a lot of work there. So, where's the field moving in general? Of course, Deep Learning is big but if you were a new researcher or a young researcher what are the fields that you would work on now?

Yeah. I think I'm pretty excited about augmented reality. So, Computer Vision with augmented reality applications, Embodied Computer Vision.

SM: What is Embodied Computer Vision?

Think of it as for example always-on wearable cameras or agents that have cameras or at some point let's say Apple comes out with augmented reality glasses or, as you know, there's Magic Leap, there's HoloLens. Right now, those technologies are quite bulky and expensive, but at some point, I tell my students they will get refined. I mean it's one of those things that's always five years away which is convenient if you're a Ph.D. advisor. But I think the technology will get there. And then what I tell prospective students is that right now we're still in the throes of dataset AI. So, a lot of what we do, particularly in my group, in the past 10 years has been about putting together and curating datasets, and lobbying to get other people in the community to use them, and setting up benchmarks, and trying to do the best possible job you can. You publish this dataset, and it sits on a shelf and it collects dust and everybody uses it. And it's very helpful in terms of apples to apple comparisons, but it does have a tendency to get people to work on corner cases and chase tiny performance gains, whereas if you look at this from an embodied perspective, for example, in simulation environments like Facebook habitat, or a Gibson environment, or AI Thor and things like that, if you're able to move around, some of those corner cases don't matter anymore. 

SM: Interesting.

In other words, you could just say if you have, say, a mosquito moving around, it just doesn't have that many neurons, to begin with. But if it just moves a little bit, it might get a better view, in which case a very low capacity model would be able to recognize something more sophisticated. So, I mean this isn't new at all. I mean people have talked about the active vision for decades. But I don't know when but I think that the dataset AI era will kind of come to a close, and people will move toward rich simulation environments. So, instead of publishing datasets and code with your papers, you would publish a simulation environment that may be hybrid with some real data and simulated and code. And in theory, you could pull out some multi-terabyte dataset snapshot from that simulation environment. But I think it will be a kind of generational change where some group of new students comes in and looks down on the students that insist on downloading these terabytes to the box under their desk to do these experiments. It's just a new perspective. It's just the data is always coming.

SM: Right, right.

And I think that's what will happen in the Computer Vision for augmented reality context, but it's very hard to let go of that era, that was kind of started with MNIST digits progressed through ImageNet and so on. It's all just these big versions of ImageNet, but that's the dataset AI paradigm, and I think that students should be very wary of digging deep into that and for fear that they will just kind of get lost in the shuffle.

That's great. So, you're saying that there are companies or organizations like CARLA which are doing car simulation. They are creating virtual environments for cars and that's where things will move. You will still need real data to augment it and things like that. But most of the work would be done in this virtual synthetic dataset setting.

SB: Yeah, I don't know if I can say “most”. But just like you were saying I think that virtual environments like CARLA will provide the substrate for infinitely replenishable simulation experiences, and you can put real data into that. What the ideal mix is between synthetic and real I don't know. I think it's just going to be some hybrid.

But the funny part is that you have made huge contributions even in the dataset generation, right?

Yeah, guilty as charged.

COCO Dataset, ImageNet, and Visipedia: A Brief Overview

SM: So, you were one of the people who started the MS COCO dataset and all the challenges associated with it. Was the goal to build on top of ImageNet or go through the same line of reasoning as ImageNet and build something better? 

A lot of what I did in my career at that point was based on not liking ImageNet, which is kind of a funny way to proceed.

Because 40% of it are dogs or something like that?

SB: I don't know. It just seemed like the main appeal of ImageNet was that it was big. And I'm not going to use the word but there's an abbreviation called BDD. It's Big something Data. And I won't say the middle word. But the main thing was it was big. And there were a lot of people doing those kinds of projects at the time: write a script, collect as many images as you can, get some data, label it, and release, put it out there. It's big, you know. And I got it. I understood like big data was happening. But I looked through it, and it was just so messy and haphazard, and the question was as follows: who were the stakeholders? Who was supposed to be using this thing? But it was sort of this unstoppable train, and I remember Fei Fei handing out pens at ICCV 2009 that had ImageNet on them. And Fei Fei and I overlapped it at Caltech. I don't remember if we overlapped… I think we overlapped when I did my sabbatical. But in any case, we're academic siblings. And I remember teasing her about these pens but also thinking “wow” like “this thing… this is unstoppable.” I mean, everyone I see is using an ImageNet pen and this is even before Deep Learning happened.


But I went back to the drawing board and said “why don't I like ImageNet? What is it that's bothering me about it?” And so, with respect to COCO, I had been talking to Piotr Dollar and Larry Zitnick, who were at Microsoft at the time, and one of the things that we agreed we didn't like in ImageNet was that the images were too iconic: they had this strong compositional bias. And, for example, an image in COCO that has a dog, it's more like it just happens to have a dog – the dog is photobombing more or less – as opposed to a canonical Instagram-type image of a dog where the average position would be in the center. So, a lot of our thinking around, the COCO consortium came together with lots of universities contributing ideas; and we wanted to have at least two objects that were in the image and make it as natural as possible. And of course, no matter what dataset you build there's bias in it. And in that respect, COCO fixes some things and not others. It's a never-ending process. But that was born from wanting to correct some things that we didn't like from ImageNet. 

SM: So, making the images more natural instead of making them almost like Instagram pictures where the dominant thing that you're looking for is right in front and not a natural setting – so that was the main idea. And the scale was actually the big part you did keep. The scale is huge of MS COCO.

Yeah. I mean the person that made COCO happen was Tsung-Yi or T.Y. Lin. So, he was a Ph.D. student at the time and Piotr just took a big gamble on him because in his first year he was willing to take him on as a summer intern. And at the time I think that was uncommon at Microsoft research to do that. But T.Y. was so motivated and we nicknamed him “the great master with determination” (it was a play on words for what his Chinese name means). So, he was a great master with determination or the GMD and these COCO consortium meetings felt like a dozen professors and Tsung-Yi. And basically, he always said yes to everything we wanted. T.Y. just carried this enormous burden and then there was this big movement of researchers from Microsoft research to what became FAIR – Facebook AI Research. And they brought in T.Y. again and Facebook really helped a lot with advancing the cause of COCO.

SM: That's very interesting! Could you also tell us a little bit about Visipedia, the project that we started again with Pietro Perona? What is it and what is the state right now?

Yeah. So, Visipedia is a project that Pietro and I started during my first sabbatical – so it was around 2008. I had just gotten tenure at UC San Diego. I basically wanted to be a professor since I was a little kid, so, clearly by the time I met Pietro he knew that I wanted to be a professor. So, when it finally happened he said “someday, when you get tenure, you should do a sabbatical at Caltech.” But both Pietro and Jitendra said, “you shouldn't work with your former advisors before that, you have to do your own thing; and then once you get tenure then not a problem, you can work with us again.” So, and I held to that except maybe one paper we didn't publish together. So, I got the sabbatical, I went to Caltech, and just by some bad timing and poor coordination, it turned out Pietro had also scheduled a sabbatical in Italy. So, we barely overlapped but we overlapped enough. And I already knew lots of students in the lab and I think that's when… I can't remember when I overlapped with Fei Fei but certainly, there was a period where we were both there. So, I got to his lab. We had maybe a month of overlap and then we formulated this idea of fine-grained visual categorization or subordinate categorization. So, as I was saying before, I had these problems with how ImageNet was coming together and it didn't seem to have a user. Like who were the stakeholders? Like I believe in this principle of nothing about us without us. So, I wanted to know who's the us in ImageNet. It's certainly cool to take a dataset like that, train a model, and show that you can recognize a bicycle; but if you show that to a non-technical person they're really not impressed, check out this demo, you take your phone – it takes you like a minute – you point it at the bike and at the end it says bicycle. Like I knew that already. Tell me something I don't know. So, I really was interested in this idea of a fine-grained or subordinate categorization. And human computation was becoming big at the time, this idea of crowdsourcing and Mechanical Turk. And by coincidence, Pietro was really interested in the same thing, and so we just brainstormed a bunch about what topics would be good to just kick off this Visipedia project. And we looked at things like making a model of cars, airplanes, coins, stamps, sides of beef… I mean, strange categories of things. But what was common among all of them is that there were these communities of knowledge behind them. There were human communities who would actually care or have strong opinions about the curation of the dataset and the performance of models trained on it. And it became addictive because it was so fun to meet with people and see the passion that they have for these topics. And it took work to sort of fight that Computer Science instinct, of sort of shutting out the world sitting at a laptop writing a script. And someone says “all right, you want a bird recognizer; fine, I write a script, I get the birds, I get the species names, I train it – done.” And then you sort of produce this mutant Frankenstein weird thing and you show it to that cousin of yours that loves birding and they look at it having no idea what this, asking “did an alien train this thing?” So, I just thought how fun it would be to work on Computer Vision and Machine Learning but also have these human communities of knowledge involved. And so, we ended up settling on birds for a variety of reasons. Partly, because of having connections to the lab of ornithology at Cornell, but also the birding community is just enormously enthusiastic. And so, it was just really fun to work with them.

A Professor and A Musician: How to Double-Job 

SM: That's great. And is it the same sabbatical where you guys started a band and created a music album?

Yeah! That's the other SO3, without parentheses. So, yeah. It's a little-known fact that I pursued a musical career at the same time. So, my friend Mike and I decided to start a band a few years before that. And it's sort of parallel to how Pietro said “someday, when you become a professor, if you get tenure, you should come to visit my lab.” So, Mike, the guitarist in my band, said “if you get sabbatical…” and he was working as a software engineer at the time. He said, “why don't we do a sabbatical and record an album, go to L. A., and learn recording engineering, and get to know people in studios, and write songs, and do it?” And we did. We moved to L. A. 

SM: Yes, I know.

And being at Caltech was like the day job and they have a really nice jam room and a recording studio. And like 20 years before I had been an undergrad and my music teacher, Bill Bing, was still there. And he remembered me. That's how small Caltech is. And he gave me a key to the jam room and let me use the studio. And we played some of our recordings to him, and he's polite, he's an extremely well-trained musician. And we were doing sloppy rock music but he was happy to see an alum come back. But yeah, that was a really interesting period of kind of trying to make it in music and then also work on Visipedia at the same time. 

So, I remember you complaining because you had these two personas, right? You're the professor who's well respected and then you are the musician who is not well respected. 

SB: Who gets no respect.

SM: And you were… I remember over lunch or something you were complaining about these two personalities you have. One which has so much respect and the other which is not getting enough respect.

SB: Right. No, it gave me enormous appreciation for the struggle that artists face every day. Their work is so undervalued. We would practice for hours and hours every week, and then get a gig at some dive bar in Pacific Beach. And there were three of us in the band at the time, and we play for like three hours and get a couple of drink tickets. And maybe at the end of the night, he hands out 120 bucks to the three of us. And because I was wearing that different hat of being in a band like “wow, that's a lot of money!” And then we go to Denny's and basically blow half of it. But then yeah, it's just extremely hard to make a living in that world, but it was also some of the best years of my life. Just a really different experience.

SM: Yeah. I think I bought the music album when it came out.

SB: Yeah, in fact, I have one right here.

The Opportunities of LDV Vision Summit

SM: Yes, that’s funny. So, now you are also very closely associated with the Entrepreneurship Community at Cornell. Can you tell us a little bit about LDV Vision Summit and what are the competitions there for entrepreneurs? What is it all about?

SB: Yeah. So, first a little bit about how I came to join Cornell Tech. So, I was quite happy at UC San Diego. There was nothing wrong with it. Nothing drove me away. It was a fantastic Computer Science Department because of students like you and colleagues like David Kriegman. I was visiting a friend in New York and Alyosha Efros, my former office mate when I was at Berkeley, so Alyosha was also a Ph.D. student in Jitendra's group, so he told me that he bumped into Dan Huttenlocher during a previous trip to New York City. And that Dan told him they were creating this new campus for Cornell University in Manhattan that would come to be called Cornell Tech – at the time it was just a proposal – and that Dan actually mentioned to him that they were looking for people with a Serge Belongie type of combination of academics, and entrepreneurship. And in my case, I was in San Diego with a startup company in Silicon Valley and I subsequently had several students that spun off companies. And so, I was a frequent visitor to Silicon Valley, but I like to keep my distance from it because there are aspects of Silicon Valley that I didn't like at the time. And that aspect has become gargantuan by now, but that's another topic. So, Dan Huttenlocher mentioned this to Alyosha. He's like “I want this Serge Belongie type profile and Alyosha said “well, why don't you talk to Serge Belongie?” And Dan said “oh, there's no way he would leave California; I mean he's in San Diego, like, why would he come to New York?” But Alyosha mentioned it and I think I had another trip to New York City not that long after that and I just met Dan for a beer in Manhattan somewhere. And he told me “this is what we're planning to do, we want to create this Cornell Tech campus, it would be graduate only focused on external engagement, and deep tech, and fundamental research, and if we get it” – it was an extensive bidding process so it was far from certain that Cornell would get this – but he said “if we get it, why don't you come out for six months and help us get it started?” and I thought “wow”. I mean it would be a risk for my band, which by that point it was clear that we weren't going to breakthrough. Why not get a chance to live in New York City? So, I did. And it turned out I just loved what Cornell Tech was becoming. I mean it was the so-called beta semester, it was the first half of 2013. And it was just like a handful of us in a temporary space in the Google Building in Chelsea, just kind of dreaming up what it means to put entrepreneurism and academia together. And then I met my future wife during that trip as well and it just became clear that it was time to move. And that's what brought me to the East Coast. And so, also during that preliminary visit – it was a six-month visiting faculty position – that was when I started giving talks at local meet-ups. So, I don't even know if meet-up is still a thing but…

SM: Yeah, it was still a thing before Covid.

SB: So, I did a meet-up talk on Visipedia talking about bird recognition. And there was someone that attended it named Karen Moon who became the founder of Trendalytics which was applying Computer Vision to fashion. And she came up after the talk and she said “you need to meet Evan Nisselson”, like, “here's his number, here's his email, call this guy, you guys should talk.” And so, I talked to him. And Evan is the one who started LDV Capital which was just getting started at the time – so this is around 2013, 2014. 

SM: And LDV is a VC firm, right?

SB: Yeah. It's a venture capital firm focused on Visual Technology. And so, Evan had a series of networking dinners and they were bringing together people who did research or worked in Technology involving Computer Vision, as well as business and product-type people. And a typical dinner would have 50 people.

SM: Amazing!

SB: So, he would rent out a space and he'd subsidize it with contributions from partners of LDV. And it was always 50% women / 50% men, and really balance between this business and product part and the technology part. So, I attended one of those dinners, and then we had another coffee chat and he said “I want to do a summit, I want to capture the spirit of what's happening at these dinners.” And I mean this seems so quaint in retrospect but at the time there wasn't that much going on in Computer Vision, and Machine Learning, and Venture Capital in New York City. Obviously in California that was already happening but Evan said “I want to do a summit with fireside chats and keynotes and entrepreneurial competitions and stuff; let's do this.” And I said “that's great” because that's actually part of my job description at Cornell Tech, this external engagement. So, we actually put together the first LDV Vision Summit and I begged everyone I knew – I knew Rob Fergus and Yann LeCun and a few other people at NYU – and I just begged everyone I could to participate in it in some way. And maybe 300 people were there or something. And then Evan and I were just completely worn out after that was done. And then one day passed and then we said we should do this again. And then, we did the second and we've been doing it every year since until Covid hit.

SM: How big is the Conference now? How many people usually visit the Conference?

SB: That's a good question. I think it's something like 800.

SM: Okay, quite big.

SB: And it's in its steady state. It's a two-day Summit. One day focused on business and product and the other on Deep Tech. 

SM: And there are some challenges also, right? There are there competitions for entrepreneurs? What's the structure of it? Like, do they have to apply?

SB: Yeah, we call these The Entrepreneurial Computer Vision Challenges. So, a typical LDB Vision Summit has a general startup challenge which you'll see at a lot of VC firms that have that type of general startup pitch competition. But what Evan and I created that, I think, was unique, was something like the intersection of a Computer Vision conference with a VC roundtable meeting kind of thing. And so, the ECVC or the Entrepreneurial Computer Vision Challenge… the idea was that Ph.D. students or faculty who have an interesting idea that's been published or maybe not even published, and they have some vague sense that it could become the nucleus of a company. They don't even need to have a business plan. They just need some kind of aspirations that that technology could make it into the real world with applications for, let's say, Radiology Computer Vision, for Diagnosing Fractures or something applied to remote sensing. And so, it was really nice because we could build bridges to the academic community and say “don't worry about having a big pitch, just bring yourself, bring your idea, and explain how it could be useful.” And that was really fun because I felt like I was really in my element, and Evan was in his element with the other part. And it was an excuse to invite some pretty illustrious judges to serve on a panel.

SM: So, how many companies usually participate in these competitions? And do they get exposure to VCs? Do they get funded after these?

SB: Yeah. I mean Evan has a whole workflow for how the nominees are processed. And so, you might see something like 30 to 40 people on a shortlist – I don't know how many it starts out with because at this point Evan actually has a team that can vet some of the early stages of the proposals – but you'll have something like 30 or 40, and Evan and myself and volunteer students like T. Y. I mentioned before, from the COCO project, even though he was a mainstream Ph.D. student, he still saw this interesting opportunity to connect with the external community, so he volunteered as a judge. And Oscar Beijbom volunteered as a judge, and so on. And so, we would all do some kind of vetting and then some subset of those would actually come into a law firm that works closely with LDV Capital, and then do a live pitch and get lots of coaching, mainly from Evan but also from me on the technical side. And then we just do a bunch of voting and back and forth. And eventually, three or four teams would compete for the ECBC startup competition, maybe is more than that, but that's the basic flow.

SM: Interesting. So, obviously, if you're listening to this podcast and interested in some of these competitions, you're a new startup that wants to try out new ideas or not even a startup you can have an idea, you should go check out LDV Vision Summit. All the information is out there. I have already exceeded the time that you had given me. So, there are a few other things that you have done: Anchovi Labs which got acquired by Dropbox, and now you're also involved with a company called Headroom which is a Zoom competitor and raised $5 Million. So, that's a lot of startup experience for somebody who says that their main goal is to be a professor, right?

SB: Right.

Hints and Tips For Aspiring AI Start-ups

SM: So, I would like to wrap it up by asking you what advice would you have for people who are wanting to start AI startups right now. It's obviously very hot but what are the areas that they could work on? Or what is the way to go about it?

SB: Yeah, it's hard to answer that question without thinking about the pandemic. I mean there's so much happening right now that's influenced by Covid: everybody having to work from home, people have the reluctance to go to routine doctor appointments, elderly people or immuno-compromised people are hesitant to go outside, and it's just been influencing so much of my thinking about where people should direct their creative energy. And maybe that's wrong-headed, I mean maybe the vaccine's going to roll out and everything just goes back to normal, it's possible. But I think that there's a huge learning opportunity right now, where everybody's home most of the time working on Zoom. And we should not miss this opportunity to make note of what's working better and what could be done better, what are the pain points. In the case of Headroom… I think all of us are using Zoom or Microsoft teams or Google meets all day. I mean we're sitting in front of our computers having meetings all the time. And my kids, my poor kids… they're age three and five and they're sitting in front of Zoom, trying to have some kind of preschool and kindergarten experience. So, these are really hard problems. They're combinations of technology and social science and humanities, and they require a deep understanding of human nature. So, I think first we have to realize that there are no simple technology solutions to most of these problems. I mean these are fundamentally human problems that require extensive engagement with the stakeholders at every stage, like, the teachers, the essential workers, doctors and so on that are trying to do telemedicine. So, I think even though I know that I'm perhaps overly influenced by the Covid paradigm or the Covid era that we're in, I think it is an opportunity not to be missed. I mean think about what could be done better, what are the things that can connect people more and prevent feelings of isolation and loneliness to improve the ability for doctors to diagnose patients for someone undergoing stroke rehabilitation. What technology could they have in their home that reduces the need for them to go into the clinic? So, I think it's a rare opportunity for us to have this really focused attention on the work-from-home paradigm and come up with some really neat ideas to reduce suffering and improve human connections.

SM: Right, right. I also… because you're a professor I'm very curious. Now a lot of things have moved online. I have this idea about online courses. So, right now, when universities give online courses, it's almost like recording a lecture. And the professor is there but, it has all the bad features of online and none of the good features of in-person. 

SB: Absolutely.

SM: So, one of the things that I've been thinking about, and I have not really done anything about it is the future of online courses. It's going to be almost like movie production, right? Because the crowd is so big, there will be millions of dollars spent and it would be a real production as if you're creating a movie or a game; and that would be one course. Do you have any thoughts about that? Is our university also thinking in those lines to fundamentally rethink the course experience, where it's not just a professor lecturing but more like a video game or a movie, where there is a whole production, animations, and things like that?

SB: So, I think you're right. For courses like Calculus or Introduction to Organic Chemistry, these are courses that probably first appeared on the massive open online course platforms. So, I think certain people are just brilliant at teaching Calculus and it's unlikely that any given Community College is going to have someone that comes close to that person's ability. And so, it does make sense to make a high production movie-type experience as you described for something like Multivariable Calculus. I think as we were describing this kind of Covid-era technology inspiration, students still crave that human connection. So, I think somehow it seems okay to me to have that sort of blockbuster vibe around a big class like Calculus because chances are a class like Calculus is in an auditorium anyway right. So, there isn't a whole lot of back and forth with the instructor.

SM: Right.

SB: What I think is much more likely for upper-division courses. So, I think you're right. The zoomification of an already dreary lecture-hall experience is not making anybody happy. But where we can learn a thing or two is from twitch streamers and YouTubers. I mean I'm learning from my five-year-old son who watches live streaming of Minecraft. And so, yes, to me it's just like overstimulating. I mean there's the YouTuber with their little face in the corner and they're just sort of hyper all the time, and every time they make a point their face gets a little bigger, and they have the green screen, and the special microphone. And I wouldn't say it's a high production, but it really keeps your attention and it's a sort of bite-sized chunks. I'm not advocating twitch streaming for Machine Learning or something, but I do think that the new generation will be using things like OBS (Open Broadcasting Software). Instructors like Harold Haroldson who's the Director of the Extended Reality Collaboratory at Cornell Tech, Sasha Rush a professor at Cornell Tech and NLP… they're both using OBS in their courses. And so, the students are getting something richer than Zoom. And it feels engaging. I think once that starts to happen, then the previous generation of professors are going to realize “I need to do this or else I'm going to lose my students.” But the pioneers like Harold and Sasha who in turn were learning from the guy who did coding-train out of NYU that you make your rig at home with the proper lighting and the green screen and the desk setup up, you invest some time in that but then you can compare notes with other people and figure out the right way to do it and make an engaging experience. So, I think you're right that the sort of high production blockbuster thing can be done for those big auditorium-style courses, and then I think we're going to see a lot more of that live streaming sort of narration-type thing for the smaller courses.

SM: Yeah, my kids also follow these YouTube celebrities. They are very nice. I mean there is one called Backyard Scientist, they blow up things. And they keep it interesting but kids also learn a lot about stuff in general. So, I do agree that's a very important channel. One last question. Where can people reach you if they have any questions about maybe LDV Vision Summit or anything else? Where can you be reached at?

SB: Yeah. So, my group's website at Cornell Tech is it's called SE3. So, the lab in UC San Diego was SO(3) and then when that group translated to the East Coast it became SE(3) (Special Euclidean Group). So, if you just google SO(3) Cornell you'll find my group's website and you can see how to contact me. And then LDV is And you can look at all the team members there. So, right now due to the pandemic there's no Summit, unfortunately, coming together, but there are still quite a few resources there. And of course, I'm always happy to make referrals to people that are in Computer Vision that kind of want to test the waters of entrepreneurship.

SM:  Thank you. It was such a pleasure to talk to you, Serge. You have been hugely influential in my life. Maybe I've never told you in person but yes, that's true.

SB: No, I didn't know.

SM:  And thank you so much for being on the show. I really appreciate it. Thanks for your time.

Great. Thanks a lot.

✨ Thank you! Your submission has been received!
Oops! Something went wrong!
HomePodcastNewsGet in Touch