E112: Adam Coates, Director of Baidu Silicon Valley AI Lab – Interview

May 24, 2017

https://www.linkedin.com/in/apcoates/

I was lucky enough to interview Adam Coates. Adam is the director of Baidu’s Silicon Valley AI Lab. Their lab’s first big focus is Deep Speech, an end-to-end deep learning system for speech recognition.

Here’s a voice first texting/typing app created by Adam and his team. It’s great.

Talk type: https://play.google.com/store/apps/details?id=com.baidu.research.talktype&hl=en

Of course Baidu is based in China and is one of the largest Internet companies in the world.

Adam has quite an AI background. He spent the years 2000-2012 at Stanford where he ended up with his bachelors, masters and PhD. His research has been around computer vision for autonomous cars, deep learning for speech recognition, large scale deep learning, computer vision for robotics and machine learning for helicopter aerobatics, which we get to hear about.

In 2015, Adam was names as one MIT’s 35 innovators under 35.

Here are some other things we talk about:

-How can you train a helicopter to fly autonomously? What are the challenges?
-What kind of hardware is best for machine learning?
-What is one of the biggest problems that needs to be solved to accelerate adoption of AI?
-How should companies start thinking about AI when incorporating it into their products?
-How do you train speech recognition models without labeled data?
 
 

Transcript

Dave Kruse: Hey everyone. Welcome to another episode of Flyover Labs and today we get to talk to Adam Coates. Adam is the Director of Baidu’s Silicon Valley AI Lab, and of course Baidu based in China is one of the largest internet companies in the world. As you can guess Adam has quite a background in Artificial Intelligence. He spent the years 2000 to 2012 at Stanford where we ended up with his Bachelor’s, Master’s and Ph.D. and his research has been around computer vision for autonomous cars, deep learning for speech, a large scale deep learning in general and computer vision for robotics and machine learning for helicopter aerobatics, which we’re going to have to probably hear about that. Yeah, and in 2015 Adam was named as one of MITs 35, Innovators Under 35. So I’m really glad that Adam came on the show. I’m curious to see what he’s up to now and what he is excited about. So Adam…

Adam Coates: Thanks for having me Dave.

Dave Kruse: Yeah, definitely. I really appreciate you taking the time to chat. And so, yeah as I mentioned, it will be fun to hear a little bit about your background. I’m kind of curious; you spent a lot of time at Stanford. Like you must have been the Stanford master by the time you left. But I was curious, what kind of projects you worked on and yeah.

Adam Coates: Well when I first started school, I kind of intended to get my bachelors degree and then go off and start working. But sort of the thing what happened, it started into remote controlled helicopters. I had learned to fly them in high school as just a hobby and one of the professors at Stanford convinced me to take an introductory Artificial Intelligence course and it so happened that the professor teaching that course had a project that involved getting helicopters to fly using machine learning algorithm. So even though – I actually had a similar crazy quarter. I ended up dropping that class, but oddly enough before I got out of the class, I got in touch with that professor who turned was Andrew Ng as many people know now and started working on building a helicopter that was later being flown by machine learning algorithms. Because of that project I actually stuck around at Stanford to take on my master’s degree and later to start the Ph.D. program there, and so that was really one of the first projects that got me involved in artificial intelligence.

Dave Kruse: And what year was that, that you started on that project?

Adam Coates: I probably started hacking on it in about 2003, 2004 and then I think most of the final results, where we really decided that there was nothing left for us to do was maybe in 2008.

Dave Kruse: Okay, all right. Wow! And whose idea was it to kind of direct the helicopter to do a certain task or certain location or what was the goal there?

Adam Coates: So one of the things which is really cool about the project is that you know the computer programmer, you sort of get into this mindset that you can build virtually anything, as long as you can buy all those computers, because growing up I loved building things and computers gave you a way to sort of get outside of the physical world and build pretty much anything you could imagine. You were just limited by sort of your own mind and how much time you had to code it. And when it came to things like helicopters though, it turns out that I as a human being could learn to hover a helicopter or you know like simple maneuvers and you could even find expert pilots who could fly crazy aerobatics. Things that didn’t look aerodynamically possible, but somehow they had figured out by trial and error and by being taught from other experts how to fly using incredible routines. Unfortunately if you just try to build a helicopter that can do this under computer control, it is not enough to be a great programmer. There is just no way for me to sit down even today and write all of the instructions for how it is that you fly a look or do a flip in the air, because the aerodynamics of a helicopter are just so complex that no human being, no expert in aerodynamics even can write down the equation for how that thing behaves. So the idea behind this project was that we were going to take machine learning algorithms and artificial intelligence methods and see if we can write a computer program that would learn to fly this helicopter based on watching humans fly and based on looking at data from human flight, and that turned out to be really successful, but even though I still cannot to this day write down all the instructions to how you do a flip. I can write a piece of software that learns to perform a flip on its own for a very sophisticated aircraft and this is one of the like extremely thrilling events early in my career that got me excited about AI, because it’s sort of like computer programming taken to the next level, that thing that you couldn’t even describe how to solve on your own, you could still build a computer program that would learn to solve it all by itself.

Dave Kruse: Got you. And we don’t have to talk about helicopter the whole time, but one last question and for the audience, the way that you the helicopter or the machine learns, essentially you would perform the loop or the flip and then the machine would understand kind of like the – well I don’t know all the little nuances of helicopter control, but it would understand the controls and exactly what type of inputs are necessary.

Adam Coates: Yes. Well it turns out that the aerodynamics of how a helicopter behaves is just horribly complicated. There are all kinds of nasty things that can happen to you. For example, if a helicopter is just hovering in place, it churns up a lot of air around the rotor blade and it kind of sort of gets this tube of air going that’s being pulled down through the rotor. And if you descend just straight down at just the right speed, you can have really awful chaotic effects happening that you really wouldn’t expect, unless somebody explained it to you. And deciding what happens in all of these cases is just too complex, and so what we did was we had a human fly the helicopter round and fly through all of these difference unusual stages, many of them would be quite dangerous to do in a real helicopter and we collect a bunch of data from that and then we let the computer look at all of the data to understand how the helicopter behaves, and if we controlled it this way in this situation, this is what is going to happen next; and very often the computer actually realizes that if I move the controls this way in a certain situation, there could actually be a wide range of outcome and one of the challenges after that is given how the helicopter behaves which you’ve learned how it moves roughly from the data. You have to also be able to learn from experience, because if I as a person tell you to fly a perfect loop, it turns out that it’s almost impossible just because of momentum and air flow and all of these different variables. It’s almost impossible to fly it perfectly, because if you ask the computer to fly this loop perfectly, it often makes a weird mistake, because you are asking it for something that is literally impossible. But if you have a human to demonstrate it; now the human flies several imperfect loops. It turns out that the computer is capable of looking at all of the things that the human has done and taking out the parts that are extraneous, all the mistakes that the human made, and keeping the parts that are really consistent, and so you can actually watch a human fly the same routine several times and figure out what the routine was that they intended to do and you can see for instance that a loop is almost a perfect circle, but it involves speeding up and slowing down at different parts, and these are the things that as a human would be very hard to code yourself, but the machine learning algorithm is able to figure it out by watching an expert fly, because when you put all these things together, it turns out you can go fly these amazing aerobatic routines, just by watching an expert do it a few times.

Dave Kruse: Interesting. Oh man! This is good. That could be a whole podcast, so we’ll stop there, but we’ll keep moving on. There are so many other questions and you are very good at explaining how it all works So I have more questions on Stanford, but maybe we’ll see if we have time we’ll come back to it. You did mention – I am always interested in people I interview like how they grow up. You know you said you like to build stuff. What type of stuff did you like to build growing up?

Adam Coates: Oh! I mean I think I was a pretty normal kid. I grew up in a small town, about 4000 people. It’s actually a little tourist town in California, and you know there’s lots of stuff to do outside, but I always grew up with a love of building things. My father was a contractor, so I kind of started out with hammering nails and stuff like that, but as a little kid you know you can’t build a whole house. So I was always kind of tinkering with small things and when we first got a computer, that’s when I really realized that Oh wow! Here’s this thing that I can do to build things much bigger than I can possibly do myself in the real world and so as much as I love like Legos and erector sets and all these sort of fun toys to learn to build things, the thing that really got me about computers was its ability to build things as big as your imagination.

Dave Kruse: Interesting, okay. All right, so in 2000, about 2012, well you spent some time in Indiana too. But I was curious you know, how was it leaving Stanford? Like did you enjoy your time there for the most part or did you ever think that Stanford…

Adam Coates: Yes, I definitely had this sensation when I left Stanford, that it was the end of a very long era, but I think you know I was actually really very happy moving on. When I first moved out to Indiana, I spent some time at the Indiana University, Bloomington and really just fell in love with Bloomington and all the faculty at IU. So I met Geoffrey Fox who is actually a senior faculty there and he introduced me to a lot of sort of background with supercomputing, and I think this was very much inspirational for a lot of the work that I’ve been doing today, where we are figuring out how to run neural network, much faster and much more powerful machine, which is powering a lot of progress in AI. So I think one of the exciting things about moving to Indiana wasn’t just that I think it’s actually a nice place to live outside of Silicon Valley, but also that there is a ton of knowledge and expertise in all of these universities that have large supercomputing facilities, that for many years was useful for things like physics or chemistry, but suddenly your seeing a ton of adoption for AI and machine learning technology. So I feel like it was going to be fateful instances where I was in the right place at the right time to pick up a lot of important skills that are being very helpful now.

Dave Kruse: Interesting. And can you – Do you remember – can you share one of those kills or kind of learning’s where like oh! This is great; it’s really helpful.

Adam Coates: Yeah. So it turns out that for deep learning algorithms where we are training extremely large neural networks, part of the challenge is just computational; that the neural networks are getting so big and the amount of data we need to train them on is so large, that most of the barriers that we faced to having a much higher performing truth then is actually, can we train this model in a reasonable amount of time? If it takes you six months to train a model right, it turns out that most research can’t make progress, because if I have to wait six month to get feedback, then I can’t just make progress in coming up with new ideas, because it takes too long to find out whether one idea works or not. So this sort of research process is very iterative and even if you can train a model, it doesn’t really help me to get feedback as quickly as possible. So it also happens that most of the operations that we run with the learning algorithm look like linear algebra. It’s just big matrix multiplied and that involves a lot of computation and as I came to IU, I was thinking a lot about Gosh! How do we solve all of these really challenging problems with doing very large scale matrix multiplication; and it turned out that if you were working on supercomputers all these years, you knew exactly how to solve that problem. One of my favorite things that Jeff also told me was, yeah all these problems that you’re talking about, we solved them 40 years ago, but nobody cared back then, and it really just hit me that with all this knowledge about how to handle this large dense computations that has been waiting for a killer app and so I think AI has that application, and I picked up a lot of knowledge about how to solve dense linear algebra problems and a lot of intuition about the scale of things that are possible on a large computer, that have really been helpful since I’ve come to Baidu and thinking about some of the things that should be possible now with the computing power we have.

Dave Kruse: Got you, okay. And so some of these methods have a – they just allow you to iterate a lot faster. Do they cut down on the training time a lot. I will be curious, like if you weren’t doing these methods, like how long will a model take you versus what you are doing now? How much time it might save you?

Adam Coates: Well, let me give you an example. One of the first things we tried to do a couple of years back was many people will remember there was this results from the Google brain team, where they had an unsupervised learning algorithms that just looks and a whole bunch of data and just tries to find interesting patterns. It doesn’t have an objective of any kind and this neural network had about a billion connections with a billion free parameters to tune and they ran and this experiment on a 1,000 machines with 15,000 CPU cores for I think at least a week or probably several weeks. And so when we first saw this result I remember talking to friends like, ‘well great, what are the rest of us going to do?’ None of us have 1,000 machines with 15,000 cores that we can be running for weeks on end. But when I started getting involved in a more supercomputing applications, I started to realize that deep learning isn’t that well suited to these giants pieces of cloud infrastructure, and that in fact when you try to run things like this, existing cloud infrastructure that beginner net companies have built, they are actually terribly inefficient. And so what we did was we reimplemented that experiment on what is basically a small computer, a supercomputer. It was actually just three machines with a bunch of GPU. so that was actually 12GPUs, which in hindsight was quite small, but for the time was a large experiment, and we could reproduce that result in a few days on just three machines instead of a 1,000 machines. Now if I look at the work we do with Baidu, that had really turned into something really highly effective that today we can train neural networks for speech recognition that have said 10 billion connections and we can train them on 40 or more GPS and do it vastly, more efficiently than we could on a single machine or a piece of cloud infrastructure like before.

Dave Kruse: And why is that the case? Is it so much faster like a supercomputer versus a more distributed computing?

Adam Coates: Most distributed systems are both to handle throughput, where let’s say I want to do a search over an enormous dataset and I want to handle millions of users all at once, these will materialize really well, because I can have all of my machine searching their own data in parallel or I can have lots of machines handling different users in parallel and that scales really nicely and those machines don’t have to talk to each other very much. They just kind of have to coordinate, so that if each machine does some work they can send their response back to the user and then they can all be aggregated at the end in one very small stack. The problem that deep learning have is that most of the algorithms not only involve a large amount of computation, but every machine has to communicate very frequently with all the other machines and have to move a lot of data back and forth in order to make progress. So one of the things that you could often see is if you ran your favorite debilling algorithm on your home computer network with gigabit ethernet lets say, most of the time will be spent moving data back and forth over this network and the computer actually doesn’t give you that much computation as it was. And so once you start packing a large number of GPUs into a machine, you have this huge amount of computing power, but it was incredibly hard to use it all at once, because the machine is going to sit idle waiting for the other machine to catch up. And so when you start building something that looks more like a supercomputer, basically you are targeting that problem. You are saying, I am going to build a single machine in effect that is very tightly connected by an extremely network with lots of bandwidth and very low wait and see communication, and all of these machines are going to run together in lock stock and that requires a lot more orchestration, a lot more regularity to the problem, where when one machine finishes its computation and sends out data to the other machines, the other machine is just finishing and is waiting to catch the ball, and it’s all sort of tightly coupled so that you don’t waste any time. If you do that well, you can run much, much faster than these larger system with very loosely coupled machines.

Dave Kruse: Interesting; it makes sense. The Founder of of course Cray Computers is based here in Wisconsin, so hopefully this will bode well for them, I don’t know. So let’s talk about what you’re doing at Baidu and can you tell us about some of the projects you are working on first?

Adam Coates: So the AI Lab of Baidu, the Silicon Valley AI Lab here is one of Baidu’s four research labs and one of the things that is sort of unique about our group is that we have a bunch of different skill sets all in one team. So I believe that great AI research will not just involve working on AI algorithms and these learning algorithms. I think as you can guess from the last conversation that there are a bunch of components to making progress in AI, where there are actually things like systems research. So within the AI lab we actually have two, well actually three machine learning teams. One of them is working on better text to speech methods, another one is working on speech recognition and we also have an applied machine learning team that I’ll talk about in a second. And so those teams are working on machine learning algorithms, but we also have a high performance computing systems research team, and they are solving all of these problems that I was just talking about, where if we want to train our speech models faster or we have a text to speech model that we don’t know how to put on a cloud server to serve users, these are often a systems problem and so we have the systems research capacity in-house to help solve those. We also have an AI product team, because I think there’s just so many things we could do in AI research, that for us though the most important thing to do is make sure we solve the problems that are going to have the most impact for people, and it helps a lot to be developing your own products and have product experts who are thinking hard about user experience and what the pin point is for everyday people, so that when we do AI research we solve all the problems that matter to them, and so that’s also why we have this applied machine learning team that spends a lot of time speaking about the AI research that’s coming out of the research community and our own team and thinking about how to tweak those things or guide them so that they land in a product. So those are all the teams that we have that are working right now and then each of them have a whole portfolio of problems that they are working on. For example, the high performance computing team is especially incredibly exciting, because the hardware that we are starting to see today is on this trajectory where in a few years time we are going to see something like between a 100 and a 1000X speed up possible for our deep learning algorithms, through a combination of getting more power hardware from hardware vendors just because of mobile through architectural changes that hardware vendors are making, specifically through deep learning now and also just through capital. If we just buy more hardware and we string them together in a bigger network, we can get more speed ups. The trouble is that none of that stuff comes from free. But even if we can buy the machines, most of the software techniques we use today just aren’t suitable for the types of hardware that we are going to see over the next several years. And so what our systems team is figuring out is how to change all of our training algorithms; how to change the way that we deploy these models, so that we can actually use all of that new hardware that’s coming up. So that’s a really exciting trend in AI, which is the fact that all of the hardware now is starting to follow AI rather than the other way around. And then as I mentioned, we have a speech research and text to speech team. The reason that we are working on those two problems is because especially in China and in developing economies many people are actually mobile first and the next generation of devices out there aren’t necessarily going to have screens, and so your ability to interact with a device in a very natural way becomes crucial. So being able to have a device that speaks to you in a very natural voice that you enjoy, but also making it possible for you to speak to that device and how that understands you, those two components are crucial to these next generation devices, because we’re not going to have touch screens anymore or perhaps our screens are going to be too small for touch to really be viable, and so it’s going to be really critical for us to have these technologies to enable that next generation of interphase.

Dave Kruse: And yeah, so you’re working on some really interesting projects of course and so what are some of your biggest challenges. Like what – either the text speech or speech recognition – I mean I know you have lots of challenges, but what’s up in that kind of a…

Adam Coates: One of the things that keeps me up at night is how are we going to get away from labeled data. I think one of the really exciting things that’s happening today is that most of the new products that people are imagining are things where speech recognition will be incredibly valuable, as an interphase to the product, as a way for people to interact much more naturally and efficiently with technology. The trouble is that every time a manufacturer releases a new product the acoustics of that system because of the microphone, because of the product design can actually change and sometimes they’ll release it to a market that’s a little bit different, where people have maybe an accent that you haven’t heard before and this especially challenging in China where you often have communities with very thick accents that can be hard even for a native speaker to understand. And so when you’re faced with an ecosystem like that, it’s not enough for us just to do sort of speech research on our benchmark dataset and make improvements there. We actually have to be thinking about how are we going to solve the machine learning problem, where I need to make a speech recognition system that works for people with an accent I’ve never heard before with a use case I haven’t thought about and I need to make it work for them quickly before the product is even launched, and so you have so little data that a lot of our traditional machine learning methods, and especially deep learning methods have a hard time adapting to that scenario easily. It becomes very expensive to try to get all of the data for these situations. But if we’re really going to change the world with speech recognition and we’re going to make it work for everybody, not just for the lucky few of us that don’t have much of an accent, then we really have to think hard about how we are going to make deep learning worthwhile with a lot less data than we’re accustomed to have.

Dave Kruse: Yeah, that seems like a problem in a lot of areas. It seems like no one has really cracked that nut of reducing the amount of data necessary, because you hear a lot about different – a lot of different applications for AI, but they all require fairly large amounts of data, and so it’s a work development for some areas, but how far away are we from having models that don’t require – this is kind of a generic broad question, but how far away from the models?

Adam Coates: It’s hard to say. I think we’ve all known in the research community that unsupervised learning would be a huge boom, if we could figure out how to get it to work. Many years ago a lot of us thought that unsupervised learning was going to be the sort of the driving force behind progress and deep learning and that actually turned out to be why that supervised learning, where I just give you the correct answer and you have to update your neural network based on feeding the correct answer. Supervised learning actually worked perfectly fine if you had a huge amount of data and a really big neural network. It turned out that the answer was scale and so we’ve been pushing hard on scale and that’s got us pretty far and I think we got a little bit more room left, but increasingly I think we are returning to this unsupervised learning challenge and thinking Gosh! We really have to figure out this out if we are going to make progress and there are so many things going on in the field that I think we are making a ton of progress on application; I think reinforcement learning has shown that we could get really good results if we have good simulators and so on. I don’t find that stuff quite as surprising. What I think would be the huge watershed is going to be as we start to make a lot of progress on unsupervised learning. So over the last year or two I think there have been some very exciting development, something called adversarial networks, also just a number of different methods for improving how well we can generalize data. So I don’t see a clear cut path for how those things are going to solve this problem of letting us generalize all these new applications the way that a human can, but I think the starting points are starting to show up. I’m hopeful that over the next couple of years we are going to see a break, that if we can find the right lever to open up those floodgates, then we will be able to use unlabeled data at the e-store or perhaps be able to get by with a lot less data overall.

Dave Kruse: Interesting, okay. And can you describe what’s the difference between supervised and unsupervised quick, just for the audience, just in case?

Adam Coates: Yeah. So to take an example with our speech recognition system, the way that you train these speech systems is you give them a lot of examples of the input which is an audio clip that you recorded from someone, and then you also give the learning algorithm the correct answer. So if I tell Dave, hello from Sunnyvale, then I can record that audio clip and give it to the speech system, and the speech system can make a prediction for what it thinks that I said and early on in its learning processes it will probably make a terrible prediction, but if I also give it the transcription, I actually write out the text “Hello from Sunnyvale” and I give it to the learning algorithm, then it will go and tweak all of the parameters in the neural network to try to give the correct answer the next time. If we do this over and over and over again, then eventually the system will learn to give correct transcription and the important defining characteristic of this supervised learning approach is that I have to have that transcription, I got to have the text of the correct transcription. So unsupervised learning by contrast would mean giving the speech recognition system a huge amounts of audio, but with no transcription and then asking it to please do something useful. So that if I later ask you to transcribe audio given just a handful of examples, you can do that much faster. We’d really like to do this, because especially for something like speech recognition it turns out they are getting high quality transcribed date that’s very effective.

Dave Kruse: Nice, thank you, that was good. And so you have been around AI for many years now. Does AI ever surprise you? Like does it ever come up with results like huh! That’s interesting or yeah?

Adam Coates: I think the thing that just as a phenomena and I think one of the things that shocks me the most right now is how easy it is to get something 90% working and how unbelievably hard it is to get something that’s like 95% to 99% working. The deep learning algorithms we have today are remarkably generic. You can throw almost any patter matching problem at them and they will find a way to give you a pretty decent solution. So when we started working on text to speech, there are all of these sub problems that you have to solve. So for example, you need to learn how you take English characters and match them to a phonetic representation that tells you how to pronounce it. And English spelling is horribly complex and so trying to figure out how to pronounce a new work that you have never seen before just based on its English spelling is even hard for a human, and yet if you just train this thing with a whole bunch of known pronunciations, it can actually learn to do really well and it’s not perfect, but it actually performs comfortably to other state of the art methods you could choose. And likewise, you can the same deep learning systems and throw it at the problem of converting those phonetic representations to audio waves or throw it at the problem of looking at a bunch of audio and chopping it up into little slices that the text each engine has to learn from. So all of these different sub problems can be solved with the exact same technique up to reasonably good level of performance. But the thing that turned out to be incredibly hard is getting from the thing that’s working decently well, all the way to 99%, something that’s almost indistinguishable from a personal performance and that’s where I think the AI lab is really trying to push forward. I think we do a fair bit of research ourselves, but the thing that we are especially good at is solving all of the sort of other challenges that are necessary to get to 99%. You might need to come up with a fantastic new way of getting large quantities of data. You might need a totally new piece of software written to run at extremely large scale on a super computer. You might even need to think about how you change the user interphase of a product and handle all the human issues involved with how a person interacts with a speech engine in order to get to a product experience that feels a lot like talking to human or listening to a human. And so we are set up to handle all of those things together, so that we don’t just produce a research paper that gives you 90% performance, but we can solve all of the things in the entire pipeline that are going to get you to 99% performance.

Dave Kruse: Interesting! And if it’s not confidential, I was curious, you know you mentioned the UI example, what you could change in UI in order to improve the performance. I’m kind of curious what you had in mind there.

Adam Coates: Some of the things that we don’t get a lot of attention in the research world are things like how do you setup user interphase. A person knows when the device is listening to them. How do they when the device has stopped listening to them and then importantly, how does the device know what the person is doing? So how does the device know that the person has stopped speaking and it should take action and how does the device give you feedback about what its current beliefs are, what you want it to do. And even better, this is something that frankly none of the current systems do very well. If I make a mistake as a machine learning algorithm and as a human you try to correct and you say no, I didn’t say that, I meant this; these are things that human aren’t bothered by. We are not bothered by stuttering, we are not bothered by you know, we are not bothered by not knowing when a person has stopped speaking or started speaking. None of these things actually matter where we are remarkably good at figuring out the beginnings and ends of what and what to say. But the machine learning algorithms we have today and the way that they are integrated with products don’t really solve this to my satisfaction. I still have a hard time with products from pretty much any big internet company at this point, and it’s a mix of getting that machine learning algorithm to understand when you stop speaking and do that really well, but also setting up the user interphase for which you can push back and give the person some feedback on what you believe is going on.

Dave Kruse: Interesting, okay. Yeah that’s – I didn’t think about that. That’s clever; it makes sense.

Adam Coates: Actually one of the things that listeners can try out is to create a voice first keyboard called TalkType for android. So if you go to talktype.baidu.com you can try it out. And what we are trying to do there is look at the way most companies design the keyboards on their phones and recognize that very often we just have this keyboard which is the way we’ve always interacted with our devices and we’re throwing hands at the speech engine that’s pretty good but not amazing. So what we do as product designers is we built a little microphone button on and when if you press the microphone on the switch mode and all of that, and this always felt to me like a very clunky experience and so what we were trying to do with TalkType is make speech the first things that you use and put in your face all the time and see what happens. We actually learned a ton about it. So for example, I often hear people say well, I’ve never used speech to interact with my phone because I don’t really want to talk to it in public or you know it doesn’t work all the time and things like that. But once I started using it, and I’m really sort of forced to rely on it because it’s the default method on my phone, I was actually shocked at how quickly my own habits changed. That even if I’m sitting on a bus or something, I don’t mind kind of murmuring into my phone the way I would talk to a friend in the seat next to me. It actually is a easy habit to build and once you start to recognize that you can start developing other features that if they are natural enough, if it’s like talking to the person in the seat next to you people will actually view, but you will end up seeing that you really got to get to that same level of naturalness in order for it to be successful.

Dave Kruse: Interesting, yeah. I could see I’ve been using voice more on my phones because it nice, especially if you want to like send a long text. I got my mom. She never did texting, but I’m like Oh! You can just like press this mic like you mentioned and so now she texts a lot more. So it – but you know it’s not great, like the user experience isn’t great, but I can see – it would be awesome to try it out. TalkType is what it called right?

Adam Coates: Yeah, that’s right. And you know the ultimate goal for all of these things is to make devices as natural to interact with people. I think it’s pretty remarkable for something like navigation. My wife and I took a lot of road trips while we were in the Midwest and to avoid having our eyeballs on our phone, which is just an incredibly dangerous things that is going on right now in the United States. The person in the passenger seat will just hold the phone and you could actually get virtually anything done that way, just by talking to the passenger. We say, Oh! When is the next turn coming up? And if something wasn’t clear they could clarify and you could get a hotel booked and it was pretty amazing that even if you are not really using the intelligent of your passenger to get things done, just being able to in effect speak to the device was incredibly freeing and its clearly possible to get all of your questions answered. But unfortunately the device itself just doesn’t have that kind of experience yet. We don’t have the natural language in speech technologies to make that happen. But I think in the future this is going to happen and thankfully too since I think right now we could all do certainly with being able to interact with technology, but also keep our eyes on the road.

Dave Kruse: Interesting, all right. I think that almost answers my next question and I know we are almost out time. We are probably going over, but this is pretty interesting so hopefully that’s okay. And I was curious about in the next five years like what do you see coming out of using neural networks or AI that people aren’t necessarily thinking about now. But you just mentioned the voice kind of interaction. That could be your answer right there. I don’t know if you have any other ideas or what could be coming our way.

Adam Coates: I think one of the realizations that’s happened over the last few years, especially in Silicon Valley is that AI is not a product by itself, and you could often see startups when deep learning was becoming popular, that we were founded on this idea that we are going to do amazing deep learning technology and we are going to build a platform out of that. And to some degree that’s been helpful. It gives people access to the technology in a way that wasn’t possible before. But it’s very difficult to build a great product out of that. Partly because the deep learning algorithms aren’t efficiently automated that you can get the best performance that way, but also because of the need for integration, the need to have your speech user interphase very closely coupled with the speech engine and how it functions for example. So I think the things that is coming up that everyone needs to be thinking about is how are we going to create these sort of delightful vertically integrated AI products. And that’s how I think it’s not just a technology challenge, like how do I make a speech engine better, but it’s even just a challenge for organizations that innovate. How do you get people who are product experts and machine learning researchers and systems researchers? How do you get them all to work under one roof, so that you can make all that vertical integration happen; so that a machine learning engineer who is thinking about which classifiers they should tweak today to make the product better, actually knows how to think like a product person and can think about user pain and how they could address it. And likewise how can a product manager or product developer who really understands users and really thinks hard about user interphases can understand what’s possible with machine learning technology and how to interact with machine learning engineers and communicate well with them. I think that’s something that we haven’t really phased before and in the course of AI development we’ve always been happen to throw a speech engine up on the wed and let he product people figure out it out, but I don’t think that’s going to apply in it.

Dave Kruse: That’s good and so our last question and then we’ll end it. But I’m curious, how do you get away from work? Like what do you like to do outside of AI and maybe it’s more AI, but what do you like to do?

Adam Coates: Right now I have two children under two, so they are pretty much my life outside of work.

Dave Kruse: Got you.

Adam Coates: But that’s okay. It’s actually – as an AI researcher it is especially remarkable to see how far we are even from the capabilities of a two year old. I made the mistake of showing my two year old son that you could flip this latch on the back door in order to open the slider and go outside, and I was like Oh well! He is not going to remember how to do this, so I’ll just show him once and sure enough he learnt instantly. Even though it’s a very non-obvious computer vision problem, a very difficult thing to manipulate, it’s a heavy door, it didn’t matter. He just had to see it once, partly demonstrated and now he is capable of getting out of the house. And when I saw him do this I was just stunned, I was like Oh my goodness! I have no idea how to build a robot that can do what he just did, that learnt from so little data. Although I have to say that we are making progress on things like that. That our ability to get our machines to learn from one or two examples for very simple tasks is getting close, but I think they are still nowhere close to a toddler.

Dave Kruse: Well, and that’s a good way to end. Yeah, kids definitely keep you humble and exhausted, but yes.

Adam Coates: Especially as a researcher.

Dave Kruse: Yeah exactly, like oh man. All right, well I think that’s a great place to end and Adam I really appreciate your time and this is great. I mean really good articulating a pretty complex subject, which isn’t easy to do, and so yeah, I appreciate it. I learned a lot and so hoping everyone else will learn a lot too. So thanks Adam.

Adam Coates: Thanks a lot Dave. It was really appreciated.

Dave Kruse: Definitely. And thanks everyone for listening to another episode of Flyover Labs. As always, I greatly appreciate it. We’ll see you next time. Bye.