This interview is all about robotics. I was fortunate enough to interview Julie Shah. Julie is an Assistant Professor at MIT in the Department of Aeronautics and Astronautics and leads the Interactive Robotics Group in CSAIL, known as the Computer Science and Artificial Intelligence Laboratory.
Julie’s research is fascinating. It’s around collaborative robots, which is the study of how humans and robots will interact together in the future. Some of her work definitely reminds me of Tony Stark’s robots. A lot of her work revolves around collaborative robots in manufacturing, disaster response and space exploration.
Here are some other things we talk about:
-For your planning robots around disaster response, how do you analyze conversation to detect planning discrepancies?
-How do your collaborative robots anticipate human actions? She shares some interesting anecdotes about how predictable humans are.
-What needs to happen before we can have a robot cooking for us?
David Kruse: Hey everyone. Welcome to another episode of Flyover Labs and today we are lucky enough to have Julie Shah with us. And Julie is an Assistant Professor at MIT in the Department of Aeronautics and Astronautics.
Julie Shah: Actually can I stop you.
David Kruse: Oh! Yeah, sure.
Julie Shah: I’m actually an Associate Professor.
David Kruse: Oh! Associate Professor, all right. Associate Professor, sorry about that. Good I’m glad you stopped. Associate professor and she is in the Department of Aeronautics and Astronautics and leads the interactive robotics group in the CSAIL, which is known as the Computer Science and Artificial Intelligence Laboratory. And Julie’s research is quite fascinating. It’s all around collaborative robotics, which is the study of how humans and robots learn and act or how robots interact with each other. So some of her work definitely reminds me of Tony Stark’s robots in his machine shop, except Julie’s robots are actually real. So a lot of her work revolves around collaborative robotics and manufacturing, disaster response and space exploration. So I invited Julie on the show to hear more about her research and what she sees for the future of her collaborate robots. So Julie, thanks for coming on the show.
Julie Shah: Thank you for having me.
David Kruse: Was there anything else in the intro that we need to revise?
Julie Shah: All good.
David Kruse: That’s good, all right. So let’s start with your background. I’m curious to see how you got to be at MIT and what you are doing now?
Julie Shah: Yeah. So I’ve actually done all my degrees at MIT. I did my PhD here in Artificial Intelligence focusing especially on automated planning and scheduling. So how you can get machines to do planning and scheduling tasks for us and I sort of specialized in how these machines could support people in making decisions and in making robots smart enough to work collaboratively with people. My background is actually – my under grad degree was actually in Aeronautics and Astronautics here at MIT. And so I focus on human machine collaboration in time critical, basic critical domains. Places where safety is important and timing really matters. That interaction has to be basically perfect for the system to add value.
David Kruse: Interesting, and so growing up how did you decide you want to go to MIT and it’s not easy of course to get in. But how did you decide you wanted to go to MIT? When did you get interested in kind of this space?
Julie Shah: So for me I grew up loving space and loving science and wanted to be an aerospace engineer and MIT has one of the best if maybe not arguably the best program in aerospace engineering. That was always my dream and I was fortunate enough to get in and I – yeah and it turned out you know growing up I would tell everybody I wanted to be an aerospace engineer and everybody would say “oh! That’s so specific.” And so I came to MIT thinking, you know this is a very specific thing to major in and then you start studying Aerospace Engineering and what you realize is that it’s a broad discipline, it’s a systems discipline. There is many, many different areas you can specialize in and I really became interested in human machine interaction with autopilot control systems and then higher level facility making systems that supported me on this job.
David Kruse: Interesting, okay. And so let’s talk a little bit about what you are working on now. I mean you have a number of projects. I don’t know if you want to go through all of them or you can list kind of the top two or three projects you are working on, that would be interesting.
Julie Shah: Yeah. Yeah, so the work in my lab, everything we do is focused on trying to harness the relative strengths of people and machines working together. And so the real question is how can we make a machine or a robot smart enough to work together with us as a team. And we look at both teamwork and sort of physical collaboration and we look at team work and decision making support. How do we – how do people bring their information together to decide how they are going to teams even? So instead of in terms of physical interaction, we need to play robots to work alongside people in factories, to improve the efficiency with which you can build planes and build cars. So that requires – if a robots going to – you imagine robots in factories, but they are usually cage and they are usually physically separate from people and you can just pre program them to do the same thing over and over. If you have a reason to make that robot actually work alongside people, to do any work that’s inter-dependent with people; well the problem is that people don’t work like robots. So the robots now needs to be able to simply make decisions on the sequencing on this work, the timing of its actions, so that it can effectively coordinate with other people and other robots on the line. So you have a – we have an area of our research where we developed fast algorithms for scheduling robot work or adapting robot promotions, so that the price of the robot can be a team on building a car, building a plane, more the way that people work together to do that. And then on the other side we’ve developed machines that model human decision making. How in a conversation do we even decide how we are going to work together? Who is going to do what and in what order in types of inspiring our partners mental state, like what are they currently thinking, what do they currently understand about what we said. So we designed machines that are able to those sort of instant paths, so that they can participate in the conversation or just figuring out the plan which then you would go use like for example to build a plane or a car or deploy an emergency response team.
David Kruse: Interesting and so with the decision making, how – can you give an example of kind of a project you are working on and yeah then you might want to dive a little deeper into that, but…
Julie Shah: Yeah, so in one of our projects what we are trying to do is listen to natural human conversations like in the meaning where you’re just going to plan or you are planning some progress on the tests and you are making decisions about who is going to do what. What we want to do is listen to that natural language and try to infer a few things. One is, if the team is going to work together you want to try to figure out whether you are on the same page, whether they have a consistent understanding of what they have agreed upon and then if they do, well then you want to try to figure out what is it they’ve actually agreed upon, because that’s what forms the shared plan that you’re then going to go execute. And in many cases people – I guess we are all in meeting every day, so it probably makes sense. But with these meetings and this confusion, now everybody is on the same page or what we think we agreed upon, actually what makes sense and then we go to try to execute and so we are working the machines that can perform these given tasks. Are we in consensus, what have we agreed upon. If we are not in consensus or if what we’ve agreed upon is logically inconsistent or it doesn’t make sense in some way, we design the machine to actually involve itself in the conversation and help us strengthen our shared understanding or picks up the plan.
David Kruse: Wow! Okay and it does sound like the future. So can you provide like an exact example of what kind of conversation you had and then what a robot is supposed to do or maybe the robot pushed back on something because it wasn’t you know…
Julie Shah: Yeah, yeah. So we developed this capability with the motivation of supporting emergency response teams. So every situation is very fluid and the sort of quality of the team’s execution is you’re really depended on sort of a shared understanding of what they thought they should go out and execute. And even small weakness in that that shared understanding can have a very, very significantly consequences resulted wasted lead sources, delays which can be costly in terms of time and money and maybe safety of people. So the motivation was emergency response and so ultimately we developed the machine learning technique. Any machine learning system needs like a training data set to learn about, in this case learn about our conversation and patterns in our conversations. So we actually, we need a publicly available large data set to train the system on conversation and team planning. And there is a publically available data set that involves many teams of core people designing a remote control. It’s imitated and publically available, but really looks nothing like an emergency response team is doing there planning. This is what we had and there is literature and Cognitive and Behavioral Psychology that tells us well, it doesn’t really matter whether you are talking about a remote control or emergency response or any number of other domains. The patterns of information that change tend to be very similar and how the exchange information relates to our shared consensus. So we spend our whole meeting asking each other questions. We’re probably not going to leave with a very solid understanding. But if we fully exchange information really to the task or make some proposal, you accept those proposals, I confirm them, then we’re much more likely to leave on the same page. So we actually train our machine learning systems from this purpose and data of people discussing the design of remote control and then we deployed it in experiment so we had teams of people come into the lab and plan for an emergency response deployment. So they are given a map, there are given travel pins of the map, they were given locations of patients with different issues that had to be brought to hospitals, and various resources like ambulances, helicopters, lease cars and they had to figure out how to deploy those resources to bring those patients to be at the hospitals all in a timely manner. And it turned out that the machine learning systems that learned on the remote control data set is actually able to identify the weakness in consensus of the team assessing the emergency response plan. And untimely it was able to cooperate or sort of interact with the human, with the people to improve the quality of their plan by almost 20%, which is really cool.
David Kruse: Wow! Okay, so boy I have so many questions now. So how when they take it to train medicines, it’s like a pretty complex problem. It doesn’t seem like you have a huge amount of data. It sounds like you have enough data, but…
Julie Shah: Yeah, yeah. No yeah, I mean it’s not like you will you know. Getting any data from people is very time consuming in comparison and you know the data set is very small, so that’s actually the trick in the research, that’s like the creativity of the innovation. When you design a machine learning modal you need to design like the features that it looks for, you need to design like a structure, sort of like the scalpel that it uses to piece together its observation of us, so that it can do this job efficiency. And the way we do that is we translate well established cognitive models for how people do this into computational models for machines. So we look at the cognitive models for how people developed their strong shared understanding and we translate elements of the cognitive models into machine understandable representation. We look at the scaffolding that people use in their minds to try to piece together a conversation and then we design the structure of the computational model near that scaffolding that people use. And the result is the machine that can learn relatively efficiently on relatively little data, much the way people do, which is neat.
David Kruse: And so can you give an example of how – how the robot picked up kind of issues or discrepancies among the two people talking. I don’t know if you can give any exact example and then how was the robot trained on that. I mean it seems like there would be unlimited number of ways that you would have discrepancies. So how did you think of…
Julie Shah: Yes, there is an unlimited number of ways, yeah, yeah. So you have to start with well, what can a machine observe about our natural conversations and what can it observe automatically like in real time and what the cognitive model tell us is that it’s really patterns and information exchange. These patterns of like question, response, accept, reject that really matter the shared understanding. And so and it just so happens that there are existing, either they are techniques that will translate sentence by sentence our natural language into these dialog acts related to information exchange, so that’s actually something that machine can observe. And then what we know is that well there are actually patterns to these different dialog acts related to information exchange. We don’t know exactly what those patterns are, but we know that they fall in a few different categories based on the literature. So what we do is we look at those dialog acts related information exchange and we process them into these like higher level meta labels related to information exchange and we take those sort of another labels from the literature. So that’s like decades of research and Cognitive and Behavioral Psychology tell us that these patterns physically are useful in answering the consensus question. Then what we need to do is learn patterns of those meta label and so we train two machine learning models; one, to learn patterns of weak shared understanding and another to learn patterns of strong shared understanding. So with these two computing models, then given a new conversation we see that in each of the models and we see which one gives us the maximum likelihood answer and that’s the one that we use to try to infer the strength of the agreement, is it strong or is it weak.
David Kruse: So how generic is the model. Like could you take it into let’s say into a business setting, I guess a military setting or any other setting or do you have to tweak it?
Julie Shah: So that’s like – yeah, that’s the key question, that’s like the million dollar question. So the studies that our models are based of are looking at shared understanding and design path. So the remote controlled design path that those people were doing, it looks very different than emergency response. So ultimately it’s a design path, they are trying to figure out where to put buttons, who is going to – the first remote controls be marketed to and the emergency response test is a computing test which is a more complex type of design test. Very different content, very different domains, but those are the only two we tested it on. So the next question is how well does this generalize. Will it work for military team planning, will it work for just for everyday meetings, you know millions of meetings a day.
David Kruse: We only…
Julie Shah: Wouldn’t it be needed. Someone once joked to me that this capability was incredibly potentially useful because not only could it help us figure out if we were not in consensus, yeah if we were not in consensus it could help us get in consensus. But this kinds of meetings everybody is in agreement but you are still talking. So wouldn’t that be wonderful if they could just tell us if we are on agreement, move on.
David Kruse: Move on, yes exactly. Well, that’s good, yeah having a robot lead the meeting. I like that.
Julie Shah: Yeah.
David Kruse: So can we go to these other domains, I kind of cut you off I think or…
Julie Shah: Oh! Yeah, we are still looking into it. That’s our next question, is to look at how well it generalizes other domains.
David Kruse: Interesting. And I’m curious, do you remember an example from that disaster planning where a robot picked up a discrepancy? If you don’t that’s okay, but like a sentence or some subject matter where it picked up?
Julie Shah: Yeah, so in that experiment there were sort of four decisions points that the team had to agree on and there was conversation around each of those four decision points. So in what order should those patients be rescued, which hospitals should they be brought to, what vehicles should be used for each of them. Just sort of the categories of decisions that you knew you had to make for this domain. And it was different for each team, there wasn’t sort of like one consistent topic that was always the program for all the teams, which sort of indicates like the complexity here. It just depends on how the teams’ conversation evolves and so yeah, it would be a different topic each time.
David Kruse: Okay, got you. Interesting, well all right I could ask a lot more questions about that, but let’s move on to more of the kind of the mobile assistant like interacting type – how to robots interact with the humans and there is some, I’ll try to remember to post these when posting your podcast. You have some interesting videos on YouTube. I saw where there was a robot and a human a researcher and the researcher would do something and then the robot would of wait and pause for the researcher to finish it and then would go over and do a task, which is you know it’s so cool. It looked like two humans almost working together. So I imaging that’s not an easy thing to do and so how – you know what’s kind of the technology behind that and that’s a pretty broad question, but how do you even start kind of building a collaborative work space like that?
Julie Shah: Yeah. So the challenge there, there is two challenges that we’ve working on. One is, even though we are trying to formulate a plan, like how does machine help us formulate our plan, it need to perform in front, so sort of try to figure out what we are thinking to be able to figure out how to help us. Even when we are working together on like a physical task like that, we have to deal with a similar problem. So if there is – at every point in time for that robot trying to figure out how to work with this very unpredictable human, there is like an enormous number of possible features that could unfold. So one is give there is an enormous number of possible features, but then one of those features materializes. How does the robot sort of quickly figure out what its right next action should be, because it can’t do another dance, it can’t like pre-compute everything you could possibly do for every way this could unfold. So there is like a computational challenge of how it does it syncing only fast enough based on the real time information that its getting. And then there is another part which is a store. You can watch what happens and then try to quickly react to it, that’s sort of an even capability, but humans do more than that when we are working together. We like anticipate what our partner is going to do. We can model that in our minds, and so how do we do that. So how can a robot watch the progress of a person to find their task and try incur what their likely next task sequence will be. What the timing of their actions would be, because if they can do that well then it cannot just react in the next moment, but it can sort of project into the future and make a plan around the predicted actions of the person. So we work on both those aspects; making their robot think quick enough as if it is happening and allowing the robot to make accurate predictions of what its partner will do as the task unfolds.
David Kruse: Okay. And how do you yeah make those predictions, and some of that must be from research and just how humans predict. I mean are you looking at research and how humans predict humans’ movements and actions and then trying to somehow create an algorithm for the robot to do the same?
Julie Shah: Yeah, yeah. So we do rely a lot of human studies. So people do this well. So is there something that we can learn from that. At the – there is like in predicting, like there is predicting like next steps in the task which are sort of just three questions or like the discreet chunk or the next step that a person is going to perform and then there is like what are they going to do at the motion level? Like how is their arms going to move, how is their body going to move and of course that depends on what their next step is going to be and vice versa, so the problems are kind of inter coupled. The motion level one is really interesting, because again like people do this pretty well. So what someone explained to me was you know, it’s a very small observation of a very small movement. Like when you are looking at people that are fencing or people that are doing karate, there is a like a small movement of the body that they are able to sort of quickly like match to the – what’s going to happen next, what the gross behavior is. And so it turns out that we were able to train a robot to do something very similar. We gave the robot information about like the biomechanical model of the person, and we trained it to look for these very small movements in the arm or in the walking motion to try to predict where a person is going to reach next or whether a person is going to walk, if they were going to turn or not. And there are prior studies that say that you know a full motion capture puts them in like a very controlled environment. You can figure out if someone is walking, who is going to turn right or left with like 70% accuracy, just by looking at like the media lateral velocity, sort of like the velocity of the midsection and head turn and so we – which we use those features and train the machine learning techniques to try to do that and we are able to reproduce those results which is really cool. So you know a robot walking, moving in a crowded space with a person can sort of track these small movements in the body of a person, very small and figure out two steps ahead whether that person is going to turn left of right around that robot. It would be very useful for the robot. And similarly with just like a few hundred milliseconds of arm motion, like a fraction of a second the robot can predict with 70% accuracy where a person is going to reach on the table within the four quadrants. So I spent a lot of time like watching people walking in hallways and watching people fix someone else’s desk, wondering what if I can do that. Like do I know in a fraction of a second what they are going to reach for and I’m not sure I do, but what people tell me is like people that fence, even do karate they’ve like trained their minds to basically do that same task. So it is like a human possible capability.
David Kruse: Interesting. And can you tell us how you train the robot, you know in order to do that, kind of the steps you go through to make that happen.
Julie Shah: Yeah. So we started by using sort of the biomechanical model of a person. So we are capturing information about the links in the arm and the legs and the body and their relationship and we use those as features in the machine learning model that performs like time period classification. So if you imagine like every joint in your arm or leg, you can sort of get like a trace of what it’s doing through time. So it’s like a time series data. Then what we learn is a model, like a statistical model of multiple of those cases. A person walks across the room a few times, we have a library of what those traces look like. So we then develop this model.
David Kruse: And these are images right we are feeding the model?
Julie Shah: So the model, the model really needs the [inaudible] of the person, the lengths and the angles of the arms and legs. So the robot needs to go from vision to try to process that, which is a hard job. So our first set of studies, we used actually a full motion capture systems. We put like a little dots on the person, so that we could – with some cameras situated all around the room, very, very precisely pull this information out. But a robot like in a factory or moving through quarters with people, we need to be able to do that from a vision on its own, using its hidden cameras and so that’s actually our current research, to be able to design a technique for the robot to pull all that same information but from sensors onboard rather than off-board situated around the room.
David Kruse: Interesting. Okay, and then all right – and then how about the algorithm to help the robot think quickly. How does that work?
Julie Shah: Yeah, so there the – like the computational challenge is that the robot needs to reason about time and it needs sort of unfold time into the future and look at sort of the every little time set what could possibly happen. And so if you are working very closely together, the time steps need to be sort of very small and then that branching factor of like what this next time step could happen, at this next time step what could happen, it sort of like, it turns into like a very large tree, a possible feature very quickly. And the problem is because you have to despotize time. So we, so in the lab we work on developing algorithms that reason efficiently on time. So what are ways we can very quickly cut that search phase to try to pull in parts of that phase that is very unlikely the person would go down because it doesn’t make sense for the task basically. And so that involves designing like constrain programs and schedule all these tests that work very, very quickly online where you carry it. Like is this a possible feature that could happen, no okay. Is this a possible feature that could happen, no, okay. And so we sort of expand the tree of options, we use these tests to quickly prune it and then we work with what’s left.
David Kruse: Got you. And so with this anticipation and having a quick reaction, why isn’t – what’s from a technical standpoint what’s holding it back from being completely seamless experience and amazing like the ironman example. What kind of challenges are you facing in order to get there?
Julie Shah: Yeah. So the areas that we work in, I mentioned in factories, doing something manufacturing, building planes, building cars, these are areas where the, you know, you have the person doing the task that are going be variability and how the task is done, and the person isn’t a robot. But there is still called something as structured task with a goal and with at least a high level procedure that’s specifies some place. What just is challenging is actually a robot in the home or a robot in your workshop, you can’t necessarily – you are not going to cook the same meal every day, if the robots helping you in the kitchen and it would be kind of time consuming to tell it in advance what that recipe is before you go ahead and do it. If you did that and you’ve had that sort of structural information, then the robot could be much, much more helpful right. But if there is nothing to do with our lives then we change the recipe as we go or going to need two recipes, because you are multi tasking and sort of these other environments that are less structured are still like a very big challenge.
David Kruse: Got you, okay. So what do you need to – is it just a lot of more training, more advanced algorithms in order to let’s say have a robot cook a meal. We are going to the extreme here, but of course we all want that someday, but what’s necessary in order for that to happen.
Julie Shah: Yeah, so in a lot of the works that we do that I’ve described the coordination is very implicit, so there is like a task plan. In a factory the person and robots can’t really talk to each other; it’s like often too noisy. But if you think about how people coordinate in the kitchen there is a lot more than just watching them and sort of guessing you know how to get coordinated in your mind what they are going to do. Their one partner might specifically ask someone else, like can you get the bowl from over there or can you turn on the oven. There is often, there is often like a lot that can be communicated through such as verbal information, but non verbal information as well like finger points. There is like – the way that people coordinate is that they’re very complex and very rich and we use these strategies effectively even when the task is relatively less structured. Now that being said, we also have a whole life time of experience in cooking. So like if I get out the eggs and I’ve been cracking it, like I guess you would know that maybe I’m going to want to beat those eggs in a moment. So more data I think is a part of the recipe to making robots just line up to help us in these less structured task, because we are obviously using a whole lifetime of experience in doing his, but the communication has to play an important part as well.
David Kruse: Okay, that makes sense. Yeah once – I mean with your robots I mean to be together in theory they should – if they are all networked it should keep them smarter, especially if you are doing the same task in the same work space.
Julie Shah: Yeah, absolutely.
David Kruse: But you, is that part of the design, I assume that you may have baked in.
Julie Shah: I think that’s coming. So there is a filed called powered robotics, where the idea is that you know you can train a robot in one area and it can sort of upload the knowledge that’s its learning and then there are other robots in other areas have those skills and those capabilities without another user training that robot for them specifically. So I think that is the further for sure.
David Kruse: Okay, and we are almost out of time here, but I have kind of a couple of questions but to me it’s the same question too. Where do you want your research to end up in five years? Like if you had your vision right now it would be in heavily in manufacturing setting and disaster response and yeah, what’s kind of your vision for your research now.
Julie Shah: Yeah, so the vision is even today we know when we don’t have many robots around. When we do have robots around they are either like teli-operated or they are specifically commanded step by step and it’s been hard, because you – by doing that you basically have to change everything about the way people are doing their work or living their lives to accommodate that robot. So the vision for what we are trying to do in the lab is to make robots smart enough to watch us and understand how we work, how we communicate and how we execute plans and make them smart enough to figure out on their own how they can just plug in or integrate seamlessly and so we’ve deployed robots in real factories which is exciting. We – I’m very interested in these emergency response application. These sort of time critical, safety critical applications are actually where we are pushing the technology forward, because the technology really has to be like perfect for them to add value in these areas. If you, if you get team work right when it has to be perfect, then you are not in the kitchen so less to look forward, sometimes it gets to the wrong thing. You know it’s okay there, but what we are betting is that effective team work will translate and if you can get it right when it really counts, it’s going to translate into better team robot partners in the home as well.
David Kruse: Interesting. Well, that’s a great vision, I’m excited for it. So keep working and what type of – for the audience, what type of robots do you typically with on these projects?
Julie Shah: So we work with, we work with a set of industrial robots, those are for the factories. And we have a next generation bomb disposable robot. It looks more humanoid, it’s got two arms, it’s on a mobile base and its meant to dismantle explosive devices. And then we have, we have a PR2 robot which is looks like, this robot looks like it’s from industrial. It’s more of a research robot but its more geared for working like alongside people and doing more helpful tasks. So we try out the techniques on various types of robots.
David Kruse: Interesting, okay. Well unfortunately I think that just about does it. This has been a great interview on robots. I really appreciate your time Julie coming on our show.
Julie Shah: Well, thank you for having me.
David Kruse: Yeah definitely. I’m excited for the vision you are creating. It must be exciting kind of working on right on the cutting edge of what…
Julie Shah: It’s very exciting and biased, but I think so, yeah.
David Kruse: Well, I mean its – you have the internet changed a lot of things, but robots actually interact like physically with all this. So it’s a whole another level of how it could change our lives in some ways, so it’s exciting. So yeah, thanks again for coming on the show.
Julie Shah: Thank you.
David Kruse: And thanks everyone for listening to another episode of Flyover Labs and we’ll see you next time. Bye everyone.