Vision-Guided Self-Localization for Autonomous Cars

Veera Ganesh Yalla, NIO

For autonomous driving, we need to be able to localize in the map very precisely. Additionally, in urban areas with high-rise buildings, autonomous cars will face problems with poor GPS signal reception. Using various MATLAB^® toolboxes, NIO was able to demonstrate a proof-of-concept algorithm in a short period of time. In his talk, Veera Ganesh Yalla discusses the company’s progress in self-localization of a vehicle for autonomous driving.

Published: 16 Mar 2018

Today, I'll be presenting our progress on the topic of vision-guided central localization for autonomous cars. This is an ongoing project at Nio, and the results we shared with you today are from R&D. Major part of the work was done by Satya during his 2017 summer internship work at Nio, and who is continuing his master's research at Nio under my guidance.

Actually, Satya is here, so if you can quickly stand up and show yourself. So Satya is from Worcester Polytechnic Institute, and he is continuing his research at Nio right now. Other author is Daviday who is my colleague at Nio. He leads the simulation and mapping efforts.

Before we go into the specifics of this particular project, let me introduce the company, Nio, a little bit. And then I shall highlight a few projects that use MATLAB toolboxes. So we have three projects which we kind of picked for this talk. The first project is EP9, which is our supercar driving autonomously at the Circuit of America, and which broke a few world records.

For this project, we use Control Systems Toolbox, MATLAB Coder, and Simulink Coder. The second project I'll briefly touch upon is the camera calibration. For this, we use the Computer Vision System Toolbox. The third project which is the main topic for today's presentation, we are currently using Image Processing Toolbox and Computer Vision System Toolbox to achieve this goal.

I'll provide my conclusions at the end of this talk. About Nio. So Nio is a global startup, now growing beyond startup mode. We are more than 3,000-plus employees globally. The two largest entities are Nio US, which is based in San Jose, and Nio China, which is based in Shanghai.

Our mission in the US is to change driving by building autonomous vehicles with experiences with our car 3.0 technology stack. We recently showcased our first production vehicle ES8 targeted for China market and the Shanghai auto show. And ES8 will be released to customers in China towards the end of this year.

The Nio China team is focused on electric vehicles and associated membership services for that electric vehicle. Additionally, we have a performance team in the UK which participates in Formula E and develops ultra high-performance vehicles, specifically our Nio EP9 supercar. We have set production vehicle world records with the EP9, including becoming the fastest electric vehicle in the world with EP9.

Munich is where the car design is done. There are over 100 people working on that program, with Munich supporting both Nio US and China. We have made significant progress to date. We are a relatively young company—more like less than three years in existence. But we have made significant progress to date, and I'll highlight a few of these.

Especially on the US side, we have defined what's called a car 3.0 full stack architecture. We have received autonomous vehicle testing permits, and we are actually testing our vehicles on the road for L2 and L4 features today as we speak. We set the world's fastest autonomous lap.

We also unveiled Nio EV which is our autonomous vision car for L4 targeting mid-2020 launch. On the China side, we won the first Formula E championship. Again, joint collaboration Nio EP9 with the world's fastest EV. We have many manufacturing partnerships. NIO ES8 SUV and also in China, like I said, we'll be launching it end of this year to the China market.

Now let's talk about our EP9 supercar a little bit. The key personnel involved in making the EP9 drive autonomously around Circuit of America is Aaron Bailey, who is the head of driving dynamics; Dennis Polischuk, who is head of platform engineering and firmware; and Kamran Turkoglu, who is head of controls and algorithms.

A little bit more about EP9. So this is a limited edition, 1 megawatt, 3G capable—3G meaning not the cellular 3G—it's 3G forces—cable electric supercar born to push limits. This is a supercar that Nio launched in November of last year. In February, we accomplished the fastest autonomous lap on February 23rd, 2017, specifically.

We set the autonomous lap recorded 2 minutes 40 seconds and 33 seconds at the Circuit of the Americas. And the top speed we achieved was 160 miles per hour. The EP9 then went on to set a driven lap record by one of our engineers, actually—Aaron Bailey, who I mentioned earlier, who is also a race car driver. The lap record was 2 minutes and 11 1/2 seconds at a top speed of 170 miles per hour.

I'll now show you a quick video that tells you more about this accomplishment.

We embarked on a project to upgrade an existing supercar to incorporate additional technology elements that would allow it to drive without a driver behind the wheel.

We go out and map what we know is the inside edge of the track and the outside edge of the track with GPS, and then we go out and drive a lap so that the car can actually learn and see where the human would drive the car. And from that data, try to as best match what I'm doing with the actual car's position.

That's the lap record we set autonomously going around Circuit of America. So in the video, you saw Aaron Bailey, Dennis Polischuk, Kamran Turkoglu, and our leader, Padmasree Warrior, who is the CEO of Nio US. So like I said, this project was done using mostly MATLAB. So we used the Control Systems Toolbox, the MATLAB Coder, and SIMULINK Coder to achieve this goal.

And to be honest, this was achieved in three months. Our team of three engineers spent three months in the UK with our performance engineering team, and we developed algorithms and applied it to break the record. The second project is the camera calibration. So I'll briefly touch upon this project.

So this is some rapid prototyping work we did in camera calibration space. The motivation for this is, for our ES8 vehicle, we are using a tri-focal camera for ADAS. And the three cameras have three different fields of view, so you cannot use the same calibration method for all the three cameras.

So we have a narrow field of camera which is 28 degrees, there is a wide field of view camera, which is 150 degrees, and there is a main field of view main camera, whose field of view is 52 degrees. Especially the wide field of view is very challenging to calibrate. At this time, I contacted MATLAB, especially my friend Avi who I've known for many, many years now, and asked him, hey, do you have anything that would expedite my rapid prototyping?

And Avi told me, as a matter of fact, MATLAB has been experimenting with the fisheye calibration technology, and he said I can give you a pre-release version. I said, please give it to me. I really thank Avi and the MathWorks team for providing me this early access, and this really expedited our rapid prototyping of calibration. And we could provide some valuable feedback to our vendors based on the calibration results.

The second project which Satya is actually involved in is calibrating a stereo camera system for a mapping pilot project. So we are working with a few different mapping companies out there. We are doing some pilot studies right now. So we have a stereo rig on our car, and we are actually trying to calibrate.

So we are looking for—we want to make it as easy as possible for our engineers when we keep changing the configuration of the cameras and the sensors. So we find the stereo calibration app to be very, very useful. So these are screenshots of the single camera calibration app.

So this is a very intuitive app. You can see you can just create a new session, load all the images, and MATLAB automatically picks the images that are good quality, and rejects the bad quality ones. And you can set like whether you want a 2 coefficient calibration or a 3 coefficient calibration. Really cool things, like what would take me maybe days, maybe weeks to code it up, this is all done and it's ready to use.

We find it very useful, especially the graphs which show the reproduction errors and the extrinsic. They really tell us how good our calibration is, so we find it very useful. I think now in the release 2017 b, they actually have the fisheye model added into the MATLAB release.

The next one is the stereo camera calibration. So there is another app called Stereo Camera Calibrator app. So we use it for our mapping pilot project. I won't go into any specific details of this project at this time, as we are in very early R&D stage at this point. Perhaps in the near future, we can share more info on this project as we have the results.

So now coming to the main topic of today's presentation. So this is a vision-guided self-localization for autonomous cars. This is one of many approaches Nio is currently evaluating for self-localization of autonomous cars. This started off as a summer internship program project for Satya, and we formed some really good results, and we started continuing down this path at this time.

What is the motivation for this research? So one of the motivations is GPS is not always available, and the accuracy of GPA is very limited in a production car. It's greater than 5 meters, but when we want to achieve something like L4 autonomy, we want centimeter level accuracy for decision making and part planning tests.

So the other problem is poor GPS signal reception. So in urban areas and urban canyons, you would not get a reliable GPS signal, and also when you are inside of a tunnel. So how do we fix this problem? What can we develop? So these are the input signals that we look into right now.

So the last received GPS estimate—where was my last position based on the GPS coordinates. Currently, we don't have access to HD maps, but we are investigating with some of our map partners. We have some early proof of concept, but nothing that I could put in a production stage right now.

So we have the SD maps, we have the odometry, which is a change in position over time. We are currently integrating the camera and LiDAR, which is work in progress. And this is where we use MATLAB extensively to do some feature extraction and visual feature recognition. I'll go into the details a little bit.

The method was implemented as particle filtering. So this is algorithm in a snapshot. So we take the last received GPS estimate, and we also take the SD map data. Like I mentioned, we use particle filtering for this approach. So we initialize particles along the segment.

I'll show you images so you don't have to memorize any of this. Given the motion update using odometry or velocity model, we update the weight of each particle based on the distance to map segment. And then we resample particles using a stochastic universal sampling. And the mean of all these particles will give the approximate location of the vehicle.

So that's the big picture. So the input to this algorithm is the SD map. So the SD map is something like this on the side. I don't think you can see it very clearly, but when you look at the close up of the SD map, it is very sparse. It only has some segment geometry, but the number of points in the segment are very sparse.

For our application, we need very closely spaced points to generate particles all along the segment. So what do we do? First, we do like an SD map augmentation. So the number of points along each segment along the map is augmented, and due to this, we have a more dense map information for localization. This is simple interpolation, so we are actually interpolating the points that are available on the SD map.

Next, particle filtering for localization. Once we have the augmented map, we need to sample particles along the segment of the map. So the states of the particles are like xy, and theta which is the heading. So first, we extract a 400 meter by 500 tile. This is a variable—you can change it. And this tile is chosen based on the last received GPS point.

For augmenting, each point along the segment is considered as a possible location, because at this point, we don't know where we are in the map, and that's what we are trying to solve. So we initialize particles along these segments. And using this information, like the particles are generated with orientations equal to theta, and 180 plus theta, both directions.

So the close up is—the image on the right shows the close up of the tile—my right. On the left, it shows the particles initialized on the map segment. So this is a test we do. So this is a map segment we want to localize. So as soon as we enter from one corner and exit, I want my algorithm to localize and say this is where I am. I'm not taking any information from the GPS at this point—I only have the last received GPS signal.

This is a video. So as you can see, the particles are starting to disperse. And the green dot is the ground truth, and the blue dot is the mean of all the particles. So as the car is driving around the loop, you can see the blue dot is starting to converge closer to the green dot. There you go. So that's the localization. So we are just relying on odometry and particle filtering to do the localization. As you have seen—

—one of the downsides of this is it's a very iterative approach. We are trying to track these particles over time, and we're trying to, based on the odometry, we are trying to localize using an iterative approach. The solution takes some time to converge to a location.

And the computer location can have ambiguities if the driving part is mostly along straight lines. So these are problems that we're aware of. So the next thing is, how do we solve this? What can we add to this algorithm to make this problem solvable?

So the proposed solution is to add visual features to the map nodes. This is where the mapping project kind of comes into play. So the idea is to extract visual features from the camera sensors and send it to a map cloud.

And the visual features enhance the map data that can be used in real time localization. So the localization itself can be done in real time, which means we can do some feature extraction and matching. And this reduces the number of possible locations for the particular filters before we converge to a solution.

And this is a form of map crowdsourcing. Every car that is driving along a route can capture these visual features and augment the map in the cloud. And the user car can actually download these features and use it for localization using visual features.

So what this does is it reduces the number of locations where you want to initialize your particles, so the solution would be much faster. So this is the idea we are currently looking into. Just visually representing it. So each map has nodes like I've shown in the SD map. The green dots represent the different map segments. So we can extract features at each of these node points and store these visual features. So maps, if you think about it, they're actually layers. So we can add an additional layer on top of it and add this visual feature node.

So the yellow dots are the map nodes. For illustration, I'm just showing a few nodes. In reality, each intersection can be a node of its own. The road segments connect different nodes in a map. So the visual features are nothing but—like you can choose adjust features or whatever features you like, even deep learning approach if you take the YOLO network on top of the last layer, like that could be a feature vector that you can embed in the map.

So we are currently experimenting with these visual features, and this is where we use MATLAB a lot—like the Image Processing and the Computer Vision Toolbox—the deep learning stuff. During localization, we extract these features and do feature matching. and this will reduce the number of particle filters we need, and the places where we search for a magic location. And in places like urban canyons, where you have number of just strong visual features, I think this method would work really well.

So these are some of the references where we followed a little bit, but I think our method is slightly different in terms of augmenting the map with visual feature thing. So conclusions—in this talk, I presented an overview of the various projects at Nio using MATLAB toolboxes. Project one, like EP9 driving autonomously at COTA.

With the help of Control Systems Toolbox, MATLAB code, and Simulink code, Nio was able to implement the autonomous driving feature, which is actually GPS waypoint following, and successfully demoed the EP9 at the COTA. Like I said, this took three engineers three months to achieve this goal. Our controls and algorithms team uses Control Systems Toolbox, MATLAB Coder, and Simulink Coder almost on a daily basis to develop our L2 and L4 features, which are currently in development.

Project two is a camera calibration with the Computer Vision System Toolbox. We were able to do rapid prototyping of the various calibration algorithms for the ADAS tri-focal system we plan to launch in the ES8 vehicle. Project number three, which I talked about right now, is the vision-guided self-localization for autonomous cars. We rely heavily on MATLAB for Image Processing and Computer System Toolbox for the visual feature recognition that we are incorporating in improving the self-localizing algorithm.

In addition, we are also using Automated Driving System Toolbox for some of our R&D projects. Maybe in a future talk, I can showcase some of those projects. That's my presentation. I would like to thank MathWorks team for providing me this wonderful opportunity to share some of our projects using MATLAB toolboxes with you. Thank you so much for listening to this topic.