What Is SLAM?

3 things you need to know

SLAM (simultaneous localization and mapping) is a method used for autonomous vehicles that lets you build a map and localize your vehicle in that map at the same time. SLAM algorithms allow the vehicle to map out unknown environments. Engineers use the map information  to carry out tasks such as path planning and obstacle avoidance.

Why SLAM Matters

SLAM has been the subject of technical research for many years. But with vast improvements in computer processing speed and the availability of low-cost sensors such as cameras and laser range finders, SLAM is now used for practical applications in a growing number of fields.

To understand why SLAM is important, let's look at some of its benefits and application examples.

SLAM Examples

Consider a home robot vacuum. Without SLAM, it will just move randomly within a room and may not be able to clean the entire floor surface. In addition, this approach uses excessive power, so the battery will run out more quickly. On the other hand, robots with SLAM can use information such as the number of wheel revolutions and data from cameras and other imaging sensors to determine the amount of movement needed. This is called localization. The robot can also simultaneously use the camera and other sensors to create a map of the obstacles in its surroundings and avoid cleaning the same area twice. This is called mapping.

SLAM is useful in many other applications such as navigating a fleet of mobile robots to arrange shelves in a warehouse, parking a self-driving car in an empty spot, or delivering a package by navigating a drone in an unknown environment. MATLAB and Simulink provide SLAM algorithms, functions, and analysis tools to develop various applications. You can implement simultaneous localization and mapping along with other tasks such as sensor fusion, object tracking, path planning and path following.

How SLAM Works

Broadly speaking, there are two types of technology components used to achieve SLAM. The first type is sensor signal processing, including the front-end processing, which is largely dependent on the sensors used. The second type is pose-graph optimization, including the back-end processing, which is sensor-agnostic.

To learn more about the front-end processing component, let’s take a look at visual SLAM and lidar SLAM – two different methods of SLAM.

Visual SLAM

As the name suggests, visual SLAM (or vSLAM) uses images acquired from cameras and other image sensors. Visual SLAM can use simple cameras (wide angle, fish-eye, and spherical cameras), compound eye cameras (stereo and multi cameras), and RGB-D cameras (depth and ToF cameras).

Visual SLAM can be implemented at low cost with relatively inexpensive cameras. In addition, since cameras provide a large volume of information, they can be used to detect a landmarks (previously measured positions). Landmark detection can also be combined with graph-based optimization, achieving flexibility in SLAM implementation.

Monocular SLAM is when vSLAM uses a single camera as the only sensor, which makes it challenging to define depth. This can be solved by either detecting AR markers, checkerboards, or other known objects in the image for localization or by fusing the camera information with another sensor such as inertial measurement units (IMUs), which can measure physical quantities such as velocity and orientation. Technology related to vSLAM includes structure from motion (SfM), visual odometry, and bundle adjustment.

Visual SLAM algorithms can be broadly classified into two categories Sparse methods match feature points of images and use algorithms such as PTAM and ORB-SLAM. Dense methods use the overall brightness of images and use algorithms such as DTAM, LSD-SLAM, DSO, and SVO.

Structure from motion.

Point cloud registration for RGB-D SLAM

LiDAR SLAM

Light detection and ranging (lidar) is a method that primarily uses a laser sensor (or distance sensor).

Compared to cameras, ToF, and other sensors, lasers are significantly more precise, and are used for applications with high-speed moving vehicles such as self-driving cars and drones. The output values from laser sensors are generally 2D (x, y) or 3D (x, y, z) point cloud data. The laser sensor point cloud provides high-precision distance measurements, and works very effectively for map construction with SLAM. Generally, movement is estimated sequentially by matching the point clouds. The calculated movement (traveled distance) is used for localizing the vehicle. For lidar point cloud matching, iterative closest point (ICP) and normal distributions transform (NDT) algorithms are used. 2D or 3D point cloud maps can be represented as a grid map or voxel map.

On the other hand, point clouds are not as finely detailed as images in terms of density and do not always provide sufficient features for matching. For example, in places where there are few obstacles, it is difficult to align the point clouds and this may result in losing track of the vehicle location. In addition, point cloud matching generally requires high processing power, so it is necessary to optimize the processes to improve speed. Due to these challenges, localization for autonomous vehicles may involve fusing other measurement results such as wheel odometry, global navigation satellite system (GNSS), and IMU data. For applications such as warehouse robots, 2D lidar SLAM is commonly used, whereas SLAM using 3-D lidar point clouds can be used for UAVs and automated parking.

SLAM with 2D LiDAR

SLAM with 3D LiDAR

Common Challenges with SLAM

Although SLAM is used for some practical applications, several technical challenges prevent more general-purpose adoption. Each has a countermeasure that can help overcome the obstacle.

1. Localization errors accumulate, causing substantial deviation from actual values

SLAM estimates sequential movement, which include some margin of error. The error accumulates over time, causing substantial deviation from actual values. It can also cause map data to collapse or distort, making subsequent searches difficult. Let’s take an example of driving around a square-shaped passage. As the error accumulates, robot’s starting and ending point no longer match up. This is called a loop closure problem. Pose estimation errors like these are unavoidable. It is important to detect loop closure and determine how to correct or cancel out the accumulated error.

Example of constructing a pose graph and minimizing errors.

One countermeasure is to remember some characteristics from a previously visited place as a landmark and minimize the localization error. Pose graphs are constructed to help correct the errors. By solving error minimization as an optimization problem, more accurate map data can be generated. This kind of optimization is called bundle adjustment in visual SLAM.

Example of constructing a pose graph and minimizing errors

2. Localization fails and the position on the map is lost

Image and point-cloud mapping does not consider the characteristics of a robot’s movement. In some cases, this approach can generate discontinuous position estimates. For example, a calculation result showing that a robot moving at 1 m/s suddenly jumped forward by 10 meters. This kind of localization failure can be prevented either by using a recovery algorithm or by fusing the motion model with multiple sensors to make calculations based on the sensor data.

There are several methods for using a motion model with sensor fusion. A common method is using Kalman filtering for localization. Since most differential drive robots and four-wheeled vehicles generally use nonlinear motion models, extended Kalman filters and particle filters (Monte Carlo localization) are often used. More flexible Bayes filters such as unscented Kalman filters can also be used in some cases. Some commonly used sensors are inertial measurement devices such as IMU, Attitude and Heading Reference System or AHRS, Information Network Systems or INS, accelerometer sensors, gyro sensors, and magnetic sensors). Wheel encoders attached to the vehicle are often used for odometry.

When localization fails, a countermeasure to recover is by remembering a landmark as a key-frame from a previously visited place. When searching for a landmark, a feature extraction process is applied in a way that it can scan at high speeds. Some methods based on image features include bag of features (BoF) and bag of visual words (BoVW). More recently, deep learning is used for comparison of distances from features.

3. High computational cost for image processing, point cloud processing, and optimization

Computing cost is a problem when implementing SLAM on a vehicle hardware. Computation is usually performed on compact and low-energy embedded microprocessors that have limited processing power. To achieve accurate localization, it is essential to execute image processing and point cloud matching at high frequency. In addition, optimization calculations such as loop closure are high computation processes. The challenge is how to execute such computationally expensive processing on embedded microcomputers.

One countermeasure is to run different processes in parallel. Processes such as feature extraction, which is preprocessing of the matching process, is relatively suitable for parallelization. Using multicore CPUs for processing, single instruction multiple data (SIMD) calculation, and embedded GPUs can further improve speeds in some cases. Also, since pose graph optimization can be performed over a relatively long cycle, lowering its priority and carrying out this process at regular intervals can also improve performance.

SLAM with MATLAB

MATLAB® provides capabilities for implementing SLAM applications for your target system and addressing many of the countermeasures to known technical challenges with SLAM.

  1. Sensor signal and image processing for SLAM front end
  2. 2D / 3D pose graphs for SLAM back end
  3. Occupancy grids with SLAM Map Builder app
    • Import lidar data from MATLAB workspace or rosbag files and create occupancy grids
    • Find and modify loop closures, and export the map as an occupancy grid for path planning
  4. Use output map from SLAM algorithms for path planning and controls
  5. Speed up computationally intensive processes such as those related to image processing by running them in parallel using Parallel Computing Toolbox™
  6. Deploy standalone ROS nodes and communicate with your ROS-enabled robot from MATLAB and Simulink® using ROS Toolbox
  7. Deploy your image processing and navigation algorithms developed in MATLAB and Simulink on embedded microprocessors using MATLAB Coder™ and GPU Coder™

Learn More About SLAM

Develop a map of an environment and localize the pose of a robot or a self-driving car for autonomous navigation using Navigation Toolbox.
The method shown in this example is the use of pose graph optimization in combination with the collected series of 2D lidar scan data to implement the SLAM algorithm. 2D lidar scan data is used to construct an environment map and estimate robot position and trajectory.
This method uses read-in IMU values to process 3D lidar data from automobile-mounted sensors, which is then used to construct a map. This method compares automobile trajectory with global positioning system (GPS) records.
Structure from motion (SfM) is a method of determining a 3D scene from a sequence of 2D images. In this example, the position of the calibrated camera is determined from the view sequence, and the 3D scene is reconstructed.
Visual odometry is a process which estimates camera position and orientation through analysis of image sequences. This indicates a method of tracing the path of a single calibrated camera from a series of images.
This example shows how to process image data from a monocular camera to build a map of an indoor environment and estimate the trajectory of the camera. The example uses ORB-SLAM, which is a feature-based vSLAM algorithm.
This example demonstrates an application of the Monte Carlo Localization (MCL) algorithm on TurtleBot® in simulated Gazebo® environment.