Difficulties in obtaining good results with the ORB-SLAM2 algorithm in MATLAB.

46 views (last 30 days)
I am interested in running the ORB-SLAM2 algorithm that is presented in this example:
but with a different set of images and camera parameters as input. I already managed to set up the simulation and get it to work, but the optimized trajectory seems way off compared with the one coming from the GPS data. I ran multiple tests in different scenarios, and the most relevant ones are given below:
  1. Example from the link above, runs as expected;
  2. In the US City Block virtual environment with Unreal Engine, I captured the video frames from this other example: https://it.mathworks.com/help/vision/ug/stereo-visual-slam-for-uav-navigation-in-3d-simulation.html, and used them as input. After setting up the parameters as in this second example, the results obtained are good;
  3. KITTI dataset. The script works but the results are not satisfying;
  4. In the US City Block virtual environment with Unreal Engine, a trajectory drawn by me using Select Waypoints for Unreal Engine Simulation (https://it.mathworks.com/help/driving/ug/select-waypoints-for-3d-simulation.html). Here I managed to obtain good results even after changing the camera intrinsics so that they were the same as those used for KITTI;
  5. Another outdoor, urban virtual environment downloaded from the Unreal Engine Marketplace, same camera parameters as the previous case, but no good results.
Some useful information about the tests:
  • I created a new bag of features for each environment, as described in the example;
  • The initial pose has been updated with the correct values;
  • Camera intrinsics, baseline and all the parameters inherent in the stereo configuration should have been correctly updated;
  • The loop closure is obtained;
  • Some parameters of the script has been modified to make it work with the new set of images and cameras: numSkipFrames has been reduced since the KITTI dataset is acquired at only 10Hz and numPoints has been increased since the resolution is higher than the one in the example's images;
  • The length of the paths is similiar in all cases, and it's about 700 -800 meters.
I can't figure out what I'm doing wrong and why in a couple of cases it works well while in the others there is no way to get a good trajectory.
The image below shows what I obtain from a KITTI sequence.
I have made many runs, and it would seem that for even very small changes in some parameters the results can vary significantly. For example, numSkipFrames = 3 might work well, but = 2 or = 4 has detrimental consequences. Is this supposed to happen? In any case, the results that I have obtained are no better than what is shown here (from the Map Points it is also clear that the optimized trajectory is tilted out of the XY plane).
I am fairly new to the world of SLAM, and I hope I have stated my problem correctly. I would be extremely grateful to anyone who can give me some tips to get good results even with KITTI or other image sets. I am, of course, available to provide any additional information that might be helpful.
Thank you

Accepted Answer

Qu Cao
Qu Cao on 9 Dec 2022
Thank you for posting the question.
In general, tuning the hyperparameters for a visual SLAM system can be hard and requires a lot of heuristics. You have identified the key hyperparameters, such as numSkipFrames and numPoints, and are moving towards the right direction.
The behavior you saw in different numSkipFrames values might be attributed to some specific frames in the dataset that caused a weak tracking (the number of matched points is too small). When you set numSkipFrames=3, the system skipped that frame by accident. Well, this is my guess and I totally agree with you that the visual SLAM system should be more robust and the results should be more predictable. We are actively working on this and you should be able to see the improvement soon.
Regarding the tips on tuning the SLAM system, you may find this post helpful. There is a script with the best parameters I could find for the above KITTI dataset.
  1 Comment
Davide S.
Davide S. on 11 Dec 2022
Thank you Qu for your answer.
I am glad to hear that I'm probably not making significant mistakes. After many many trials and hyperparameters changing I managed to obtain errors somewhat comparable with those from the example, so I am actually satisfied with the results now. in particular, what seems to have made an important difference was to increase the number of iterations in bundleAdjustement to refine the local key frames and map points and to separate minNumMatches in two different parameters: one for helperCreateNewMapPointsStereo and the other for optimizePoses, increasing the second to 100.
The behavior with numSkipFrames makes perfect sense now that you point it out. Thank you for your explanation.
I wish you a good continuation of your work and happy holidays!

Sign in to comment.

More Answers (0)


Find more on Image Processing and Computer Vision in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!