Document Type : Research Paper

Authors

1 The Department of photogrammetry and remote sensing, the faculty of geodesy and geomatics engineering, K. N. Toosi University of Technology, Tehran, Iran

2 Assistant professor, Department of photogrammetry and remote sensing, the faculty of geodesy and geomatics engineering, K. N. Toosi University of Technology, Tehran, Iran

Abstract

Extended Abstract

Introduction

Due to the complexity of frame processing used for positioning and mapping in visual odometry (VO) and visual simultaneous localization and mapping (VSLAM) algorithms, key-frame selection methods have been introduced to improve the performance and decrease the number of frames required for processing while maintaining accuracy and robustness of the algorithms. Selected key-frames in these methods make a very good representation of all available frames. The current key-frame selection methods rely on heuristic thresholds in their selection procedure. Researchers have used several datasets to find optimum values for these thresholds through trial and error. In fact, proposed methods may not work as expected with a new dataset due to changes occurring in the sensor, environment and the platform.

 

Materials & Method

The present study has proposed an improved geometric and photogrammetric key-frame selection method built upon ORB-SLAM3, as the state of the art visual SLAM algorithm. The proposed Photogrammetric Key-frame Selection (PKS) algorithm has replaced inflexible heuristic thresholds with photogrammetric principles and thus guaranteed the robustness of the algorithm and the quality of the point cloud obtained from the key-frames. First, an adaptive threshold decides the allowable number of points whose line of sight zone has changed on a four-zone cone built upon each point. Increased number of points whose line of sight zone has changed means increased changes and displacements of the frame and thus, increased need for a new key-frame. Then, a 3*3 grid was formed in each frame and the number of points with a more than 30-degree change in line of sight angle (effective points) in each cell were counted. Later, the Equilibrium of Center Of Gravity (ECOG) criterion decides whether the distribution of points is appropriate using the center of gravity of the points inside the frame. Appropriate distribution of effective points within the frame shows a high geometric strength and thus will improve the strength of key-frames network. IMU sensor  is not dependent on the position of the frames and the camera sensor. Thus, it independently obtains the key-frame in case significant changes occur in acceleration. The threshold value of acceleration has been experimentally considered equal to 1 meter per square second, which entirely depends on the type of robot. For ground robots with slower moving speeds, this threshold must be reset.

 

Results & Discussion

The present study has employed data collected by the European Robotics Challenge (EuRoC) flying robot containing the information collected by the synchronized camera and IMU information, as well as the ground truth data such as the robot trajectory and point cloud formed by the laser scanner. To evaluate the proposed method, extensive experiments have been implemented on the EuRoC dataset in mono-inertial and stereo-inertial modes. Then, trajectory of each algorithm was compared with the reference trajectory and point clouds formed by the key-frames were also compared. Apart from these qualitative evaluations, absolute trajectory error (ATE) obtained from running the PKS and ORB-SLAM3 algorithm 10 times were also compared quantitatively and finally, the error histogram was used to evaluate the point clouds. The processing time of each algorithm was also evaluated for each EuRoC dataset sequence. Results indicated that the proposed algorithm has improved ORB-SLAM3 accuracy in stereo-inertial by 18.1% and in the mono-inertial mode by 20.4% producing a more complete and accurate point cloud and thus, extracting more details from the environment. Furthermore, despite higher density of the point cloud, the error histogram has not changed significantly and fewer errors were observed in the ORB-SLAM3 algorithm.

 

Conclusion

Findings indicated that the PKS method has succeeded in extracting key-frames using photogrammetric and geometric principles. Apart from improving the positioning accuracy of the robot, the method has produced a much more complete and dense point cloud as compared to the ORB-SLAM3 algorithm. Also, dependency of the PKS method on the environment conditions and the type of system used (stereo camera or mono camera) was greatly reduced. Future studies can expand our key-frame selection method to include fisheye cameras or visual-only systems. More geometric conditions (near and far point condition and the vertex angle in the triangle formed by the points in the current frame, the camera and the corresponding points in the last key-frame) can also be added to the key-frame selection method.

Keywords

1- Ahmadabadian, A. H., Robson, S., Boehm, J., & Shortis, M. (2013). Image selection in photogrammetric multi-view stereo methods for metric and complete 3D reconstruction. Paper presented at the Videometrics, Range Imaging, and Applications XII; and Automated Visual Inspection.
2- Bloesch, M., Burri, M., Omari, S., Hutter, M., & Siegwart, R. (2017). Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback. The International Journal of Robotics Research, 36(10), 1053-1072.
3- Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M., & Tardós, J. D. (2021). ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Transactions on Robotics.
4- Engel, J., Koltun, V., & Cremers, D. (2017). Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence, 40(3), 611-625.
5- Engel, J., Schöps, T., & Cremers, D. (2014). LSD-SLAM: Large-scale direct monocular SLAM. Paper presented at the European conference on computer vision.
6- Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). SVO: Fast semi-direct monocular visual odometry. Paper presented at the 2014 IEEE international conference on robotics and automation (ICRA).
7- Hosseininaveh, A., & Remondino, F. (2021). An Imaging Network Design for UGV-Based 3D Reconstruction of Buildings. Remote Sensing, 13(10), 1923.
8- Hosseininaveh, A., Serpico, M., Robson, S., Hess, M., Boehm, J., Pridden, I., & Amati, G. (2012). Automatic image selection in photogrammetric multi-view stereo methods.
9- Kerl, C., Sturm, J., & Cremers, D. (2013). Dense visual SLAM for RGB-D cameras. Paper presented at the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.
10- Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. Paper presented at the 2007 6th IEEE and ACM international symposium on mixed and augmented reality.
11- Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., & Furgale, P. (2015). Keyframe-based visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 34(3), 314-334.
12- Lin, X., Wang, F., Guo, L., & Zhang, W. (2019). An automatic key-frame selection method for monocular visual odometry of ground vehicle. IEEE Access, 7, 70742-70754.
13- Lv, C., Li, J., & Tian, J. (2021). Key Frame Extraction for Sports Training Based on Improved Deep Learning. Scientific Programming, 2021.
14- Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 31(5), 1147-1163.
15- Mur-Artal, R., & Tardós, J. D. (2017). Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5), 1255-1262.
16- Qin, T., Li, P., & Shen, S. (2018). Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4), 1004-1020.
17- Rosinol, A., Abate, M., Chang, Y., & Carlone, L. (2020). Kimera: an open-source library for real-time metric-semantic localization and mapping. Paper presented at the 2020 IEEE International Conference on Robotics and Automation (ICRA).
18- Savran Kızıltepe, R., Gan, J. Q., & Escobar, J. J. (2021). A novel keyframe extraction method for video classification using deep neural networks. Neural Computing and Applications, 1-12.
19- Sheng, L., Xu, D., Ouyang, W., & Wang, X. (2019). Unsupervised collaborative learning of keyframe detection and visual odometry towards monocular deep slam. Paper presented at the Proceedings of the IEEE International Conference on Computer Vision.
20- Sze, K.-W., Lam, K.-M., & Qiu, G. (2005). A new key frame representation for video segment retrieval. IEEE transactions on circuits and systems for video technology, 15(9), 1148-1155.
21- Tan, W., Liu, H., Dong, Z., Zhang, G., & Bao, H. (2013). Robust monocular SLAM in dynamic environments. Paper presented at the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).
22- Zhang, Z., & Scaramuzza, D. (2018). A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry. Paper presented at the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
23- Zhuang, Y., Rui, Y., Huang, T. S., & Mehrotra, S. (1998). Adaptive key frame extraction using unsupervised clustering. Paper presented at the Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No. 98CB36269).