research-article
Open access
Authors: Runkai Zhao, Heng Wang, Weidong Cai
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
Pages 4283 - 4291
Published: 28 October 2024 Publication History
Metrics
Total Citations0Total Downloads47Last 12 Months47
Last 6 weeks47
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
PDFeReader
- View Options
- References
- Media
- Tables
- Share
Abstract
Detecting 3D lane lines from monocular images is garnering increasing attention in the Autonomous Driving (AD) area due to its cost-effective edge. However, current monocular image models capture road scenes lacking 3D spatial awareness, which is error-prone to adverse circumstance changes. In this work, we design a novel cross-modal knowledge transfer scheme, namely LaneCMKT, to address this challenge by transferring 3D geometric cues learned from a pre-trained LiDAR model to the image model. Performing on the unified Bird's-Eye-View (BEV) grid, our monocular image model acts as a student network and benefits from the spatial guidance of the 3D LiDAR teacher model over the intermediate feature space. Since LiDAR points and image pixels are intrinsically two different modalities, to facilitate such heterogeneous feature transfer learning at matching levels, we propose a dual-path knowledge transfer mechanism. We divide the feature space into shallow and deep paths where the image student model is prompted to focus on lane-favored geometric cues from the LiDAR teacher model. We conduct extensive experiments and thorough analysis on the large-scale public benchmark OpenLane. Our model achieves notable improvements over the image baseline by 5.3% and the current BEV-driven SoTA method by 2.7% in the F1 score, without introducing any extra computational overhead. We also observe that the 3D abilities grabbed from the teacher model are critical for dealing with complex spatial lane properties from a 2D perspective.
References
[1]
Yifeng Bai, Zhirong Chen, Zhangjie Fu, Lang Peng, Pengpeng Liang, and Erkang Cheng. 2023. Curveformer: 3D lane detection by curve propagation with curve queries and attention. In IEEE International Conference on Robotics and Automation (ICRA). IEEE, 7062--7068.
[2]
Yifeng Bai, Zhirong Chen, Pengpeng Liang, and Erkang Cheng. 2024. CurveFormer: 3D lane detection by curve propagation with temporal curve queries and attention. arXiv preprint arXiv:2402.06423 (2024).
[3]
Jingwei Cao, Chuanxue Song, Shixin Song, Feng Xiao, and Silun Peng. 2019. Lane Detection Algorithm for Intelligent Vehicles in Complex Road Conditions and Dynamic Environments. Sensors, Vol. 19, 14 (2019). https://doi.org/10.3390/s19143166
[4]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision (ECCV). Springer, 213--229.
Digital Library
[5]
Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, et al. 2022. Persformer: 3D lane detection via perspective transformer and the openlane benchmark. In European Conference on Computer Vision (ECCV). Springer, 550--567.
Digital Library
[6]
Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang, and Feng Zhao. 2022. BEVDistill: Cross-modal bev distillation for multi-view 3D object detection. In International Conference on Learning Representations (ICLR).
[7]
Zhiyu Chong, Xinzhu Ma, Hong Zhang, Yuxin Yue, Haojie Li, Zhihui Wang, and Wanli Ouyang. 2022. Monodistill: Learning spatial features for monocular 3D object detection. arXiv preprint arXiv:2201.10830.
[8]
Noa Garnett, Rafi Cohen, Tomer Pe'er, Roee Lahav, and Dan Levi. 2019. 3D-lanenet: End-to-end 3D multiple lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2921--2930.
[9]
Yuliang Guo, Guang Chen, Peitao Zhao, Weide Zhang, Jinghao Miao, Jingao Wang, and Tae Eun Choe. 2020. Gen-lanenet: A generalized and scalable approach for 3D lane detection. In European Conference on Computer Vision (ECCV). Springer, 666--681.
Digital Library
[10]
Yu Hong, Hang Dai, and Yong Ding. 2022. Cross-modality knowledge distillation network for monocular 3D object detection. In European Conference on Computer Vision (ECCV). Springer, 87--104.
Digital Library
[11]
Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, and Winston H Hsu. 2022. Monodtr: Monocular 3D object detection with depth-aware transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4012--4021.
[12]
Shaofei Huang, Zhenwei Shen, Zehao Huang, Zi-han Ding, Jiao Dai, Jizhong Han, Naiyan Wang, and Si Liu. 2023. Anchor3dlane: Learning to regress 3D anchors for monocular 3D lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17451--17460.
[13]
Chen Jiang, Hong Liu, Xuzheng Yu, Qing Wang, Yuan Cheng, Jia Xu, Zhongyi Liu, Qingpei Guo, Wei Chu, Ming Yang, et al. 2023. Dual-modal attention-enhanced text-video retrieval with triplet partial margin contrastive learning. In Proceedings of ACM International Conference on Multimedia (MM). 4626--4636.
Digital Library
[14]
Haoxiang Jie, Xinyi Zuo, Jian Gao, Wei Liu, Jun Hu, and Shuai Cheng. 2023. Llformer: An efficient and real-time lidar lane detection method based on transformer. In Proceedings of International Conference on Pattern Recognition and Intelligent Systems (PRIS). 18--23.
Digital Library
[15]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[16]
Yunshi Lan, Xiang Li, Xin Liu, Yang Li, Wei Qin, and Weining Qian. 2023. Improving zero-shot visual question answering via large language models with reasoning question prompts. In Proceedings of ACM International Conference on Multimedia (MM). 4389--4400.
Digital Library
[17]
Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12697--12705.
[18]
Chenguang Li, Jia Shi, Ya Wang, and Guangliang Cheng. 2022. Reconstruct from top view: A 3D lane detection approach based on geometry structure prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4370--4379.
[19]
Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, Junchi Yan, Ping Luo, and Hongyang Li. 2023. Graph-based topology reasoning for driving scenes. arXiv preprint arXiv:2304.05277.
[20]
Tianyu Li, Peijin Jia, Bangjun Wang, Li Chen, Kun Jiang, Junchi Yan, and Hongyang Li. 2023. LaneSegNet: Map learning with lane segment perception for autonomous driving. In International Conference on Learning Representations (ICLR).
[21]
Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Aqi Gao, Tiancai Wang, and Xiangyu Zhang. 2023. Petrv2: A unified framework for 3D perception from multi-camera images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR). 3262--3272.
[22]
Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983.
[23]
Yueru Luo, Shuguang Cui, and Zhen Li. 2023. DV-3DLane: End-to-end multi-modal 3D lane detection with dual-view representation. In International Conference on Learning Representations (ICLR).
[24]
Yueru Luo, Xu Yan, Chaoda Zheng, Chao Zheng, Shuqi Mei, Tang Kun, Shuguang Cui, and Zhen Li. 2022. M2--3DLaneNet: Exploring Multi-Modal 3D Lane Detection. arXiv preprint arXiv:2209.05996.
[25]
Yueru Luo, Chaoda Zheng, Xu Yan, Tang Kun, Chao Zheng, Shuguang Cui, and Zhen Li. 2023. Latr: 3D lane detection from monocular images with transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 7941--7952.
[26]
Dong-Hee Paek, Seung-Hyun Kong, and Kevin Tirta Wijaya. 2022. K-Lane: lidar lane dataset and benchmark for urban roads and highways. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW).
[27]
Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3967--3976.
[28]
Mary Phuong and Christoph H Lampert. 2019. Distillation-based training for multi-exit architectures. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1355--1364.
[29]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML). PMLR, 8748--8763.
[30]
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. 2020. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2446--2454.
[31]
Pin Tang, Hai-Ming Xu, and Chao Ma. 2023. ProtoTransfer: Cross-modal prototype transfer for point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3337--3347.
[32]
Frederick Tung and Greg Mori. 2019. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1365--1374.
[33]
Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, and Weidong Cai. 2024. V2a-mapper: A lightweight solution for vision-to-audio generation by connecting foundation models. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 38. 15492--15501.
[34]
Heng Wang, Chaoyi Zhang, Jianhui Yu, and Weidong Cai. 2022. Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22.
[35]
Jingke Wang, Tengju Ye, Ziqing Gu, and Junbo Chen. 2022. Ltp: Lane-based trajectory prediction for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17134--17142.
[36]
Ruihao Wang, Jian Qin, Kaiying Li, Yaochen Li, Dong Cao, and Jintao Xu. 2023 d. Bev-lanedet: An efficient 3D lane detection based on virtual camera via key-points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1002--1011.
[37]
Song Wang, Wentong Li, Wenyu Liu, Xiaolu Liu, and Jianke Zhu. 2023. LiDAR2Map: In defense of liDAR-Based semantic map construction using online camera distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5186--5195.
[38]
Yuanbin Wang, Shaofei Huang, Yulu Gao, Zhen Wang, Rui Wang, Kehua Sheng, Bo Zhang, and Si Liu. 2023. Transferring CLIP's knowledge into zero-shot point cloud semantic segmentation. In Proceedings of ACM International Conference on Multimedia (MM). 3745--3754.
Digital Library
[39]
Zeyu Wang, Dingwen Li, Chenxu Luo, Cihang Xie, and Xiaodong Yang. 2023. Distillbev: Boosting multi-camera 3D object detection with cross-modal knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8637--8646.
[40]
Yao Wu, Mingwei Xing, Yachao Zhang, Yuan Xie, Jianping Fan, Zhongchao Shi, and Yanyun Qu. 2023. Cross-modal unsupervised domain adaptation for 3D semantic segmentation via bidirectional fusion-then-distillation. In Proceedings of ACM International Conference on Multimedia (MM). 490--498.
Digital Library
[41]
Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shuguang Cui, and Zhen Li. 2022. 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. In European Conference on Computer Vision (ECCV). Springer, 677--695.
Digital Library
[42]
Chengtang Yao, Lidong Yu, Yuwei Wu, and Yunde Jia. 2023. Sparse Point Guided 3D Lane Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8363--8372.
[43]
Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3713--3722.
[44]
Runkai Zhao, Yuwen Heng, Heng Wang, Yuanda Gao, Shilei Liu, Changhao Yao, Jiawen Chen, and Weidong Cai. 2024. Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development. In 2024 IEEE International Conference on Robotics and Automation (ICRA). 5382--5388. https://doi.org/10.1109/ICRA57147.2024.10610087
[45]
Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, and Hai Jin. 2023. Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning. In Proceedings of ACM International Conference on Multimedia (MM). 6311--6320.
Digital Library
[46]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159.
Index Terms
LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer
Computing methodologies
Artificial intelligence
Computer vision
Machine learning
Learning paradigms
Multi-task learning
Transfer learning
Recommendations
- Lane Detection and Kalman-Based Linear-Parabolic Lane Tracking
IHMSC '09: Proceedings of the 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02
This paper presents a lane detection and linear-parabolic lane tracking system using kalman filtering method. First, the image horizon is detected in a traffic scene to split the sky and road region. Road region is further analyzed with entropy method ...
Read More
- A lane detection and tracking method for driver assistance system
KES'11: Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
In driver assistance systems, lane detection and tracking are very crucial treatments to locate the vehicle and to track its position on the road. The aim of this study is to propose lane detection and tracking method. The first step in this method ...
Read More
- Driving Lane Detection on Smartphones using Deep Neural Networks
Current smartphone-based navigation applications fail to provide lane-level information due to poor GPS accuracy. Detecting and tracking a vehicle’s lane position on the road assists in lane-level navigation. For instance, it would be important to know ...
Read More
Comments
Information & Contributors
Information
Published In
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
- General Chairs:
- Jianfei Cai
Monash University, Australia
, - Mohan Kankanhalli
NUS, Singapore
, - Balakrishnan Prabhakaran
UT Dallas, USA
, - Susanne Boll
University of Oldenburg, Germany
, - Program Chairs:
- Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
, - Liang Zheng
Australian National University, Australia
, - Vivek K. Singh
Rutgers University, USA
, - Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
, - Lexing Xie
Australian National University, Australia
, - Dong Xu
University of Hong Kong, Hong Kong
Copyright © 2024 Owner/Author.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.
Sponsors
- SIGMM: ACM Special Interest Group on Multimedia
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 28 October 2024
Check for updates
Author Tags
- 3d vision
- lane detection
- multi-modality
- transfer learning
Qualifiers
- Research-article
Conference
MM '24
Sponsor:
- SIGMM
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia
Acceptance Rates
MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%
More
Contributors
Other Metrics
View Article Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
Total Citations
47
Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)47
Reflects downloads up to 25 Nov 2024
Other Metrics
View Author Metrics
Citations
View Options
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
Media
Figures
Other
Tables