LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer | Proceedings of the 32nd ACM International Conference on Multimedia (2024)

research-article

Open access

Authors: Runkai Zhao, Heng Wang, Weidong Cai

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 4283 - 4291

Published: 28 October 2024 Publication History

Metrics

Total Citations0Total Downloads47

Last 12 Months47

Last 6 weeks47

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

Manage my Alerts

New Citation Alert!

Please log in to your account

PDFeReader

    • View Options
    • References
    • Media
    • Tables
    • Share

Abstract

Detecting 3D lane lines from monocular images is garnering increasing attention in the Autonomous Driving (AD) area due to its cost-effective edge. However, current monocular image models capture road scenes lacking 3D spatial awareness, which is error-prone to adverse circumstance changes. In this work, we design a novel cross-modal knowledge transfer scheme, namely LaneCMKT, to address this challenge by transferring 3D geometric cues learned from a pre-trained LiDAR model to the image model. Performing on the unified Bird's-Eye-View (BEV) grid, our monocular image model acts as a student network and benefits from the spatial guidance of the 3D LiDAR teacher model over the intermediate feature space. Since LiDAR points and image pixels are intrinsically two different modalities, to facilitate such heterogeneous feature transfer learning at matching levels, we propose a dual-path knowledge transfer mechanism. We divide the feature space into shallow and deep paths where the image student model is prompted to focus on lane-favored geometric cues from the LiDAR teacher model. We conduct extensive experiments and thorough analysis on the large-scale public benchmark OpenLane. Our model achieves notable improvements over the image baseline by 5.3% and the current BEV-driven SoTA method by 2.7% in the F1 score, without introducing any extra computational overhead. We also observe that the 3D abilities grabbed from the teacher model are critical for dealing with complex spatial lane properties from a 2D perspective.

References

[1]

Yifeng Bai, Zhirong Chen, Zhangjie Fu, Lang Peng, Pengpeng Liang, and Erkang Cheng. 2023. Curveformer: 3D lane detection by curve propagation with curve queries and attention. In IEEE International Conference on Robotics and Automation (ICRA). IEEE, 7062--7068.

[2]

Yifeng Bai, Zhirong Chen, Pengpeng Liang, and Erkang Cheng. 2024. CurveFormer: 3D lane detection by curve propagation with temporal curve queries and attention. arXiv preprint arXiv:2402.06423 (2024).

[3]

Jingwei Cao, Chuanxue Song, Shixin Song, Feng Xiao, and Silun Peng. 2019. Lane Detection Algorithm for Intelligent Vehicles in Complex Road Conditions and Dynamic Environments. Sensors, Vol. 19, 14 (2019). https://doi.org/10.3390/s19143166

[4]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision (ECCV). Springer, 213--229.

Digital Library

[5]

Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, et al. 2022. Persformer: 3D lane detection via perspective transformer and the openlane benchmark. In European Conference on Computer Vision (ECCV). Springer, 550--567.

Digital Library

[6]

Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang, and Feng Zhao. 2022. BEVDistill: Cross-modal bev distillation for multi-view 3D object detection. In International Conference on Learning Representations (ICLR).

[7]

Zhiyu Chong, Xinzhu Ma, Hong Zhang, Yuxin Yue, Haojie Li, Zhihui Wang, and Wanli Ouyang. 2022. Monodistill: Learning spatial features for monocular 3D object detection. arXiv preprint arXiv:2201.10830.

[8]

Noa Garnett, Rafi Cohen, Tomer Pe'er, Roee Lahav, and Dan Levi. 2019. 3D-lanenet: End-to-end 3D multiple lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2921--2930.

[9]

Yuliang Guo, Guang Chen, Peitao Zhao, Weide Zhang, Jinghao Miao, Jingao Wang, and Tae Eun Choe. 2020. Gen-lanenet: A generalized and scalable approach for 3D lane detection. In European Conference on Computer Vision (ECCV). Springer, 666--681.

Digital Library

[10]

Yu Hong, Hang Dai, and Yong Ding. 2022. Cross-modality knowledge distillation network for monocular 3D object detection. In European Conference on Computer Vision (ECCV). Springer, 87--104.

Digital Library

[12]

Shaofei Huang, Zhenwei Shen, Zehao Huang, Zi-han Ding, Jiao Dai, Jizhong Han, Naiyan Wang, and Si Liu. 2023. Anchor3dlane: Learning to regress 3D anchors for monocular 3D lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17451--17460.

[13]

Chen Jiang, Hong Liu, Xuzheng Yu, Qing Wang, Yuan Cheng, Jia Xu, Zhongyi Liu, Qingpei Guo, Wei Chu, Ming Yang, et al. 2023. Dual-modal attention-enhanced text-video retrieval with triplet partial margin contrastive learning. In Proceedings of ACM International Conference on Multimedia (MM). 4626--4636.

Digital Library

[14]

Haoxiang Jie, Xinyi Zuo, Jian Gao, Wei Liu, Jun Hu, and Shuai Cheng. 2023. Llformer: An efficient and real-time lidar lane detection method based on transformer. In Proceedings of International Conference on Pattern Recognition and Intelligent Systems (PRIS). 18--23.

Digital Library

[15]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[16]

Yunshi Lan, Xiang Li, Xin Liu, Yang Li, Wei Qin, and Weining Qian. 2023. Improving zero-shot visual question answering via large language models with reasoning question prompts. In Proceedings of ACM International Conference on Multimedia (MM). 4389--4400.

Digital Library

[17]

Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12697--12705.

[18]

Chenguang Li, Jia Shi, Ya Wang, and Guangliang Cheng. 2022. Reconstruct from top view: A 3D lane detection approach based on geometry structure prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4370--4379.

[19]

Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, Junchi Yan, Ping Luo, and Hongyang Li. 2023. Graph-based topology reasoning for driving scenes. arXiv preprint arXiv:2304.05277.

[20]

Tianyu Li, Peijin Jia, Bangjun Wang, Li Chen, Kun Jiang, Junchi Yan, and Hongyang Li. 2023. LaneSegNet: Map learning with lane segment perception for autonomous driving. In International Conference on Learning Representations (ICLR).

[21]

Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Aqi Gao, Tiancai Wang, and Xiangyu Zhang. 2023. Petrv2: A unified framework for 3D perception from multi-camera images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR). 3262--3272.

[22]

Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983.

[23]

Yueru Luo, Shuguang Cui, and Zhen Li. 2023. DV-3DLane: End-to-end multi-modal 3D lane detection with dual-view representation. In International Conference on Learning Representations (ICLR).

[24]

Yueru Luo, Xu Yan, Chaoda Zheng, Chao Zheng, Shuqi Mei, Tang Kun, Shuguang Cui, and Zhen Li. 2022. M2--3DLaneNet: Exploring Multi-Modal 3D Lane Detection. arXiv preprint arXiv:2209.05996.

[25]

Yueru Luo, Chaoda Zheng, Xu Yan, Tang Kun, Chao Zheng, Shuguang Cui, and Zhen Li. 2023. Latr: 3D lane detection from monocular images with transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 7941--7952.

[26]

Dong-Hee Paek, Seung-Hyun Kong, and Kevin Tirta Wijaya. 2022. K-Lane: lidar lane dataset and benchmark for urban roads and highways. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW).

[27]

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3967--3976.

[28]

Mary Phuong and Christoph H Lampert. 2019. Distillation-based training for multi-exit architectures. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1355--1364.

[29]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML). PMLR, 8748--8763.

[30]

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. 2020. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2446--2454.

[31]

Pin Tang, Hai-Ming Xu, and Chao Ma. 2023. ProtoTransfer: Cross-modal prototype transfer for point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3337--3347.

[32]

Frederick Tung and Greg Mori. 2019. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1365--1374.

[33]

Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, and Weidong Cai. 2024. V2a-mapper: A lightweight solution for vision-to-audio generation by connecting foundation models. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 38. 15492--15501.

[34]

Heng Wang, Chaoyi Zhang, Jianhui Yu, and Weidong Cai. 2022. Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22.

[35]

Jingke Wang, Tengju Ye, Ziqing Gu, and Junbo Chen. 2022. Ltp: Lane-based trajectory prediction for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17134--17142.

[36]

Ruihao Wang, Jian Qin, Kaiying Li, Yaochen Li, Dong Cao, and Jintao Xu. 2023 d. Bev-lanedet: An efficient 3D lane detection based on virtual camera via key-points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1002--1011.

[37]

Song Wang, Wentong Li, Wenyu Liu, Xiaolu Liu, and Jianke Zhu. 2023. LiDAR2Map: In defense of liDAR-Based semantic map construction using online camera distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5186--5195.

[38]

Yuanbin Wang, Shaofei Huang, Yulu Gao, Zhen Wang, Rui Wang, Kehua Sheng, Bo Zhang, and Si Liu. 2023. Transferring CLIP's knowledge into zero-shot point cloud semantic segmentation. In Proceedings of ACM International Conference on Multimedia (MM). 3745--3754.

Digital Library

[39]

Zeyu Wang, Dingwen Li, Chenxu Luo, Cihang Xie, and Xiaodong Yang. 2023. Distillbev: Boosting multi-camera 3D object detection with cross-modal knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8637--8646.

[40]

Yao Wu, Mingwei Xing, Yachao Zhang, Yuan Xie, Jianping Fan, Zhongchao Shi, and Yanyun Qu. 2023. Cross-modal unsupervised domain adaptation for 3D semantic segmentation via bidirectional fusion-then-distillation. In Proceedings of ACM International Conference on Multimedia (MM). 490--498.

Digital Library

[41]

Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shuguang Cui, and Zhen Li. 2022. 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. In European Conference on Computer Vision (ECCV). Springer, 677--695.

Digital Library

[42]

Chengtang Yao, Lidong Yu, Yuwei Wu, and Yunde Jia. 2023. Sparse Point Guided 3D Lane Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8363--8372.

[43]

Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3713--3722.

[44]

Runkai Zhao, Yuwen Heng, Heng Wang, Yuanda Gao, Shilei Liu, Changhao Yao, Jiawen Chen, and Weidong Cai. 2024. Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development. In 2024 IEEE International Conference on Robotics and Automation (ICRA). 5382--5388. https://doi.org/10.1109/ICRA57147.2024.10610087

[45]

Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, and Hai Jin. 2023. Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning. In Proceedings of ACM International Conference on Multimedia (MM). 6311--6320.

Digital Library

[46]

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159.

Index Terms

  1. LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer

    1. Computing methodologies

      1. Artificial intelligence

        1. Computer vision

        2. Machine learning

          1. Learning paradigms

            1. Multi-task learning

              1. Transfer learning

      Recommendations

      • Lane Detection and Kalman-Based Linear-Parabolic Lane Tracking

        IHMSC '09: Proceedings of the 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

        This paper presents a lane detection and linear-parabolic lane tracking system using kalman filtering method. First, the image horizon is detected in a traffic scene to split the sky and road region. Road region is further analyzed with entropy method ...

        Read More

      • A lane detection and tracking method for driver assistance system

        KES'11: Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I

        In driver assistance systems, lane detection and tracking are very crucial treatments to locate the vehicle and to track its position on the road. The aim of this study is to propose lane detection and tracking method. The first step in this method ...

        Read More

      • Driving Lane Detection on Smartphones using Deep Neural Networks

        Current smartphone-based navigation applications fail to provide lane-level information due to poor GPS accuracy. Detecting and tracking a vehicle’s lane position on the road assists in lane-level navigation. For instance, it would be important to know ...

        Read More

      Comments

      Information & Contributors

      Information

      Published In

      LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer | Proceedings of the 32nd ACM International Conference on Multimedia (4)

      MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

      October 2024

      11719 pages

      ISBN:9798400706868

      DOI:10.1145/3664647

      • General Chairs:
      • Jianfei Cai

        Monash University, Australia

        ,
      • Mohan Kankanhalli

        NUS, Singapore

        ,
      • Balakrishnan Prabhakaran

        UT Dallas, USA

        ,
      • Susanne Boll

        University of Oldenburg, Germany

        ,
      • Program Chairs:
      • Ramanathan Subramanian

        University of Canberra & IIT Ropar, Australia

        ,
      • Liang Zheng

        Australian National University, Australia

        ,
      • Vivek K. Singh

        Rutgers University, USA

        ,
      • Pablo Cesar

        Centrum Wiskunde & Informatica, Netherlands

        ,
      • Lexing Xie

        Australian National University, Australia

        ,
      • Dong Xu

        University of Hong Kong, Hong Kong

      Copyright © 2024 Owner/Author.

      This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

      Sponsors

      • SIGMM: ACM Special Interest Group on Multimedia

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2024

      Check for updates

      Author Tags

      1. 3d vision
      2. lane detection
      3. multi-modality
      4. transfer learning

      Qualifiers

      • Research-article

      Conference

      MM '24

      Sponsor:

      • SIGMM

      MM '24: The 32nd ACM International Conference on Multimedia

      October 28 - November 1, 2024

      Melbourne VIC, Australia

      Acceptance Rates

      MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      More

      Contributors

      LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer | Proceedings of the 32nd ACM International Conference on Multimedia (15)

      Other Metrics

      View Article Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Total Citations

      • 47

        Total Downloads

      • Downloads (Last 12 months)47
      • Downloads (Last 6 weeks)47

      Reflects downloads up to 25 Nov 2024

      Other Metrics

      View Author Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      Get this Publication

      Media

      Figures

      Other

      Tables

      LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer | Proceedings of the 32nd ACM International Conference on Multimedia (2024)
      Top Articles
      Latest Posts
      Recommended Articles
      Article information

      Author: Chrissy Homenick

      Last Updated:

      Views: 6584

      Rating: 4.3 / 5 (54 voted)

      Reviews: 93% of readers found this page helpful

      Author information

      Name: Chrissy Homenick

      Birthday: 2001-10-22

      Address: 611 Kuhn Oval, Feltonbury, NY 02783-3818

      Phone: +96619177651654

      Job: Mining Representative

      Hobby: amateur radio, Sculling, Knife making, Gardening, Watching movies, Gunsmithing, Video gaming

      Introduction: My name is Chrissy Homenick, I am a tender, funny, determined, tender, glorious, fancy, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.