중국의 지능형 드라이빙 퓨전 알고리즘 산업(2024년)
China Intelligent Driving Fusion Algorithm Research Report, 2024
상품코드 : 1482383
리서치사 : ResearchInChina
발행일 : 2024년 05월
페이지 정보 : 영문 380 Pages
 라이선스 & 가격 (부가세 별도)
US $ 4,300 ₩ 6,197,000
Unprintable PDF (Single User License) help
PDF 보고서를 1명만 이용할 수 있는 라이선스입니다. 인쇄 불가능하며, 텍스트의 Copy&Paste도 불가능합니다.
US $ 6,400 ₩ 9,224,000
Printable & Editable PDF (Enterprise-wide License) help
PDF 보고서를 동일 기업의 모든 분이 이용할 수 있는 라이선스입니다. 인쇄 가능하며 인쇄물의 이용 범위는 PDF 이용 범위와 동일합니다.


한글목차

2023년 8월에 Musk가 FSD V12 Beta를 라이브 시승하고 나서, 2024년 3월에 FSD V12 Supervised를 30일간 무료 체험하기까지의 8개월간, 도시 NOA와 같은 선진의 지능형 드라이빙이 주요 OEM의 무대가 되기 시작해, 엔드 투 엔드 알고리즘, BEV Transformer 알고리즘, AI 기반 모델 알고리즘의 응용 사례가 점점 늘어나고 있습니다.

1. 스파스 알고리즘은 효율성을 향상시키고 지능형 드라이빙 비용을 절감합니다.

현재 대부분의 BEV 알고리즘은 고밀도이며 상당한 연산 능력과 스토리지를 소비합니다. 초당 30프레임 이상의 부드러움을 얻으려면 NVIDIA A100과 같은 비싼 컴퓨팅 리소스가 필요합니다. 그래도 2MP 카메라에서는 5-6대밖에 대응할 수 없습니다. 8MP 카메라에는 여러 H100 GPU와 같은 매우 비싼 리소스가 필요합니다.

우리의 현실 세계에는 희소한 특징이 있습니다. 스파스화는 센서가 노이즈를 줄이고 견고성을 향상시키는 데 도움이 됩니다. 게다가 거리가 길어지면 그리드는 희소해지고 밀집된 네트워크는 약 50m 이내에서만 유지할 수 있습니다. 쿼리와 특성의 상호작용을 줄임으로써 스파스 지각 알고리즘은 계산 속도를 높이고 필요한 스토리지를 줄이고 지각 모델의 계산 효율과 시스템 성능을 크게 개선하며 시스템 대기 시간을 단축하며 지각 정확도 범위를 확대하고 차속의 영향을 완화합니다.

그러므로 2021년부터 학계는 밀집한 그리드 기반 알고리즘이 아니라 스파스 타겟 수준의 알고리즘으로 이동했습니다. 장기적인 활동을 통해 스파스 타겟 레벨 알고리즘은 밀접한 그리드 기반 알고리즘과 거의 동등한 성능을 발휘할 수 있습니다. 업계는 희소 알고리즘을 계속 반복합니다. 최근 몇 년동안 Horizon Robotics는 Sparse4D를 오픈소스화했습니다. Sparse4D는 비전 전용 알고리즘으로 nuScenes의 비전 전용 3D 감지와 3D 추적 모두에서 1위를 차지합니다.

Sparse4D는 멀티뷰 임시 퓨전 지각 기술의 범위에 속하는 오랜 시간 계열 스파스 3D 타겟 검출을 목표로 하는 일련의 알고리즘입니다. 스파스 지각의 산업 발전 동향에 직면하여 Sparse4D는 순수한 스파스 퓨전 지각 프레임워크를 구축하여 지각 알고리즘을 보다 효율적이고 정확하게 만들고 지각 시스템을 간소화합니다. 비밀 BEV 알고리즘에 비해 Sparse4D는 계산 복잡성을 줄이고 지각 범위에서 계산 능력의 한계를 돌파하며 지각 효과와 추론 속도에서 비밀 BEV 알고리즘을 능가합니다.

스파스 알고리즘의 또 다른 중요한 이점은 센서에 대한 의존성을 줄이고 연산 능력 소비를 줄임으로써 지능형 드라이빙 솔루션의 비용을 줄이는 것입니다. 예를 들어, Megvii Technology는 BEV 알고리즘 최적화, 계산 능력 감소, HD 맵, RTK, LiDAR 삭제, 알고리즘 프레임워크 통일, 자동 주석 등 다양한 대책을 강구함으로써 PETR 시리즈 스파스 알고리즘에 우리는 지능형 드라이빙 솔루션의 비용을 시장의 기존 솔루션에 비해 20%-30% 절감했다고 말합니다.

2. 4D 알고리즘은 더 높은 정밀도를 제공하고 지능형 드라이빙의 신뢰성을 향상시킵니다.

OEM의 센서 구성에서 볼 수 있듯이 최근 3년간 지능형 드라이빙 기능과 응용 시나리오가 증가하여 그 어느 때보다 많은 센서가 탑재되게 되었습니다. 대부분의 도시형 NOA 솔루션은 10-12대의 카메라, 3-5대의 레이더, 12대의 초음파 레이더, 1-3대의 LiDAR을 탑재하고 있습니다.

센서가 증가함에 따라 이전보다 더 많은 지각 데이터가 생성됩니다. 데이터 활용을 향상시키는 방법은 OEM 및 알고리즘 공급자의 과제이기도 합니다. 각 회사의 알고리즘의 세부 사항은 약간 다르지만 현재 주류 BEV Transformer 솔루션의 일반적인 아이디어는 기본적으로 동일합니다.

본 보고서에서는 중국의 지능형 드라이빙 퓨전 알고리즘 산업에 대해 조사 분석하여 각사의 솔루션과 응용 사례와 연구 개발 동향 등을 정리했습니다.

목차

제1장 지능형 드라이빙 퓨전 알고리즘의 개요

제2장 엔드 투 엔드 알고리즘

제3장 BEV Transformer 기반 모델 알고리즘

제4장 데이터는 퓨전 알고리즘의 기초가 됩니다.

제5장 칩 벤더의 알고리즘

제6장 Tier 1·Tier 2 벤더 알고리즘

제7장 신흥 자동차 제조업체 및 OEM 알고리즘

제8장 L4 지능형 드라이빙의 로봇 택시 알고리즘

AJY
영문 목차

영문목차

Intelligent Driving Fusion Algorithm Research: sparse algorithms, temporal fusion and enhanced planning and control become the trend.

China Intelligent Driving Fusion Algorithm Research Report, 2024 released by ResearchInChina analyzes the status quo and trends of intelligent driving fusion algorithms (including perception, positioning, prediction, planning, decision, etc.), sorts out algorithm solutions and cases of chip vendors, OEMs, Tier1 & Tier2 suppliers and L4 algorithm providers, and summarizes the development trends of intelligent driving algorithms.

Since the period of eight months from Musk's live test drive of FSD V12 Beta in August 2023 to the 30-day free trial of FSD V12 Supervised in March 2024, advanced intelligent driving such as urban NOA has begun to become the arena of major OEMs, and there have been ever more application cases for end-to-end algorithms, BEV Transformer algorithms, and AI foundation model algorithms.

1. Sparse algorithms improve efficiency and reduce intelligent driving cost.

At present, most BEV algorithms are dense and consume considerable computing power and storage. The smoothness of more than 30 frames per second requires expensive computing resources such as NVIDIA A100. Even so, only 5 to 6 2MP cameras can be supported. For 8MP cameras, extremely expensive resources like multiple H100 GPUs are needed.

Our real world has sparse features. Sparsification helps sensors reduce noise and improve robustness. In addition, as distance increases, grids are bound to be sparse, and a dense network can only be maintained within about 50 meters. By reducing queries and feature interactions, sparse perception algorithms speed up calculations and lower storage requirements, greatly improve the computing efficiency and system performance of the perception model, shorten the system latency, expand the perception accuracy range, and ease the impact of vehicle speed.

Therefore, the academia has shifted to sparse target-level algorithms rather than dense grid-based algorithms since 2021. With long-term efforts, sparse target-level algorithms can perform almost as well as dense grid-based algorithms. The industry also keeps iterating sparse algorithms. Recently, Horizon Robotics has open-sourced Sparse4D, its vision-only algorithm which ranks first on both nuScenes vision-only 3D detection and 3D tracking lists.

Sparse4D is a series of algorithms moving towards long-time-sequence sparse 3D target detection, belonging to the scope of multi-view temporal fusion perception technology. Facing the industry development trend of sparse perception, Sparse4D builds a pure sparse fusion perception framework, which makes perception algorithms more efficient and accurate and simplifies perception systems. Compared with dense BEV algorithms, Sparse4D reduces the computational complexity, breaks the limit of computing power on the perception range, and outperforms dense BEV algorithms in perception effect and reasoning speed.

Another significant advantage of sparse algorithms is to cut down the cost of intelligent driving solutions by reducing dependence on sensors and consuming less computing power. For example, Megvii Technology mentioned that taking a range of measures, for example, optimizing the BEV algorithm, reducing computing power, removing HD maps, RTK and LiDAR, unifying the algorithm framework, and automatic annotation, it has lowered the costs of its intelligent driving solutions based on PETR series sparse algorithms by 20%-30%, compared with conventional solutions on the market.

2. 4D algorithms offer higher accuracy and make intelligent driving more reliable.

As seen from the sensor configurations of OEMs, in recent three years ever more sensors have been installed, with increasing intelligent driving functions and application scenarios. Most urban NOA solutions are equipped with 10-12 cameras, 3-5 radars, 12 ultrasonic radars and 1-3 LiDARs.

With the increasing number of sensors, ever more perception data are generated. How to improve the utilization of the data is also placed on the agenda of OEMs and algorithm providers. Although the algorithm details of companies are a little different, the general ideas of the current mainstream BEV Transformer solutions are basically the same: conversion from 2D to 3D and then to 4D.

Temporal fusion can greatly improve the algorithm continuity, and the memory of obstacles can handle occlusion and allows for better perception the speed information. The memory of road signs can improve the driving safety and the accuracy of vehicle behavior prediction. The fusion of information from historical frames can improve the perception accuracy of the current object, while the fusion of information from future frames can verify the object perception accuracy, thereby enhancing the algorithm reliability and accuracy.

Tesla's Occupancy Network algorithm is a typical 4D algorithm.

Tesla adds the height information to the vector space of 2D BEV+ temporal information output by the original Transformer algorithm to build the 4D space representation form of 3D BEV + temporal information. The network runs every 10ms on the FSD, that is, it runs at 100FPS, which greatly improves the speed of model detection.

3. End-to-end algorithms integrating perception, planning and control enable more anthropomorphic intelligent driving.

Mainstream intelligent driving algorithms have adopted the "BEV+Transformer" architecture, and many innovative perception algorithms have emerged. However, rule-based algorithms still prevail among planning and control algorithms. Some OEMs face technical and practical challenges in both perception and planning & control systems, which are sometimes in a "split" state. In some complex scenarios, the perception module may fail to accurately recognize or understand the environmental information, and the decision module may make incorrect driving decisions due to improper handling of the perception results or algorithm limitations. This restricts the development of advanced intelligent driving to some extent.

UniAD, an end-to-end intelligent driving algorithm jointly released by SenseTime, OpenDriveLab and Horizon Robotics, was rated as the Best Paper in CVPR2023. UniAD integrates three main tasks (perception, prediction and planning) and six sub-tasks (target detection, target tracking, scene mapping, trajectory prediction, grid prediction and path planning) into a unified end-to-end network framework based on Transformer for the first time to attain a general model of full-stack task-critical driving. Under the nuScenes real scene dataset, UniAD performs all tasks best in the field, especially in terms of the prediction and planning results far better the previous best solution.

The basic end-to-end algorithm enables direct inputs from sensors and predictive control outputs, but it is difficult to optimize, because of lacking effective feature communication between network modules and effective interaction between tasks and needing to output results in phases. The decision-oriented perception and decision integrated design proposed by the UniAD algorithm uses token features for deep fusion according to the perception-prediction-decision process, so that the indicators of all tasks targeting decision are consistently improved.

In terms of planning and control algorithms, Tesla adopts an approach of interactive search + evaluation model to enable a comfortable and effective algorithm that combines conventional search algorithms with artificial intelligence:

Firstly, candidate objects are obtained according to lane lines, occupancy networks and obstacles, and then decision trees and candidate object sequences are generated.

The trajectory for reaching the above objects is constructed synchronously using conventional search and neural networks;

The interaction between the vehicle and other participants in the scene is predicted to form a new trajectory. After multiple evaluations, the final trajectory is selected. During the trajectory generation, Tesla applies conventional search algorithms and neural networks, and then scores the generated trajectory according to collision check, comfort analysis, the possibility of the driver taking over and the similarity with people, to finally decide the implementation strategy.

XBrain, the ultimate architecture of Xpeng's all-scenario intelligent driving, is composed of XNet 2.0, a deep vision neural network, and XPlanner, a planning and control module based on a neural network. XPlanner is a planning and control algorithm based on a neural network, with the following features:

Rule algorithm

Long time sequence (minute-level)

Multi-object (multi-agent decision, gaming capability)

Strong reasoning

The previous advanced algorithms and ADAS functional architectures were separated and consisted of many small logic planning and control algorithms for sub-scenes, while XPlanner has a unified planning and control algorithm architecture. XPlanner is supported by a foundation model and a large number of extreme driving scenes for simulation training, thus ensuring that it can cope with various complex situations.

Table of Contents

1 Overview of Intelligent Driving Fusion Algorithms

2 End-to-end Algorithms

3 BEV Transformer Foundation Model Algorithms

4 Data Is the Cornerstone of Fusion Algorithms

5 Algorithms of Chip Vendors

6 Algorithms of Tier 1 & Tier 2 Vendors

7 Algorithms of Emerging Automakers and OEMs

8 Robtaxi Algorithms of L4 Intelligent Driving

(주)글로벌인포메이션 02-2025-2992 kr-info@giikorea.co.kr
ⓒ Copyright Global Information, Inc. All rights reserved.
PC버전 보기