https://news.ycombinator.com/item?id=15945906
The process of autonomy, at every time instant, may be broken down into the following: 1) sensor observation, 2) perception, 3) intent modeling, 4) path planning, 5) control action. 1) Sensor observation is the collection of video, radar, Lidar etc. data. 2) Perception is the interpretation of that sensor data into a meaningful representation of the 3D environment, both static and dynamic, tasks like object detection, localization, tracking, semantic understanding (think of it like computing a physics enginge for the world). 3) Intent modeling is the prediction of what the moving objects might do in the future (e.g. is that car just drifting a bit, or it is about to merge into my lane?) 4) Given the outcome of 2) and 3), path planning is answering the question of where should I plan to drive the car through my estimation of the environment and how it might change? 5) Control is the execution of the planned path, by manipulating the steering wheel, gas and brake etc. Of the different aspects of autonomy, perception and intent modeling are the unsolved pieces, with the other aspects being relatively well understood. The quality of your sensors (resolution, dynamic range, depth range for Lidar/radar etc) affect the difficulty of the perception task, as does computational power, but even with perfect sensors and high compute the problem is difficult (recognizing the difference between a rock and a crumpled piece of paper requires algorithmic processing of sensor data). The difficulty of perception is best illustrated by pointing to the field of computer vision, which is essentially focused on solving that problem. What seems easy to a human is quite hard for a computer, but really it’s only easy at the conscious level, while in fact 70% of the human brain is dedicated to solving the vision problem at any given time. All the steps after perception rely crucially on it. If perception were perfectly solved, intent modeling is also a difficult problem, but it is relatively easier than perception, as it involves reasoning in a lower dimensional state-action space, albeit with partial information. To make a comparison, intent modeling for diving in urban environments is perhaps harder than beating humans at Go, and may be as hard as beating humans at poker. If perception and intent modeling are solved, the execution of path planning and control is relatively well understood. To summarize, the main issues are perception and intent modeling, and these are fundamentally difficult AI problems. So the main thing holding back GM/Volvo/Google is algorithms.