Reactive vs Predictive AI

This post is an extension of my previous post on Statistics vs Dynamics in machine learning. I'll try to expand here on what I think is the key missing ingredient (possibly not the only one) for efforts such as a self driving car or other robotic projects that are aimed at unrestricted environments.

The way the problem of control in machine learning is approached today is by end-to-end training of motor command based on sensory input (such as e.g. here).  The authors argue that the optimisation algorithm will do a better job than explicitly breaking down the task into perceptual/planning submodules because it can do everything at once. This logic is influenced by behaviourism and the observation that humans essentially appear to do the same thing - map sensory input onto motor command.

This approach is flawed, as I will try to explain in paragraphs to follow.

What looks like direct sensory to motor mapping maybe be a lot more complex

Looking naively at a human performing a task of let's say driving a car, one may think that the human performs the task of matching what he currently sees onto a motor command. This certainly looks like that, because if we treat the brain as a black box, then what goes in are the visual/auditory stimuli and what comes out are muscle movements. However much more may be going on, consider the following hypothesis:

The brain predicts the state of the environment for say up to 20 seconds ahead. Part of the prediction involves generating multiple paths of behaviour, branches that lead to dangerous and unwanted outcomes are pruned. A quick selection of branches which generate favourable outcome are selected. Favourable can be defined as say avoiding situation that may lead to danger.  Ultimately the current motor plan is updated and a motor command is generated for "here and now" to proceed according to the selected trajectory.

From the outside we cannot see the details of this complex process. What we can see though is what appears to be direct sensory->motor mapping. So we can try to make a naive association of this empirically observed mapping with something like deep learning. The fact that this mapping will statistically seem to work, is because statistically the next 20 seconds of future is very regular (on a nice freeway without surprises at least), so the particular motor commands (which truly are parts of the more complex predictive trajectory) can be reasonably approximated from the here and now visual input (and perhaps a bit of history). Except for some relatively few motor commands which cannot. Which are those? Exactly the ones that rely more on the predictive model and the anticipation of an outcome that cannot be easily statistically associated with the here and now sensory input. In other words the commands that rely on the future rather than the present. These are actually crucial, because those are the commands generated in a "irregular" situation. Irregular not now, but irregular in 20 seconds. Those are the commands that save life, but they are statistically insignificant because life threatening situations are relatively rare. The naive statistical model appears to work great even if it is completely clueless about these crucial situations.

Fooled by statistics

The above example shows how we can be fooled by statistics, exactly because statistics works great when it achieves 99% in a statistical measure. But this is unlike life, where a trajectory that ends in disaster is not great at all, even if 99% the trajectory is fine and the disaster appears only in the final 1%.

Throw more data and expect miracles

What is almost everyone doing today to fight  the proverbial 1% of problematic situations? Throw more data at the naive model. That is temptingly simple and since people treat machine learning models as black boxes, they expect them to figure out from the here and now the stuff that the brain is only able to figure out because it knows how here and now may result in danger in the near future. However this requires a huge amount of priors about "how the world works" which the statistical machine learning models cannot gather (part of why is because it is not what they are optimised for). So the miracles will not come (instead an AI winter is likely).

Reactive and predictive

So there is a subtle difference. Predictive systems may from the outside look like reactive systems and in fact can be statistically very well approximated by naive reactive associations. But that does not change the fact, that the remaining statistically insignificant outliers are actually the most significant data points for survival. PVM is an attempt to build a large, scalable predictive system. By a liberal abuse of terminology, the role of PVM is to build the forward model of both the world and the agent (typically forward model refers to the model of the agent itself). Some additional module should likely use the PVM to generate possible future trajectories, and at the end yet additional module could associate those trajectories with motor commands. All of these modules could be built from associative memories but they could also be possibly engineered to a degree. I'm very confident that this is absolutely necessary to built truly intelligent agents,  until then we will just be fooling ourselves with reactive imposers.