Statistics and dynamics

There are two very important branches of mathematics relevant for building intelligent systems: statistics and dynamics. The rationale is the following:

  • data has regularities and patterns that repeat, therefore an intelligent system should analyse them statistically
  • things in the world are in motion and that motion has regularities, therefore the intelligent system should build models of that dynamics

Although seemingly these approaches are very compatible, it is important to understand the different modes of thinking: statistics tries to find a pattern given we know nothing else about the system (often making assumptions that things come from a known distribution) based on many samples. Dynamics tries to write down the equations of motion of the system given very few samples. Statistics wants to estimate the expected value and variance of things. Dynamics wants to predict exact value of something with strict error estimate.

Current machine learning is heavily biased towards statistics. Although some priors are inserted into the models, the general approach is to throw more data and compute power at a system and expect miracles, rather than building a system that could intelligently infer based on the dynamics (see e.g. the ImageNet and similar purely statistical approaches to understanding images which assume that reality is a set of frozen visual data points deprived of time and dynamics).

Dynamics in the context of connectionist models would trigger keywords such as: feedback, online learning, online inference, prediction, time series and so on. Statistics on the other could be recognised by the use of terms such as: batch learning, layer normalisation, dropout, regularisation, offline learning, cross-validation, bias estimation, entropy and so on.

If you follow deep learning, you can probably tell easily which approach dominates.

My hunch is that for real world applications, such as robotics (and yes a self driving car is a robot too) purely statistical approach will never be sufficient. We generally have no doubt that when we know the equations of motion of some system, it is much better to use those rather than statistics (see weather prediction for example). However in the context of AI we somehow believe that everything can be solved with statistics.

The real solution is to use statistics to build the model of dynamics. Once this is done, the agent can use the model to make predictions about what will happen and use those to intelligently guide its behaviour (instead of trying to make those same decisions directly based on statistics). This is not easy because it requires people to think both in statistical and dynamical framework.

That exact principle stands behind building the predictive vision model. Building PVM also revealed a number of interesting consequences which include:

  • scalability in both training time and compute
  • seamless accommodation of feedback connectivity (long standing puzzle in neuroscience)
  • possibility of asynchronous operation for ultimately scalable parallel implementation
  • robustness to errors and component failures

But appreciating PVM requires a change in thinking: appreciation of what dynamics can offer that statistics will not be able to match, and at the same time understanding that the only way to derive the complex dynamical equations of the agent acting in the real world is via statistics.