In many of my posts I'm directly or indirectly postulating learning of the physics as a way to create a "real AI". The point I'm trying to makes is so obvious, that it actually is not obvious at all and it took me some time to realise it. As with many such obvious/non-obvious things it takes multiple angles before the essence could be captured, hence why I write this blog. I'm trying to express myself in many ways until I hit the explanation that everyone just simply gets. So let me try again in this post:
The world around us is complex. Everything to some degree interacts with everything else, there are lots of regularities but there is also a fair amount of chaos. No two trees look identical, yet we manage to categorise them. In terms of physical language it appears that a good chunk of our reality is a "mixing system" at the "edge of chaos" (or otherwise critical). We therefore cannot very well predict what will happen. Yet I'm postulating prediction as a training paradigm, does this make any sense?
It does and here is why: even with the chaotic world, there are numerous aspects of it which are highly predictable at various spacial/temporal scales. My immediate surroundings, my desk and office are stable and predictable. So is my apartment. The weather at a higher time scale is not. Neither are vortices in my tea cup while I mix it. Therefore regular and irregular dynamics is intertwined at different scales. What is irregular at one scale, can be regular on a larger scale. What is irregular on a large scale, can have regular smaller scale patterns. An agent acting in such reality has to be able to extract the regularities and ignore the irregularities (not allocate cognitive resources to predict the unpredictable), and that often requires operating at multiple scales as the same time. It is pointless to predict the exact trajectory of a leaf falling from the tree if I know it will generally end up on the ground and in either way it is light and harmless (irrelevant). The interaction between scales is so ubiquitous we rarely perceive it explicitly, yet pretty much all AI systems we build are ignorant of that interplay.
Cortex and recurrence
I'm a bit sceptical (what a surprise!) on how much can biology teach us on how to exactly build AI - biology is complex and full of stuff that might not be relevant. Yet it is informative to see what we do know from biology, to see if there is any greater sense. So I claim there is.
It appears that the neocortex is the part of the brain that builds the model of the world, since it processes the sensory information. A salient feature of the cortex is that it appears relatively uniform across different areas and so it is hypothesised that the "cortical algorithm" - whatever it might be - is the same everywhere. Another important feature is feedback. It was not that obvious in the times of Hubel and Wiesel work on simple and complex cells which through the Neocognitron inspired contemporary deep learning, but it is well documented now: cortex is full of feedback connections. In fact it seems there are more feedback connections than there are feedforward ones. The feedback goes all over the place, often across modalities: e.g. there are auditory projections in primary visual cortex. Few (more grow for those who loose their sight), but even in perfectly healthy individuals they still exist. Feedback does affect the responses even in the primary visual cortex. In fact it appears that the cortex is in some form of critical state, which indicates the feedback is so strong that the whole thing is at the edge of "blowing up". It sometimes does blow up in an epileptic seizure.
Why would cortex need all that recurrence? Exactly to be able to fit a model of a complex world. Since the world interacts at multiple scales and levels of abstraction, the model of the world build by the cortex needs to do the same. Hence the structure of recurrence reflects the complexity of the processes being modelled. Auditory projections in visual cortex obviously make sense since auditory and visual events are often related and co-inform each other. Seeing an explosion without a blast is an anomaly. Hearing a blast without any visual indication of an explosion is an anomaly.
Generalisation and memory
There is an interplay between generalisation and memorising. One can build a perfect machine learning algorithm: a lookup table. The algorithm can memorise the entire training set and achieve 100% performance. Yet it will fail miserably on any test data. To achieve generalisation, information needs to get lost. Generalisation is all about similarity in some aspect and ignoring irrelevant details that don't matter. The big question is what to ignore? What is irrelevant? What constitutes the essence and what is simply noise? I claim in real world applications it is the physics that determines these decisions. It is all intermixed with statistics (which is confusing us), but the physics is primary and more informative.
E.g. some object is illuminated from the left. If the system understands the physics of illumination it will understand that there is a primary cause (object) and a secondary effect (illumination). A purely statistical system will need to see the same object illuminated from the left and from the right to create a similar decomposition. And even then, what about illuminating from the top or another angle the statistical system has not seen before? The true physical cause remains elusive to the statistical system and requires a huge amount of data to cover1.
The point is subtle and takes some thinking before it becomes clear. Let me illustrate this with the two figures below:
Figure 1. The statistical approach to AI. Impressive algorithms associate data disconnected from the reality that generated it, the physical cause. However data contains enough statistical signal to be able to weakly associate the picture with a meaning.
Figure 2. The dynamical approach to AI. The algorithm interacts with dynamical data (continuous and unlabelled). The system is allowed to discover the physical regularities. Consequently the system develops representations corresponding to the physical causes of things. Those representations allow for physically correct generalisation.
The ghost of the Chinese room
This goes back to the original Chinese room argument. As much as I generally dislike this formulation (as it generally aims against interpreting "intelligence" as a "computable" process, which I, mildly speaking, don't buy), I believe it should perhaps be written down as this to make more sense:
A "true" AI is one that develops physically correct generalisation (aka common sense, folk physics, etc). Can AI develop physically correct generalisation without being directly exposed to physics (embodied in some way)? Theoretically maybe (with an astronomical amount of data), but practically unlikely. Until then we can build "ludic AI's" (AI operating in some artificial, statistical/game world) and they may achieve impressive results in those domains (universes of discourse), but they will fail to generalise when applied to real world data.
General recurrent predictive system
PVM has all the qualities of the neocortex discussed above and it works (at least for the tracking task we evaluated it on). Unlike the recurrent deep nets found in contemporary literature, the recurrence in PVM is not restricted to a single layer or module, but global all over the place. The feedback connections in PVM grow strong and hugely affect the responses (I will write a separate post on it with a few demos). It is clear that the system builds a complex, multiscale, recurrent model of the reality it sees. There are plenty questions left, the exact details of the architecture may need to be tweaked and the best way to use such recurrent model needs more research, but I'm sure this is the right path. Too many things come together and make sense for it to be a coincidence.
1 There is a subtle point here to note. Ultimately the physics needs to be learned using statistics as well. But if the system has the capacity to represent a physical aspect of the world it no longer needs to rely on statistics to learn it for other objects. So once a system learns what e.g. illumination/shadow is in given set of conditions it can generalise onto new set of objects/conditions without the need to experience all of those or even similar cases.