There is a widespread belief among the "artificial intelligentsia" that with the advent of deep learning all it takes to conquer some new land (application) is to create a relevant dataset, get a fast GPU and train, train, train. However, as it happens with complex matters, this approach has certain limitations and hidden assumptions. There are at least two important epistemological assumptions:

- Given big enough sample from some distribution we can approximate it efficiently with a statistical/connectionist model
- A statistical sample of a phenomenon is enough to automate/reason/predict the phenomenon

Both of these assumptions are not universally correct.

### Universal approximation is not really universal

There is a theoretical result known as the universal approximation theorem. In summary it states that any function can be approximated to an arbitrary precision by (at least) three level composition of real functions, such as e.g. a multilayer perceptron with sigmoidal activation. This is a mathematical statement, but a rather existential one. It does not say if such approximation would be practical or achievable with, say, gradient descent approach. It merely states that such approximation exists. As with many such existential arguments, their applicability to real world is limited. In the real world, we … Read more...