The peculiar perception of the problem of perception

In the previous posts I've been investigating the current state of the art deep nets for casual vision application - telling what is in the image taken in an average office and average boring street. I've also played a bit with adversarial examples to show how the deep nets can be fooled. These failure modes tell us something important about the level of perception we are dealing with - very basic level. In this post I will discuss why I think perception is such an elusive problem. Let's begin with vision.

The blindspot

Each of us is born with a blindspot in their visual field - the place where nerve fibres from the retina exit the eyeball. However, unless somebody tells us how to discover it, we are completely ignorant of its existence. In some sense it could be qualified as an example of anosognosia - a condition in which humans are not aware of a defect in their perception. A more extreme case of this is known as Anton-Babinski syndrome, typically occurring after a brain damage in which the patient claims to see even though he is technically blind! As much as this seems unbelievable, patients will confabulate … Read more...

Adversarial red flag

In the previous post I  applied an off the shelf deep net to get an idea how it performs on average street/office video. The purpose of this exercise was to critically examine and reveal what these award winning models are actually like. The results were a mixed bag. The network was able to capture the gist of the scene, but made serious mistakes every once in a while. Granted the model I used for that experiment was trained on ImageNet which has a few biases and is probably not the best set to test "visual capabilities in the real world". In the the current post I will discuss another problem which is plaguing deep learning models - adversarial stimuli.

Deep nets can be made to fail on purpose. It's been first shown in [1] and there have been quite a few papers since then with different methods to construct stimuli that fool deep models. In the simplest case one can directly derive these stimuli from the network itself. Since ConvNets  are purely feedforward systems (most of them at least), we can trace back the gradients. Typically gradients are used to modify the weights such that they better fit the given … Read more...

Just how close are we to solving vision?

There is a lot of hype today about deep learning, a class of multilayer perceptrons with some 5-20 layers featuring convolutional and polling layers. Many blogs [1,2,3] discuss the structure of these networks, there is plenty code published so I won't get into much detail here. Several tech companies had invested a lot of money into this research and everyone has very high expectations on performance of these models. Indeed they've been winning image classification competitions for several years now and media are reporting  superhuman performance on some visual classification tasks once in a while.

Now just looking at the numbers from ImageNet competition is not really telling us much on how good these models really are, we can only maybe confirm that they are much better than whatever came before them (for that benchmark at least). With media reporting superhuman abilities and high ImageNet numbers and big CEO's pumping hype and showing sexy movies of a car tracking other cars on the road (2min video looped X times which seems a bit suspicious) one can get the impression that vision is  a solved problem.

In this blog post (and a few others coming … Read more...

Intelligence is real

So we are trying to build Artificial Intelligence. But what is it? Is a program playing chess or go intelligent? After some though I think most people would agree that not really. It's just a computer program that managed to master a game. Is a large neural network -- optimised with gradient descent to approximate a dataset -- intelligent? Well, it is just a function approximator so technically I would say no. All these exercises do capture some aspect of what we would call intelligence, but the core of this idea seems elusive.

So why all the fuss about Artificial Intelligence?

A bit of history

The term "Artificial Intelligence" was coined by Prof. John McCarthy for the famous Dartmouth Conference in 1956. By his own words he had to invent something to get the funding. Since the very origin this term caused controversies and boom-bust iterations known as AI winters, among which  the better documented ones are the LightHill report in 1974, Minsky and Papert book Perceptrons in 1969 (which busted the connectionist studies for quite a while), the 1987 collapse of expert systems (predicted by Minsky and Schank), and more recent smaller crisis in Backpropagation powered neural networks … Read more...

When you are 80% there means you are not there

Apparently we live in the world where singularity is about to happen and artificial intelligence (AI) will cover every aspect of our lives. But the field of AI had always been inflated by bubbles and busts known as AI winters. Why is it so and is this time different?

Human psychology

There are several weaknesses of human psychology that make us very susceptible to hype in AI. First of all, we should note that humans have amazing perception, particularly visual perception. The problem is that great majority of our marvellous vision develops by the age of 2 and so neither of us remember what it's like to not perceive the world correctly. By the time we begin to verbalise (and remember anything), all the low and mid level perceptual machinery is up and running. So our psyche wakes up in a world where everything already makes sense and what needs to be learned and achieved are the higher cognitive tasks.

This phenomenon is reflected in our approach to AI. We tend to believe that artificial intelligence is about playing chess or go (or atari) because that is the kind of higher cognitive task that we are excited about by the … Read more...