Can a deep net see a cat?

In this post I will explore the capabilities of contemporary deep learning models on the vitally important task of detecting a cat. Not an ordinary cat though, but a sketch of an abstract cat. This task matters because success tells us something about whether a visual system has learned generalization and abstraction  -- at least on par with a 2-year old. This post is inspired by my ex co-worker Peter O'Connor who tried similar experiments on LeNet several years ago. In addition, this post is a continuation of this blog's highly popular "Just how close are we to solving vision?" which to-date has amassed nearly 15,000 hits. Let's begin by introducing my menagerie:

Figure 1. The cat menagerie. From left to right (top to bottom): "abstract cat", rough sketch of a real cat, less rough sketch of a real cat, the "best cat" I could draw, "best cat" inverted.

I made these sketches myself, based on a photo of a cat. NOTE: Whenever you test a deep net (or any other machine learning model), always use new data. Anything you find on the Internet is either already in the training set or soon will be.

VGG 16


Scaling up AI

Caution: due to a large number of animations this post may take a while to load (depending on your connection speed), please be patient and don't reload unless necessary. The animations will likely load before you read the text.

Scalability in Machine Learning

Scalability is a word with many meanings and can be confusing, particularly when applied to machine learning. For me the meaning of scalability is the answer to this question:

Can an instance of the algorithm be practically scaled to larger/parallel hardware and achieve better results in approximately the same (physical) time?

That is different from the typical understanding of data parallelism, in which case multiple instances of an algorithm are deployed in parallel to process chunks of data simultaneously. An example of scalability of instance (definition above) is for example computational fluid dynamics (CFD). Aside from the need to obtain better initial conditions, one can run the fluid dynamics on a finer grid and achieve better (more accurate) results. Obviously it requires more compute, but generally the increase in complexity can be offset by adding more processors (there are some subtleties related to Amdahl's law and synchronisation). For that reason, most of the world's giant supercomputers are … Read more...

Recurrent dreams and filling in

Caution: due to a large number of animations, fair amount of traffic and the tiny size of my web hosting machine, this post may take a while to load, please be patient and don't reload unless necessary.

There has recently been a fair amount of deep learning work on video prediction and generative models that focuses on infusing motion into static pictures. One such paper is e.g. available here:

The approach taken in that  paper was to train a model on a huge amount of data and explicitly separate the task of prediction into (1) the generation of static background and (2) a moving object. As much as this work is impressive, the separation into background and foreground prediction seems a bit unnatural. Given however the nice mesmerising quality of video (and the importance of prediction) I decided to play a little bit with our Predictive Vision Model (PVM) which is also capable of generating such "dreams". For the sake of this post I only trained a very small instance of PVM on a single relatively short video, so the results shown here are mainly illustrative and this is by no means a full blown scientific study.… Read more...