Many people these days are fascinated by deep learning, as it enabled new capabilities in many areas, particularly in computer vision. Deep nets are however black boxes and most people have no idea how they work (and frankly most of us, scientists trained in the field can't tell exactly how they work either). But the success of deep learning and a set of its surprising failure modes teach us a valuable lesson about the data we process.
In this post I will present a perspective of what deep learning actually enables, how it relates to classical computer vision (which is far from being dead) and what are the potential dangers of relying on DL for critical applications.
The vision problem
First of all, some things need to be said about the problem of vision/computer vision. In principle it could be formulated as follows: given an image from a camera allow the computer to answer questions about the contents of that image. Such questions can range from "is there a triangle in the image", "is there a human face in the image" to more complex instances such as "is there a dog chasing a cat in the image". Although many of … Read more...
Once upon a time, in the 1980's there was a magical place called Silicon Valley. Wonderful things were about to happen there and many people were about make a ton of money. These things were all related to the miracle of a computer and how it would revolutionize pretty much everything.
Computers had a ton of applications in front of them: completely overhauling office work, enabling entertainment via computer games and changing the way we communicate, shop and use banking system. But back then they were clumsy, slow and expensive. And although the hope was there, many of these things wouldn't be accomplished unless computers somehow got orders of magnitude faster and cheaper.
But there was the Moore's law - over the decade of the 1970' the number of transistors in an integrated circuit doubled every ~18 months. If this law were to hold, the future would be rosy and beautiful. The applications would be unlocked for which the markets were awaiting. Money was to be made.
By mid 1990's it was clear that it worked. Computers were getting faster and software was getting more complex so rapidly, that upgrades had to happen on a yearly basis to keep up … Read more...
It has became a tradition that I write a quick update on the state of self driving car development every year when the California DMV releases their disengagement data [ 2017 post here, 2018 post here]. 2018 was an important year for self driving as we had seen the first fatal accident caused by an autonomous vehicle (the infamous Uber crash in Arizona).
Let me start with a disclaimer: I plot disengagements against human crashes and fatalities not because it is a good comparison, but because this is the only comparison we have. There are many reasons why this is not the best measure and depending on the reason the actual "safety" of AV may be either somewhat better or significantly worse than indicated here. Below are some of my reasons:
- A disengagement is a situation in which a machine cannot be trusted and the human operator takes over to avoid any danger. The precise definition under California law is:
“a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe operation of the vehicle requires that the autonomous vehicle test driver disengage the autonomous mode and take immediate manual
… Read more...
Every rule of thumb in data science has a counterexample. Including this one.
In this post I'd like to explore several simple and low dimensional examples that expose how our typical intuitions about the geometry of data may be fatally flawed. This is generally a practical post, focused on examples, but there is a subtle message I'd like to provide. In essence: be careful. It is easy to make data based conclusions which are totally wrong.
Dimensionality reduction is not always a good idea
It is a fairly common practice to reduce the input data dimension via some projection, typically via principal component analysis (PCA) to get a lower-dimensional, more "condensed" data. This often works fine, as often the directions along which data is separable align with the principal axis. But this does not have to be the case, see a synthetic example below:
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from scipy.stats import ortho_group
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
N = 10 # Dimension of the data
M = 500 # Number of samples
# Random rotation matrix
R = ortho_group.rvs(dim=N)
# Data variances
variances = np.sort(np.random.rand((N)))[::-1]
… Read more...
Elon Musk is a polarizing figure. His ideas frequently come about in casual conversations. People are often amused and impressed by his achievements. I must admit, a few years back I thought he is literally the next Steve Jobs, only actually better, since he was onto so many things... I admired SpaceX, thought that Tesla cars had many great solutions in them...
At some point in 2015 or 2016 Elon started talking outrageous stuff in the domain of AI, a domain of my own expertise, which I could tell right away was total bullshit. And then I began looking at all this stuff in detail. Doing some math here and there. Reading various opinions. As a result, my opinion on Musk and many of his ideas has changed somewhat substantially. At this point, I can pretty much say with confidence that 90% of his stuff is utter BS, and the remaining 10% is perhaps impressive but still questionable.
Nevertheless he is quite a character with many fans almost religiously believing everything he says. Any time I meet somebody who is a Musk fan I have to go over these issues so I decided to write this post as a point … Read more...
Almost six months ago (May 28th 2018) I posted the "AI winter is well on its way" post that went viral. The post amassed nearly a quarter million views and got picked up in Bloomberg, Forbes, Politico, Venturebeat, BBC, Datascience Podcast and numerous other smaller media outlets and blogs [1, 2, 3, 4, ...], triggered violent debate on Hacker news and Reddit. I could not have anticipated this post to be so successful and hence I realized I touched on a very sensitive subject. One can agree with my claims or not, but the sheer popularity of the post almost itself serves as a proof that something is going on behind the scenes and people are actually curious and doubtful if there is anything solid behind the AI hype.
Since the post made a prediction, that the AI hype is cracking (particularly in the space of autonomous vehicles) and as a result we will have another "AI winter" episode, I decided to periodically go over those claims, see what has changed and bring some new evidence.
First of all a bit of clarification: some readers have … Read more...
There are many many deep learning models out there doing various things. Depending on the exact task they are solving, they may be constructed differently. Some will use convolution followed by pooling. Some will use several convolutional layers before there is any pooling layer. Some will use max-pooling. Some will use mean-pooling. Some will have a dropout added. Some will have a batch-norm layer here and there. Some will use sigmoid neurons, some will use half-recitfiers. Some will classify and therefore optimize for cross-entropy. Others will minimize mean-squared error. Some will use unpooling layers. Some will use deconvolutional layers. Some will use stochastic gradient descent with momentum. Some will use ADAM. Some will have RESNET layers, some will use Inception. The choices are plentiful (see e.g. here).
Reading any of these particular papers, one is faced with a set of choices the authors had made, followed by the evaluation on the dataset of their choice. The discussion of choices typically refers strongly to papers where given techniques were first introduced, whereas the results section typically discusses in detail the previous state of the art. The shape of the architecture is often broken down into obvious and non obvious decisions. … Read more...
In some recent email exchanges I've realized that when people by some coincidence make it to this blog, they rarely end up visiting my main website, and even if they do, they rarely browse through the teaching materials. This is not really a complaint, I hardly ever visit my website myself, but there are some materials there that I go back to every once in a while (though I have copies on my laptop). These are the lecture notes I made for a lecture on mathematical foundations of neuroscience.
As a bit of a background, in 2009 after I defended my PhD and before I joined Brain Corporation I was briefly an Adjunct Professor at the Faculty of Mathematics and Computer Science Nicolaus Copernicus University in Torun. During that time I decided to refresh everything I gathered about mathematics of neuroscience and prepare a lecture series complete with exercises, lots of pictures, graphs, and all the necessary theory. And even though 9 years have passed since then, the lectures hold up pretty well, hence why not bring that content to a broader audience?
The lecture consists of 15 main pdf presentations, a number of sample exercises as well … Read more...
Since it is fashionable these days to compare the performance of connectionist models with humans (even though these models, often referred to as deep learning only stand a chance of competing with humans in extremely narrow contests), there is a popular belief that these models powered by modern GPU's somehow approach the computational power of the human brain.
Now the latter is really not defined, since we don't even know how brains work and therefore it is extremely hard to estimate at which level of abstraction to assign the fundamental computation but we can still play with some numbers just to get some vague idea of where are we.
So let us start with neurons: average human brain has roughly 80 billion neurons. The popular belief is that neurons are responsible for the function of the brain but there are plenty other cells there, called glia, whose function is not yet understood. So it is very likely there are actually orders of magnitude more cells that somehow realize the computational function, but for now let us stick to the "official" 80B figure.
Each of these neurons is an extremely complex cell, with membrane, electrochemical dynamics of action potentials … Read more...
There has been a lot of stuff going on recently and I've been super busy. I have a few posts in early stage of development and a few ideas in the pipeline but it will likely take me quite some time before I get this stuff to a state in which it would be readable.
In the meanwhile, by a complete coincidence I've learned that my 2017 PVM talk I gave a University of California Merced is actually available online. It was a very good visit, organised by Chris Kello, David Noelle and others. I had some good chats with these guys and with Jeff Yoshimi (author of simbrain) among others. Somehow I did not realize the talk was recorded... Anyway, here it is, better late than never I guess. Since I generally hate to listen to myself, I had to increase the playback speed to 2.0 at which point it actually sounded OK, so I recommend those settings (plus it only takes 50% of the time).
Slides are available here.
If you found an error, highlight it and press Shift + Enter or click here to inform us.… Read more...
I read a lot of deep learning papers, typically a few/week. I've read probably several thousands of papers. My general problem with papers in machine learning or deep learning is that often they sit in some strange no man's land between science and engineering, I call it "academic engineering". Let me describe what I mean:
- A scientific paper IMHO, should convey an idea that has the ability to explain something. For example a paper that proves a mathematical theorem, a paper that presents a model of some physical phenomenon. Alternatively a scientific paper could be experimental, where the result of an experiment tells us something fundamental about the reality. Nevertheless the central point of a scientific paper is a relatively concisely expressible idea of some nontrivial universality (and predictive power) or some nontrivial observation about the nature of reality.
- An engineering paper shows a method of solving a particular problem. Problems may vary and depend on an application, sometimes they could be really uninteresting and specific but nevertheless useful for somebody somewhere. For an engineering paper, things that matter are different than for a scientific paper: the universality of the solution may not be of paramount importance. What matters
… Read more...
In recent weeks I've been forced to reformulate and distill my views on AI. After my winter post went viral many people contacted me over email and on twitter with many good suggestions. Since there is now more attention to what I have to offer, I decided to write down in a condensed form what I think is wrong with our approach to AI and what could we fix. Here are my 10 points:
- We are trapped by Turing's definition of intelligence. In his famous formulation Turing confined intelligence as a solution to a verbal game played against humans. This in particular sets intelligence as a (1) solution to a game, and (2) puts human in the judgement position. This definition is extremely deceptive and has not served the field well. Dogs, monkeys, elephants and even rodents are very intelligent creatures but are not verbal and hence would fail the Turing test.
- The central problem of AI is Moravec's Paradox. It is vastly more stark today than it was when it was originally formulated in 1988 and the fact we've done so little to address it over those 30 years is embarrassing. The central thesis of the paradox is
… Read more...
My previous post on AI winter went viral almost to the point of killing my Amazon instance (it got well north of 100k views). It triggered a serious tweet storms, lots of discussion on hackernews and reddit. From this empirical evidence one thing is clear - whether the AI winter is close or not, it is a very sensitive and provocative subject. Almost as if many people felt something under their skin...
Anyway, in this quick followup post, I'd like to respond to some of the points and explain some misunderstandings.
Hype is not fading, it is cracking.
First off, many citations to my post were put in context that the AI hype is fading. This was not my point at all. The hype is doing very well. Some of the major propagandists have gone quieter but much like I explained in the post, on the surface everything is still nice and colorful. You have to look below the propaganda to see the cracks. It would actually be great if the hype faded down but that is not how it works. When the stock market crashes, it is not like everybody slowly begin to admit that they overpaid for … Read more...
Deep learning has been at the forefront of the so called AI revolution for quite a few years now, and many people had believed that it is the silver bullet that will take us to the world of wonders of technological singularity (general AI). Many bets were made in 2014, 2015 and 2016 when still new boundaries were pushed, such as the Alpha Go etc. Companies such as Tesla were announcing through the mouths of their CEO's that fully self driving car was very close, to the point that Tesla even started selling that option to customers [to be enabled by future software update].
We have now mid 2018 and things have changed. Not on the surface yet, NIPS conference is still oversold, the corporate PR still has AI all over its press releases, Elon Musk still keeps promising self driving cars and Google CEO keeps repeating Andrew Ng's slogan that AI is bigger than electricity. But this narrative begins to crack. And as I predicted in my older post, the place where the cracks are most visible is autonomous driving - an actual application of the technology in the real world.
The dust settled on deep learning
When … Read more...
I have long been fascinated with the mysterious black holes. Over the years I've been following the literature and improved my mathematical skills to better understand what we know about these objects. Over the past several years I followed several heated debates related to numerous paradoxes that our understanding of black holes had caused. Here I'd like to present a few issues I have with our contemporary understanding of the subject. If you are a black hole specialist, I will appreciate feedback.
Existence of black holes is a straightforward result of the theory of general relativity (in fact is conceivable even in the classical Newtonian mechanics). In essence the observation is that an object dense enough would eventually reach the escape velocity equal to the speed of light, at which point in becomes black (since it cannot radiate anything out) and anything that happens to get trapped inside it, has no hope of getting out, or at least has the same hope of getting out as we may have the hope of traveling faster than light. The solution of that particular object was first put forward by Karl Schwarzschild who observed that there is a particular size/radius below … Read more...