Every rule of thumb in data science has a counterexample. Including this one.
In this post I'd like to explore several simple and low dimensional examples that expose how our typical intuitions about the geometry of data may be fatally flawed. This is generally a practical post, focused on examples, but there is a subtle message I'd like to provide. In essence: be careful. It is easy to make data based conclusions which are totally wrong.
Dimensionality reduction is not always a good idea
It is a fairly common practice to reduce the input data dimension via some projection, typically via principal component analysis (PCA) to get a lower-dimensional, more "condensed" data. This often works fine, as often the directions along which data is separable align with the principal axis. But this does not have to be the case, see a synthetic example below:
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from scipy.stats import ortho_group
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
N = 10 # Dimension of the data
M = 500 # Number of samples
# Random rotation matrix
R = ortho_group.rvs(dim=N)
# Data variances
variances = np.sort(np.random.rand((N)))[::-1]
… Read more...
Almost six months ago (May 28th 2018) I posted the "AI winter is well on its way" post that went viral. The post amassed nearly a quarter million views and got picked up in Bloomberg, Forbes, Politico, Venturebeat, BBC, Datascience Podcast and numerous other smaller media outlets and blogs [1, 2, 3, 4, ...], triggered violent debate on Hacker news and Reddit. I could not have anticipated this post to be so successful and hence I realized I touched on a very sensitive subject. One can agree with my claims or not, but the sheer popularity of the post almost itself serves as a proof that something is going on behind the scenes and people are actually curious and doubtful if there is anything solid behind the AI hype.
Since the post made a prediction, that the AI hype is cracking (particularly in the space of autonomous vehicles) and as a result we will have another "AI winter" episode, I decided to periodically go over those claims, see what has changed and bring some new evidence.
First of all a bit of clarification: some readers have … Read more...
There are many many deep learning models out there doing various things. Depending on the exact task they are solving, they may be constructed differently. Some will use convolution followed by pooling. Some will use several convolutional layers before there is any pooling layer. Some will use max-pooling. Some will use mean-pooling. Some will have a dropout added. Some will have a batch-norm layer here and there. Some will use sigmoid neurons, some will use half-recitfiers. Some will classify and therefore optimize for cross-entropy. Others will minimize mean-squared error. Some will use unpooling layers. Some will use deconvolutional layers. Some will use stochastic gradient descent with momentum. Some will use ADAM. Some will have RESNET layers, some will use Inception. The choices are plentiful (see e.g. here).
Reading any of these particular papers, one is faced with a set of choices the authors had made, followed by the evaluation on the dataset of their choice. The discussion of choices typically refers strongly to papers where given techniques were first introduced, whereas the results section typically discusses in detail the previous state of the art. The shape of the architecture is often broken down into obvious and non obvious decisions. … Read more...
In some recent email exchanges I've realized that when people by some coincidence make it to this blog, they rarely end up visiting my main website, and even if they do, they rarely browse through the teaching materials. This is not really a complaint, I hardly ever visit my website myself, but there are some materials there that I go back to every once in a while (though I have copies on my laptop). These are the lecture notes I made for a lecture on mathematical foundations of neuroscience.
As a bit of a background, in 2009 after I defended my PhD and before I joined Brain Corporation I was briefly an Adjunct Professor at the Faculty of Mathematics and Computer Science Nicolaus Copernicus University in Torun. During that time I decided to refresh everything I gathered about mathematics of neuroscience and prepare a lecture series complete with exercises, lots of pictures, graphs, and all the necessary theory. And even though 9 years have passed since then, the lectures hold up pretty well, hence why not bring that content to a broader audience?
The lecture consists of 15 main pdf presentations, a number of sample exercises as well … Read more...
Since it is fashionable these days to compare the performance of connectionist models with humans (even though these models, often referred to as deep learning only stand a chance of competing with humans in extremely narrow contests), there is a popular belief that these models powered by modern GPU's somehow approach the computational power of the human brain.
Now the latter is really not defined, since we don't even know how brains work and therefore it is extremely hard to estimate at which level of abstraction to assign the fundamental computation but we can still play with some numbers just to get some vague idea of where are we.
So let us start with neurons: average human brain has roughly 80 billion neurons. The popular belief is that neurons are responsible for the function of the brain but there are plenty other cells there, called glia, whose function is not yet understood. So it is very likely there are actually orders of magnitude more cells that somehow realize the computational function, but for now let us stick to the "official" 80B figure.
Each of these neurons is an extremely complex cell, with membrane, electrochemical dynamics of action potentials … Read more...
There has been a lot of stuff going on recently and I've been super busy. I have a few posts in early stage of development and a few ideas in the pipeline but it will likely take me quite some time before I get this stuff to a state in which it would be readable.
In the meanwhile, by a complete coincidence I've learned that my 2017 PVM talk I gave a University of California Merced is actually available online. It was a very good visit, organised by Chris Kello, David Noelle and others. I had some good chats with these guys and with Jeff Yoshimi (author of simbrain) among others. Somehow I did not realize the talk was recorded... Anyway, here it is, better late than never I guess. Since I generally hate to listen to myself, I had to increase the playback speed to 2.0 at which point it actually sounded OK, so I recommend those settings (plus it only takes 50% of the time).
Slides are available here.
If you found an error, highlight it and press Shift + Enter or click here to inform us.… Read more...
I read a lot of deep learning papers, typically a few/week. I've read probably several thousands of papers. My general problem with papers in machine learning or deep learning is that often they sit in some strange no man's land between science and engineering, I call it "academic engineering". Let me describe what I mean:
- A scientific paper IMHO, should convey an idea that has the ability to explain something. For example a paper that proves a mathematical theorem, a paper that presents a model of some physical phenomenon. Alternatively a scientific paper could be experimental, where the result of an experiment tells us something fundamental about the reality. Nevertheless the central point of a scientific paper is a relatively concisely expressible idea of some nontrivial universality (and predictive power) or some nontrivial observation about the nature of reality.
- An engineering paper shows a method of solving a particular problem. Problems may vary and depend on an application, sometimes they could be really uninteresting and specific but nevertheless useful for somebody somewhere. For an engineering paper, things that matter are different than for a scientific paper: the universality of the solution may not be of paramount importance. What matters
… Read more...
In recent weeks I've been forced to reformulate and distill my views on AI. After my winter post went viral many people contacted me over email and on twitter with many good suggestions. Since there is now more attention to what I have to offer, I decided to write down in a condensed form what I think is wrong with our approach to AI and what could we fix. Here are my 10 points:
- We are trapped by Turing's definition of intelligence. In his famous formulation Turing confined intelligence as a solution to a verbal game played against humans. This in particular sets intelligence as a (1) solution to a game, and (2) puts human in the judgement position. This definition is extremely deceptive and has not served the field well. Dogs, monkeys, elephants and even rodents are very intelligent creatures but are not verbal and hence would fail the Turing test.
- The central problem of AI is Moravec's Paradox. It is vastly more stark today than it was when it was originally formulated in 1988 and the fact we've done so little to address it over those 30 years is embarrassing. The central thesis of the paradox is
… Read more...
My previous post on AI winter went viral almost to the point of killing my Amazon instance (it got well north of 100k views). It triggered a serious tweet storms, lots of discussion on hackernews and reddit. From this empirical evidence one thing is clear - whether the AI winter is close or not, it is a very sensitive and provocative subject. Almost as if many people felt something under their skin...
Anyway, in this quick followup post, I'd like to respond to some of the points and explain some misunderstandings.
Hype is not fading, it is cracking.
First off, many citations to my post were put in context that the AI hype is fading. This was not my point at all. The hype is doing very well. Some of the major propagandists have gone quieter but much like I explained in the post, on the surface everything is still nice and colorful. You have to look below the propaganda to see the cracks. It would actually be great if the hype faded down but that is not how it works. When the stock market crashes, it is not like everybody slowly begin to admit that they overpaid for … Read more...
Deep learning has been at the forefront of the so called AI revolution for quite a few years now, and many people had believed that it is the silver bullet that will take us to the world of wonders of technological singularity (general AI). Many bets were made in 2014, 2015 and 2016 when still new boundaries were pushed, such as the Alpha Go etc. Companies such as Tesla were announcing through the mouths of their CEO's that fully self driving car was very close, to the point that Tesla even started selling that option to customers [to be enabled by future software update].
We have now mid 2018 and things have changed. Not on the surface yet, NIPS conference is still oversold, the corporate PR still has AI all over its press releases, Elon Musk still keeps promising self driving cars and Google CEO keeps repeating Andrew Ng's slogan that AI is bigger than electricity. But this narrative begins to crack. And as I predicted in my older post, the place where the cracks are most visible is autonomous driving - an actual application of the technology in the real world.
The dust settled on deep learning
When … Read more...
Those who regularly read my blog are aware that I'm a bit skeptical of the current AI "benchmarks" and whether they serve the field well. In particular I think that the lack of definition of intelligence is the major elephant in the room. For a proof that this apparently is not a well recognized issue take this recent twitter thread:
Aside from the broader context of this thread discussing evolution and learning, Ilya Sutskever, one of the leading deep learning researchers, is expressing a nice sounding empirical approach: we don't have to argue, we can just test. Well, as it may clearly follow from my reply, I don't think this is really the case. I have no idea what Sutskever means by "obviously more intelligent" - do you? Does he mean better ability to overfit existing datasets? Play yet another Atari computer game? I find this approach prevalent in the circles associated with deep learning, as if this field had some very well defined empirical measurement foundation. Quite the opposite is true: the field is driven by a dogma that a "dataset" (blessed as standard in the field by some committee) and some God given measure (put Hinton, LeCun or … Read more...
A year ago I wrote a post summarizing the disengagement data that the state of California requires from the companies developing Autonomous Vehicles. The thesis of my post back then was that the achieved disengagement rates were not yet comparable to human safety levels. It is 2018 now and new data has been released to it is perhaps a good time to revisit my claims.
Let me first show the data:
And in a separate plot for better readability just Waymo, the unquestionable leader of that race (so far at least):
So where did that data came from? There are several sources:
- California DMV disengagement reports for years 2017, 2016 and 2015
- Insurance Institute for Highway Safety fatality data.
- RAND driving to safety report.
- Bureau of Transportation Statistics
One can easily verify the numbers plotted above with all of these sources. Now before we start any discussion let's recall what California defines as a qualifying event:
“a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe operation of the vehicle requires that the autonomous vehicle test driver disengage the autonomous mode and take immediate manual control of
… Read more...
In this post I'd like to present a slightly different take on AI and expose one dimension of Intelligence which we hardly explore with mainstream efforts. In order to do that, I'll use a metaphor, which should hopefully make things clear. As with every analogy, this one is also bound to be imperfect, but should be sufficient to get certain idea across.
The way I see progress in artificial intelligence (AI) could be summarized with the following (visual) metaphor:
Imagine that elevation symbolizes ability to accomplish certain level of performance in a given task, and each horizontal position (latitude/longitude) represents a task. Human intelligence is like a mountain, tall and rather flat, just like one of those buttes in Monument Valley. Let's call this mountain "Mt. Intelligence". A lot of horizontal space is covered by the hill (representing the vast amount of tasks that can be accomplished), less intelligent animals can be represented by lower hills covering different areas of task space.
In this setting our efforts in AI resemble building towers. They are very narrow and shaky, but by strapping together a bunch of cables and duct tape we can often reach elevation higher than the "human … Read more...
This post is a bit of a mixed bag about technology and fragility, a bit about AI and tiny bit on politics. You've been warned.
Back in the communist and then early capitalist Poland, where I grew up, one could often get used soviet equipment such as optics, power tools etc. Back in the day these things were relatively cheap and had the reputation of being very sturdy and essentially unbreakable (often described as pseudo Russian "gniotsa nie łamiotsa" which essentially meant you could "bend it and it would not break"). There are multiple possible reasons why that equipment was so sturdy, one hypothesis is that soviet factories could not control very well the quality of their steel and so the designers had to put in additional margin into their designs. When the materials actually turned out to be of high quality, such over engineered parts would then be extra strong. Other explanation is that some of that equipment was ex-military and therefore designed with an extra margin. Nevertheless, these often heavy and over-engineered products were contrasted in the early 90's with modern, optimized, western made things. Western stuff was obviously better designed and optimized, lighter, but as soon … Read more...
Most people (at least those with college education) are well aware of how exponential growth works. The typical (correct) intuition is that when things are growing exponentially, they may initially look like nothing, in fact things may go very slow for quite a while, but eventually there is an explosion and exponential growth eventually outpaces everything sub-exponential. What is less commonly appreciated is that exponential decay works similarly - things exist, get smaller and effectively at some point become nonexistent. It is almost as if there was a discrete transition. Let us keep that in mind while we discuss some probability theory below.
Gauss and Cauchy
Gauss and Cauchy were two very famous mathematicians, both having countless contributions in various areas of mathematics. Coincidentally, two seemingly similarly looking probability distributions are named after these two individuals. And although many people working in data science and engineering have relatively good understanding of Gaussian distribution (otherwise known as "normal" distribution), Cauchy distribution is less known. It is also a very interesting beast, as it is an example of a much less "normal" distribution than Gaussian and most intuitions from typical statistics fail in the context of Cauchy. Although Cauchy like distributions are … Read more...