Optimality, technology and fragility.

This post is a bit of a mixed bag about technology and fragility, a bit about AI and tiny bit on politics. You've been warned.


Back in the communist and then early capitalist Poland, where I grew up, one could often get used soviet equipment such as optics, power tools etc. Back in the day these things were relatively cheap and had the reputation of being very sturdy and essentially unbreakable (often described as pseudo Russian "gniotsa nie łamiotsa" which essentially meant you could "bend it and it would not break"). There are multiple possible reasons why that equipment was so sturdy, one hypothesis is that soviet factories could not control very well the quality of their steel and so the designers had to put in additional margin into their designs. When the materials actually turned out to be of high quality, such over engineered parts would then be extra strong. Other explanation is that some of that equipment was ex-military and therefore designed with an extra margin. Nevertheless, these often heavy and over-engineered products were contrasted in the early 90's with modern, optimized, western made things. Western stuff was obviously better designed and optimized, lighter, but as soon … Read more...

When the tail is bigger than the dog

Most people (at least those with college education) are well aware of how exponential growth works. The typical (correct) intuition is that when things are growing exponentially, they may initially look like nothing, in fact things may go very slow for quite a while, but eventually there is an explosion and exponential growth eventually outpaces everything sub-exponential. What is less commonly appreciated is that exponential decay works similarly - things exist, get smaller and effectively at some point become nonexistent. It is almost as if there was a discrete transition. Let us keep that in mind while we discuss some probability theory below.

Gauss and Cauchy

Gauss and Cauchy were two very famous mathematicians, both having countless contributions in various areas of mathematics. Coincidentally, two seemingly similarly looking probability distributions are named after these two individuals. And although many people working in data science and engineering have relatively good understanding of Gaussian distribution (otherwise known as "normal" distribution), Cauchy distribution is less known. It is also a very interesting beast, as it is an example of a much less "normal" distribution than Gaussian and most intuitions from typical statistics fail in the context of Cauchy. Although Cauchy like distributions are … Read more...

The paradox of reproducibility in statistical research


One of the hallmarks of science is the reproducibility of results. It lies at the very foundation of our epistemology that objectivity of a result could only be assured if others are able to independently reproduce the experiment.

One could argue that science today actually has various issues with reproducibility, e.g. results obtained in a unique instrument (such as the LHC - Large Hydron Collider) cannot be reproduced anywhere, simply because nobody has another such instrument. At least in this case the results are in principle reproducible, and aside from the lack of another instrument, the basic scientific methodology can remain intact. Things get a bit more hairy with AI.

Determinism, reproducibility and randomness

The one hidden assumption with reproducibility is that the reality is roughly deterministic, and the results of the experiment depend deterministically on the experimental setup. After carefully tuning the initial conditions we expect the same experimental result. But things start to be more complex when our experiment itself is statistical in nature and relies on a random sample.

For example the experiment called elections: once the experiment is performed it cannot be reproduced, since the outcome of the first experiment affects substantially the system studied … Read more...

Problems with measuring performance in Machine Learning

With today's advancements in AI we often see media reports of superhuman performance in some task. These often quite dramatic announcements should however be treated with a dose of skepticism, as many of them may result purely from pathologies in measures applied to the problem. In this post I'd like to show what I mean by a "measurement pathology". I therefore constructed a simple example, which hopefully will  to get the point across.

Example: measuring lemons

Imagine somebody came to your machine learning lab/company with a following problem: identify lemons in a photo. This problems sounds clear enough, but in order to build an actual machine learning system that will accomplish such task, we have to formalize what this means in the form of a measure (of performance). The way this typically begins, is that some student will laboriously label the dataset. For the sake of this example, my dataset consists of a single image with approximately 50 lemons in it:

As mentioned the picture was carefully labeled:

With human labeled mask here:

Now that there is a ground truth label we can establish a measurement. One way to formally express the desire to identify lemons in this picture … Read more...

Silent immobility

I have a few AI related posts in the pipeline, but before I publish them (most still need some work), I want to share my recent experience and some thoughts on it.

I just came back from a trip to Europe, a typical summer visit. The trip went fine, children are happy, the whole flight was uneventful. I've spent there a week, back in my hometown visiting friends and family. This time however I decided to pay attention to something different than usual, instead of focusing on stuff that has changed, I decided to seek the stuff that remained the same.

It's been more than 7 years since I moved from Poland to California, nevertheless there are countless things there which seem to not have changed at all e.g particular stores and institutions, my neighbors, bars and coffees etc. Wound up with the constant push for progress, we tend to not see how many things appear to be frozen in time.


Now let me get to a concrete example of what I'm talking about: on my way there I obviously took a transcontinental flight, one of the mayor European airlines. A nice and neat Airbus A380 welcomed us at … Read more...

Autonomous car safety myths and facts

Regular readers by now may have gathered that I'm skeptical about the current self driving car hype. To make things clear: this is not because I would not like to use a driverless car, or that because I think it is fundamentally impossible. My skepticism is merely caused by my concern that the technology we have right not is not mature enough for such application. That includes both the fundamental technological primitives in the space of AI as well as economic feasibility. Also the increasing hype and sensational press reports are not improving the realistic and fact based discussion that should take place.

The argument that is often repeated in popular press and used by the proponents of autonomous cars is that they will be much safer than humans. This argument is very potent and emotional, as nearly each one of us had a relative killed in a car accident and the number of these accidents is still too high (even though in absolute terms motor vehicle related fatalities are very rare). I would certainly like to see the improved safety by whatever means, the lowest hanging fruits in this space I think are: better training and testing of drivers … Read more...

Intelligence confuses the intelligent

This post again about the definition of AI. It was provoked by a certain tweet exchange where it turned out again that everybody understand the term AI arbitrarily. So here we go again.


Let us deal again with this fundamental question: what is and what is not AI. Determining which is artificial and which not is not the problem, the problem is determining if something is intelligent or not. This has confused, confuses and likely will confuse even very intelligent people.

Many application/research focused people, particularly in machine learning avoid asking this question altogether, arguing that it is philosophical, undefined and therefore not scientific (and inevitably touching this matter causes a mess). Instead they use the equivalent of duck typing - if it looks intelligent it is intelligent - a somewhat extreme extension of the Turing test. I disagree with this opportunistic approach, I think getting this definition right is crucial to the field, even if it means getting into another s%#t storm.  In fact, if the argument by the machine learning people is that this discussion is not sufficiently formal and messy, I'd like to kindly suggest that it is their duty to formalize it, not to … Read more...

Inside of a nebula

I'm taking a break from AI in this short post, it's time for something more general about the universe [see the last post in this category "what if we had a warp drive"].

In our daily activities we may not notice how lucky we are - we can see the sky. I mean the deep sky, even far beyond our Galaxy. And by looking at those things, we can learn that the Universe is expanding, that there are quasars, active galaxies, large scale cosmic structures, galaxy clusters, cosmic background radiation and many other marvels. We treat all that as obvious.

But imagine the Sun along with the solar system was trapped inside one of the dense nebulas, which there are countless numbers of in our Galaxy. Say we were trapped somewhere deep inside the Orion nebula.  All we would see in the night sky would be the faint pink glow of hydrogen and maybe a few blurred stars shining through the fog.

And best of all, since the nebula is many, many light years across, we could do nothing to see beyond it. Absolutely nothing. Discovering anything about the outside universe would require sending a probe light years … Read more...

The complexity of simplicity - balancing on the Occam's razor

While rereading my recent post [the meta-parameter slot machine], as well as a few papers suggested by the readers in the comments, I've realized several things.

On the one hand we have Occam's Razor: choose only the simplest models for things. On the other hand we know that in order to build intelligence, we need to create a very complex artifact (namely something like a brain), that has to contain lots of memories (parameters). There is an inherent conflict between these two constraints.

Many faces of overfitting

If we have a model too complex for the task we often find it will overfit, since it has the capacity to "remember the training set". But things may not be so obvious in reality. For example there is another, counter intuitive situation where overfitting may hit us: the case where the model is clearly too simple to solve the task we have in mind, but the task as specified by the dataset is actually much simpler than what we had originally thought (and intended).

Let me explain this counterintuitive case with an example (an actual anecdote I heard from Simon Thorpe as far as I remember):

Figure 1.

Who will figure out intelligence?

In my career I've encountered researchers in several fields who try to address the (artificial) intelligence problem. What I found though, is that researchers acting within those fields had a vague idea of all the others trying to answer the same question from a different perspective (in fact I had a very faint idea myself initially as well). In addition, following the best tradition of Sarye's law there is often tension and competition between the researchers occupying their niches resulting in violent arguments. I've had the chance to interact with researchers representing pretty much all of the disciplines I'll mention here, and as many of the readers of this blog may be involved in research in one or a few of them, I decided it might be worthwhile to introduce them to each other. Within each community I'll try to explain (at least from my shallow perspective) the core assumption, prevalent methodology, and the possible benefits and drawbacks of the approach as well as a few representative literature/examples (purely subjective choice). My personal view is that the answer to the big AI question cannot be obtained within any of these disciplines, but will eventually be found somewhere between them, and … Read more...

The meta-parameter slot machine

Today we'll step back a bit and consider the psychology of a machine learning researcher when he does his job, a subject which interests me deeply and one that I've already touched in another post.  Some of this comes from my own introspection, as I've been doing machine learning for quite a few years now.

Emails and ML models trigger dopamine

It is a well known fact from biology that little achievements trigger the release of small amounts of dopamine - a neurotransmitter that is believed to be involved in reinforcement learning. The dopamine makes us feel good and also triggers plasticity in certain parts of the brain (likely allowing the brain to "remember" what behaviour lead to the reward). Reinforcement learning however has its issues, since the reward can appear by coincidence and therefore reinforce the "wrong cause". This is very much visible these days with Internet, emails and texts: since receiving an important and rewarding message reinforces the behaviour which lead to it - and that most likely was pressing "get mail" button - we get addicted to checking email! Same applies to social media, texting, and is also the mechanism underlying gambling. In reality rewards … Read more...

Give me a dataset to train, and I shall move the world

There is a widespread belief among the "artificial intelligentsia" that with the advent of deep learning all it takes to conquer some new land (application) is to create a relevant dataset, get a fast GPU and train, train, train. However, as it happens with complex matters, this approach has certain limitations and hidden assumptions. There are at least two important epistemological assumptions:

  1. Given big enough sample from some distribution we can approximate it efficiently with a statistical/connectionist model
  2. A statistical sample of a phenomenon is enough to automate/reason/predict the phenomenon

Both of these assumptions are not universally correct.

Universal approximation is not really universal

There is a theoretical result known as the universal approximation theorem. In summary it states that any function can be approximated to an arbitrary precision by (at least) three  level composition of real functions, such as e.g. a multilayer perceptron with sigmoidal activation. This is a mathematical statement, but a rather existential one. It does not say if such approximation would be practical or achievable with, say, gradient descent approach. It merely states that such approximation exists. As with many such existential arguments, their applicability to real world is limited. In the real world, we … Read more...

Self made time capsule, part 2.

In my previous post I described the hardware components of my self made time capsule/home server. It consisted of the Intel NUC micro-PC, Netgear managed 1GBps switch and Edimax 802.1ac access point. Here I'll go over the basic config, necessary to achieve the functionality I've mentioned.


I'm using ubuntu 16.04 LTS (Long Term Support). It is a very decent Debian based distribution and works very well on the Intel NUC. In this post I'll assume that the Linux is already installed and all the hardware components are detected by the kernel (I had no issues whatsoever, it worked out of the box). The only issue that may perhaps be a problem on the NUC is when you have secure boot enabled in the BIOS, which should be disabled before you install Linux. Also make sure the boot sequence in the BIOS makes sense. After you install linux, it's a good idea to update to make sure all the installed packages are the latest.

Before we begin the setup it is good to install a few essentials before we screw up our Internet connection:

apt-get install openssh-server
apt-get install git
apt-get install vim
apt-get install dnsmasq
apt-get install vlan



Self made time capsule, part 1

It will not be about AI this time, neither will it be about Sci-fi. It will actually be exactly about what the title indicates. So let's begin.

Since a certain incident in the late 90's involving a 850MB drive I'm quite paranoid about having backup. For many years this paranoia was satisfied with Apple Time Capsule - a handy device that acts as a wifi/router and a network attached storage, which through afp protocol offers time machine service to Mac computers. I have one back in Poland and I had one here in California, until one day in January 2017 all of a sudden the device died. I had this device since 2010 so it served me well for quite a few years (I upgraded the drive to a 3TB in the meanwhile), but still the death was surprising and disappointing.

But what was even more disappointing, was to see the current Apple's offering in that segment.  As mentioned I bough my (back then 1.5TB) capsule in 2010, now it is 2017 and Apple offers... a 2TB Capsule for $299 and a 3TB Capsule for $399. This is ridiculous!

Ultimately, I decided to build one myself, and I'm very happy … Read more...

PVM on the GPU (dev update)

I've mentioned several times that the Predictive Vision Model (PVM) is not expressible in any of the current deep learning frameworks such as TensorFlow or Caffe (not easily or for that matter efficiently at least). This is due to the inherent feedback and multi scale structure. PVM is not an end to end trained system, it is a collection of intertwined sequence learners. That being said, I'm currently working in my free time to bring PVM to the GPU.

I'm not the most experienced person in the GPU programming domain, but I can definitely write a kernel and use Nvidia profiler. So far my results look very encouraging: I can train more than 210 million float32 parameters at 21 fps with Nvidia Titan X based on the Pascal architecture. In other words that is 4.4 billion trained float32 parameters per second. This training performance matches that of deep learning models, where e.g. RESNET-50 with ~25 million parameters can be trained at approximately 100-150 samples/s (single GPU). In fact my GPU utilisation is now close to 97% with most kernels.  To some degree I feel that PVM will be even better suited for GPU implementation than end-to-end deep learning because of … Read more...