Some new ideas for AI benchmarks

Those who regularly read my blog are aware that I'm a bit skeptical of the current AI "benchmarks" and whether they serve the field well. In particular I think that the lack of definition of intelligence is the major elephant in the room. For a proof that this apparently is not a well recognized issue take this recent twitter thread:

Aside from the broader context of this thread discussing evolution and learning, Ilya Sutskever, one of the leading deep learning researchers, is expressing a nice sounding empirical approach: we don't have to argue, we can just test. Well, as it may clearly follow from my reply, I don't think this is really the case. I have no idea what Sutskever means by "obviously more intelligent" - do you? Does he mean better ability to overfit existing datasets? Play yet another Atari computer game? I find this approach prevalent in the circles associated with deep learning, as if this field had some very well defined empirical measurement foundation. Quite the opposite is true: the field is driven by a dogma that a "dataset" (blessed as standard in the field by some committee) and some God given measure (put Hinton, LeCun or … Read more...

Autonomous vehicle safety myths and facts, 2018 update

A year ago I wrote a post summarizing the disengagement data that the state of California requires from the companies developing Autonomous Vehicles. The thesis of my post back then was that the achieved disengagement rates were not yet comparable to human safety levels. It is 2018 now and new data has been released to it is perhaps a good time to revisit my claims.

Let me first show the data:

And in a separate plot for better readability just Waymo, the unquestionable leader of that race (so far at least):


So where did that data came from? There are several sources:

  1. California DMV disengagement reports for years 2017, 2016 and 2015
  2. Insurance Institute for Highway Safety fatality data.
  3. RAND driving to safety report.
  4. Bureau of Transportation Statistics

One can easily verify the numbers plotted above with all of these sources. Now before we start any discussion let's recall what California defines as a qualifying event:

“a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe operation of the vehicle requires that the autonomous vehicle test driver disengage the autonomous mode and take immediate manual control of