This was a paper I worked on during my PhD at the Max Planck Institute. It merged ideas from computer vision, experimental psychology and optimization to create the somewhat novel concept of Detectability.

Avoiding collisions is one of the central goals of driver assistance systems. Modern approaches envision self-driving cars that will do this automatically. A less automated approach has assistance systems just helping the driver making the optimal decisions. A key assumption is that drivers will try and avoid collisions with pedestrians as best they can. But what about the pedestrians that are hard to notice? If you don’t know that a pedestrian is there, you’re not going to avoid him successfully. But how likely is it that you’ll notice the pedestrian standing in the middle of the road with the red shirt? How probable is it that you’re going to miss the one in the grey shirt walking in between two cars? Enter the idea of Detectability.

Detectability

In the simplest form, detectability is the likelihood that you’ll notice an object in a scene “at a glance”. Given enough time most human observers will find any pedestrian (or for that matter object) in a scene. The classical “where is Waldo” search puzzles show this. But if you’re only given a short amount of time (on the order of a few 100 milliseconds) some pedestrians will be easily noticeable by everyone while others are really hard to spot under such conditions.

We took a dataset of annotated images (i.e. pictures taken in a city where all cars, pedestrians, etc. are already labelled) and showed them to test subjects in a lab setup. The participants had their heads fixed in front of the monitor, a street scene would briefly flash on the monitor and the pedestrians then had to click everywhere in the image where the noticed pedestrians. With this data we could calculate, for every pedestrian in every street image, how likely it was that a human observer could notice them “at a glance”.

Predicting detectability

Going one step further we extracted feature sets for every pedestrian. We then trained a support vector machine regression (those were the days before deep learning neural networks had won the ML wars) to predict the detectability of a pedestrian based on the feature set. Several of our features were higher level measures such as “distance to the original fixation point” or “distance to other pedestrians” besides the traditional image descriptors.

This interestingly meant that we could then use the learn function to predict the optimal fixation point (i.e. where should the test subject focus before the street scene got flashed) to maximize the detectability of all pedestrians in the scene.

Predicted optimal fixation point (blue) and center point between all pedestrians (red). The blue one led to higher detection rates.

We then got a new set of test subjects into the lab, showed them the same images as the first set of participants but this them to fixate on the predicted optimal point. And as it turns out it actually improved detectability of pedestrians, meaning that the positions of more pedestrians were correctly reported in the second experiment.

The results show that the fixation point predicted by our algorithm outperforms just fixating in the center of all pedestrians or the baseline (fixating on the center of the screen).

Abstract

How likely is it that a driver notices a person standing on the side of the road? In this paper we introduce the concept of pedestrian detectability. It is a measure of how probable it is that a human observer perceives pedestrians in an image. We acquire a dataset of pedestrians with their associated detectabilities in a rapid detection experiment using images of street scenes. On this dataset we learn a regression function that allows us to predict human detectabilities from an optimized set of image and contextual features. We exploit this function to infer the optimal focus of attention for pedestrian detection. With this combination of human perception and machine vision we propose a method we deem useful for the optimization of Human-MachineInterfaces in driver assistance systems

Citation

D. Engel and C. Curio, “Detectability Prediction for Increased Scene Awareness,” in IEEE Intelligent Transportation Systems Magazine, vol. 5, no. 4, pp. 146-157, winter 2013, doi: 10.1109/MITS.2013.2272473

PDF for download

IEEE