Project description: Society’s need for understanding of big data, and in particular visual data, is constantly
growing. For instance, virtually all major public areas are covered by surveillance cameras producing video streams,
which need to be analyzed on-demand or even online to prevent major acts of crime or terrorism. This analysis requires
predominantly fully automatic detection, tracking, and recognition of objects, e.g., people, and predictions of their
actions/behavior and anomalies. Visual detection and recognition problems have recently been addressed with previously
unseen performance by deep learning approaches, where convolutional networks are trained on enormous datasets,
e.g., IMAGENET (more than 14 million images), taking weeks even on powerful platforms. Due to the success of deep
learning, many vision problems have recently been addressed using the same framework. In contrast to many other
researchers that apply deep learning as a black box, we have addressed new problems, such as action recognition and
object tracking, by investigating the separate layers of those networks.
In the present project, we aim at a major leap beyond current deep learning approaches. We will look into the procedural
fundamentals of the learning algorithm and investigate the aspect of learning with respect to the overall task,
which often requires reinforcement learning rather than supervised learning. Furthermore, we will derive methods
for incremental and online learning, based on a deeper understanding of the semantic structure of the network.
This structural knowledge will also allow the introduction of regularization, constraints, invariances, and existing
modelling. Finally, we plan to go beyond feed-forward networks and look into recurrent network design and dynamic
schemes. Using these novel approaches to deep learning and on our previous experience with visual object tracking,
we will address the problem of multiple object tracking. Visual object tracking requires an online adaptation of the
tracked model, thus online deep learning is needed and approaches to construct compact and discriminative deep
features will be investigated. The assessment of tracking is typically by means of region overlap, i.e., a success
score between zero and one, thus requiring reinforcement learning. Multiple object tracking is typically an iterative
process, which can be implemented by a dynamic network with recurrent procedures. Finally, depending on the
cameras, prior knowledge such as geometric constraints are available that should be modelled into the solution, too.