Task 1: Deep architectures
Because the role played by the architecture is not well understood, its design often relies on a know-how about combining fixed, plain or convolutional layers. Particular structured data, such as audio, images or videos, will need specific models that encode the underlying signal models and constraints, such as invariance, and exploit the independence structures in the data. In particular, we will work on architecture mixing two worlds (tensors and deep learning) and develop new deep learning algorithms for tensor structured data. A special interest will be devoted to design new deep architectures for forecasting large-scale spatiotemporal streams.

Task 2: Learning issues for deep nets
The most widely used approach for deep learning is stochastic gradient descent coupled with many heuristics. Even if the basic setup of deep learning is well understood, some skepticism has been raised in front of what is sometimes referred as “the black magic required for their effective train- ing”, expressing the lack of understanding on the nature of deep learning. We also propose to explore unsupervised strategies to learn deep representations by opportunely exploiting different priors on the data (eg. Slow Feature Analysis principle). Unsupervised and supervised learning schemes will be also considered in a joint semi-supervised optimization of deep nets.

Task 3: Implementation issues and green deep learning
One of the main drawbacks of deep learning current solutions lies in the computational power re- quired to train their millions of parameters. Requirements of such a computational power currently prevents embedded applications of such methods. The current research activities towards embed- ded deep learning techniques, rely mainly on designing new hardware. This is efficient for very specific tasks in robotics for instance. We will consider new more generic low energy and embedded platforms such as, for instance but not only, the NVidia Jetson series (Tegra K1, Tegra X1). Some low energy high computation clusters are already on development in other contexts.3 This will be our starting point to develop new efficient high-performance and energy-efficient cluster for deep learning.

Task 4: Demonstration systems

  • Image search. We will combine convolutional and recurrent neural networks (e.g. LSTM) to syn- thesize natural language
    image descriptions (e.g. “two people walking in a park on a sunny day”).
  • Video forecasting. We will develop a deep architecture demonstrator dedicated to very-large scale structured data
    prediction with application on video forecasting.
  • Audio scene recognition. We plan to develop novel methodologies for everyday/urban sound event recognition and
    important breakthroughs can be expected in this area by capturing short-term and long-term features characterizing these
    sound events.
  • Robotics and Transportation. We will develop new architectures for grasp detection and real-time object detection for
    smart drones and intelligent cars.