Human communication is inherently multi-modal, comprising not only the
linguistic utterance but also signals transmitted via the visual channel
including amongst others gaze, gesture and object manipulations. When
robots are to interact with humans in real-world environments, within
concrete task-specific situations, it is essential to endow robots with
capabilities for multi-modal natural language processing and

Based on the analysis of human-human communication in task-oriented
scenarios, we will discuss the natural language processing capabilities
and related mechanisms a robot should be equipped with. This talk gives
insights into current basic research in multi-modal natural language
understanding and related challenges.