Invited Speakers

Yiannis Aloimonos
University of Maryland, USA

Title of the Talk: The theory of Therbligs: A compositional approach to incremental robot learning

Michael Beetz
University of Bremen, Germany

Title of the Talk: DTKR&R — a simulation-based predictive modelling engine for
cognition-enabled robot manipulation

Recent years have seen an impressive progress of robot simulators and environments as fully developed software systems that provide simulations as a substitute for real-world activity. They are primarily used for training modules of robot control programs, which are, after completing the learning process, deployed in real-world robots. In contrast, simulation in (artificial) cognitive systems is a core cognitive capability, which is assumed to provide a „small-scale model of external reality and of its own possible actions within its head, it is able to try out various alternatives, conclude which is the best of them, react to future situations before they arise, utilise the knowledge of past events in dealing with the present and future, and in every way to react in a much fuller, safer, and more competent manner to the emergencies which face it“ (Craik, The Nature of Explanation, 1943). This means that simulation can be considered as an embodied, online predictive modelling engine that enables robots to contextualize vague task requests such as „bring me the milk“ into a concrete body motion that achieves the implicit goal and avoids unwanted side effects. In this setting a robot can run small-scale simulation and rendering processes for different reasoning tasks all the time and can continually compare simulation results with reality – it is a promising Sim2Real2Sim setup that has the potential to create much more powerful robot simulation engines. We introduce DTKR&R, a robot simulation framework that is currently designed and developed with this vision in mind.

Angelo Cangelosi
The University of Manchester; United Kingdom

Title of the Talk: Developmental Robotics for Language Learning, Trust and Theory of Mind

Growing theoretical and experimental research on action and language processing and on number learning and gestures clearly demonstrates the role of embodiment in cognition and language processing. In psychology and neuroscience, this evidence constitutes the basis of embodied cognition, also known as grounded cognition (Pezzulo et al. 2012). In robotics and AI, these studies have important implications for the design of linguistic capabilities in cognitive agents and robots for human-robot collaboration, and have led to the new interdisciplinary approach of Developmental Robotics, as part of the wider Cognitive Robotics field (Cangelosi & Schlesinger 2015; Cangelosi & Asada 2021). During the talk we will present examples of developmental robotics models and experimental results from iCub experiments on the embodiment biases in early word acquisition and grammar learning (Morse et al. 2015; Morse & Cangelosi 2017) and experiments on pointing gestures and finger counting for number learning (De La Cruz et al. 2014). We will then present a novel developmental robotics model, and experiments, on Theory of Mind and its use for autonomous trust behavior in robots (Vinanzi et al. 2019). The implications for the use of such embodied approaches for embodied cognition in AI and cognitive sciences, and for robot companion applications will also be discussed.

David Hsu
National University of Singapore, Singapore

Title of the Talk: Interactive Visual Grounding and Grasping in Clutter

Pass me the blue notebook right next to the coffee mug.“ This is the spoken instruction to the robot, which is faced with a pile of objects on the table. What would it take for the robot to succeed? It must understand natural language instructions, recognize objects and their spatial relationships visually, and most importantly, connect language understanding, visual perception with robot actions. One main challenge here is the inevitable ambiguity in human languages and uncertainty in visual perception. In this talk, I will introduce INVIGORATE, a robot system that interacts with human through natural language and grasps a specified object in clutter. By integrating model-based reasoning and data-driven deep learning, INVIGORATE takes one step towards a service robot that helps with household tasks at home.

Tetsunari Inamura
National Institute of Informatics, Japan

Title of the Talk:

Cloud-based VR gamification towards learning explanation of the daily-life activity

In recent years, attempts to bridge the gap between natural language processing research and robotics research has accelerated. A typical example is the visual navigation task, which learns the relationship between a sequence of visual information about an agent’s movement and the sentences that describe its navigation. Researchers have proposed various machine learning models by sharing open and large data sets on such navigation tasks. However, the main behaviors are often two-dimensional movements in a room or a city. Large datasets of complex behaviors such as assembling objects or physical and social interactions with others are not easily available due to the enormous cost of building datasets. On the other hand, simulators in robotics research are becoming more and more important, and VR systems that allow humans to intervene in robot simulators have been proposed. In this talk, I introduce an attempt of gamification using a VR system as a mechanism to collect natural language expression that corresponds to social interactions and complex physical behaviors. I have developed the SIGVerse system, which combines a robot simulator with a VR space where humans log in as avatars. Based on the VR system, I designed a robot competition task in which humans and robots interact linguistically and perform social and physical actions. This system enables us to collect interaction data while providing fun for the competitors and participants. I will also introduce our recent attempt to accelerate HRI research in the coronavirus pandemic with the VR system.

Karinne Ramirez-Amaro
Chalmers University of Technology, Sweden

Title of the Talk: Robots that Reason – A Semantic Reasoning Method for the Recognition of Human Activities

Autonomous robots are expected to learn new skills and to re-use past experiences in different situations as efficient, intuitive and reliable as possible. Robots need to adapt to different sources of information, for example, videos, robot sensors, virtual reality, etc. Then, to advance the research in the understanding of human activities, in robotics, the development of learning methods that adapt to different sensors are needed. In this talk, I will introduce a novel learning method that generates compact and general semantic models to infer human activities. This learning method allows robots to obtain and determine a higher-level understanding of a demonstrator’s behavior via semantic representations. First, the low-level information is extracted from the sensory data, then a meaningful semantic description, the high-level, is obtained by reasoning about the intended human behaviors. The introduced method has been assessed on different robots, e.g. the iCub, REEM-C, and TOMM, with different kinematic chains and dynamics. Furthermore, the robots use different perceptual modalities, under different constraints and in several scenarios ranging from making a sandwich to driving a car assessed on different domains (home-service and industrial scenarios). One important aspect of our approach is its scalability and adaptability toward new activities, which can be learned on-demand. Overall, the presented compact and flexible solutions are suitable to tackle complex and challenging problems for autonomous robots.

Giulio Sandini
Instituto Italiano di Tecnologia, Italy

Title of the Talk: TBA
Abstract – TBA

Emre Ugur
Bogazici University, Turkey

Title of the Talk: Learning discrete representations from continuous self-supervised interactions: A neuro-symbolic robotics approach

Interaction with the world requires processing low-level continuous sensorimotor representations whereas abstract reasoning requires the use of high-level symbolic representations. Truly intelligent robots are expected to form abstractions continually from their interactions with the world and use them on-the-fly for complex planning and reasoning in novel environments. In this talk, we address the challenging problem of autonomous discovery of discrete symbols and unsupervised learning of rules via a novel neuro-symbolic architecture. In this architecture, action grounded categories are formed in the binary bottleneck layer in a predictive, deep encoder-decoder network that processes the image of the scene of the robot. To distill the knowledge represented by the neural network into rules and plans, PPDDL representations are formed from learned decision trees that replace decoder functionality of the network. The discovered symbols are interpretable, formed incrementally, re-used to learn more complex symbols and directly deployed by the off-the-shelf planners in order to achieve manipulation tasks such as building towers from objects with different affordances.

Florentin Wörgötter
Georg-August University Göttingen, Germany

Title of the Talk: How Humans Recognize Actions: Behavioral and fMRI Experiments Support Robotic Action Grammar

Since about 2010 several groups (e.g. the groups of Aloimonos, Asfour, Kjellström, and others) have advocated and used different but related grammar-like representations to encode actions for robots. Our representation is based on so-called Semantic Event Chains. This separates actions into temporal chunks defined by touching and untouching relations between objects (including the actor’s hand or other body parts), because these transition events are highly characteristic for different action types. Recently we had focused on the question whether or not humans use the same „algorithm“ to predict and recognize action. Here we show first a set of virtual reality experiment to support this notion. This had been flanked by a second study using functional magnetic resonance that shows how out brain „hooks on“ to these transition events. These results, thus, indicate that the SEC-framework may have direct explanatory value for human processing of action, too.