HCI @METU: Multimodal Interaction through Visual and Haptic Modalities

Cengiz Acartürk , Middle East Technical University, Turkey


HCI Research Group at the Middle East Technical University (METU) has been studying multimodal interaction and communication through various modalities, including visual and haptic modalities in collaboration with national and international partners. The talk will include a brief presentation of three projects that focus on multimodal interaction and communication:

• Öztek: Investigation of the teaching process of basic essential and cognitive concepts to Special education students and its effectiveness by using Technology Enhanced Learning environments.

• IRIS: Towards Natural Interaction and Communication. The major goal of the project is to provide a natural interaction communication platform accessible and adapted for all users, particularly for people with speech impairments and elderly in indoor scenarios.

• The study of cognitive processes in multimodal communication through line graphs: The project aims at developing a verbal assistance system for haptic line graph comprehension, based on the insights obtained from a set of experimental work.

Recognizing Affective Touch for Pet Robots

Kerem Altun , Kemerburgaz University, Turkey


Pet cats and dogs usually have the uncanny ability to respond to humans' emotional state, and we would like pet robots to do the same. Humans mostly interact with their pets through touch; yet, it is an understudied element in affect recognition. We present experimental results of recognizing various touch gestures and their emotional content, using the Haptic Creature as our robot platform. We discuss requirements for next generation touch sensing technology in human-pet robot interactions.

Creating Multimodal Cues of Empathy in Human-Robot Interaction

Elisabeth André , University of Augsburg, Germany


Interpersonal communication is inherently emotional. Human conversational partners usually try to interpret – consciously or unconsciously - the speaker’s or listener’s affective cues and respond to them accordingly. With the objective to contribute to more natural and intuitive ways of communicating with machines, an increasing number of research projects started to investigate how to simulate emotional behaviors using computer models. On the one hand, robust techniques are researched that recognize emotional states from multi-sensory input, such as mimics, gestures and speech. Through the use of miniaturized sensor technology, it has become possible to unobtrusively capture natural human behaviors in everyday situations.

On the other hand, mechanisms are under development that generate and display emotional states of artificial agents appearing in the role of animated virtual interlocutors, anthropomorphic robots or even digitally-enhanced everyday objects. The behavior of these artificial agents does not follow a given script, rather it develops spontaneously based on operationalized emotional models that are informed by theories from the cognitive and social sciences. Behaviors of different complexity are examined including processes that simply mirror a human’s emotion state in the artificial agent, but also processes that require a deeper understanding of the causes and reasons for emotional states, and processes that attempt to positively influence them.

Pervasive Computing Technologies for Maintaining Wellness and Healthy Lifestyle

Bert Arnrich , Bogazici University, Turkey


Major determinants of many health outcomes are manifested in people’s lifestyle and behavior. Pervasive computing technologies are foreseen to play a major role towards maintaining wellness and healthy lifestyle in everyday life. In recent years, a huge variety of mobile, wearable and ambient computing technologies were investigated to provide a data-rich basis to capture, connect and analyze various health-related dimensions of individuals. In this talk, pervasive computing technologies which monitor health-damaging factors like work-related stress are presented.

Overview of Research Projects at the AI Group of Freiburg University

Christian Becker-Asano , Freiburg University, Germany


The "Foundations of Artificial Intelligence group" is mainly concerned with developing and testing fundamental AI planning algorithms. These are applied to action planning in robotics. Apart from this basic research we are also interested in the role that emotions play in Human-Agent Interaction (HAI). This includes both emotion expression by the interface as well as emotional effects elicited in the user. This talk will give an overview of our recent experiments in the field of HAI and the conclusions we derived from the results.

HCI Research at Bahçeşehir University: Affect Recognition, Human Behavior Analysis and Mobile Localization

Çiğdem Eroğlu Erdem , Bahçeşehir University, Turkey


This presentation will give an overview of the current research efforts at Bahçeşehir University, which are related to the field of human computer interaction.

First, research on multi-modal affect recognition using facial expressions and speech will be presented. Two naturalistic audio-visual databases will be introduced, one collected from movies and the other one recorded in our labs in Turkish. These challenging databases are expected to be useful test beds for researchers working on affect recognition. A method for facial expression recognition by estimation of the unknown neutral face will also be presented. This work was funded by the Technical and Scientific Research Council of Turkey (TÜBİTAK-1001).

Next, research efforts on human behavior analysis will be presented, which involve gesture spotting in continuous videos, and fusion of multimodal cues for gesture recognition (RGB, depth etc.). An application for physiotherapy guidance at home based on Kinect data will be discussed.

Finally, a scalable and accurate localization system for modern mobile smart devices will be presented, which combines classical RF techniques and computer vision methods. The system is expected to be an enabler for new breed of applications that require high accuracy, including augmented reality and indoor information systems. This work was funded by Avea and Turkish Ministry of Science and Industry

HCI Related Research @ GSU: Mobile Phone Sensing, Mobile Activity Recognition, Brain-Computer Interfaces

Özlem Durmaz İncel , Galatasaray University, Turkey


Human activity recognition using wireless sensors and crowdsourced sensing are both emerging topics in the domain of pervasive computing. Activity recognition involves the use of different sensing technologies to automatically collect and classify user activities for different application domains, ranging from medical applications to home monitoring. Whereas crowd-sourced sensing aims to collect environmental or personal data especially using the mobile devices carried by the people. In this sense, smart phones provide a unique platform both for human activity recognition and crowdsourced sensing applications with the integrated rich set of sensors, their ubiquity, ease of use and wireless communication capabilities with various interfaces, hence the ability to transfer sensing data to backend servers.

In this talk, first I will talk about an ongoing Tubitak-supported research project which is about the recognition of the activities of the crowds and communities. I will introduce the ARServ application and the dataset that will be created throughout the project.

In the rest of the talk, I will be focusing on the Brain-Computer Interfaces research where Emotiv EPOC Wireless EEG headset is used to design and test BCI systems which generate small vocabulary speech or control the hand of a robot to help disabled people interact with their surroundings.

Conversational Systems and Multimodal Coordination Dynamics

Stefan Kopp , Bielefeld University, Germany


Multimodality has become a cornerstone notion both for understanding human communication and for realizing adaptive human-computer interfaces. It is particularly important for conversational systems like embodied virtual humans or social robots that are to engage in interaction using speech, gesture, gaze, etc. Much work has studied how multimodal behaviors or, more recently, subtle cues can be recognized, fused, or automatically generated. However, it is still an open problem to understand what features of rich multimodal delivery are significant and what they indicate or communicate in a specific situational context. One key challenge is to account for how these significancies of multimodal behavior vary over time -- as a function of the multidimensional coordination dynamics between interaction partners, modalities, underlying cognitive processes, or situational factors. The "Social Cognitive Systems Group" at Bielefeld University aims to develop conversational systems that are able to understand, predict, and, ultimately, to engage in those dynamic coordinative processes. I will present examples from speech-gesture production and understanding, interpersonal alignment, communicative feedback, and situated language processing.

Multimodal & Face based Games

Hatice Köse , Istanbul Technical University, Turkey


This presentation will give an overview of the current research projects at Istanbul Technical University, on human computer interaction and human-robot interaction fields.

First, research activities on multimodal interaction in HRI is summarized. In “Robotic Sign Language Tutor” project, the motivation is to design and implement imitation based turn-taking games for child-humanoid robot interaction based on sign language. In this setup, children communicate with the robot via touch, sound (speech and drumming), signs from Turkish Sign Language (TSL), non-verbal gestures, colored flashcards, and face recognition. Signs from TSL consists of basic upper torso actions, hand, body, and facial gestures and recognized using RGB-D camera systems.

The second study involves face analysis. Faces are one of the main social signals for communication. Therefore, utilising them for gaming is expected to lead an entertaining experience. We briefly present two mobile gaming applications based on automatic face analysis. One of them is a facial expression imitation game for children with social disorders and the other one is a multiplayer mobile game application that aims at enabling individuals play paintball or laser tag style games using their smartphones.

Attention and Interruptions in Massive Multimodal Ubiquitous Computing Environments

Antonio Krüger , University of Saarland, Germany


In this talk I will discuss the relevance the concepts of "attention" and "interruption" when designing ubiquitous computing interfaces. In fact, these two concepts have been at the core of Mark Weiser's Vision of Ubiquitous and Calm Computing. With more and more possibilities to measure human attention and interruptions they gain more relevance in actual design of User Interfaces in Ubiquitous Computing Environments especially if combined with several other input modalities, which we call massive multimodal environments.

I will present various examples of our work in this area, including studies on how users can handle interruption on mobiles and how to assess attention in ubiquitous computing environments.

Scalable, Customizable, and Reusable Techniques for Multimodal Processing

Marc Erich Latoschik , University of Würzburg, Germany


Input processing of multimodal utterances incorporates a variety of tasks to derive meaning from signals. Typical tasks include sampling, segmentation, feature detection, classification, annotation, or fusion. These tasks often have to be customized and combined in various ways, often modality- or even application-specific. Hence, scalability, customizability and reusability become paramount requirements for technical solutions targeting the various multimodal processing tasks.

We will introduce two approaches to tackle these requirements. With respect to the overall processing tasks, a decentralized actor-based architecture interconnects concurrent and task-specific solutions. Domain-specific languages provide an alternative to low-level processing details often hard-coded. With respect to one prominent specific task, the fusion of multimodal utterances, we introduce recent work which combines a central unification-based approach with a transition network to optimize performance and reduce the search space.

Dialogue Systems Research at Ulm University ‐ Adaptive Speech Interfaces for Technical Companions

Wolfgang Minker , University of Ulm, Germany


Spoken Language Dialogue Systems (SLDSs) providing natural interfaces to computer‐ based applications have been an active research area for many years. However, most state‐of‐the‐art systems are still static in use, and their field of application is rather limited. Our work aims at overcoming this limitation. In this presentation, we will provide a brief overview on our research activities in the domain of adaptive SLDSs for next‐generation Technical Companions in Ambient Intelligent Environments. These activities include Emotion Detection from Multimodal Signals, Verbal Intelligence Estimation based on the Analysis of Spoken Utterances and Statistical Modeling for User‐centered Adaptive Spoken Dialog Systems.

Action Understanding and Generation

Erhan Öztop , Özyeğin University, Turkey


Recent findings in neuroscience indicate that action understanding and generation share common mechanisms in biological systems. Similar approaches are being pursued in technical fields such as artificial intelligence and robotics, with the goal of obtaining effective means to generate intelligent behaviors and algorithms to make sense of observed behaviors of such. In our laboratory we focus on ‘mirror neurons’ that seem to be fundamental for action understanding and generation in biological systems. Complementarily, we also develop engineering methods for the generation of dexterous skills on robots and action understanding based on dynamical systems, human and machine learning. The talk will include highlights from our general approach, and give some details on the use of Dynamic Motor Primitives for processing an observed action for recognition and segmentation.

Multimodal Human State and Trait Recognition: Quo plures, eo feliciores?

Björn Schuller , Munich University of Technology, Germany


Human state and trait recognition plays an ever increasing role in today’s intelligent user interfaces lending them social competence for improved naturalness of the interaction. Obviously, such automatic assessment of user characteristics including emotion, personality or cognitive and physical load to name just a few is challenging. To ease this fact, it is broadly believed that a multimodal approach to the goal is beneficial. Here, we touch upon the question often arising when it comes to consideration of multiple modalities in computer-based human behavior analysis: the more the merrier? The modalities considered comprise the "usual suspects", namely speech, facial expression, and physiology alongside less typical candidates. Synergies are highlighted such as complementarity in view of emotion or personality primitives alongside arising problems of multimodal fusion. Examples include such from a series of recent public research competitions co-organized by the presenter.


Reading User Intentions: Predictive and Proactive Multimodal Interfaces

Tevfik Metin Sezgin , Koç University, Turkey


Early computers were nothing more than sophisticated calculators. They lacked proper user interfaces, and users communicated their intentions by punching carefully placed holes on cards. Advanced computers and handhelds of the present time are no better -- we communicate our intentions by punching virtual buttons and menus. We need to move beyond punching buttons by building smart systems that can reason about the intentions of their users. In this talk, I will talk about how insights gained from careful analysis of natural human behavior can be used to build systems that can predict user intentions, and take proactive action accordingly.

Towards Building a Mobility and Navigation Aid for Seeing Impaired Users

Rainer Stiefelhagen , Karlsruhe Institute of Technology, Germany


According to the World Health Organization there are 285 million visually impaired persons worldwide, of which 39 million are blind and 246 million have low vision. The partial of full loss of sight leads to a number of challenges that seeing impaired persons have to face. These include difficulties in mobility and navigation in unknown terrain, missing information in social interaction or handling and finding of objects in daily live.

Over the last years, the field of computer vision has made tremendous progress, and methods for object, person and text recognition are now widely used in commercial applications including production, gaming, security, image retrieval and driver assistance systems.

Computer vision methods also have a huge potential to be used to build assistive technology for blind or low vision users that can for example inform them about objects, people, landmarks, texts or obstacles in their vicinity. So far, however, there has been little effort to build practical assistive technology for visually impaired persons that make use of such methods.

In my presentation I will talk about our recent efforts towards building a mobility aid for seeing impaired users. Our goal is to build a mobile assistive device that uses cameras to observe the vicinity of the user, in order to detect obstacles, traffic signs and landmarks and then inform the user about the relevant information. An important issue here is the design of the user interface, especially the way in which information is conveyed to the user. To this end we are investigating the use of haptic and audio feedback. In my talk I will report some first results from a small user study with blindfolded and with seeing impaired users.

Multimodal Interaction Research at Sabanci University

Berrin Yanıkoğlu , Sabancı University, Turkey


I will overview research in the area of human-computer interfaces and data visualization at Sabanci University. Handwriting recognition is gaining interest through the increase of pen-enabled devices (tablets, phones) and provides a natural form of communication. Handwriting recognition presents extra challenges for Turkish, as it is an agglutinative language. While handwriting modality is useful or even needed in some applications, brain-computer interfaces (BCI) via EEG signals is necessary for people with disabilities who cannot use motor controls. Furthermore, BCI applications for healthy individuals have started to emerge as well. Lastly, augmented reality and visualization of big data help people to interact with their environment or interact with data. I will present brief overview of state-of-the-art and current challenges in these three areas.