What Communication Modalities Do Users Prefer in Real Time HRI? Ori Novanda1,2,3, Maha Salem3, Joe Saunders3, Michael L. Walters3, and Kerstin Dautenhahn3 Abstract. This paper investigates users’ preferred interaction modalities when playing an imitation game with KASPAR, a small child-sized humanoid robot. The study involved 16 adult participants teaching the robot to mime a nursery rhyme via one of three interaction modalities in a real-time Human-Robot Interaction (HRI) experiment: voice, guiding touch and visual demonstration. The findings suggest that the users appeared to have no preference in terms of human effort for completing the task. However, there was a significant difference in human enjoyment preferences of input modality and a marginal difference in the robot’s perceived ability to imitate. 1 INTRODUCTION Humans often use multi-modal interaction in daily communication and frequently use speech, physical gesture, and eye gaze when communicating with each other. In contrast, people do not usually interact with machines in the same way they interact with other humans. For example, when we open the fridge door in the morning, we do not usually greet it as we would another person. With the recent advances in technology, it is now quite common for people to speak to some machines. High-end consumer products such as smartphones and tablets have enough computing power to capture human speech and translate it into text commands. This allows people to use their voice to interact with the applications running on the device. This technology has given rise to digital virtual assistants such as: Siri [1] on the iOS platform, Google Now [2] on the Android platform, and Cortana [3] on the Windows platform. These systems enable people to get information simply by asking the device. For example, asking what the weather will be like, or when a flight will leave. Language learning programs, such as Duolingo [4], prompt users to say sentences and use a voice to text translation method to accept their answer. Traditionally robots have been associated with factories for building products such as cars. However, robots are now increasingly being used in a number of application areas where people can interact with them in a more natural way, in some ways similar to how they would interact with living creatures, such as indicated in the survey by Leite et al. [5]. For example, Pleo [6] changes its behaviour depending on how the user interacts with it, and Fernaeus et al. [7] used it to learn how people play with a robotic animal. KASPAR, a child-size 1 Dept. of Electrical Engineering, Universitas Sumatera Utara, Indonesia, ori@usu.ac.id 2 This author received a scholarship from the General Directorate of Higher Education of Ministry of Education and Culture of Indonesia 3 Adaptive Systems Research Group, University of Hertfordshire, United Kingdom humanoid robot, has primarily been developed as a mediator to interact with children with autism in order to encourage basic communication and social interaction skills [8]. The consumer and research robot NAO [9] has been programmed to fulfil many tasks, one of which is as a companion robot (see Dautenhahn [10]) such as used in the research by Baxter et al. [11]. Since Sheridan [12] first associated Human-Robot Interaction (HRI) with teleoperation of factory robotic platforms, HRI research has extended into a number of different research areas (Goodrich and Schultz [13]). One of the areas of particular interest in recent years is multi-modal interfaces for multi-modal interactions. Stiefelhagen et al. [14] suggested that multi-modal interfaces are required to facilitate natural interaction. When humans are interacting with machines that have some human- like characteristics, they have a tendency to anthropomorphise with the machine and communicate in ways similar to human- human communication [15]. One of the objectives of HRI is to make human-robot interaction easier, more intuitive and more user friendly. By providing a multi-modal interface it may help keep the users engaged and interact with them in a more familiar manner, similar in some ways to which they may interact with other humans. Although interactive multi-modal systems have some distinct advantages, developing such systems poses many challenges. According to Turk [16], the performance of a multi-modal system depends on each unimodal technology. Currently each modality has its own ongoing progress as an active research field. For example, a survey by Argall and Billard [17] lists research that solely focuses on investigating the tactile input modality. Developing multi-modal interactive systems requires a substantial amount of computing power and robust integration algorithms. The integration algorithm of the robot’s sensing system needs to make decisions in real-time on which input to consider for giving an appropriate response or action through the robot’s actuators. The system has to be powerful enough to process different inputs such as visual, audio, and gesture cues. Integrating these social queues to flow naturally throughout the interaction session will also consume additional processing power. Providing a robust input modality and fusion to integrate all input data is a technically challenging task. Many hours of work would need to be devoted just to prepare the robot for a relatively simple task. This is one of the reasons that some HRI studies use Wizard-of-Oz [18] approaches to run experiments. By using these approaches, limitations on the technology can be set aside and replaced by behind-the-scene controllers to produce behaviour for the robot which is perceived by users as autonomous. The challenge of creating a multi-modal interactive robotic system has inspired the research in the current study which investigates users’ preferences of input modality when providing information to a robot. The study was designed to ask users to experience three different modalities whilst delivering the same instructions to the robot. 2 RELATED WORK The study took related research in Human-Computer Interaction (HCI) into consideration. As suggested by Kiesler and Hinds [19], and Breazeal [20], existing work in HCI offers rich resources and inspiration for research in HRI. The experiment “Put That There” by Bolt [21] is widely considered a pioneering demonstration that first showed the value and opportunity of multi-modal interfaces over uni-modal interfaces in HCI. The experiment was conducted using speech and gesture as command channels to draw a map. The multi-modal interface raised a question of when the system is capable of multi-modal interactions, will the users utilise the ability to interact multi-modally? Oviatt [22] discussed ten myths about multi-modal interaction that give useful guidance to researchers building multi-modal systems. He stated that with multi-modally capable systems, users tend to switch between uni-modal and multi-modal interaction with the multi-modal interactions being the most predictable, based on the type of action being performed. In a previous study Oviatt et al. [23] found that 86% of the time participants used multi-modal commands when navigating a map in order to move, add, modify, or calculate the distance between objects. For performing tasks that require no navigation of the map, such as printing the map, the participants interacted multi-modally less than 1% of the time. Later, Oviatt et al. [24] conducted an experiment using a Wizard-of-Oz approach, and concluded that the cognitive load of the task will drive the users’ preference towards either uni-modal or multi-modal interaction. Tasks with higher difficulty will often cause the users to utilize the multi-modality of the system. With repetitive tasks, users would initially communicate multi- modally. Once the tasks became more familiar they then tended to prefer one particular interaction modality four times more often than interacting multi-modally. Schüssel et al. [25] experimented using speech, gesture, and touch in multi-modal interactions to select graphical icons on a computer monitor. This experiment was also conducted using the Wizard-of-Oz approach and measured what modality was used and combined by the users to complete the task. The overall results of the modalities used were: touch (63.2%), speech (21.6%), gesture (11.2%), speech+gesture (3.6%), speech+touch (0.5%). None of the participants used speech+gesture+touch at the same time. Carbini et al. [26] observed users’ preferences for using a story telling game. Each user was given a task to compose a coherent story from a set of objects on a computer screen. It was found that children could easily interact using speech and gesture as compared to adults. The results of the full dataset were: gesture (45%), speech (5%), gesture+speech (50%). All of the research cited above was conducted in HCI domains, where the users interacted with computers. This current research is focused on the interaction between humans and robots. Presented below are some studies that are more closely related to research in HRI. Research by Khan [27] surveyed 134 respondents about their preferred interaction modalities with a robot. One of the questions asked in this survey was the preferred method of communicating with a service robot to take care of clothes on a couch, or when the robot is to inform the user that the task has been completed. The results showed that speech was the most preferred interaction modality (82%), followed by touch screen (63%), gestures (51%), and typing commands (45%). However, the results of this study are limited because the survey was conducted by asking participants to complete a questionnaire without the participants having interacted with an actual robot. Salem et al. [28] conducted research to compare the preference of modality in HRI. In contrast to the current research, they investigated the output side of the multi-modal interface. They examined the perceptions of users regarding a robot when the robot provides information to the human uni- modally (voice only) and multi-modally (voice and gesture). It was found that the robot was evaluated more positively if it displayed non-verbal behaviours, such as hand and arm gestures along with speech, even if they do not semantically match the spoken utterances. Humphrey and Adams [29] also conducted a study relevant to our current research, by measuring users’ preference for visualising a tele-operated robot’s compass. They compared two different compass visualisations: top-down and world-aligned. The top-down visualisation received higher preference, but there was no significant difference to the world-aligned visualisation 3 THE STUDY The study presented in this paper builds on two main observations from the related work discussed above which are: 1. As described in [24], simple task interaction can be conducted sufficiently using a uni-modal system only. 2. Previous research established significant differences of modality preference one over another and the most-preferred modality also differed ([25], [26], and [27]). Those considerations above come from the HCI research domain where humans interact with computers. This study puts them in HRI perspective, where humans interact with robots, to see whether they can be applicable to the HRI domain. Based on the first observation (1), our research investigated further the modality comparison by conducting an experiment that asked users to do a simple-task, comparing the using of specific and different modalities in different sessions. Based on the second consideration (2), the study also evaluated which modality was most preferred. This research aimed toward developing an autonomous humanoid robot that can perform a real-time multi-modal interaction. The developed system provides the capability to detect voice commands, and interprets gestures and touch. All processes run in parallel in real-time. In the discussion section, this paper presents the comparison of user preferences for the three input channel modalities when instructing the robot to move its arms. The basic idea of the experiment for the research was to develop a robot that can be taught to dance following music. This idea was limited in the required capability in order to match the robot’s physical limitations in speed of movement. The dance was changed to a simple mime task, and the music was limited to a single nursery rhyme. With these changes, the experiment became teaching the robot to mime following a nursery rhyme. The robot could be instructed to move its arms using voice commands, by the users' gestures, and by physically guiding the arms. The experiment was run non-intrusively so that the users did not need to use gloves or markers. The users also did not have to wear a microphone or headphone. The voice command system used a speaker-independent system so it did not have to be trained prior to the experiment. 4 EXPERIMENT SETUP This section describes the experimental setup for the study. The study was approved by the University of Hertfordshire Ethics Committee under protocol number a1213/10. Figure 1. KASPAR Robot 4.1 The Robot This research uses KASPAR [30], a child-alike humanoid robot (shown in Figure 1). It has 17 Degrees of Freedom (DoFs) and has an internal PC to run the robot autonomously. The robot uses eSpeak [31] text-to-speech engine for speaking. Figure 2. Compliance Mechanism For the study, a program was developed to feature a servo compliance system. The block diagram of the compliance system is shown in Figure 2. It has a controller that measures the servos’ torque values. This measurement is used to allow the software to detect whether the arms are being moved by an external force. It will then adjust the servos’ positions to comply with the external force. With this feature, users can move KASPAR’s arms without breaking the servos. This controller works independently and can override any arm movement commands sent by the higher level controller. In the current implementation, there was a time delay in the compliance controller’s loop path introduced by the hardware interface. This made the control bandwidth of the servos only achieve 1 Hz, which is lower than the human force control bandwidth which is around 20 Hz [32]. This made the arms slightly stiff to move. The system used an additional external PC beside the internal PC. The PC’s communicated using TCP/IP through an Ethernet connection. The robot was built to have a WiFi connection as well but this wireless connection was never used in the experiment because of the latency in data transmission. Figure 3. System Architecture The external PC runs the high demand processes, such as the gesture detection and speech recognition. The global architecture of the system can be seen in Figure 3. The GUI controller runs on the external PC and sends commands to the internal PC to control the robot. The robot has several force sensitive resistor (FSR) sensors to detect touches. They are located on both palms and on the upper arms. This research did not restrict the participants on where they could touch the robot when moving its arms. During the experiment, the system only used the compliance system mentioned above to allow the participants to move the robot’s arms physically. 4.2 Sensors KASPAR was equipped with sensors to provide the following input modalities: (i) voice command, (ii) gesture, and (iii) touch. The developed system uses the Microsoft speech recognition engine. With non-intrusive interaction in mind, the system uses a directional microphone to listen to the user’s voice. The microphone location was adjusted so the sound coming from the robot (voice and mechanical servo movements) was less likely to disturb the user’s voice. The speech recognition engine was programmed to detect 5 different commands that could be used to instruct the robot to move its arms. The robot has colour markers on its fingers (see Figure 1) to refer to the arms by colour instead of left and right (the former was deemed to be easier for participants to use when facing the robot). The markers are red and blue. The commands are: (i) red up, (ii) blue up, (iii) arms open, (iv) red down, and (v) blue down. As suggested by the name, ‘up’ and ‘down’ commands will instruct the corresponding red or blue arm to go up or down. The ‘arms open’ command will make both arms open wide. The system could only detect one particular command at a time. After saying a command, the user was expected to wait for the robot to respond before saying the next command. A Microsoft Kinect was used by the system to detect the human partner's gestures. The Kinect SDK provided a skeleton representation of the user's position and pose. The position of the wrists were measured and interpreted as commands to move the robot arms. The system was programmed so that it only detected 5 positions, which were equivalent to the 5 voice commands. Touch input modality was provided to the robot by using the developed compliance system. The users could move the robot’s arms by moving the arm directly. They could hold any part of the arm in order to move it e.g. the users could move the arms by moving the upper arm or moving the hand. The latter requires smaller force because it is further away from the shoulder joint. 4.3 Layout The physical layout of the experiment is shown in Figure 4. The robot was ‘sitting’ on the table and the Kinect sensor was located next to the robot. Video cameras were used to record the activities during the experiment sessions. Figure 4. Experiment layout Next to the robot was an instruction sign (see Figure 5) which reminded the user of the five instructions that could be used to control the robot. The instruction sign showed arrows to reflect the direction of the arms movement. Figure 5. Instruction sign 4.4 Interaction Scenario The task given to the participants in this study was teaching a humanoid robot to mime to a rhyme. The rhyme was ‘Hickory Dickory Dock’. The participants had to instruct the robot to move the arms to mime by following the lines of the rhyme. The task was repeated in several sub-sessions by only allowing one or two of these modalities in each session: voice, gesture, touch, and voice+gesture. 4.5 Experiment Procedure Before starting the experiment, the participants completed a demographic questionnaire and signed a consent form. The experiment was divided into two main sessions: 1. Introduction session In the beginning, the participant was introduced to the robot and asked to shake its hand. This was to familiarise the participants with the robot, and to let them know that it was fine to physically move its ‘red arm’ (right arm), even though it felt slightly stiff. Next, they were introduced to the nursery rhyme, and told what to do during the main trial session. The participants were also instructed on how to move the arms using each input modality. During the introduction session, the robot was operated semi- autonomously using a wireless clicker to advance between sub- sessions. At the end of the introduction session, the participants were told that the following was the main trial, and the robot would run fully autonomously. 2. Main trial session In the main trial, the participants were left alone interacting with the robot which ran autonomously. The investigator stayed in the same room reading a book and sat back-facing the participants at a table without any computer or electronics devices. The participants were told that in case of emergency or if they wanted to stop, they could notify the investigator at any time. The trial was run individually with a single participant for each trial session. The robot first asked the participants to instruct it on how to move in order to follow the nursery rhyme. The robot said the rhyme, and the participant should then instruct the robot to move for each line of the rhyme. The participant could instruct the robot to move the arms while the robot said the rhyme, except in the voice command mode session, where the participants were instructed (by the robot) to say the command after the robot has finished saying the rhyme. In the touch modality sessions, the participants had to move forward close to the robot to move its arms. In total, there were 4 sub-sessions in the main trial. Each sub- session presented to the participant a different input modality. The first three were arranged so each participant had a different order of voice, gesture, and touch modalities. In total there were 9 possible different orders. In the fourth sub-session, the participant was asked to instruct the robot using a freely chosen combination of gesture and voice commands. After each sub- session, the robot performed the complete ‘dance’ with movements and timings specified by the commands that had been given by the participant. After the main trial session, a second questionnaire recorded the users’ preferences of the methods to teach the robot. Before the whole session ended, the participants were also asked verbally whether they had any comments they wanted to express regarding the experiment. 4.6 Dependent Measurements The post-trial questionnaire asked four questions using the Likert scale, and the participants rated their answers on a scale from 1 to 5. The first one was “Did you fully understand what instructions KASPAR said during the main session?” (1 being “not very well” and 5 being “very well”). The second question was “In terms of effort, how did you feel about the different methods to teach KASPAR to dance?” (1 being “very hard” and 5 being “very easy”). The third question was “In terms of enjoyment, how did you feel about the different methods to teach KASPAR to dance?” (1 = least enjoyable, 5 = most enjoyable). The fourth question asked “When KASPAR showed what it had learned, how well did you feel KASPAR followed your instruction?” (1 = not very well, 5 = very well). Every question from 2 to 4 had separate answers for each interaction modality. 5 RESULTS The experiment was conducted with 16 participants; six females and 10 males aged 20 to 48 years old. They were recruited from the university staff and students. The invitation was advertised verbally and they were given a link of an online scheduler (Doodle [33]) to pick the available time slots that were suitable for them. In each gender category, 1 person was very familiar with robotic systems, while none had a prior knowledge of the robot setup that was used in this experiment. Figure 6. Questionnaire result on human effortlessness Figure 7. Questionnaire result on human enjoyment Figure 8. Questionnaire result on different instruction modalities For the first question of the questionnaire, that asked whether the participants fully understood what the robot said during the experiment, no participant selected a value lower than 4. The mean score was 4.56 (SD = 0.51). The middle point of the answer was weighted as 3. The questionnaire result on the effort to teach the robot to dance is shown in Figure 6. The data were checked using one- way repeated-measures ANOVA. The result was F(3,42) = 0.848, p = 0.476, which meant none was significant. The result suggests that no particular modality is perceived as harder than the others. The result that is shown in Figure 7 shows participants' perceived enjoyment of conducting the task for each modality. The touch modality received the least enjoyable rating. The statistical analyses indicated a significant difference in preferences, F(3,42) = 6.461, p = 0.001. The pairwise comparisons results indicated that there was a significant difference (p = 0.008) between participants ratings for gesture (M = 4.4, SD = 0.74) and touch (M = 3.07, SD = 1,28) interaction modalities. Finally, Figure 8 shows the participants’ perception of the robot's ability to follow instructions. The difference was marginally significant, F(3,39) = 2.56, p = 0.069. The pairwise comparisons showed a preference (p = 0.011) for touch (M = 4.43, SD = 0.65) over voice+gesture (M = 3.43, SD = 0.85). 6 DISCUSSION This research has investigated a robotic system that can be taught movements to follow a nursery rhyme. The development of the software is only presented briefly as it would be better to be presented as a technical paper. Three modalities were provided as input channels to give information to the robot as commands to move its arms. They are voice, gesture, and touch. Two modalities were provided as output channels: voice and gesture. The robot operated autonomously during individual sessions. The robot had touch-compliance which allows humans to physically move its arms into a desired pose. The system supported integration of multiple modalities through a TCP/IP- based inter-process communication mechanism. The experiment was conducted with adult participants. The research findings indicated that being given a task which was to teach a robot to mime actions that follow a nursery rhyme, there was no statistically significant difference in preference ratings regarding human effort. In contrast, there were favourable preferences regarding the human enjoyment. The touch modality was the least preferred and the gesture modality was rated the highest. The authors argue that the touch modality scored lowest due to the participants worrying about breaking the arms of the robot. This was because the compliance only controlled the arms compliance at a 1 Hz cycle rate instead of 20 Hz (see [32]). For the robot’s perceived ability to follow instructions, touch modality received the highest rating. The combined voice+gesture modalities received the lowest. This could be due to the robot only performing the instructed action after the voice command had completed, while the action after the gesture mode interaction was followed immediately. However, they were not statistically significant at the 5 % level, and only indicated a trend towards higher mean preference to the touch modality. In general, without considering the task, the results are in contrast to the result in [25], [26], and [27]. However, this contrast indicates an agreement with [22] and [24], namely that for certain tasks humans can communicate to robots effectively using a uni-modal communication channel. 7 FUTURE WORK This research is eventually aiming to evaluate how best to teach a robot and what constitutes an effective teaching strategy. The work presented here is an initial attempt towards that direction, and further research is required. The software system could be further developed to accommodate more complex input interfaces. It would also be useful to conduct the same experiment with different user groups, e.g. children or people with special needs. 8 REFERENCES [1] “Siri.” [Online]. Available: https://www.apple.com/uk/ios/siri/. [Accessed: 03-Jul-2015]. [2] “Google Now.” [Online]. Available: https://www.google.com/landing/now/. [Accessed: 03-Jul-2015]. [3] “Cortana.” [Online]. Available: www.microsoft.com/en- /mobile/experiences/campaign-cortana/. [Accessed: 03-Jul-2015]. [4] L. von Ahn, “Duolingo: learn a language for free while helping to translate the web,” in Proceedings of the 2013 international conference on Intelligent user interfaces, 2013, pp. 1–2. [5] I. Leite, C. Martinho, and A. Paiva, “Social robots for long-term interaction: a survey,” Int. J. Soc. Robot., vol. 5, no. 2, pp. 291–308, 2013. [6] “Pleoworld.com.” [Online]. Available: pleoworld.com. [Accessed: 03-Jun-2015]. [7] Y. Fernaeus, M. Håkansson, M. Jacobsson, and S. Ljungblad, “How do you play with a robotic toy animal?: a long-term study of pleo,” in Proceedings of the 9th international Conference on interaction Design and Children, 2010, pp. 39–48. [8] B. Robins, K. Dautenhahn, and P. Dickerson, “From isolation to communication: a case study evaluation of robot assisted play for children with autism with a minimally expressive humanoid robot,” in Advances in Computer-Human Interactions, 2009. ACHI’09. Second International Conferences on, 2009, pp. 205–211. [9] D. Gouaillier, V. Hugel, P. Blazevic, C. Kilner, J. Monceaux, P. Lafourcade, B. Marnier, J. Serre, and B. Maisonnier, “The nao humanoid: a combination of performance and affordability.” [10] K. Dautenhahn, “Socially intelligent robots: dimensions of human-- robot interaction,” Philos. Trans. R. Soc. B Biol. Sci., vol. 362, no. 1480, pp. 679–704, 2007. [11] P. Baxter, T. Belpaeme, L. Canamero, P. Cosi, Y. Demiris, V. Enescu, A. Hiolle, I. Kruijff-Korbayova, R. Looije, M. Nalin, and others, “Long-term human-robot interaction with young users,” in IEEE/ACM Human-Robot Interaction 2011 Conference (Robots with Children Workshop), 2011. [12] T. B. Sheridan, Telerobotics, automation, and human supervisory control. MIT press, 1992. [13] M. A. Goodrich and A. C. Schultz, “Human-robot interaction: a survey,” Found. trends human-computer Interact., vol. 1, no. 3, pp. 203–275, 2007. [14] R. Stiefelhagen, C. Fügen, P. Gieselmann, H. Holzapfel, K. Nickel, and A. Waibel, “Natural human-robot interaction using speech, head pose and gestures,” in Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on, 2004, vol. 3, pp. 2422–2427. [15] D. Perzanowski, A. C. Schultz, W. Adams, E. Marsh, and M. Bugajska, “Building a multimodal human-robot interface,” Intell. Syst. IEEE, vol. 16, no. 1, pp. 16–21, 2001. [16] M. Turk, “Multimodal interaction: A review,” Pattern Recognit. Lett., vol. 36, pp. 189–195, 2014. [17] B. D. Argall and A. G. Billard, “A survey of tactile human--robot interactions,” Rob. Auton. Syst., vol. 58, no. 10, pp. 1159–1176, 2010. [18] A. Steinfeld, O. C. Jenkins, and B. Scassellati, “The oz of wizard: simulating the human for interaction research,” in Human-Robot Interaction, 2009 4th ACM/IEEE Int. Conf. on, 2009, pp. 101–107. [19] S. Kiesler and P. Hinds, “Introduction to this special issue on human-robot interaction,” Human--Computer Interact., vol. 19, no. 1–2, pp. 1–8, 2004. [20] C. Breazeal, “Social interactions in HRI: the robot view,” Syst. Man, Cybern. Part C Appl. Rev. IEEE Trans., vol. 34, no. 2, pp. 181–186, 2004. [21] R. A. Bolt, “‘Put-that-there’: Voice and gesture at the graphics interface,” ACM SIGGRAPH Comput. Graph., vol. 14, no. 3, pp. 262–270, Jul. 1980. [22] S. Oviatt, “Ten myths of multimodal interaction,” Commun. ACM, vol. 42, no. 11, pp. 74–81, Nov. 1999. [23] S. Oviatt, A. DeAngeli, and K. Kuhn, “Integration and synchronization of input modes during multimodal human-computer interaction,” in Referring Phenomena in a Multimedia Context and their Computational Treatment, 1997, pp. 1–13. [24] S. Oviatt, R. Coulston, and R. Lunsford, “When do we interact multimodally?: cognitive load and multimodal communication patterns,” … Conf. Multimodal interfaces, pp. 129–136, 2004. [25] F. Schüssel, F. Honold, and M. Weber, “Influencing factors on multimodal interaction during selection tasks,” J. Multimodal User Interfaces, vol. 7, no. 4, pp. 299–310, 2013. [26] S. Carbini, L. Delphin-Poulat, L. Perron, and J.-E. Viallet, “From a wizard of Oz experiment to a real time speech and gesture multimodal interface,” Signal Processing, vol. 86, no. 12, pp. 3559– 3577, 2006. [27] Z. Khan, “Attitudes towards intelligent service robots,” NADA KTH, Stock., vol. 17, 1998. [28] M. Salem, S. Kopp, I. Wachsmuth, K. Rohlfing, and F. Joublin, “Generation and evaluation of communicative robot gesture,” Int. J. Soc. Robot., vol. 4, no. 2, pp. 201–217, 2012. [29] C. M. Humphrey and J. A. Adams, “Compass visualizations for human-robotic interaction,” in Proceedings of the 3rd ACM/IEEE int. conf. on Human robot interaction, 2008, pp. 49–56. [30] K. Dautenhahn, C. L. Nehaniv, M. L. Walters, B. Robins, H. Kose- Bagci, N. A. Mirza, and M. Blow, “KASPAR--a minimally expressive humanoid robot for human--robot interaction research,” Appl. Bionics Biomech., vol. 6, no. 3–4, pp. 369–397, 2009. [31] “eSpeak.” [Online]. Available: http://espeak.sourceforge.net/. [Accessed: 03-Jul-2015]. [32] H. Z. Tan, M. A. Srinivasan, B. Eberman, and B. Cheng, “Human factors for the design of force-reflecting haptic interfaces,” Dyn. Syst. Control, vol. 55, no. 1, pp. 353–359, 1994. [33] “Doodle.” [Online]. Available: doodle.com. [Accessed: 03-Jun- 2015].