Is Spoken Language All-or-Nothing? Implications for future speech-based human-machine interaction Roger K. Moore Speech and Hearing Research Group, Dept. Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK r.k.moore@sheffield.ac.uk http://www.dcs.shef.ac.uk/~roger/ Abstract. Recent years have seen significant market penetration for voice-based personal assistants such as Apple’s Siri. However, despite this success, user take-up is frustratingly low. This position paper ar- gues that there is a habitability gap caused by the inevitable mismatch between the capabilities and expectations of human users and the fea- tures and benefits provided by contemporary technology. Suggestions are made as to how such problems might be mitigated, but a more worri- some question emerges: “is spoken language all-or-nothing”? The answer, based on contemporary views on the special nature of (spoken) language, is that there may indeed be a fundamental limit to the interaction that can take place between mismatched interlocutors (such as humans and machines). However, it is concluded that interactions between native and non-native speakers, or between adults and children, or even between hu- mans and dogs, might provide critical inspiration for the design of future speech-based human-machine interaction. Keywords: spoken language; habitability gap; human-machine interac- tion 1 Introduction The release in 2011 of Siri, Apple’s voice-based personal assistant for the iPhone, signalled a step change in the public perception of spoken language technology. For the first time, a significant number of everyday users were exposed to the possibility of using their voice to enter information, navigate applications or pose questions - all by speaking to their mobile device. Of course, voice dicta- tion software had been publicly available since the release of Dragon Naturally Speaking in 1997, but such technology only found success in niche market areas for document creation (by users who could not or would not type). In contrast, Siri appeared to offer a more general-purpose interface that thrust the potential benefits of automated speech-based interaction into the forefront of the public’s imagination. By combining automatic speech recognition and speech synthesis arXiv:1607.05174v1 [cs.HC] 18 Jul 2016 2 Is Spoken Language All-or-Nothing? with natural language processing and dialogue management, Siri promoted the possibility of a more conversational interaction between users and smart devices. As a result, competitors such as Google Now and Microsoft’s Cortana soon fol- lowed1. Of course, it is well established that, while voice-based personal assistants such as Siri are now very familiar to the majority of mobile device users, their practical value is still in doubt. This is evidenced by the preponderance of videos on YouTubeTM that depict humorous rather than practical uses; it seems that people give such systems a try, play around with them for a short while and then go back to their more familiar ways of doing things. Indeed, this has been confirmed by a recent survey of users from around the world which showed that only 13% of the respondents used a facility such as Siri every day, whereas 46% had tried it once and then given up (citing inaccuracy and a lack of privacy as key reasons for abandoning it) [2]. This lack of serious take-up of voice-based personal assistants could be seen as the inevitable teething problems of a new(ish) technology, or it could be evidence of something more deep-seated. This position paper addresses these issues, and attempts to tease out some of the overlooked features of spoken language that might have a bearing on the success or failure of voice-based human-machine interaction. In particular, attention is drawn to the inevitable mismatch between the capabilities and expectations of human users and the features and benefits provided by contemporary technical solutions. Some suggestions are made as to how such problems might be mitigated, but a more worrisome question emerges: “is spoken language all-or-nothing”? 2 The Nature of the Problem There are many challenges facing the development of effective voice-based human- machine interaction [3,4]. As the technology has matured, so the applications that are able to be supported have grown in depth and complexity (see Fig.1). From the earliest military Command and Control Systems to contemporary com- mercial Interactive Voice Response (IVR) Systems and the latest Voice-Enabled Personal Assistants (such as Siri), the variety of human accents, competing signals in the acoustic environment and the complexity of the application sce- nario have always presented significant barriers to practical usage. Considerable progress has been made in all of the core technologies, particularly following the emergence of the data-driven stochastic modelling paradigm [5] (now supple- mented by deep learning [6]) as a key driver in pushing regularly benchmarked performance in a positive direction. Yet, as we have seen, usage remains a serious issue; not only does a speech interface compete with very effective non-speech GUIs [7], but people have a natural aversion to talking to machines in public spaces [2]. As Nass & Brave stated in their seminal book Wired for Speech [8]: 1 See [1] for a comprehensive review of the history of speech technology R&D up to, and including, the release of Siri. Is Spoken Language All-or-Nothing? 3 “voice interfaces have become notorious for fostering frustration and failure” (p.6). Fig. 1. The evolution of spoken language technology applications from early military Command and Control Systems to future Autonomous Social Agents (robots). These problems become magnified as the field moves forward to develop- ing voice-based interaction with Embodied Conversational Agents (ECAs) and Autonomous Social Agents (robots). In these futuristic scenarios, it is assumed that spoken language will provide a “natural” conversational interface between human beings and so-called intelligent systems. However, there many additional challenges which need to be overcome in order to address such a requirement . . . “We need to move from developing robots that simply talk and listen to evolving intelligent communicative machines that are capable of truly understanding human behaviour, and this means that we need to look beyond speech, beyond words, beyond meaning, beyond communication, beyond dialogue and beyond one-offinteractions.” [9] (p.321) Of these, a perennial problem seems to be how to evolve the complexity of voice-based interfaces from simple structured dialogues to more flexible con- versational designs without confusing the user [10,11,12]. Indeed, it has been 4 Is Spoken Language All-or-Nothing? known for some time that there appears to be a non-linear relationship between flexibility and usability [13] - see Fig.2. As flexibility increases with advancing technology, so usability increases until users no longer know what they can and cannot say, at which point usability tumbles and interaction falls apart. Fig. 2. Illustration of the consequences of increasing the flexibility of spoken language dialogue systems; increasing flexibility can lead to a habitability gap where usability drops catastrophically (reproduced, with permission, from Mike Phillips [13]). This means that it is surprisingly difficult to deliver a technology corresponding to the point marked ‘??’. Siri corresponds to the point marked ‘Add NL/Dialog’. 2.1 The “Habitability Gap” Progress is being made in this area: for example, by providing targeted help to users [14,15,16] and by replacing the traditional notion of turn-taking with a more fluid interaction based on incremental processing [17,18]. Likewise, simple slot-filling approaches to language understanding and generation are being re- placed by sophisticated statistical methods for estimating dialogue states and optimal next moves [19,20]. Nevertheless, it is still the case that there is a hab- itability gap of the form illustrated in Fig.2. Is Spoken Language All-or-Nothing? 5 In fact, the shape of the curve illustrated in Fig.2 is virtually identical to the famous Uncanny Valley effect [21] in which a near human-looking artefact (such as a humanoid robot) can trigger feelings of eeriness and repulsion in an observer; as human likeness increases, so affinity increases until a point where artefacts start to appear creepy and affinity goes negative. A wide variety of explanations have been suggested for this non-linear relationship but, to date, there is only one quantitative model [22], and this is founded on the combined effect of categorical perception and mismatched perceptual cues giving rise to a form of perceptual tension. The implication of this model is that uncanni- ness - and hence, habitability - can be avoided if care is taken to align how an autonomous agent looks, sounds and behaves [23,9]. In other words, if a speech- enabled agent is to converse successfully with a human being, it should make clear its interactional affordances [24,25]. This analysis leads to an important implication - since a spoken language system consists of a number of different components, each of which possesses a certain level of technical capability, then in order to be coherent (and hence usable), the design of the overall system needs to be aligned to the component with the lowest level of performance. For example, giving an automated personal assistant a natural human voice is a recipe for user confusion in the (normal) situation where the other speech technology components are limited in their abilities. In other words, in order to maximise the effectiveness of the interaction, a speech-enabled robot should have a robot voice. As Bruce Balentine succinctly puts it [26]: “It’s better to be a good machine than a bad person”! This is an unpopular result2, but there is evidence of its effectiveness [27], and it clearly has implications for contemporary voice-based personal assistants such as Siri, Google Now and Cortana which employ very humanlike voices3. Of course, some might claim that the habitability problem only manifests itself in applications where task-completion is a critical measure of success. The suggestion would be that the situation might be different for applications in domains such as social robots, education or games in which the emphasis would be more on the spoken interaction itself. However, the argument presented in this paper is not concerned with the nature of the interaction, rather it questions whether such speech-based interaction can be sustained without access to the notion of full language. 2.2 Half a Language? So far, so good - as component technologies improve, so the flexibility of the overall system would increase, and as long as the capabilities of the individual 2 It is often argued that such an approach is unimportant as users will habituate. However, habituation only occurs after sustained exposure, and a key issue here is how to increase the effectiveness of first encounters (since that has a direct impact on the likelihood of further usage). 3 Interestingly, these ideas do appear to be having some impact on the design of contemporary autonomous social agents such as Jibo (which has a childlike and mildly robotic voice) [28]. 6 Is Spoken Language All-or-Nothing? components are aligned, it should be possible to avoid falling into the habitability gap. However, sending mixed messages about the capabilities of a spoken language system is only one part of the story; even if a speech-based autonomous social agent looks, sounds and behaves in a coherent way, will users actually be able to engage in conversational interaction if the overall capability is less than that normally enjoyed by a human being? What does it mean for a language-based system to be compromised in some way? How can users know what they may and may not say [29,15], or even if this is the right question? Is there such a thing as half a language and, if so, is it habitable? Indeed, what is language anyway? 3 What is Language? Unfortunately there is no space here to review the extensive and, at times, controversial history of the scientific study of language, or of the richness and variety of its spoken (and gestural) forms. Suffice to say that human beings have evolved a prolific system of (primary vocal) interactive behaviours that is vastly superior to that enjoyed by any other animal [30,31,32,33,34]. As has been said a number of times . . . “Spoken language is the most sophisticated behaviour of the most com- plex organism in the known universe.” [35]. The complexity and sophistication of (spoken) language tends to be masked by the apparent ease with which we, as human beings, use it. As a consequence, engineered solutions are often dominated by a somewhat na¨ıve perspective in- volving the coding and decoding of messages passing from one brain (the sender) to another brain (the receiver). In reality, languaging is better viewed as an emer- gent property of the dynamic coupling between cognitive unities that serves to facilitate distributed sense-making through cooperative behaviours and, thereby, social structure [36,37,38,39,40]. Furthermore, the contemporary view is that lan- guage is based on the co-evolution of two key traits - ostensive-inferential com- munication and recursive mind-reading (including ‘Theory-of-Mind’) [41,42,43] - and that abstract (mental) meaning is grounded in the concrete (physical) world through metaphor [44,45]. These modern perspectives on language not only place strong emphasis on pragmatics [46], but they are also founded on an implicit assumption that in- terlocutors are conspecifics4 and hence share significant priors. Indeed, evidence suggests that some animals draw on representations of their own abilities (ex- pressed as predictive models [47]) in order to interpret the behaviours of others [48,49]. For human beings, this is thought to be a key enabler for efficient recur- sive mind-reading and hence for language [50,51]. Several of these advanced concepts may be usefully expressed in pictographic form [52] - see Fig.3. 4 Members of the same species. Is Spoken Language All-or-Nothing? 7 Fig. 3. Pictographic representation of language-based coupling (dialogue) between two human interlocutors [52]. One interlocutor (and its environment) is depicted using solid lines and the other interlocutor (and its environment) is depicted using broken lines. As can be seen, communicative interaction is founded on two-way ostensive recursive mind-reading (including mutual Theory-of-Mind). 8 Is Spoken Language All-or-Nothing? So now we arrive at an interesting position; if (spoken) language interaction between human beings is grounded through shared experiences, representations and priors, to what extent is it possible to construct a technology that is intended to replace one of the participants? For example, if one of the interlocutors illus- trated in Fig.3 is replaced by a cognitive robot (as in Fig.4), then there will be an inevitable mismatch between the capabilities of the two partners, and coupled ostensive recursive mind-reading (i.e. full language) cannot emerge. Fig. 4. Pictographic representation of coupling between a human being (on the left) and a cognitive robot (on the right). The robot lacks the capability of ostensive recursive mind-reading (it has no Theory-of-Mind), so the interaction is inevitably constrained. Could it be that there is a fundamental limit to the language-based inter- action that can take place between unequal partners - between humans and machines? Indeed, returning to the question posed in Section 2.2 “Is there such a thing as half a language?”, the answer seems to be “no”; spoken language does appear to be all-or-nothing . . . “The assumption of continuity between a fully coded communication system at one end, and language at the other, is simply not justified.” [41] (p.46). Is Spoken Language All-or-Nothing? 9 4 The Way Forward? The story thus far provides a compelling explanation of the less-than-satisfactory experiences enjoyed by existing users of speech-enabled systems and identifies the source of the habitability gap outlined in Section 2.1. It would appear that, due to the gross mismatch between their respective priors, it might be impos- sible to create an automated system that would be capable of a sustained and productive language-based interaction with a human being (except in narrow specialised domains involving experienced users). The vision of constructing a general-purpose voice-enabled autonomous social agent may be fundamentally flawed - the equivalent of trying to build a vehicle that travels faster than light! However, before we give up all hope, it is important to note that there are sit- uations where voice-based interaction between mismatched partners is successful - but these are very different from the scenarios that are usually considered when designing current speech-based systems. For example, human beings regularly engage in vocal interaction with members of a different cultural and/or linguis- tic and/or generational background5. In such cases, all participants dynamically adjust many aspects of their behaviour - the clarity of their pronunciation, their choice of words and syntax, their style of delivery, etc. - all of which may be con- trolled by the perceived effectiveness of the interaction (that is, using feedback in a coupled system). Indeed, a particularly good example of such accommodation between mismatched interlocutors is the different way in which caregivers talk to young children (termed “parentese”) [53]. Maybe these same principles should be applied to speech-based human-machine interaction? Indeed, perhaps we should be explicitly studying the particular adaptations that human beings make when attempting to converse with autonomous social agents - a new variety of spoken language that could be appropriately termed “robotese”6. Of course, these scenarios all involve spoken interaction between one human being and another, hence in reality there is a huge overlap of priors in terms of bodily morphology, environmental context and cognitive structure, as well as learnt social and cultural norms. Arguably the largest mismatch arises between an adult and a very young child, yet this is still interaction between members of the same species. A more extreme mismatch exists between non-conspecifics; for example, between humans and animals. However, it is interesting to note that our nearest relatives - the apes - do not have language, and this seems to be because they do not have the key precursor to language: ostensive communication (apes do not seem to understand pointing gestures) [41]. Interestingly, one animal - the domestic dog - appears to excel in ostensive communication and, as a consequence, dogs are able to engage in very pro- ductive spoken language interaction with human partners (albeit one-sided and somewhat limited in scope) [55,41]. Spoken human-dog interaction may thus 5 Interestingly, Nass & Brave [8] noted that people speak to poor automatic speech recognition systems as if they were non-native listeners. 6 Unfortunately, this term has already been coined to refer to a robot’s natural lan- guage abilities in robot-robot and robot-human communication [54]. 10 Is Spoken Language All-or-Nothing? be a potentially important example of a heavily mismatched yet highly effec- tive cooperative configuration that might usefully inform spoken human-robot interaction in hitherto unanticipated ways. 5 Final Remarks This paper has argued that there is a fundamental habitability problem facing contemporary spoken language systems, particularly as they penetrate the mass market and attempt to provide a general-purpose voice-based interface between human users and (so-called) intelligent systems. It has been suggested that the source of the difficulty in configuring genuinely usable systems is twofold: first, the need to align the visual, vocal and behavioural affordances of the system, and second, the need to overcome the huge mismatch between the capabilities and expectations of a human being and the features and benefits offered by even the most advanced autonomous social agent. This led to the preliminary conclusion that spoken language may indeed be all-or-nothing. Finally, and on a positive note, it was observed that there are situations where successful spoken language interaction can take place between mismatched interlocutors (such as between native and non-native speakers, or between an adult and a child, or even between a human being and a dog). It is thus concluded that these scenarios might provide critical inspiration for the design of future speech-based human-machine interaction. Acknowledgement This work was supported by the European Commission [grant numbers EU-FP6- 507422, EU-FP6-034434, EU-FP7-231868 and EU-FP7-611971], and the UK En- gineering and Physical Sciences Research Council [grant number EP/I013512/1]. References 1. Pieraccini, R. (2012). The Voice in the Machine. MIT Press, Cambridge, MA. 2. Liao, S.-H. (2015). Awareness and Usage of Speech Technology. Masters thesis, Dept. Computer Science, University of Sheffield. 3. Deng, L., & Huang, X. (2004). Challenges in adopting speech recognition. Com- munications of the ACM, 47(1), 69-75. 4. Minker, W., Pittermann, J., Pittermann, A., Strauß, P.-M., & B¨uhler, D. (2007). Challenges in speech-based human-computer interfaces. International Journal of Speech Technology, 10(2-3), 109-119. 5. Gales, M., Young, S. J. (2007). The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 1(3), 195-304. 6. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine, IEEE. Is Spoken Language All-or-Nothing? 11 7. Moore, R. K. (2004). Modelling data entry rates for ASR and alternative input methods. In INTERSPEECH-ICSLP. Jeju, Korea. 8. Nass, C., & Brave, S. (2005). Wired for Speech: How Voice Activates and Advances the Human-computer Relationship. Cambridge, MA: MIT Press. 9. Moore, R. K. (2015). From talking and listening robots to intelligent communica- tive machines. In J. Markowitz (Ed.), Robots That Talk and Listen (pp. 317-335). Boston, MA: De Gruyter. 10. Bernsen, N. O., Dybkjaer, H., & Dybkjaer, L. (1998). Designing Interactive Speech Systems: From First Ideas to User Testing. London: Springer-Verlag. 11. McTear, M. F. (2004). Spoken Dialogue Technology: Towards the Conversational User Interface. London: Springer-Verlag. 12. Lopez Cozar Delgado, R. (2005). Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment. Wiley. 13. Philips, M. (2006). Applications of spoken language technology and systems. In M. Gilbert & H. Ney (Eds.), IEEE/ACL Workshop on Spoken Language Technology (SLT). 14. Tomko, S., Harris, T. K., Toth, A., Sanders, J., Rudnicky, A., & Rosenfeld, R. (2005). Towards efficient human machine speech communication. ACM Transac- tions on Speech and Language Processing, 2(1), 1-27. 15. Tomko, S. L. (2006). Improving User Interaction with Spoken Dialog Systems via Shaping. PhD Thesis, Carnegie Mellon University. 16. Komatani, K., Fukubayashi, Y., Ogata, T., & Okuno, H. G. (2007). Introducing utterance verification in spoken dialogue system to improve dynamic Help gener- ation for novice users. In 8th SIGdial Workshop on Discourse and Dialogue (pp. 202-205). 17. Schlangen, D., & Skantze, G. (2009). A general, abstract model of incremental dialogue processing. 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09). Athens, Greece. 18. Hastie, H., Lemon, O., & Dethlefs, N. (2012). Incremental spoken dialogue sys- tems: Tools and data. In Proceedings of NAACL-HLT Workshop on Future Direc- tions and Needs in the Spoken Dialog Community (pp. 15-16). Montreal, Canada. 19. Williams, J. D., & Young, S. J. (2007). Partially observable Markov decision pro- cesses for spoken dialog systems. Computer Speech and Language, 21(2), 231-422. 20. Gasic, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., & Young, S. J. (2013). POMDP-based dialogue manager adaptation to extended domains. In SIGDIAL (pp. 214-222). Metz, France. 21. Mori, M. (1970). Bukimi no tani (the uncanny valley). Energy, 7, 33-35. 22. Moore, R. K. (2012). A Bayesian explanation of the “Uncanny Valley” effect and related psychological phenomena. Nature Scientific Reports, 2(864). 23. Moore, R. K., & Maier, V. (2012). Visual, vocal and behavioural affordances: some effects of consistency. 5th International Conference on Cognitive Systems (CogSys 2012). Vienna. 24. Gibson, J. J. (1977). The theory of affordances. In R. Shaw & J. Bransford (Eds.), Perceiving, Acting, and Knowing: Toward an Ecological Psychology (pp. 67-82). Hillsdale, NJ: Lawrence Erlbaum. 25. Worgan, S., & Moore, R. K. (2010). Speech as the perception of affordances. Ecological Psychology, 22(4), 327-343. 26. Balentine, B. (2007). It’s Better to Be a Good Machine Than a Bad Person: Speech Recognition and Other Exotic User Interfaces at the Twilight of the Jetsonian Age. Annapolis: ICMI Press. 12 Is Spoken Language All-or-Nothing? 27. Moore, R. K., & Morris, A. (1992). Experiences collecting genuine spoken enquiries using WOZ techniques. 5th DARPA Workshop on Speech and Natural Language. New York. 28. Jibo: The World’s First Social Robot for the Home, https://www.jibo.com 29. Jokinen, K., & Hurtig, T. (2006). User expectations and real experience on a multimodal interactive system. In INTERSPEECH-ICSLP Ninth International Conference on Spoken Language Processing. Pittsburgh, PA, USA. 30. Gardiner, A. H. (1932). The Theory of Speech and Language. Oxford, England: Oxford Univ. Press. 31. Bickerton, D. (1995). Language and Human Behavior. Seattle, WA, US: University of Washington Press. 32. Hauser, M. D. (1997). The Evolution of Communication. The MIT Press. 33. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: what is it, who has it, and how did it evolve? Science, 298, 1569-1579. 34. Everett, D. (2012). Language: The Cultural Tool. London: Profile Books. 35. Moore, R. K. (2007). Spoken language processing: piecing together the puzzle. Speech Communication, 49(5), 418-435. 36. Maturana, H. R., & Varela, F. J. (1987). The Tree of Knowledge: The Biological Roots of Human Understanding. Boston, MA: New Science Library/Shambhala Publications. 37. Cummins, F. (2014). Voice, (inter-)subjectivity, and real time recurrent interac- tion. Frontiers in Psychology, 5, 760. 38. Bickhard, M. H. (2007). Language as an interaction system. New Ideas in Psy- chology, 25(2), 171-187. 39. Cowley, S. J. (Ed.). (2011). Distributed Language. John Benjamins Publishing Company. 40. Fusaroli, R., Raczaszek-Leonardi, J., & Tyl´en, K. (2014). Dialog as interpersonal synergy. New Ideas in Psychology, 32, 147-157. 41. Scott-Phillips, T. (2015). Speaking Our Minds: Why human communication is different, and how language evolved to make it special. Palgrave MacMillan. 42. Baron-Cohen, S. (1999). Evolution of a theory of mind? In M. Corballis & S. Lea (Eds.), The Descent of Mind: Psychological Perspectives on Hominid Evolution. Oxford University Press. 43. Malle, B. F. (2002). The relation between language and theory of mind in develop- ment and evolution. In T. Giv´on & B. F. Malle (Eds.), The Evolution of Language out of Pre-Language (pp. 265-284). Amsterdam: Benjamins. 44. Lakoff, G., & Johnson, M. (1980). Metaphors We Live By. Chicago: University of Chicago Press. 45. Feldman, J. A. (2008). From Molecules to Metaphor: A Neural Theory of Language. Bradford Books. 46. Levinson, S. C. (1983). Pragmatics. Cambridge: Cambridge University Press. 47. Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy principle. Phil. Trans. R. Soc. B, 364(1521), 1211-1221. 48. Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169-192. 49. Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131(3), 460-473. 50. Pickering, M. J., & Garrod, S. (2007). Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences, 11(3), 105-110. Is Spoken Language All-or-Nothing? 13 51. Garrod, S., Gambi, C., & Pickering, M. J. (2013). Prediction at all levels: forward model predictions can enhance comprehension. Language, Cognition and Neuro- science, 29(1), 46-48. 52. Moore, R. K. (2016). Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity. Int. J. Advanced Robotic Systems, 13(74). 53. Fernald, A. (1985). Four-month-old infants prefer to listen to Motherese. Infant Behavior and Development, 8, 181-195. 54. Matson, E. T., Taylor, J., Raskin, V., Min, B.-C., & Wilson, E. C. (2011). A natural language exchange model for enabling human, agent, robot and machine interaction. In The 5th International Conference on Automation, Robotics and Applications (pp. 340-345). IEEE. 55. Serpell, J. (1995). The Domestic Dog: Its Evolution, Behaviour and Interactions with People. Cambridge University Press.