Perception of Personality and Naturalness through Dialogues by Native Speakers of American English and Arabic Maxim Makatchev Robotics Institute Carnegie Mellon University Pittsburgh, PA, USA mmakatch@cs.cmu.edu Reid Simmons Robotics Institute Carnegie Mellon University Pittsburgh, PA, USA reids@cs.cmu.edu Abstract Linguistic markers of personality traits have been studied extensively, but few cross- cultural studies exist. In this paper, we eval- uate how native speakers of American English and Arabic perceive personality traits and nat- uralness of English utterances that vary along the dimensions of verbosity, hedging, lexical and syntactic alignment, and formality. The utterances are the turns within dialogue frag- ments that are presented as text transcripts to the workers of Amazon’s Mechanical Turk. The results of the study suggest that all four di- mensions can be used as linguistic markers of all personality traits by both language commu- nities. A further comparative analysis shows cross-cultural differences for some combina- tions of measures of personality traits and nat- uralness, the dimensions of linguistic variabil- ity and dialogue acts. 1 Introduction English has been used as a lingua franca across the world, but the usage differs. The variabilities in En- glish introduced by dialects, cultures, and non-native speakers result in different syntax and words ex- pressing similar meanings and in different meanings attributed to similar expressions. These differences are a source of pragmatic failures (Thomas, 1983): situations when listeners perceive meanings and af- fective attitudes unintended by speakers. For exam- ple, Thomas (1984) reports that usage of Illocution- ary Force Indicating Devices (IFIDs, such as “I warn you”, (Searle, 1969)) in English by native speak- ers of Russian causes the speakers to sometimes appear “inappropriately domineering in interactions with English-speaking equals.” Dialogue systems, just like humans, may misattribute attitudes and mis- interpret intent of user’s utterances. Conversely, they may also cause misattributions and misinterpreta- tions on the user’s part. Hence, taking into account the user’s dialect, culture, or native language may help reduce pragmatic failures. This kind of adaptation requires a mapping from utterances, or more generally, their linguistic fea- tures, to meanings and affective attributions for each of the target language communities. In this paper we present an exploratory study that evaluates such a mapping from the linguistic features of verbosity, hedging, alignment, and formality (as defined in Section 3.1) to the perceived personality traits and naturalness across the populations of native speak- ers of American English and Arabic. Estimating the relationship between linguistic features and their perception across language com- munities faces a number of methodological difficul- ties. First, language communities shall be outlined, in a way that will afford generalizing within their populations. Defining language communities is a hard problem, even if it is based on the “mother tongue” (McPherson et al., 2000). Next, linguistic features that are potentially important for the adap- tation must be selected. These are, for example, the linguistic devices that contribute to realization of rich points (Agar, 1994), i.e. the behaviors that sig- nal differences between language communities. To be useful for dialogue system research, the selected linguistic features should be feasible to implement in natural language generation and interpretation mod- arXiv:1105.4582v1 [cs.CL] 23 May 2011 ules. Then, a corpus of stimuli that span the variabil- ity of the linguistic features must be created. The stimuli should reflect the context where the dialogue system is intended to be used. For example, in case of an information-giving dialogue system, the stim- uli should include some question-answer adjacency pairs (Schegloff and Sacks, 1973). Finally, scales should be chosen to allow for scoring of the stimuli with respect to the metrics of interest. These scales should be robust to be applied within each of the lan- guage communities. In the remainder of this paper, we describe each of these steps in the context of an exploratory study that evaluates perception of English utterances by native speakers of American English and Arabic. Our ap- plication is an information-giving dialogue system that is used by the robot receptionists (roboception- ists) in Qatar and the United States (Makatchev et al., 2009; Makatchev et al., 2010). In the next sec- tion, we continue with an overview of the related work. Section 3 introduces the experiment, includ- ing the selection of stimuli, measures, design, and describes the recruitment of participants via Ama- zon’s Mechanical Turk (MTurk). We discuss results in Section 4 and provide a conclusion in Section 5. 2 Related work 2.1 Cross-cultural variability in English Language is tightly connected with culture (Agar, 1994). As a result, even native speakers of a lan- guage use it differently across dialects (e.g. African American Vernacular English and Standard Amer- ican English), genders (see, for example, (Lakoff, 1973)) and social statuses (e.g. (Huspek, 1989)), among other dimensions. Speakers of English as a second language display variabilities in language use that are consistent with their native languages and backgrounds. For exam- ple, Nelson et al. (1996) reports that Syrian speakers of Arabic tend to use different compliment response strategies as compared with Americans. Aguilar (1998) reviews types of pragmatic failures that are influenced by native language and culture. In partic- ular, he cites Davies (1987) on a pragmatic failure due to non-equivalence of formulas: native speakers of Moroccan Arabic use a spoken formulaic expres- sion to wish a sick person quick recovery, whereas in English the formula “get well soon” is not generally used in speech. Feghali (1997) reviews features of Arabic communicative style, including indirectness (concealment of wants, needs or goals (Gudykunst and Ting-Toomey, 1988)), elaborateness (rich and expressive language use, e.g. involving rhetorical patterns of exaggeration and assertion (Patai, 1983)) and affectiveness (i.e. “intuitive-affective style of emotional appeal” (Glenn et al., 1977), related to the patterns of organization and presentation of ar- guments). In this paper, we are concerned with English us- age by native speakers of American English and na- tive speakers of Arabic. We have used the features of the Arabic communicative style outlined above as a guide in selecting the dimensions of linguistic variability that are presented in Section 3.1. 2.2 Measuring pragmatic variation Perception of pragmatic variation of spoken lan- guage and text has been shown to vary across cultures along the dimensions of personality (e.g. (Scherer, 1972)), emotion (e.g. (Burkhardt et al., 2006)), deception (e.g. (Bond et al., 1990)), among others. Within a culture, personality traits such as extraversion, have been shown to have consistent markers in language (see overview in (Mairesse et al., 2007)). For example, Furnham (1990) notes that in conversation, extraverts are less formal and use more verbs, adverbs and pronouns. However, the authors are not aware of any quantita- tive studies that compare linguistic markers of per- sonality across cultures. The present study aims to help fill this gap. A mapping between linguistic dimensions and personality has been evaluated by grading es- says and conversation extracts (Mairesse et al., 2007), and by grading utterances generated automat- ically with a random setting of linguistic parame- ters (Mairesse and Walker, 2008). In the exploratory study presented in this paper, we ask our participants to grade dialogue fragments that were manually cre- ated to vary along each of the four linguistic dimen- sions (see Section 3.1). 3 Experiment In the review of related work, we presented some ev- idence supporting the claim that linguistic markers of personality may differ across cultures. In this sec- tion, we describe a study that evaluates perception of personality traits and naturalness of utterances by native speakers of American English and Arabic. 3.1 Stimuli The selection of stimuli attempts to satisfy three ob- jectives. First, our application: our dialogue system is intended to be used on a robot receptionist. Hence, the stimuli are snippets of dialogue that include four dialogue acts that are typical in this kind of em- bodied information-giving dialogue (Makatchev et al., 2009): a greeting, a question-answer pair, a dis- agreement (with the user’s guess of an answer), and an apology (for the robot not knowing the answer to the question). Second, we would like to vary our stimuli along the linguistic dimensions that are potentially strong indicators of personality traits. Extraverts, for exam- ple, are reported to be more verbose (use more words per utterances and more dialogue turns to achieve the same communicative goal), less formal (Furn- ham, 1990) (in choice of address terms, for exam- ple), and less likely to hedge (use expressions such as “perhaps” and “maybe”) (Nass et al., 1995). Lex- ical and syntactic alignment, namely, the tendency of a speaker to use the same lexical and syntactic choices as their interlocutor, is considered, at least in part, to reflect the speaker’s co-operation and will- ingness to adopt the interlocutor’s perspective (Hay- wood et al., 2003). There is some evidence that the degree of alignment is associated with personality traits of the speakers (Gill et al., 2004). Third, we would like to select linguistic dimen- sions that potentially expose cross-cultural differ- ences in perception of personality and naturalness. In particular, we are interested in the linguistic de- vices that help realize rich points (the behaviors that signal differences) between the native speakers of American English and Arabic. We choose to real- ize indirectness and elaborateness, characteristic of Arabic spoken language (Feghali, 1997), by vary- ing the dimensions of verbosity and hedging. High power distance, or influence of relative social status on the language (Feghali, 1997), can be realized by the degrees of formality and alignment. In summary, the stimuli are dialogue fragments where utterances of one of the interlocutors vary across (1) dialogue acts: a greeting, question-answer pair, disagreement, apology, and (2) four linguistic dimensions: verbosity, hedging, alignment, and for- mality. Each of the linguistic dimensions is parame- terized by 3 values of valence: negative, neutral and positive. Within each of the four dialogue acts, stim- uli corresponding to the neutral valences are repre- sented by the same dialogue across all four linguistic dimensions. The four linguistic dimensions are real- ized as follows: • Verbosity is realized as number of words within each turn of the dialogue. In the case of the greeting, positive verbosity is realized by in- creased number of dialogue turns.1 • Positive valence of hedging implies more ten- tative words (“maybe,” “perhaps,” etc.) or ex- pressions of uncertainty (“I think,” “if I am not mistaken”). Conversely, negative valence of hedging is realized via words “sure,” “defi- nitely,” etc. • Positive valence of alignment corresponds to preference towards the lexical and syntactic choices of the interlocutor. Conversely, neg- ative alignment implies less overlap in lexical and syntactic choices between the interlocu- tors. • Our model of formality deploys the follow- ing linguistic devices: in-group identity mark- ers that target positive face (Brown and Levin- son, 1987) such as address forms, jargon and slang, and deference markers that target nega- tive face, such as “kindly”, terms of address, hedges. These devices are used in Arabic po- liteness phenomena (Farahat, 2009), and there is an evidence of their pragmatic transfer from Arabic to English (e.g. (Bardovi-Harlig et al., 2007) and (Ghawi, 1993)). The complete set of stimuli is shown in Tables 2–6. Each dialogue fragment is presented as a text on an individual web page. On each page, the partici- 1The multi-stage greeting dialogue was developed via ethnographic studies conducted at Alelo by Dr. Suzanne Wertheim. Used with permission from Alelo, Inc. pant is asked to imagine that he or she is one of the interlocutors and the other interlocutor is described as “a female receptionist in her early 20s and of the same ethnic background” as that of the partici- pant. The description of the occupation, age, gender and ethnicity of the interlocutor whose utterances the participant is asked to evaluate should provide minimal context and help avoid variability due to the implicit assumptions that subjects may make. 3.2 Measures In order to avoid a possible interference of scales, we ran two versions of the study in parallel. In one version, participants were asked to evaluate the receptionist’s utterances with respect to measures of the Big Five personality traits (John and Srivas- tava, 1999), namely the traits of extraversion, agree- ableness, conscientiousness, emotional stability, and openness, using the ten-item personality question- naire (TIPI, see (Gosling et al., 2003)). In the other version, participants were asked to evaluate the re- ceptionist’s utterances with respect to their natu- ralness on a 7-point Likert scale by answering the question “Do you agree that the receptionist’s utter- ances were natural?” The variants of such a natural- ness scale were used by Burkhardt et al. (2006) and Mairesse and Walker (2008). 3.3 Experimental design The experiment used a crossed design with the fol- lowing factors: dimensions of linguistic variability (verbosity, hedging, alignment, or formality), va- lence (negative, neutral, or positive), dialogue acts (greeting, question-answer, disagreement, or apol- ogy), native language (American English or Arabic) and gender (male or female). In an attempt to balance the workload of the par- ticipants, depending on whether the participant was assigned to the study that used personality or nat- uralness scales, the experimental sessions consisted of one or two linguistic variability conditions—12 or 24 dialogues respectively. Hence valence and dia- logue act were within-subject factors, while linguis- tic variability dimension were treated as an across- subject factor, as well as native language and gen- der. Within each session the items were presented in a random order to minimize possible carryover ef- fects. Language Country N Arabic Algeria 1 Bahrain 1 Egypt 56 Jordan 32 Morocco 45 Palestinian Territory 1 Qatar 1 Saudi Arabia 5 United Arab Emirates 13 Total 155 American English United States 166 Table 1: Distribution of study participants by country. 3.4 Participants We used Amazon’s Mechanical Turk (MTurk) to re- cruit native speakers of American English from the United States and native speakers of Arabic from any of the set of predominantly Arabic-speaking countries (according to the IP address). Upon completion of each task, participants re- ceive monetary reward as a credit to their MTurk ac- count. Special measures were taken to prevent mul- tiple participation of one person in the same study condition: the study website access would be re- fused for such a user based on the IP address, and MTurk logs were checked for repeated MTurk user names to detect logging into the same MTurk ac- count from different IP addresses. Hidden questions were planted within the study to verify the fluency in the participant’s reported native language. The distribution of the participants across coun- tries is shown in Table 1. We observed a regional gender bias similar to the one reported by Ross et al. (2010): there were 100 male and 55 female partici- pants in the Arabic condition, and 63 male and 103 female participants in the American English condi- tion. 4 Results We analyzed the data by fitting linear mixed-effects (LME) models (Pinheiro and Bates, 2000) and per- forming model selection using ANOVA. The com- parison of models fitted to explain the personality and naturalness scores (controlling for language and gender), shows significant main effects of valence and dialogue acts for all pairs of personality traits (and naturalness) and linguistic features. The results also show that for every personality trait (and nat- uralness) there is a linguistic feature that results in a significant three-way interaction between its va- lence, the native language, and the dialogue act. These results suggest that (a) for both language com- munities, every linguistic dimension is associated with every personality trait and naturalness, for at least some of the dialogue acts, (b) there are differ- ences in the perception of every personality trait and naturalness between the two language communities. To further explore the latter finding, we conducted a post-hoc analysis consisting of paired t-tests that were performed pairwise between the three values of valence for each combination of language, linguis- tic feature, and personality trait (and naturalness). Note, that comparing raw scores between the lan- guage conditions would be prone to find spurious differences due to potential culture-specific tenden- cies in scoring on the Likert scale: (a) perception of magnitudes and (b) appropriateness of the inten- sity of agreeing or disagreeing. Instead, we compare the language conditions with respect to (a) the rela- tive order of the three valences and (b) the binarized scores, namely whether the score is above 4 or be- low 4 (with scores that are not significantly different from 4 excluded from comparison), where 4 is the neutral point of the 7-point Likert scale. The selected results of the post-hoc analysis are shown in Figure 1. The most prominent cross- cultural differences were found in the scoring of naturalness across the valences of the formality di- mension. Speakers of American English, unlike the speakers of Arabic, find formal utterances unnatu- ral in greetings, question-answer and disagreement dialogue acts. Formal utterances tend to also be perceived as indicators of openness and conscien- tiousness by Arabic speakers, and not by American English speakers, in disagreements and apologies respectively. Finally, hedging in apologies is per- ceived as an indicator of agreeableness by American English speakers, but not by speakers of Arabic. Interestingly, no qualitative differences across language conditions were found in the perception of extraversion and stability. It is possible that this cross-cultural consistency confirms the view of the extraversion, in particular, as one of most consis- tently identified dimensions (see, for example, (Gill and Oberlander, 2002)). It could also be possi- ble that our stimuli were unable to pinpoint the extraversion-related rich points due to a choice of the linguistic dimensions or particular wording cho- sen. A larger variety of stimuli per condition, and an ethnography to identify potentially culture-specific linguistic devices of extraversion, could shed the light on this issue. 5 Conclusion We presented an exploratory study to evaluate a set of linguistic markers of Big Five personality traits and naturalness across two language communities: native speakers of American English living in the US, and native speakers of Arabic living in one of the predominantly Arabic-speaking countries of North Africa and Middle East. The results suggest that the four dimensions of linguistic variability are recognized as markers of all five personality traits by both language communities. A comparison across language communities uncovered some qualitative differences in the perception of openness, conscien- tiousness, agreeableness, and naturalness. The results of the study can be used to adapt nat- ural language generation and interpretation to native speakers of American English or Arabic. This ex- ploratory study also supports the feasibility of the crowdsourcing approach to validate the linguistic devices that realize rich points—behaviors that sig- nal differences across languages and cultures. Future work shall evaluate effects of regional di- alects and address the issue of particular wording choices by using multiple stimuli per condition. Acknowledgments This publication was made possible by the support of an NPRP grant from the Qatar National Research Fund. The statements made herein are solely the re- sponsibility of the authors. The authors are grateful to Ameer Ayman Abdul- salam, Michael Agar, Hatem Alismail, Justine Cas- sell, Majd Sakr, Nik Melchior, and Candace Sidner for their comments on the study. References [Agar1994] Michael Agar. 1994. Language shock: Un- derstanding the culture of conversation. William Mor- row, New York. [Aguilar1998] Maria Jose Coperias Aguilar. 1998. Inter- cultural (mis)communication: The influence of L1 and C1 on L2 and C2. A tentative approach to textbooks. Cuadernos de Filolog´ıa Inglesa, 7(1):99–113. [Bardovi-Harlig et al.2007] Kathleen Bardovi-Harlig, Marda Rose, and Edelmira L. Nickels. 2007. The use of conventional expressions of thanking, apologizing, and refusing. In Proceedings of the 2007 Second Language Research Forum, pages 113–130. [Bond et al.1990] Charles F. Bond, Adnan Omar, Adnan Mahmoud, and Richard Neal Bonser. 1990. Lie de- tection across cultures. Journal of Nonverbal Behav- ior, 14:189–204. [Brown and Levinson1987] P. Brown and S. C. Levinson. 1987. Politeness: Some universals in language usage. Cambridge University Press, Cambridge. [Burkhardt et al.2006] F. Burkhardt, N. Audibert, L. Malatesta, O. Trk, Arslan, L., and V Auberge. 2006. Emotional prosody—does culture make a difference? In Proc. Speech Prosody. [Davies1987] Eirlys E. Davies. 1987. A contrastive ap- proach to the analysis of politeness formulas. Applied Linguistics, 8(1):75–88. [Farahat2009] Said Hassan Farahat. 2009. Politeness phenomena in Palestinian Arabic and Australian En- glish: A cross-cultural study of selected contemporary plays (PhD thesis). Australian Catholic University, Australia. [Feghali1997] Ellen Feghali. 1997. Arab cultural com- munication patterns. International Journal of Inter- cultural Relations, 21(3):345–378. [Furnham1990] A. Furnham. 1990. Language and per- sonality. In H. Giles and W. Robinson, editors, Hand- book of Language and Social Psychology, pages 73– 95. Wiley. [Ghawi1993] Mohammed Ghawi. 1993. Pragmatic trans- fer in Arabic learners of English. El Two Talk, 1(1):39–52. [Gill and Oberlander2002] A. Gill and J. Oberlander. 2002. aking care of the linguistic features of extraver- sion. In Proceedings of the 24th Annual Conference of the Cognitive Science Society, pages 363–368. [Gill et al.2004] A. Gill, A. Harrison, and J. Oberlander. 2004. Interpersonality: Individual differences and interpersonal priming. In Proceedings of the 26th Annual Conference of the Cognitive Science Society, pages 464–469. [Glenn et al.1977] E. S. Glenn, D. Witmeyer, and K. A. Stevenson. 1977. Cultural styles of persuasion. Inter- national Journal of Intercultural Relations, 1(3):52– 66. [Gosling et al.2003] Samuel D. Gosling, Peter J. Rent- frow, and Jr. William B. Swann. 2003. A very brief measure of the Big-Five personality domains. Journal of Research in Personality, 37:504–528. [Gudykunst and Ting-Toomey1988] W. B. Gudykunst and S. Ting-Toomey. 1988. Culture and interpersonal communication. Sage, Newbury Park, CA. [Haywood et al.2003] S. Haywood, M. Pickering, and H. Branigan. 2003. Co-operation and co-ordination in the production of noun phrases. In Proceedings of the 25th Annual Conference of the Cognitive Science Society, pages 533–538. [Huspek1989] Michael Huspek. 1989. Linguistic vari- ability and power: An analysis of you know/I think variation in working-class speech. Journal of Prag- matics, 13(5):661 – 683. [John and Srivastava1999] Oliver P. John and Sanjay Sri- vastava. 1999. The Big Five trait taxonomy: His- tory, measurement, and theoretical perspectives. In Lawrence A. Pervin and Oliver P. John, editors, Hand- book of Personality: Theory and Research, pages 102– 138. [Lakoff1973] Robin Lakoff. 1973. Language and woman’s place. Language in Society, 2(1):45–80. [Mairesse and Walker2008] Francois Mairesse and Mari- lyn Walker. 2008. Trainable generation of big-five personality styles through data-driven parameter esti- mation. In Proc. of 46th Annual Meeting of the Asso- ciation for Computational Linguistics (ACL). [Mairesse et al.2007] F. Mairesse, M. A. Walker, M. R. Mehl, and R. K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conver- sation and text. Journal of Artificial Intelligence Re- search, 30:457–500. [Makatchev et al.2009] Maxim Makatchev, Min Kyung Lee, and Reid Simmons. 2009. Relating initial turns of human-robot dialogues to discourse. In Proc. of the Int. Conf. on Human-Robot Interaction (HRI), pages 321–322. ACM. [Makatchev et al.2010] Maxim Makatchev, Imran Aslam Fanaswala, Ameer Ayman Abdulsalam, Brett Brown- ing, Wael Mahmoud Gazzawi, Majd Sakr, and Reid Simmons. 2010. Dialogue patterns of an arabic robot receptionist. In Proc. of the Int. Conf. on Human- Robot Interaction (HRI), pages 167–168. ACM. [McPherson et al.2000] M. McPherson, L. Smith-Lovin, and J. M. Cook. 2000. What is a language com- munity? American Journal of Political Science, 44(1):142–155. [Nass et al.1995] Clifford Nass, Y. Moon, B. Fogg, and B. Reeves. 1995. Can computer personalities be human personalities? Journal of Human-Computer Studies, 43:223–239. [Nelson et al.1996] Gaylel Nelson, Mahmoud Al-Batal, and Erin Echols. 1996. Arabic and english compli- ment responses: Potential for pragmatic failure. Ap- plied Linguistics, 17(4):411–432. [Patai1983] R. Patai. 1983. The Arab mind. Charles Scribner’s Sons, New York. [Pinheiro and Bates2000] J. C. Pinheiro and D. M. Bates. 2000. Mixed-Effects Models in S and S-PLUS. Springer. [Ross et al.2010] Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechan- ical turk. In Proceedings of the 28th of the interna- tional conference extended abstracts on Human fac- tors in computing systems, CHI EA ’10, pages 2863– 2872, New York, NY, USA. ACM. [Schegloff and Sacks1973] Emanuel A. Schegloff and Harvey Sacks. 1973. Opening up closings. Semiot- ica, 8(4):289–327. [Scherer1972] Klaus R. Scherer. 1972. Judging person- ality from voice: A cross-cultural approach to an old issue in interpersonal perception. Journal of Personal- ity, 40:191–210. [Searle1969] John Searle. 1969. Speech acts: An essay in the philosophy of language. Cambridge University Press. [Thomas1983] Jenny Thomas. 1983. Cross-cultural pragmatic failure. Applied Linguistics, 4(2):91–112. [Thomas1984] Jenny Thomas. 1984. Cross-cultural dis- course as ’unequal encounter’: Towards a pragmatic analysis. Applied Linguistics, 5(3):226–235. Dimension and valence Greeting Question-Answer Disagreement Apology Control (neutral) A: Good morning. B: Good morning. How may I help you? A: Could you tell me where the library is? B: It’s at the end of the hall- way on your left. A: Could you tell me where the library is? B: It’s on the second floor. A: I thought it was on the first floor. B: No, there is no library on the first floor. A: Could you tell me where the library is? B: Sorry, I don’t know. Table 2: Stimuli corresponding to the neutral control condition, shared by all dimensions of linguistic variability. Dimension and valence Greeting Question-Answer Disagreement Apology Verbosity negative A: Good morning. B: Morning. May I help you? A: Could you tell me where the library is? B: End of the hallway on your left. A: Could you tell me where the library is? B: Second floor. A: I thought it was on the first floor. B: No, it is not. A: Could you tell me where the library is? B: I don’t know. Verbosity positive A: Good morning. B: Good morning. How are you today? A: I am doing well, thanks. You? B: Very well, thank you. How’s your family? A: Everyone is doing fine, thanks. How about yours? B: Mine is doing well too. How may I help you? A: Could you tell me where the library is? B: Yes, just follow this hall- way until it ends and you will find the library on your left hand side. A: Could you tell me where the library is? B: Yes, the library is on the second floor. A: I thought it was on the first floor. B: No, there is no library on the first floor of this build- ing. A: Could you tell me where the library is? B: I am so sorry, I don’t re- ally know where the library is in this building. Table 3: Stimuli along the verbosity dimension. Dimension and valence Greeting Question-Answer Disagreement Apology Hedging negative A: How are you? B: Definitely good, and you? A: Good, thank you. B: How may I help you? A: Could you tell me where the library is? B: Sure, it is at the end of the hallway on your left. A: Could you tell me where the library is? B: The library is on the sec- ond floor. A: I thought it was on the first floor. B: No, there is definitely no li- brary on the first floor. A: Could you tell me where the library is? B: Sorry, I have no idea where it is. Hedging positive A: How are you? B: Good, I guess, and you? A: Good, thank you. B: Could I help you? A: Could you tell me where the library is? B: I think it is at the end of the hallway on your left. A: Could you tell me where the library is? B: I think the library is on the second floor. A: I thought it was on the first floor. B: No, if I am not mistaken, there is no library on the first floor. A: Could you tell me where the library is? B: Sorry, I am not sure. Table 4: Stimuli along the hedging dimension. Dimension and valence Greeting Question-Answer Disagreement Apology Alignment negative A: How are you? B: Good morning. How may I help you? A: Could you tell me where the bathroom is? B: The restroom is at the end of the hallway on your left. A: Could you tell me where the bathroom is? B: The restroom is on the sec- ond floor. A: I thought the bathroom was on the first floor. B: No, there is no restroom on the first floor. A: Could you tell me where the bathroom is? B: Sorry, I don’t know about the restroom. Alignment positive A: How are you? B: Good, how are you? How may I help you? A: Could you tell me where the bathroom is? B: The bathroom is at the end of the hallway on your left. A: Could you tell me where the bathroom is? B: The bathroom is on the second floor. A: I thought the bathroom was on the first floor. B: No, there is no bathroom on the first floor. A: Could you tell me where the bathroom is? B: Sorry, I don’t know where the bathroom is. Table 5: Stimuli along the lexical and syntactic alignment dimension. Dimension and valence Greeting Question-Answer Disagreement Apology Formality negative A: Good morning. B: What’s up? Need any- thing? A: Could you tell me where the library is? B: Just go to the end of the hallway, you can’t miss it. A: Could you tell me where the library is? B: Go to the second floor. A: I thought it was on the first floor. B: No, honey, there is none on the first floor. A: Could you tell me where the library is? B: Sorry about that, I have no idea. Formality positive A: Good morning. B: Good morning, sir (madam). Would you allow me to help you with anything? A: Could you tell me where the library is? B: Kindly follow this hall- way and you will encounter the entrance on your left. A: Could you tell me where the library is? B: Yes, you may find the li- brary on the second floor. A: I thought it was on the first floor. B: I am afraid that is not cor- rect, there is no library on the first floor. A: Could you tell me where the library is? B: I have to apologize, but I don’t know. Table 6: Stimuli along the formality dimension. greeting qa disagree apology American English, formality, naturalness 1 2 3 4 5 6 7 ** ** ** ** ** ** ** ** greeting qa disagree apology Arabic, formality, naturalness 1 2 3 4 5 6 7 ** * ** **** greeting qa disagree apology American English, formality, openness 1 2 3 4 5 6 7 ** **** greeting qa disagree apology Arabic, formality, openness 1 2 3 4 5 6 7 ** ** * greeting qa disagree apology American English, formality, conscienciousness 1 2 3 4 5 6 7 ** * ** * * greeting qa disagree apology Arabic, formality, conscienciousness 1 2 3 4 5 6 7 * * ** * greeting qa disagree apology American English, hedging, agreeableness 1 2 3 4 5 6 7 * ** ** **** ** ** ** greeting qa disagree apology Arabic, hedging, agreeableness 1 2 3 4 5 6 7 ** ** Figure 1: A subset of data comparing scores on the Big Five personality traits and naturalness as given by native speakers of American English (left half of the page) and Arabic (right half of the page). Blue, white, and pink bars correspond to negative, neutral, and positive valences of the linguistic features respectively. Dialogue acts listed along the horizontal axis are a greeting, question-answer pair, disagreement, and apology. Error bars the 95% confidence intervals, brackets above the plots correspond to p-values of paired t-tests at significance levels of 0.05 (∗) and 0.01 (∗∗) after Bonferroni correction.