arXiv:1607.08289v4  [cs.AI]  21 Jan 2019
1
Mammalian Value Systems
Gopal P. Sarma
School of Medicine, Emory University, Atlanta, GA USA
gopal.sarma@emory.edu
Nick J. Hay
Vicarious FPC, San Francisco, CA USA
nnickhay@gmail.com
Keywords: Friendly AI, value alignment, human values, biologically inspired AI, human-mimetic AI
Characterizing human values is a topic deeply interwoven with the sciences, humanities,
political philosophy, art, and many other human endeavors. In recent years, a number
of thinkers have argued that accelerating trends in computer science, cognitive science,
and related disciplines foreshadow the creation of intelligent machines which meet
and ultimately surpass the cognitive abilities of human beings, thereby entangling an
understanding of human values with future technological development. Contemporary
research accomplishments suggest increasingly sophisticated AI systems becoming
widespread and responsible for managing many aspects of the modern world, from
preemptively planning users’ travel schedules and logistics, to fully autonomous vehicles,
to domestic robots assisting in daily living. The extrapolation of these trends has been
most forcefully described in the context of a hypothetical “intelligence explosion,” in
which the capabilities of an intelligent software agent would rapidly increase due to the
presence of feedback loops unavailable to biological organisms. The possibility of super-
intelligent agents, or simply the widespread deployment of sophisticated, autonomous
AI systems, highlights an important theoretical problem:
the need to separate the
cognitive and rational capacities of an agent from the fundamental goal structure, or
value system, which constrains and guides the agent’s actions. The “value alignment
problem” is to specify a goal structure for autonomous agents compatible with human
values.
In this brief article, we suggest that ideas from aﬀective neuroscience and
related disciplines aimed at characterizing neurological and behavioral universals in
the mammalian class provide important conceptual foundations relevant to describing
human values. We argue that the notion of “mammalian value systems” points to a
potential avenue for fundamental research in AI safety and AI ethics.
1
Introduction
Artiﬁcial intelligence, a term coined in the 1950’s
at the now famous Dartmouth Conference, has
come to have a widespread impact on the modern
world [1, 2]. If we broaden the phrase to include
all software, and in particular, software respon-
sible for the control and operation of physical
machinery,
planning and operations manage-
ment,
or other tasks requiring sophisticated
information processing,
then it goes without
saying that artiﬁcial intelligence has become
a critical part of the infrastructure supporting
modern human society. Indeed, prominent ven-
ture capitalist Mark Andresseen famously wrote
that “software is eating the world,” in reference
to the ubiquitous deployment of software systems
across all industries and organizations, and the
corresponding growth of the ﬁnancial investment
into software companies [3].
Nonetheless,
there
is
a
fundamental
gap
2
Sarma and Hay
between the abilities of the most sophisticated
software-based control systems today and the ca-
pacities of a human child or even many animals.
Our AI systems have yet to display the capacity
for learning,
creativity,
independent thought
and discovery that deﬁne human intelligence.
It is a near-consensus position, however, that
at some point in the future, we will be able to
create
software-based
agents
whose cognitive
capacities rival those of human beings.
While
there is substantial variability in researchers’
forecasts about the time-horizons of the critical
breakthroughs and the consequences of achiev-
ing human-level artiﬁcial intelligence, there it
is little disagreement that it is an attainable
milestone [4,5].1
Some have argued that the creation of human-
level artiﬁcial intelligence would be followed by
an “intelligence explosion,” whereby the intelli-
gence of the software-based system would rapidly
increase due to its ability to analyze, model, and
improve its cognition by re-writing its codebase,
in a feat of self-improvement impossible for
biological organisms.
The net result would be
a “superintelligence,” that is, an agent whose
fundamental cognitive abilities vastly exceed our
own [9–12].
To be more explicit, let us consider a superin-
telligence to be any agent which can surpass the
sum total of human cognitive and emotional abil-
ities.
These abilities might include intellectual
tasks such as mathematical or scientiﬁc research,
artistic
invention
in
musical
composition
or
poetry, political philosophy and the crafting of
public policy, or social skills and the ability to
recognize and respond to human emotions. Many
commentators in recent years and decades have
predicted that convergent advances in computer
science,
robotics,
and related disciplines will
give rise to the development of superintelligent
machines during the 21st century [4].
If it is possible to create a superintelligence,
then a number of natural questions arise: What
1There have been a number of prominent thinkers
who have expressed strongly conservative viewpoints about
AI timelines.
See, for example, commentaries by David
Deutsch, Rodney Brooks, and Douglas Hofstadter [6–8].
would such an agent choose to do? What are the
constraints that would guide its actions and to
what degree can these actions be shaped by the
designers? If a superintelligence can reason about
and inﬂuence the world to a substantially greater
degree than human beings themselves, how can
we design a system to be compatible with human
values? Is it even possible to formalize the notion
of human values? Are human values a monolithic,
internally consistent entity, or are there intrinsic
conﬂicts and contradictions between the values
of individuals and between the value systems of
diﬀerent cultures? [9,12–16].
It is our belief that the value alignment
problem is of fundamental importance both for
its relevance to near-term developments likely
to be realized by the computer and robotics
industries
and
for
longer-
term
possibilities
of more sophisticated AI systems leading to
superintelligence.
Furthermore,
the broader
set of problems posed by the realization of
intelligent, autonomous, software-based agents
may provide an important unifying framework
that brings together disparate areas of inquiry
spanning computer science,
cognitive science,
philosophy of mind, behavioral neuroscience, and
anthropology, to name just a few.
In this article, we set aside the question of how,
when, and if AI systems will be developed that are
of suﬃcient sophistication to require a solution to
the value alignment problem. This is a substan-
tial topic in its own right which has been analyzed
elsewhere. We assume the feasibility of these sys-
tems as a starting point for further analysis of the
goal structures of autonomous agents and propose
the notion of “mammalian value systems” as pro-
viding a framework for further research.
2
Goal Structures for
Autonomous Agents
2.1
The Orthogonality Thesis
The starting point for discussing AI goal struc-
tures is the observation that the cognitive capac-
ities of an intelligent agent are independent of
the goal structure that constrains or guides the
agents’ actions, what Bostrom calls the “orthog-
Mammalian Value Systems
3
onality thesis:”
We have seen that a superintelligence could
have a great ability to shape the future ac-
cording to its goals. But what will its goals
be?
What is the relation between intelli-
gence and motivation in an artiﬁcial agent?
Here we develop two theses. The orthogo-
nality thesis holds (with some caveats) that
intelligence and ﬁnal goals are independent
variables: any level of intelligence could be
combined with any ﬁnal goal. The instru-
mental convergence thesis holds that super-
intelligent agents having any of a wide range
of ﬁnal goals will nevertheless pursue similar
intermediary goals because they have com-
mon instrumental reasons to do so. Taken
together, these theses help us to think about
what a superintelligent agent would do. [9]
The orthogonality thesis allows us to illus-
trate
the
importance
of
autonomous
agents
being guided by human-compatible goal struc-
tures, whether they are truly superintelligent
as Bostrom envisions, or even more modestly
intelligent but highly sophisticated AI systems
likely to be developed in industry in the future.
Consider the example of a domestic robot that
is able to clean the house, monitor a security
system, and prepare meals independently and
without human intervention.
A robot with a
slightly incorrect or inadequately speciﬁed goal
structure might correctly infer that a household
pet has high nutritional value to its owners, but
not recognize its social and emotional relation-
ship to the family.
We can easily imagine the
consequences for companies involved in creating
domestic robots if a family dog or cat ends up on
the dinner plate [14]. Although such a scenario is
unlikely without some amount of warning2—we
may notice odd or annoying behavior in the
robot in other tasks, for example—it highlights
an important nuance about value alignment. For
example, the exact diﬀerence between animals
that we value for their emotional role in our lives
versus those that many have deemed ethically
acceptable for food is far from obvious. Indeed
for someone who lives on a farm, the line can be
blurred and some creatures may play both roles.
2What
exactly
counts
as
suﬃcient
warning,
and
whether the warning is heeded or not, is another matter.
As the intelligent capabilities of an agent grows,
the consequences for slight deviations from hu-
man values will become greatly magniﬁed. The
reason is that such an agent possesses increasing
capacity to achieve its goals, however arbitrary
those goals might be. It is for this reason that
researchers concerned with the value alignment
problem have distanced themselves from the ﬁc-
titious and absurd scenarios portrayed in Holly-
wood thrillers. These movies often depict outright
malevolent agents whose explicit aim is to destroy
or enslave humanity.
What is implicit in these
stories is a goal structure that has been explic-
itly deﬁned to be in opposition to human values.
But as the simple example of the domestic robot
illustrates, this is hardly the risk we face with so-
phisticated AI systems. The true risk is that if
we incorrectly or inadequately specify the goals
of a suﬃciently capable agent, then it will devote
its cognitive capacities to a task that is at odds
with our values in ways that may be subtle or
even bizarre. In the example given above, there
was no malevolence or ulterior motive behind the
robot making a nutritious meal out of the house-
hold pet. Rather, it simply did not recognize—
due to the failure of its human designers—that
the pet was valued by its owners, not for nutri-
tional reasons, but rather for social and emotional
ones [13,14].
2.2
Anthropomorphic Bias Versus
Anthropomorphic Design
Before proceeding, we mention an important
caveat with regards to the orthogonality thesis,
namely, that it is not a free orthogonality. The
particular goal structure of an agent will almost
certainly constrain the necessary cognitive capa-
bilities required for the agent to operate. In other
words, the orthogonality thesis does not suggest
that one can pair an arbitrary set of algorithms
with an arbitrary goal structure.
For instance,
if we are building an AI system to process a
large number of photographs and videos so that
families can eﬃciently ﬁnd their most memorable
moments amidst terabytes of data, we know
that the underlying algorithms will be those
from computer vision and not computer algebra.
The primary takeaway from the orthogonality
thesis is that when reasoning about intelligence
in the abstract, we should not assume that any
4
Sarma and Hay
particular goal structure is implied. In particular,
there is no reason to believe that an arbitrary AI
system having the cognitive capacity of humans
will necessarily have a goal structure compatible
with or in opposition to that of humans.
It
may very well be completely arbitrary from the
perspective of human values.
This observation about the orthogonality thesis
brings to light an important point with regards to
AI goal structures, namely the diﬀerence between
anthropomorphic bias and anthropomorphic de-
sign. Anthropomorphic bias refers to the default
assumption that an arbitrary AI system will be-
have in a manner possessing commonalities with
human beings. In practice, instances of anthro-
pomorphic bias almost always go hand in hand
with the assumption of malevolent intentions
on behalf of an AI system—recall our previous
dismissal of Hollywood thrillers depicting agents
intent on destroying or enslaving humanity.
On the other hand, it may very well be the case,
perhaps even necessary, that solving the value
alignment problem requires us to build a speciﬁc
AI system that possesses important commonali-
ties with the human mind. This latter perspective
is what we refer to as anthropomorphic design.3
2.3
Inferring Human-Compatible
Value Systems
An emerging train of thought among AI safety
researchers is that a human-compatible goal
structure will have to be inferred by the AI
system itself, rather than pre-programmed by
the designers. The reason is that human values
are rich and complex, and in addition, often
contradictory and conﬂicting.
Therefore, if we
incorrectly specify what we think to be a safe
goal structure, even slight deviations can be
magniﬁed and lead to detrimental consequences.
On the other hand, if an AI system begins with
an uncertain model of human values, and then
begins to learn our values by observing our be-
havior, then we can substantially reduce the risks
of a misspeciﬁed goal structure.
Furthermore,
3Anthropomorphic design refers to a more narrow
class of systems than the term “human-compatible AI,”
which has recently come into use.
See, for example,
The Berkeley Center for Human-Compatible AI.
just as we are more likely to trust mathematical
calculations
performed
by
a
computer
than
by humans, if we build an AI system that we
know to have greater capacity than ourselves at
performing those cognitive operations required
to infer the values of other agents by observing
their behavior,
then we gain the additional
beneﬁt of knowing that these operations will be
performed with greater certainty and accuracy
than were they to be pre-programmed by human
AI researchers.
There is context in contemporary research for
this kind of indirect inference, such as Inverse Re-
inforcement Learning (IRL) [17, 18] or Bayesian
Inverse Planning (BIP) [19]. In these approaches,
an agent learns the values, or utility function, of
another agent, whether it is a human, an animal,
or software system, by observing its behavior.
While these ideas are in their nascent stages,
practical techniques have already been developed
for designing AI systems [20–23].
Russell summarizes the notion of indirect in-
ference of human values by stating three princi-
ples that should guide the development of AI sys-
tems [14]:
1. The machine’s purpose must be to maximize
the realization of human values. In particu-
lar, it has no purpose of its own and no innate
desire to protect itself.
2. The machine must be initially uncertain
about what those human values are.
The
machine may learn more about human val-
ues as it goes along, but it may never achieve
complete certainty.
3. The machine must be able to learn about hu-
man values by observing the choices that we
humans make.
There are almost certainly many conceptual
and practical obstacles that lie ahead in designing
a system that infers the values of human beings
from observing our behavior.
In particular,
human desires can often be masked by many
layers of conﬂicting emotions, they can often be
inconsistent, and the desires of one individual
may outright contradict the desires of another.
In the context of a superintelligent agent capable
Mammalian Value Systems
5
of exerting substantial inﬂuence on the world
(as opposed to a domestic robot), it is natural
to ask about variations in the value systems of
diﬀerent cultures. It is often assumed that many
human conﬂicts on a global scale stem from
conﬂicts in the underlying value systems of the
respective cultures or nation states.
Is it even
possible, therefore, for an AI system, no matter
how intelligent, to arrive at a consensus goal
structure that respects the desires of all people
and cultures?
We make two observations in response to this
important set of questions. The ﬁrst is that when
we say that cultures have conﬂicting values,
implicit in this statement are our own limited
cognitive capacities and ability to model the
behavior and mental states of other individuals
and groups.
An AI system with capabilities
vastly greater than ourselves may quickly per-
ceive fundamental commonalities and avenues for
conﬂict resolution that we are unable to envision.
To motivate this scenario, we give a highly
simpliﬁed
example
from
negotiation
theory.
A method known as “principled negotiation”
distinguishes between values and positions [24].
As an example, if two friends are deciding on
a restaurant for dinner, and one wants Indian
food and the other Italian, it may be that the
ﬁrst person simply likes spicy food and the
second person wants noodles. These preferences
are the values, spicy food and noodles, that
the corresponding positions, Indian and Italian,
instantiate.
In this school of thought, when
two parties are attempting to resolve a conﬂict,
they should negotiate from values, rather than
positions.
That is, if we have some desire
that is in conﬂict with another, we should ask
ourselves—whether in the context of a business
negotiation, family dispute, or major interna-
tional conﬂict—what the underlying value is that
the desire reﬂects. By understanding the under-
lying values, we may see that there is a mutually
satisfactory set of outcomes satisfying all parties
that we failed to see initially. In this particular
instance, if the friends are able to state their
true underlying preferences, they may recognize
that Thai cuisine will satisfy both parties.
We
mention this example from negotiation theory
to raise the possibility that what we perceive to
be fundamentally conﬂicting values in human
society might actually be conﬂicting positions
arising from distinct,
but reconcilable values
when viewed from the perspective of a higher
level of intelligence.
The second observation is that what we col-
loquially refer to as the values of a particular
culture, or even collective human values, reﬂect
not only innate features of the human mind,
but also the development of human society.
In
other words, to understand the underlying value
system that guides human behavior, which would
ultimately need to be modeled and inferred by
an AI system, it may be helpful to disentangle
those aspects of modern cultural values which
were latent, but not explicitly evident during
earlier periods of human history.
Although an agent utilizing Inverse Reinforce-
ment Learning or Bayesian Inverse Planning will
learn and reﬁne its model of human values by ob-
serving our behavior, it must begin with some
very rough or approximate initial assumptions
about the nature of the values it is trying to
learn. By starting from a more accurate initial
goal structure, an agent might learn from fewer
examples, thus minimizing the likelihood of real-
world actions having adverse aﬀects. In the re-
mainder of this article, we argue that the neuro-
logical substrate common to mammals and their
corresponding behaviors may provide a frame-
work for characterizing the structure of the ini-
tially uncertain value system of an autonomous,
intelligent agent.
2.4
Mammalian Value Systems
Our core thesis is the following: What we
call human values can be informally decomposed
into 1) mammalian values, 2) human cognition,
and 3) several millennia of human social and
cultural evolution.
This decomposition suggests
that contemporary research broadly spanning
the study of animal behavior, biological anthro-
pology,
and comparative
neuroanatomy
may
be relevant to the value alignment problem,
and in particular, in characterizing the initially
uncertain goal structure which is reﬁned through
observation by an AI system.
Additionally, in
6
Sarma and Hay
analyzing the subsequent behavioral trajecto-
ries of intelligent, autonomous agents, we can
decompose
the
resulting
dynamics
as
being
guided by mammalian values merged with AI
cognition.
Aspects of contemporary human
values which are the result of incidental historical
processes—the third component of our decompo-
sition above—might naturally arise in the course
of the evolution of the AI system (though not
necessarily), even though they were not directly
programmed into the agent.4
There are many
factors that might inﬂuence the extent to which
this third component of human values continues
to be represented in the AI system.
Examples
might include whether or not these values remain
meaningful in a world where other problems
had been solved and the extent to which certain
cultural values which were perceived to be in
conﬂict with others could be resolved with a
more fundamental understanding stemming from
the combination of mammalian values and AI
cognition.5
We want to emphasize that our claim is not
that mammalian values are synonymous with hu-
man values. Rather, our thesis is that there are
many aspects of human values which are the re-
sult of historical processes driven by human cog-
nition. Consequently, many structural aspects of
human experience and human society which we
colloquially refer to as “values” are derived en-
tities, rather than features of the initial AI goal
structure. As a thought experiment, consider a
scenario whereby the fully digitized corpus of hu-
4Many human values communicated to children during
the course of maturation and development are the result
of incidental historical processes. As an example, consider
the rich set of cultural norms and social rituals surrounding
food preparation.
One does not need to have lived the
entire history of a given culture to learn these norms. The
same may be true of an AI system.
5Ethical norms can often vary depending on resource
constraints which may also be the result of incidental his-
torical processes. The norms of behavior may be diﬀerent
in a war zone where individuals are ﬁghting for survival
than in an aﬄuent society during peacetime. If a family
struggling to survive in a war torn country is able to es-
cape and move to a more stable region, these same behav-
iors may no longer be necessary. In a similar vein, imagine
an AI system that has signiﬁcantly impacted global aﬀairs
by solving major problems in food or energy production or
by discovering novel insights into diplomatic strategy. Such
an agent may ﬁnd that previously necessary behaviors that
have a rich human history are no longer needed.
man literature, cinema, and ongoing global devel-
opments communicated via the Internet are ana-
lyzed and modeled by an AI system constructed
around a core mammalian goal structure. In the
conceptual framework that we propose, this ini-
tially mammalian structure would gradually come
to reﬂect the more nuanced aspects of human so-
ciety as the AI reﬁnes its model of human values
via analysis and hypothesis generation. We also
mention that as our aim in this article is to focus
on the structure of the initial AI motivational sys-
tem and not other aspects of AI more broadly, we
set aside the possible role human interaction and
feedback may play in the subsequent development
of the AI system’s cognition and instrumental val-
ues.
2.4.1
Neural Correlates of Values:
Behavioral and Neurological
Foundations
Our thesis about mammalian values is predi-
cated on two converging lines of evidence, one
primarily behavioral and the other primarily
neuroscientiﬁc.
Behaviorally, it is not diﬃcult
to characterize intuitively what human values
are when viewed from the perspective of the
mammalian class.
Like many other animals,
humans are social creatures and many, if not
most, of our fundamental drives originate from
our relationships with others. Attachment, loss,
anger, territoriality, playfulness, joy, anxiety, and
love are all deeply rooted emotions that guide
our behavior and which have been foundational
elements in the emergence of human cognition,
culture, and the structure of society6 [25–36].
The scientiﬁc study of behavior is largely the
domain of the disciplines of ethology and behav-
iorism. As we are primarily concerned with emo-
tions, we will focus on behavioral insights and
taxonomies originating from the sub-community
of aﬀective neuroscience, which also aims to cor-
relate these behaviors with underlying neural ar-
chitecture. More formally, Panksepp and Biven
categorize the informal list given above into seven
motivational and emotional systems that are com-
6While we have mentioned several active areas of re-
search, there are certainly others that we are simply not
aware of. We apologize in advance to those scholars whose
work we have not cited here.
Mammalian Value Systems
7
mon to mammals: seeking, rage, fear, lust, care,
panic/grief, and play [37]. We now give brief sum-
maries of each of these systems:
1. SEEKING: This is the system that primar-
ily mediates exploratory behavior and also
enables the other systems. The seeking sys-
tem can give rise to both positive and neg-
ative emotions. For instance, a mother who
needs to feed her oﬀspring will go in search
of food, and the resulting maternal / child
bonding (via the CARE system; see below)
creates positive emotional reinforcement. On
the other hand, physical threats can generate
negative emotions and prompt an animal to
seek shelter and safety. The behaviors cor-
responding to SEEKING have been broadly
associated with the dopaminergic systems of
the brain, speciﬁcally regions interconnected
with the ventral tegmental area and nucleus
accumbens.
2. RAGE: The behaviors corresponding to rage
are targeted and more narrowly focused than
those governed by the seeking system. Rage
compels animals towards speciﬁc threats and
is generally accompanied by negative emo-
tions. However, it should be noted that in
an adversarial scenario where rage can lead
to victory, it can also be accompanied by the
positive emotions of triumph or glory. The
RAGE system involves medial regions of the
amygdala, medial regions of the hypothala-
mus, and the periaqueductal gray.
3. FEAR: The two systems described thus far
are directly linked to externally directed,
action-oriented behavior.
In contrast, fear
describes a system which places an animal
in a negative aﬀective state, one which it
would prefer not to be in. In the early stages,
fear tends to correspond to stationary states,
after which it can transition to seeking or
rage, and ultimately, attempts to ﬂee from
the oﬀending stimulus. However, these are
secondary eﬀects, and the primary physical
state of fear is typically considered to be an
immobile one.
The FEAR system involves
central regions of the amygdala, anterior and
medial regions of the hypothalamus, and dor-
sal regions of the periaqueductal gray.
4. LUST: Lust describes the system leading
to behaviors of courtship and reproduction.
Like fear, it will tend to trigger the seeking
system, but can also lead to negative aﬀec-
tive states if satisfaction is not achieved. The
LUST system involves anterior and ventro-
medial regions of the hypothalamus.
5. CARE: Care refers to acts of tenderness di-
rected towards loved ones, and in particular,
an animal’s oﬀspring.
As we described in
the context of seeking, the feelings associated
with caring and nurturing can be profoundly
positive and play a crucial component in the
social behavior of mammals. CARE is asso-
ciated with the ventromedial hypothalamus
and the oxytocin system.
6. PANIC / GRIEF: Activation of the panic /
grief system corresponds to profound psycho-
logical pain, and is generally not associated
with external physical causes. In young an-
imals, this system is typically activated by
separation from caregivers, and is the under-
lying network behind “separation anxiety.”
Like care, the panic / grief system is a funda-
mental component of mammalian social be-
havior.
It is the negative aﬀective system
which drives animals towards relationships
with other animals, thereby stimulating the
care system, generating feelings of love and
aﬀection, and giving rise to social bonding.
This system is associated with the periaque-
ductal gray, ventral septal area, and anterior
cingulate.
7. PLAY: The play system corresponds to light-
hearted behavior in younger animals and is a
key component of social bonding, friendship,
as well as the learning of survival-oriented
skills.
Although play can superﬁcially re-
semble aggression, there are fundamental dif-
ferences between play and adult aggression.
At an emotional level, it goes without say-
ing that play corresponds to positive aﬀec-
tive states, and unlike aggressive behavior,
is typically part of a larger, orchestrated se-
quence of events. In play, for example, ani-
mals often alternate between assuming dom-
inant and submissive roles. The PLAY sys-
tem is currently less neuroanatomically local-
ized, but involves midline thalamic regions.
8
Sarma and Hay
As we stated earlier, our thesis about mam-
malian values originates from two convergent lines
of evidence, one behavioral and the other neu-
roscientiﬁc.
What we refer to as the “neural
correlates of values,” or NCV, are the common
mammalian neural structures which underlie the
motivational and emotional systems summarized
above.
To the extent that human values are
intertwined with our emotions, these architec-
tural commonalities suggest that the shared mam-
malian neurological substrate is of importance to
understanding human value alignment in sophis-
ticated learning systems.
Panksepp and Biven
write,
To the best of our knowledge, the basic bio-
logical values of all mammalian brains were
built upon the same basic plan, laid out in
. . . aﬀective circuits that are concentrated in
subcortical regions, far below the neocortical
“thinking cap” that is so highly developed
in humans. Mental life would be impossible
without this foundation. There, among the
ancestral brain networks that we share with
other mammals, a few ounces of brain tis-
sue constitute the bedrock of our emotional
lives, generating the many primal ways in
which we can feel emotionally good or bad
within ourselves. As we mature and learn
about ourselves, and the world in which we
live, these systems provide a solid founda-
tion for further mental developments [37].
Latent in this excerpt is the decomposition
that we have suggested earlier.
The separation
of the mammalian brain into subcortical and
neocortical regions,
roughly corresponding to
emotions and cognition respectively, implies that
we can attempt to reason by analogy what the
architecture of an AI system would look like with
a human-compatible value system. In particular,
the initially uncertain goal structure that the
AI system reﬁnes via observation may be much
simpler than we might imagine by reﬂecting on
the complexities of human society and individual
desires. As we have illustrated using our simple
example from negotiation theory, our intuitive
understanding of human values, and the conﬂicts
that we regularly witness between individuals
and groups, may in fact represent conﬂicting
positions stemming from a shared fundamental
value system, a value system that originates from
the subcortical regions of the brain, and which
other mammals share with us.7
Referring once again to the work of Panksepp,
In short, many of the ancient, evolutionar-
ily derived brain systems all mammals share
still serve as the foundations for the deeply
experienced aﬀective proclivities of the hu-
man mind.
Such ancient brain functions
evolved long before the emergence of the hu-
man neocortex with its vast cognitive skills.
Among living species, there is certainly more
evolutionary divergence in higher cortical
abilities than in subcortical ones [39].
The emphasis on the diversity in higher cortical
abilities is of particular relevance to the decompo-
sition that we have proposed. We might ask what
the full spectrum of higher cortical abilities are
that could be built on top of the common mam-
malian substrate provided by the evolutionarily
7There is a contemporary and light-hearted social
phenomenon which provides an evocative illustration of
the universality of mammalian emotions, namely, the vol-
ume of animal videos posted to YouTube. From ordinary
citizens with pets, to clips from nature documentaries,
animal videos are regularly watched by millions of viewers
worldwide. Individual videos and compilations of “animal
odd couples,” “unlikely animal friends,” “dogs and ba-
bies,” and “animal friendship between diﬀerent species”
are commonly searched enough to be auto-completed
by YouTube’s search capabilities. It is hardly surprising
that these charming and heart-warming videos are so
compelling to viewers of all age groups, genders, and
ethnic backgrounds. Our relationships with other animals,
whether home owners and their pets, or scientists and the
wild animals that they study, tell us something deeply
fundamental about ourselves [38]. The strong emotional
bonds that humans form with other animals, in particular,
with our direct relatives in the mammalian class, and the
draw to simply watching this social behavior in other mam-
mals, is a vivid illustration of the fundamental role that
emotions play in our inner life and in guiding our behavior.
In the future, the potential to apply inverse reinforce-
ment learning (or related techniques) to large datasets of
videos, including short clips from YouTube, movies, TV
shows, documentaries, etc. opens up an interesting avenue
to evaluate and further reﬁne the hypothesis presented
here. For instance, when such technology becomes avail-
able, we might imagine comparing the inferred goal struc-
tures when restricted to videos of human behavior versus
those restricted to mammalian behavior. There are many
other variations along these lines, for instance, restricting
to videos of non-mammalian behavior, mammals as well as
humans, diﬀerent cultures, etc.
Mammalian Value Systems
9
older parts of the brain. We need not conﬁne our-
selves to those manifestations of higher cognition
that we see in nature, or that would even be hy-
pothetical consequences of continued evolution by
natural selection. Indeed, one restatement of our
core thesis is to consider—in the abstract or as a
thought experiment—the consequences of extend-
ing the diversity of brain architectures to include
higher cortical abilities arising not from natural
selection, but rather the de novo architectures of
artiﬁcial intelligence.
2.5
Relationship to Moral Philosophy
It is hardly a surprise that a vibrant area of
research within AI safety is the relationship of
contemporary and historical theories of moral
philosophy to the problem of value alignment.
Indeed, researchers have speciﬁcally argued for
the relevance of moral philosophy in the context
of the inverse reinforcement learning paradigm
(IRL) that is the starting point for analysis in
this article [40].
Is the framework we propose in opposition to
those that are oriented towards moral philosophy?
On the one hand, our perspective is that the ﬁeld
of AI safety is simply too young to make such
judgments. At our present level of understand-
ing, we believe each of these agendas form solid
foundations for further research and there seems
little reason to pursue one to the exclusion of the
other. On the other hand, we would also argue
that this distinction is a false dichotomy. Indeed,
there are active areas of research in the ethics
community aimed at understanding the neurolog-
ical and cognitive underpinning of human moral
reasoning [41, 42].
Therefore, it is quite possi-
ble that a hybrid approach to value alignment
emerges, bridging the “value primitives” perspec-
tive we advocate here with research from moral
philosophy.8
8In a recent article, Baum has argued that the norma-
tive basis for “social choice” and “bottom-up” approaches
to AI ethics must overcome strong obstacles that have been
insuﬃciently explored by the AI safety community [43]. Al-
though the approach we describe here decomposes values
into more fundamental components, it is not a priori in
opposition to top-down ethics.
In an extreme case, one
could certainly imagine employing a purely predetermined
approach to ethics within the context of mammalian val-
ues in which no value learning takes place. However, as we
3
Discussion
The possibility of autonomous, software-based
agents,
whether
self-driving
cars,
domestic
robots, or the longer-term possibilities of super-
intelligence, highlights an important theoretical
problem—the need to separate the intelligent
capabilities of such a system from the fundamen-
tal values which guide the agents’ actions.
For
such an agent to exist in a human world and to
act in a manner compatible with human values,
these values would need to be explicitly modeled
and formalized. An emerging train of thought in
AI safety research is that this modeling process
would need to be conducted by the AI system
itself, rather than by the system’s designers. In
other words, the agent would start oﬀwith an
initially uncertain goal structure and infer human
values over time by observing our behavior.
The question that motivates this article is to
ask the following:
what can we say about the
broad features of the initial goal structure that
the agent then reﬁnes through observation and
hypothesis generation? The perspective we advo-
cate is to view human values within the context of
the broader mammalian class, thereby providing
implicit priors on the latent structure of the
values we aim to infer. The shared neurological
structures underlying mammalian emotions and
their corresponding social behaviors provide a
starting point for formalizing an initial value
system for autonomous, software-based agents.
There are several practical implications of having
a more detailed understanding of the structure of
stated above, we suspect that an intermediate ground will
be found when the issues are more thoroughly examined,
and for that reason, we are reluctant to endorse either a
bottom-up or a top-down approach too strongly. Given the
intellectual youth of the ﬁeld of AI safety, we see little rea-
son to give strong preference to one set of approaches over
the other. Moreover, an important observation that Baum
makes in framing his argument is that considerable work
relevant to AI ethics already exists in the social choice liter-
ature, and yet none of this work has been discussed in any
detail by the AI safety community. In our minds, this is
a more fundamental point, namely, that there is substan-
tial scholarship in many areas of academic research rele-
vant to AI safety. For this reason, we believe that where
there is controversy, the ﬁrst step should be to ensure that
the best possible representations of given viewpoints have
been made visible and adequately discussed before endors-
ing particular courses of action.
10
Sarma and Hay
human values. By having more detailed prior in-
formation, it may be possible to learn from fewer
examples. For an agent that is actively making
decisions and having an impact on the world,
learning an ethical framework more eﬃciently can
minimize potential catastrophes.
Furthermore,
an informative prior may make approaches to
AI safety which are otherwise computationally
intractable into practical options.
From this vantage point, we argue that what
we colloquially refer to as human values can be
informally decomposed into 1) mammalian val-
ues, 2) human cognition, and 3) several millennia
of human social and cultural evolution.
In the
context of a de novo artiﬁcially intelligent agent,
we can characterize desirable, human-compatible
behavior as being described by mammalian values
merged with AI cognition. It goes without saying
that we have left out a considerable amount
of detail in this description.
The speciﬁcs of
Inverse Reinforcement Learning, the many neu-
roscientiﬁc nuances underlying the comparative
neuroanatomy, physiology, and function of the
mammalian brain, as well as the controversies
and competing theories in the respective disci-
plines are all substantial topics on their own right.
Our omission of these issues is not out of lack
of recognition or belief that they are unimpor-
tant. Rather, our aim in this article has been to
present a high-level overview of a richly interdisci-
plinary set of questions whose broad outlines have
only recently begun to take shape. We will tackle
these issues and others in a subsequent series of
manuscripts and invite interested researchers to
join us. Our fundamental motivation in propos-
ing this framework is to bring together scholars
from diverse communities that may not be aware
of each other’s research and their potential for
synergy. We believe that there is a wealth of exist-
ing research which can be fruitfully re-examined
and re-conceptualized from the perspective of ar-
tiﬁcial intelligence and the value alignment prob-
lem. We hope that additional interaction between
these communities will help to reﬁne and more
precisely deﬁne research problems relevant to de-
signing safe AI goal structures.
Acknowledgements
We would like to thank Adam Safron, Owain
Evans, Daniel Dewey, Miles Brundage, and sev-
eral anonymous reviewers for insightful discus-
sions and feedback on the manuscript. We would
also like to thank the guest editors of Informatica,
Ryan Carey, Matthijs Maas, Nell Watson, and
Roman Yampolskiy, for organizing this special is-
sue.
References
[1] S. J. Russell and P. Norvig, Artiﬁcial Intel-
ligence: A Modern Approach. Upper Saddle
River, NJ, USA: Prentice Hall Press, 3rd ed.,
2009.
[2] N. J. Nilsson, The Quest for Artiﬁcial Intel-
ligence. Cambridge University Press, 2009.
[3] M. Andreessen, “Why Software Is Eating
The World,” Wall Street Journal, vol. 20,
2011.
[4] V. C. M¨uller and N. Bostrom,
“Future
Progress in Artiﬁcial Intelligence: A survey
of expert opinion,” in Fundamental issues of
artiﬁcial intelligence, pp. 553–570, Springer,
2016.
[5] K. Grace, J. Salvatier, A. Dafoe, B. Zhang,
and O. Evans, “When Will AI Exceed Hu-
man Performance?
Evidence from AI Ex-
perts,” ArXiv e-prints, May 2017.
[6] R. Brooks,
“The Seven Deadly Sins of
AI Predictions,” MIT Technology Review,
vol. 10, no. 6, 2017.
[7] D. Deutsch, “How close are we to creat-
ing artiﬁcial intelligence?,” AEON Magazine,
vol. 10, no. 3, 2012.
[8] J. Somers, “The Man Who Would Teach Ma-
chines to Think,” The Atlantic, vol. 11, 2013.
[9] N. Bostrom, Superintelligence: Paths, Dan-
gers, Strategies. OUP Oxford, 2014.
[10] M. Shanahan, The Technological Singularity.
MIT Press, 2015.
Mammalian Value Systems
11
[11] I. J. Good, “Speculations Concerning the
First Ultraintelligent Machine,” Advances In
Computers, vol. 6, no. 99, pp. 31–83, 1965.
[12] D. Chalmers, “The Singularity:
A Philo-
sophical Analysis,” Journal of Consciousness
Studies, vol. 17, no. 9-10, pp. 7–65, 2010.
[13] E. Yudkowsky,
“Artiﬁcial Intelligence as
a Positive and Negative Factor in Global
Risk,” in Global Catastrophic Risks (Nick
Bostrom and Milan Cirkovic, ed.), p. 303,
Oxford University Press Oxford, UK, 2008.
[14] S. Russell, “Should We Fear Supersmart
Robots?,”
Scientiﬁc American,
vol. 314,
no. 6, pp. 58–59, 2016.
[15] S. Omohundro,
“Autonomous technology
and the greater human good,” Journal of
Experimental & Theoretical Artiﬁcial Intel-
ligence, vol. 26, no. 3, pp. 303–315, 2014.
[16] S. M. Omohundro, “The Basic AI Drives,”
in AGI, vol. 171, pp. 483–492, 2008.
[17] A. Y. Ng and S. J. Russell, “Algorithms For
Inverse Reinforcement Learning,” in Inter-
national Conference on Machine Learning,
pp. 663–670, 2000.
[18] D. Hadﬁeld-Menell, A. Dragan, P. Abbeel,
and S. Russell, “Cooperative inverse rein-
forcement learning,” 2016.
[19] C. L. Baker, R. R. Saxe, and J. B. Tenen-
baum, “Bayesian Theory of Mind: Modeling
Joint Belief-Desire Attribution,” in Proceed-
ings of the Thirty-Second Annual Conference
of the Cognitive Science Society, pp. 2469–
2474, 2011.
[20] O. Evans, A. Stuhlm¨uller, and N. D. Good-
man, “Learning the Preferences of Ignorant,
Inconsistent
Agents,”
arXiv:1512.05832,
2015.
[21] O. Evans and N. D. Goodman, “Learning
the Preferences of Bounded Agents,” in NIPS
Workshop on Bounded Optimality, 2015.
[22] M. O. Riedl and B. Harrison, “Using Sto-
ries to Teach Human Values to Artiﬁcial
Agents,” in Proceedings of the 2nd Interna-
tional Workshop on AI, Ethics and Society,
Phoenix, Arizona, 2016.
[23] M.
O.
Riedl,
“Computational
Narrative
Intelligence:
A
Human-Centered
Goal
for Artiﬁcial Intelligence,” arXiv preprint
arXiv:1602.06484, 2016.
[24] R. Fisher and W. Ury, Getting to Yes. Simon
& Schuster Sound Ideas, 1987.
[25] I. Horswill, “Men Are Dogs (and Women
Too),” in AAAI Fall Symposium: Naturally-
Inspired Artiﬁcial Intelligence, pp. 67–71,
2008.
[26] L. W. Swanson, “Cerebral Hemisphere Reg-
ulation of Motivated Behavior,” Brain Re-
search, vol. 886, no. 1, pp. 113–164, 2000.
[27] L. W. Swanson, Brain Architecture: Under-
standing the Basic Plan. Oxford University
Press, 2012.
[28] J. H. Barkow, L. Cosmides, and J. Tooby,
The Adapted Mind: Evolutionary Psychology
and the Generation of Culture. Oxford Uni-
versity Press, 1995.
[29] S. Dehaene and L. Cohen, “Cultural Recy-
cling of Cortical Maps,” Neuron, vol. 56,
no. 2, pp. 384–398, 2007.
[30] C. Peterson and M. E. Seligman, Char-
acter Strengths and Virtues:
A Handbook
and Classiﬁcation. Oxford University Press,
2004.
[31] S. Schnall, J. Haidt, G. L. Clore, and A. H.
Jordan, “Disgust as Embodied Moral Judg-
ment,” Personality and Social Psychology
Bulletin, 2008.
[32] J. B. Tenenbaum, C. Kemp, T. L. Griﬃths,
and N. D. Goodman, “How to Grow a Mind:
Statistics, Structure, and Abstraction,” Sci-
ence, vol. 331, no. 6022, pp. 1279–1285, 2011.
[33] J. Bowlby, Attachment and Loss, vol. 3. Ba-
sic books, 1980.
[34] S. W. Porges, “Orienting in a Defensive
World:
Mammalian Modiﬁcations of Our
Evolutionary Heritage. A Polyvagal Theory,”
12
Sarma and Hay
Psychophysiology, vol. 32, no. 4, pp. 301–318,
1995.
[35] J. Cassidy, Handbook of Attachment: The-
ory,
Research,
and Clinical Applications.
Rough Guides, 2002.
[36] M. Tomasello, The Cultural Origins of Hu-
man Cognition.
Harvard University Press,
1999.
[37] J. Panksepp and L. Biven, The Archaeology
of Mind: Neuroevolutionary Origins of Hu-
man Emotions.
WW Norton & Company,
2012.
[38] R. List, “Why I Identify as a Mammal,” The
New York Times, 10 2015.
[39] J. Panksepp, Aﬀective Neuroscience:
The
Foundations of Human and Animal Emo-
tions. Oxford university press, 1998.
[40] S. Armstrong and J. Leike, “Towards Inter-
active Inverse Reinforcement Learning,” in
NIPS, 2016.
[41] J. Greene and J. Haidt, “How (and where)
does moral judgment work?,” Trends in Cog-
nitive Sciences, vol. 6, no. 12, pp. 517–523,
2002.
[42] J. D. Greene, “The cognitive neuroscience
of moral judgment,” The Cognitive Neuro-
sciences, vol. 4, pp. 1–48, 2009.
[43] S. D. Baum, “Social Choice Ethics in Artiﬁ-
cial Intelligence,” AI & SOCIETY, pp. 1–12,
2017.