creators_name: Oudeyer, Pierre-Yves
creators_name: Kaplan, Frédéric
creators_name: Hafner, Véréna
type: journalp
datestamp: 2007-04-04
lastmod: 2011-03-11 08:56:49
metadata_visibility: show
title: Intrinsic Motivation Systems for Autonomous Mental Development
ispublished: pub
subjects: comp-sci-mach-dynam-sys
subjects: dev-psy
subjects: comp-sci-art-intel
subjects: comp-sci-robot
full_text_status: public
keywords: Active learning, autonomy, behavior, complexity,
curiosity, development, developmental trajectory, epigenetic
robotics, intrinsic motivation, learning, reinforcement learning,
values.
abstract: Exploratory activities seem to be intrinsically rewarding
for children and crucial for their cognitive development.
Can a machine be endowed with such an intrinsic motivation
system? This is the question we study in this paper, presenting a number of computational systems that try to capture this drive towards novel or curious situations. After discussing related research coming from developmental psychology, neuroscience, developmental robotics, and active learning, this paper presents the mechanism of Intelligent Adaptive Curiosity, an intrinsic motivation system which pushes a robot towards situations in which it maximizes its learning progress. This drive makes the robot focus on situations which are neither too predictable nor too unpredictable, thus permitting autonomous mental development.The complexity of the robot’s activities autonomously increases and complex developmental sequences self-organize without being constructed in a supervised manner. Two experiments are presented illustrating the stage-like organization emerging with this mechanism. In one of them, a physical robot is placed on a baby play mat with objects that it can learn to manipulate. Experimental results show that the robot first spends time in situations
which are easy to learn, then shifts its attention progressively to situations of increasing difficulty, avoiding situations in which nothing can be learned. Finally, these various results are discussed in relation to more complex forms of behavioral organization and data coming from developmental psychology.
Key words: Active learning, autonomy, behavior, complexity,
curiosity, development, developmental trajectory, epigenetic
robotics, intrinsic motivation, learning, reinforcement learning,
values.

date: 2007
date_type: published
publication: IEEE Transactions on Evolutionary Computation
volume: 11
number: 6
refereed: TRUE
referencetext: [1] J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur,
and E. Thelen, “Autonomous mental development by robots and animals,”
Science, vol. 291, pp. 599–600, 2001.
[2] M. Lungarella, G. Metta, R. Pfeifer, and G. Sandini, “Developmental
robotics: A survey,” Connection Sci., vol. 15, no. 4, pp. 151–190, 2003.
[3] M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, “Purposive
behavior acquisition on a real robot by vision-based reinforcement
learning,” Mach. Learn., vol. 23, pp. 279–303, 1996.
[4] J. Elman, “Learning and development in neural networks: The importance
of starting small,” Cognition, vol. 48, pp. 71–99, 1993.
[5] R. White, “Motivation reconsidered: The concept of competence,” Psychol.
Rev., vol. 66, pp. 297–333, 1959.
[6] E. Deci and R. Ryan, Intrinsic Motivation and Self-Determination in
Human Behavior. New York: Plenum, 1985.
[7] D. Berlyne, Conflict, Arousal and Curiosity. New York: McGraw-
Hill, 1960.
[8] M. Csikszenthmihalyi, Flow-the Psychology of Optimal Experience.
New York: Harper Perennial, 1991.
[9] W. Schultz, P. Dayan, and P. Montague, “A neural substrate of prediction
and reward,” Science, vol. 275, pp. 1593–1599, 1997.
[10] P. Dayan and W. Belleine, “Reward, motivation and reinforcement
learning,” Neuron, vol. 36, pp. 285–298, 2002.
[11] S. Kakade and P. Dayan, “Dopamine: Generalization and bonuses,”
Neural Netw., vol. 15, pp. 549–559, 2002.
[12] J.-C. Horvitz, “Mesolimbocortical and nigrostriatal dopamine responses
to salient non-reward events,” Neuroscience, vol. 96, no. 4,
pp. 651–656, 2000.
[13] M. Csikszentmihalyi, Creativity-Flow and the Psychology of Discovery
and Invention. New York: Harper Perennial, 1996.
[14] J. Schmidhuber, “Curious model-building control systems,” in Proc.
Int. Joint Conf. Neural Netw., Singapore, 1991, vol. 2, pp. 1458–1463.
[15] S. Thrun, “Exploration in active learning,” in Handbook of Brain
Science and Neural Networks, M. Arbib, Ed. Cambridge, MA: MIT
Press, 1995.
[16] J. Herrmann, K. Pawelzik, and T. Geisel, “Learning predicitve representations,”
Neurocomputing, vol. 32–33, pp. 785–791, 2000.
[17] J. Weng, “A theory for mentally developing robots,” in Proc. 2nd Int.
Conf. Development Learn., 2002, pp. 131–140.
[18] X. Huang and J. Weng, “Novelty and reinforcement learning in the
value system of developmental robots,” in Proc. 2nd Int. Workshop
Epigenetic Robotics: Modeling Cognitive Development in Robotic
Systems, C. Prince, Y. Demiris, Y. Marom, H. Kozima, and C.
Balkenius, Eds., 2002, vol. 94, Lund University Cognitive Studies,
pp. 47–55.
[19] F. Kaplan and P.-Y. Oudeyer, “Motivational principles for visual
know-how development,” in Proc. 3rd Int. Workshop Epigenetic
Robotics: Modeling Cognitive Development in Robotic Systems, C.
Prince, L. Berthouze, H. Kozima, D. Bullock, G. Stojanov, and C.
Balkenius, Eds., 2003, vol. 101, Lund University Cognitive Studies,
pp. 73–80.
[20] J. Marshall, D. Blank, and L. Meeden, “An emergent framework for
self-motivation in developmental robotics,” in Proc. 3rd Int. Conf. Development
Learn., San Diego, CA, 2004, pp. 104–111.
[21] A. Barto, S. Singh, and N. Chentanez, “Intrinsically motivated learning
of hierarchical collections of skills,” in Proc. 3rd Int. Conf. Development
Learn., San Diego, CA, 2004, pp. 112–119.
[22] V. Fedorov, Theory of Optimal Experiment. New York, NY: Academic,
1972.
[23] D. Cohn, Z. Ghahramani, and M. Jordan, “Active learning with statistical
models,” J. Artif. Intell. Res., vol. 4, pp. 129–145, 1996.
[24] M. Hasenjager and H. Ritter, Active Learning in Neural Networks.
Berlin, Germany: Physica-Verlag GmbH, 2002, Physica-Verlag Studies
In Fuzziness and Soft Computing Series, pp. 137–169.
[25] J. Denzler and C. Brown, “Information theoretic sensor data selection
for active object recognition and state estimation,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 2, no. 24, pp. 145–157, Feb. 2002.
[26] M. Plutowsky and H. White, “Selecting concise training sets from clean
data,” IEEE Trans. Neural Netw., vol. 4, no. 2, pp. 305–318, Mar. 1993.
[27] T.Watkin and A. Rau, “Selecting examples for perceptrons,” J. Physics
A: Mathematical and General, vol. 25, pp. 113–121, 1992.
[28] D. MacKay, “Information-based objective functions for active data selection,”
Neural Comput., vol. 4, pp. 590–604, 1992.
[29] M. Belue, K. Bauer, and D. Ruck, “Selecting optimal experiments for
multiple output multi-layer perceptrons,” Neural Comput., vol. 9, pp.
161–183, 1997.
[30] G. Paas and J. Kindermann, “Bayesian query construction for neural
network models,” in Advances in Neural Processing Systems, G.
Tesauro, D. Touretzky, and T. Leen, Eds. : MIT Press, 1995, vol. 7,
pp. 443–450.
[31] K. O. M. Hasenjager and H. Ritter, Active Learning in Self-Organizing
Maps. New York: Elsevier, 1999, pp. 57–70.
[32] D. Cohn, L. Atlas, and R. Ladner, “Improving generalization with active
learning,” Mach. Learn., vol. 15, no. 2, pp. 201–221, 1994.
[33] J. Poland and A. Zell, “Different criteria for active learning in neural
networks: A comparative study,” in Proc. 10th Eur. Symp. Artif. Neural
Netw., M. Verleysen, Ed., 2002, pp. 119–124.
[34] J. Weng, “Developmental robotics: Theory and experiments,” Int. J.
Humanoid Robotics, vol. 1, no. 2, pp. 199–236, 2004.
[35] N. Roy and A. McCallum, “Towards optimal active learning through
sampling estimation of error reduction,” in Proc. 18th Int. Conf. Mach.
Learn., 2001, pp. 441–448.
[36] R. Collobert and S. Bengio, “Svmtorch: Support vector machines for
large-scale regression problems,” J. Mach. Learn. Res., vol. 1, pp.
143–160, 2001.
[37] R. Sutton and A. Barto, Reinforcement Learning: An Introduction.
Cambridge, MA.: MIT Press, 1998.
[38] C. Walkins and P. Dayan, “ -learning,” Mach. Learn., vol. 8, pp.
279–292, 1992.
[39] K. Kaneko and I. Tsuda, Complex Systems : Chaos and Beyond.
Berlin, Germany: Springer-Verlag, 2000.
[40] O. Sporns and T. Pegors, “Information-theoretical aspects of embodied
artificial intelligence,” in Embodied Artificial Intelligence, F. Iida, R.
Pfeifer, L. Steels, and Y. Kuniyoshi, Eds. Berlin, Germany: Springer-
Verlag, 2003, LNAI 3139, pp. 74–85.
[41] J. Piaget, The Origins of Intelligence in Children. New York, NY:
Norton, 1952.
[42] O. Michel, “Webots: Professional mobile robot simulation,” Int. J. Advanced
Robotic Syst., vol. 1, no. 1, pp. 39–42, 2004.
[43] J. Rekimoto and Y. Ayatsuka, “Cybercode: Designing augmented reality
environments with visual tags,” in Proc. Designing Augmented
Reality Environments, 2000, pp. 1–10.
[44] S. Schaal, C. Atkeson, and S. Vijayakumar, “Scalable techniques from
nonparameteric statistics for real-time robot learning,” Appl. Intell.,
vol. 17, no. 1, pp. 49–60, 2002.
[45] E. Thelen and L. B. Smith, A Dynamic Systems Approach to the Development
of Cognition and Action. Cambridge, MA: MIT Press, 1994.
[46] R. D. Beer, “The dynamics of active categorical perception in an
evolved model agent,” Adaptive Behav., vol. 11, no. 4, pp. 209–243,
2003.
[47] S. Nolfi and J. Tani, “Extracting regularities in space and time through
a cascade of prediction networks,” Connection Sci., vol. 11, no. 2, pp.
129–152, 1999.
[48] M. Arbib, The Handbook of Brain Theory and Neural Networks.
Cambridge, MA: MIT Press, 2003.
[49] M. Minsky, “A framework for representing knowledge,” in The Psychology
of Computer Vision, P. Wiston, Ed. New York: McGraw-
Hill, 1975, pp. 211–277.
[50] R. Schank and R. Abelson, Scripts, Plans, Goals and Understanding:
An Inquiry into Human Knowledge Structures. Hillsdale, NJ.:
Lawrence Erlbaum, 1977.
[51] G. L. Drescher,Made-Up Minds. Cambridge, MA.: MIT Press, 1991.
[52] R. Sutton, D. Precup, and S. Singh, “Between MDPSs and
semi-MDPS: A framework for temporal abstraction in reinforcement
learning,” Artif. Intell., vol. 112, pp. 181–211, 1999.
[53] K. Doya, K. Samejima, K. Katagiri, and M. Kawato, “Multiple
model-based reinforcement learning,” Neural Comput., vol. 14, pp.
1347–1369, 2002.
[54] J. Tani and S. Nolfi, “Learning to perceive the world as articulated: An
approach for hierarchical learning in sensory-motor systems,” Neural
Netw., vol. 12, pp. 1131–1141, 1999.
[55] M. Tomasello, M. Carpenter, J. Call, T. Behne, and H. Moll, “Understanding
and sharing intentions: The origins of cultural cognition,”
Behav. Brain Sci., vol. 28, no. 5, pp. 675–691, 2005.
[56] F. Dignum and R. Conte, “Intentional agents and goal formation,”
in Proc. 4th Int. Workshop Intell. Agents IV, Agent Theories, Architectures,
and Languages, London, U.K., 1997, vol. 1365, LNCS, pp.
231–243.
[57] F. Kaplan and V. Hafner, “The challenges of joint attention,” Interaction
Studies, vol. 7, no. 2, pp. 128–134, 2006.
[58] A. Robins, “Transfer in cognition,” Connection Sci., vol. 8, no. 2, pp.
185–204, 1996.
[59] G. Lakoff and M. Johnson, Philosophy in the Flesh: The Embodied
Mind and its Challenge toWestern Thought. New York: Basic Books,
1998.
[60] D. Gentner, K. Holyoak, and N. Kokinov, The Analogical Mind: Perspectives
from Cognitive Science. Cambridge, MA:MIT Press, 2001.
[61] L. Pratt and B. Jennings, “A survey of connectionist network reuse
through transfer,” Connection Sci., vol. 8, no. 2, pp. 163–184, 1996.
[62] J. Tani, M. Ito, and Y. Sugita, “Self-organization of distributedly represented
multiple behavior schema in a mirror system,” Neural Netw.,
vol. 17, pp. 1273–1289, 2004.
[63] F. Kaplan and P.-Y. Oudeyer, “The progress-drive hypothesis: An interpretation
of early imitation,” in Models and Mechanisms of Imitation
and Social Learning: Behavioral, Social and Communication Dimensions,
K. Dautenhahn and C. Nehaniv, Eds. Cambridge, U.K.: Cambridge
Univ. Press, 2007, pp. 361–377.
[64] L. Vygotsky, Mind in Society. Cambridge, MA: Harvard Univ. Press,
1978, The Development of Higher Psychological Processes.
[65] L. Steels, “The autotelic principle,” in Embodied Artificial Intelligence,
I. Fumiya, R. Pfeifer, L. Steels, and K. Kunyoshi, Eds. Berlin,
Germany: Springer-Verlag, 2004, vol. 3139, Lecture Notes in AI, pp.
231–242.
[66] A. Meltzoff and A. Gopnick, “The role of imitation in understanding
persons and developing a theory of mind,” in Understanding Other
Minds, H. T.-F. S. Baron-Cohen and D. Cohen, Eds. Oxford, U.K.:
Oxford Univ. Press, 1993, pp. 335–366.
[67] C. Moore and V. Corkum, “Social understanding at the end of the first
year of life,” Developmental Rev., vol. 14, pp. 349–372, 1994.
[68] P. Rochat, “Ego function of early imitation,” in The Imitative Mind:
Development, Evolution and Brain Bases, A. Melzoff and W. Prinz,
Eds. Cambridge, U.K.: Cambridge Univ. Press, 2002.
[69] J. Baldwin, Mental Development in the Child and the Race. New
York: Macmillan, 1925.
[70] H. Schaffer, “Early interactive development in studies of mother-infant
interaction,” in Proc. Loch Lomonds Symp., New York, 1977, pp. 3–18.
[71] J. Piaget, Play, Dreams and Imitation in Childhood. New York:
Norton Press, 1962.
[72] J. Gibson, The Ecological Approach to Visual Perception. Mahwah,
NJ: Lawrence Erlbaum, 1986.
[73] J.-C. Baillie, “Urbi: Towards a universal robotic low-level programming
language,” in Proc. IEEE Int. Conf. Intell. Robots Syst., Aug.
2005, pp. 820–825.
citation:   Oudeyer, Pierre-Yves and Kaplan, Frédéric and Hafner, Véréna  (2007) Intrinsic Motivation Systems for Autonomous Mental Development.  [Journal (Paginated)]     
document_url: http://cogprints.org/5473/1/ims.pdf