creators_name: Oudeyer, Pierre-Yves creators_name: Kaplan, Frédéric creators_name: Hafner, Véréna type: journalp datestamp: 2007-04-04 lastmod: 2011-03-11 08:56:49 metadata_visibility: show title: Intrinsic Motivation Systems for Autonomous Mental Development ispublished: pub subjects: comp-sci-mach-dynam-sys subjects: dev-psy subjects: comp-sci-art-intel subjects: comp-sci-robot full_text_status: public keywords: Active learning, autonomy, behavior, complexity, curiosity, development, developmental trajectory, epigenetic robotics, intrinsic motivation, learning, reinforcement learning, values. abstract: Exploratory activities seem to be intrinsically rewarding for children and crucial for their cognitive development. Can a machine be endowed with such an intrinsic motivation system? This is the question we study in this paper, presenting a number of computational systems that try to capture this drive towards novel or curious situations. After discussing related research coming from developmental psychology, neuroscience, developmental robotics, and active learning, this paper presents the mechanism of Intelligent Adaptive Curiosity, an intrinsic motivation system which pushes a robot towards situations in which it maximizes its learning progress. This drive makes the robot focus on situations which are neither too predictable nor too unpredictable, thus permitting autonomous mental development.The complexity of the robot’s activities autonomously increases and complex developmental sequences self-organize without being constructed in a supervised manner. Two experiments are presented illustrating the stage-like organization emerging with this mechanism. In one of them, a physical robot is placed on a baby play mat with objects that it can learn to manipulate. Experimental results show that the robot first spends time in situations which are easy to learn, then shifts its attention progressively to situations of increasing difficulty, avoiding situations in which nothing can be learned. Finally, these various results are discussed in relation to more complex forms of behavioral organization and data coming from developmental psychology. Key words: Active learning, autonomy, behavior, complexity, curiosity, development, developmental trajectory, epigenetic robotics, intrinsic motivation, learning, reinforcement learning, values. date: 2007 date_type: published publication: IEEE Transactions on Evolutionary Computation volume: 11 number: 6 refereed: TRUE referencetext: [1] J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and E. Thelen, “Autonomous mental development by robots and animals,” Science, vol. 291, pp. 599–600, 2001. [2] M. Lungarella, G. Metta, R. Pfeifer, and G. Sandini, “Developmental robotics: A survey,” Connection Sci., vol. 15, no. 4, pp. 151–190, 2003. [3] M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, “Purposive behavior acquisition on a real robot by vision-based reinforcement learning,” Mach. Learn., vol. 23, pp. 279–303, 1996. [4] J. Elman, “Learning and development in neural networks: The importance of starting small,” Cognition, vol. 48, pp. 71–99, 1993. [5] R. White, “Motivation reconsidered: The concept of competence,” Psychol. Rev., vol. 66, pp. 297–333, 1959. [6] E. Deci and R. Ryan, Intrinsic Motivation and Self-Determination in Human Behavior. New York: Plenum, 1985. [7] D. Berlyne, Conflict, Arousal and Curiosity. New York: McGraw- Hill, 1960. [8] M. Csikszenthmihalyi, Flow-the Psychology of Optimal Experience. New York: Harper Perennial, 1991. [9] W. Schultz, P. Dayan, and P. Montague, “A neural substrate of prediction and reward,” Science, vol. 275, pp. 1593–1599, 1997. [10] P. Dayan and W. Belleine, “Reward, motivation and reinforcement learning,” Neuron, vol. 36, pp. 285–298, 2002. [11] S. Kakade and P. Dayan, “Dopamine: Generalization and bonuses,” Neural Netw., vol. 15, pp. 549–559, 2002. [12] J.-C. Horvitz, “Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events,” Neuroscience, vol. 96, no. 4, pp. 651–656, 2000. [13] M. Csikszentmihalyi, Creativity-Flow and the Psychology of Discovery and Invention. New York: Harper Perennial, 1996. [14] J. Schmidhuber, “Curious model-building control systems,” in Proc. Int. Joint Conf. Neural Netw., Singapore, 1991, vol. 2, pp. 1458–1463. [15] S. Thrun, “Exploration in active learning,” in Handbook of Brain Science and Neural Networks, M. Arbib, Ed. Cambridge, MA: MIT Press, 1995. [16] J. Herrmann, K. Pawelzik, and T. Geisel, “Learning predicitve representations,” Neurocomputing, vol. 32–33, pp. 785–791, 2000. [17] J. Weng, “A theory for mentally developing robots,” in Proc. 2nd Int. Conf. Development Learn., 2002, pp. 131–140. [18] X. Huang and J. Weng, “Novelty and reinforcement learning in the value system of developmental robots,” in Proc. 2nd Int. Workshop Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, C. Prince, Y. Demiris, Y. Marom, H. Kozima, and C. Balkenius, Eds., 2002, vol. 94, Lund University Cognitive Studies, pp. 47–55. [19] F. Kaplan and P.-Y. Oudeyer, “Motivational principles for visual know-how development,” in Proc. 3rd Int. Workshop Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, C. Prince, L. Berthouze, H. Kozima, D. Bullock, G. Stojanov, and C. Balkenius, Eds., 2003, vol. 101, Lund University Cognitive Studies, pp. 73–80. [20] J. Marshall, D. Blank, and L. Meeden, “An emergent framework for self-motivation in developmental robotics,” in Proc. 3rd Int. Conf. Development Learn., San Diego, CA, 2004, pp. 104–111. [21] A. Barto, S. Singh, and N. Chentanez, “Intrinsically motivated learning of hierarchical collections of skills,” in Proc. 3rd Int. Conf. Development Learn., San Diego, CA, 2004, pp. 112–119. [22] V. Fedorov, Theory of Optimal Experiment. New York, NY: Academic, 1972. [23] D. Cohn, Z. Ghahramani, and M. Jordan, “Active learning with statistical models,” J. Artif. Intell. Res., vol. 4, pp. 129–145, 1996. [24] M. Hasenjager and H. Ritter, Active Learning in Neural Networks. Berlin, Germany: Physica-Verlag GmbH, 2002, Physica-Verlag Studies In Fuzziness and Soft Computing Series, pp. 137–169. [25] J. Denzler and C. Brown, “Information theoretic sensor data selection for active object recognition and state estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 2, no. 24, pp. 145–157, Feb. 2002. [26] M. Plutowsky and H. White, “Selecting concise training sets from clean data,” IEEE Trans. Neural Netw., vol. 4, no. 2, pp. 305–318, Mar. 1993. [27] T.Watkin and A. Rau, “Selecting examples for perceptrons,” J. Physics A: Mathematical and General, vol. 25, pp. 113–121, 1992. [28] D. MacKay, “Information-based objective functions for active data selection,” Neural Comput., vol. 4, pp. 590–604, 1992. [29] M. Belue, K. Bauer, and D. Ruck, “Selecting optimal experiments for multiple output multi-layer perceptrons,” Neural Comput., vol. 9, pp. 161–183, 1997. [30] G. Paas and J. Kindermann, “Bayesian query construction for neural network models,” in Advances in Neural Processing Systems, G. Tesauro, D. Touretzky, and T. Leen, Eds. : MIT Press, 1995, vol. 7, pp. 443–450. [31] K. O. M. Hasenjager and H. Ritter, Active Learning in Self-Organizing Maps. New York: Elsevier, 1999, pp. 57–70. [32] D. Cohn, L. Atlas, and R. Ladner, “Improving generalization with active learning,” Mach. Learn., vol. 15, no. 2, pp. 201–221, 1994. [33] J. Poland and A. Zell, “Different criteria for active learning in neural networks: A comparative study,” in Proc. 10th Eur. Symp. Artif. Neural Netw., M. Verleysen, Ed., 2002, pp. 119–124. [34] J. Weng, “Developmental robotics: Theory and experiments,” Int. J. Humanoid Robotics, vol. 1, no. 2, pp. 199–236, 2004. [35] N. Roy and A. McCallum, “Towards optimal active learning through sampling estimation of error reduction,” in Proc. 18th Int. Conf. Mach. Learn., 2001, pp. 441–448. [36] R. Collobert and S. Bengio, “Svmtorch: Support vector machines for large-scale regression problems,” J. Mach. Learn. Res., vol. 1, pp. 143–160, 2001. [37] R. Sutton and A. Barto, Reinforcement Learning: An Introduction. Cambridge, MA.: MIT Press, 1998. [38] C. Walkins and P. Dayan, “ -learning,” Mach. Learn., vol. 8, pp. 279–292, 1992. [39] K. Kaneko and I. Tsuda, Complex Systems : Chaos and Beyond. Berlin, Germany: Springer-Verlag, 2000. [40] O. Sporns and T. Pegors, “Information-theoretical aspects of embodied artificial intelligence,” in Embodied Artificial Intelligence, F. Iida, R. Pfeifer, L. Steels, and Y. Kuniyoshi, Eds. Berlin, Germany: Springer- Verlag, 2003, LNAI 3139, pp. 74–85. [41] J. Piaget, The Origins of Intelligence in Children. New York, NY: Norton, 1952. [42] O. Michel, “Webots: Professional mobile robot simulation,” Int. J. Advanced Robotic Syst., vol. 1, no. 1, pp. 39–42, 2004. [43] J. Rekimoto and Y. Ayatsuka, “Cybercode: Designing augmented reality environments with visual tags,” in Proc. Designing Augmented Reality Environments, 2000, pp. 1–10. [44] S. Schaal, C. Atkeson, and S. Vijayakumar, “Scalable techniques from nonparameteric statistics for real-time robot learning,” Appl. Intell., vol. 17, no. 1, pp. 49–60, 2002. [45] E. Thelen and L. B. Smith, A Dynamic Systems Approach to the Development of Cognition and Action. Cambridge, MA: MIT Press, 1994. [46] R. D. Beer, “The dynamics of active categorical perception in an evolved model agent,” Adaptive Behav., vol. 11, no. 4, pp. 209–243, 2003. [47] S. Nolfi and J. Tani, “Extracting regularities in space and time through a cascade of prediction networks,” Connection Sci., vol. 11, no. 2, pp. 129–152, 1999. [48] M. Arbib, The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press, 2003. [49] M. Minsky, “A framework for representing knowledge,” in The Psychology of Computer Vision, P. Wiston, Ed. New York: McGraw- Hill, 1975, pp. 211–277. [50] R. Schank and R. Abelson, Scripts, Plans, Goals and Understanding: An Inquiry into Human Knowledge Structures. Hillsdale, NJ.: Lawrence Erlbaum, 1977. [51] G. L. Drescher,Made-Up Minds. Cambridge, MA.: MIT Press, 1991. [52] R. Sutton, D. Precup, and S. Singh, “Between MDPSs and semi-MDPS: A framework for temporal abstraction in reinforcement learning,” Artif. Intell., vol. 112, pp. 181–211, 1999. [53] K. Doya, K. Samejima, K. Katagiri, and M. Kawato, “Multiple model-based reinforcement learning,” Neural Comput., vol. 14, pp. 1347–1369, 2002. [54] J. Tani and S. Nolfi, “Learning to perceive the world as articulated: An approach for hierarchical learning in sensory-motor systems,” Neural Netw., vol. 12, pp. 1131–1141, 1999. [55] M. Tomasello, M. Carpenter, J. Call, T. Behne, and H. Moll, “Understanding and sharing intentions: The origins of cultural cognition,” Behav. Brain Sci., vol. 28, no. 5, pp. 675–691, 2005. [56] F. Dignum and R. Conte, “Intentional agents and goal formation,” in Proc. 4th Int. Workshop Intell. Agents IV, Agent Theories, Architectures, and Languages, London, U.K., 1997, vol. 1365, LNCS, pp. 231–243. [57] F. Kaplan and V. Hafner, “The challenges of joint attention,” Interaction Studies, vol. 7, no. 2, pp. 128–134, 2006. [58] A. Robins, “Transfer in cognition,” Connection Sci., vol. 8, no. 2, pp. 185–204, 1996. [59] G. Lakoff and M. Johnson, Philosophy in the Flesh: The Embodied Mind and its Challenge toWestern Thought. New York: Basic Books, 1998. [60] D. Gentner, K. Holyoak, and N. Kokinov, The Analogical Mind: Perspectives from Cognitive Science. Cambridge, MA:MIT Press, 2001. [61] L. Pratt and B. Jennings, “A survey of connectionist network reuse through transfer,” Connection Sci., vol. 8, no. 2, pp. 163–184, 1996. [62] J. Tani, M. Ito, and Y. Sugita, “Self-organization of distributedly represented multiple behavior schema in a mirror system,” Neural Netw., vol. 17, pp. 1273–1289, 2004. [63] F. Kaplan and P.-Y. Oudeyer, “The progress-drive hypothesis: An interpretation of early imitation,” in Models and Mechanisms of Imitation and Social Learning: Behavioral, Social and Communication Dimensions, K. Dautenhahn and C. Nehaniv, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2007, pp. 361–377. [64] L. Vygotsky, Mind in Society. Cambridge, MA: Harvard Univ. Press, 1978, The Development of Higher Psychological Processes. [65] L. Steels, “The autotelic principle,” in Embodied Artificial Intelligence, I. Fumiya, R. Pfeifer, L. Steels, and K. Kunyoshi, Eds. Berlin, Germany: Springer-Verlag, 2004, vol. 3139, Lecture Notes in AI, pp. 231–242. [66] A. Meltzoff and A. Gopnick, “The role of imitation in understanding persons and developing a theory of mind,” in Understanding Other Minds, H. T.-F. S. Baron-Cohen and D. Cohen, Eds. Oxford, U.K.: Oxford Univ. Press, 1993, pp. 335–366. [67] C. Moore and V. Corkum, “Social understanding at the end of the first year of life,” Developmental Rev., vol. 14, pp. 349–372, 1994. [68] P. Rochat, “Ego function of early imitation,” in The Imitative Mind: Development, Evolution and Brain Bases, A. Melzoff and W. Prinz, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2002. [69] J. Baldwin, Mental Development in the Child and the Race. New York: Macmillan, 1925. [70] H. Schaffer, “Early interactive development in studies of mother-infant interaction,” in Proc. Loch Lomonds Symp., New York, 1977, pp. 3–18. [71] J. Piaget, Play, Dreams and Imitation in Childhood. New York: Norton Press, 1962. [72] J. Gibson, The Ecological Approach to Visual Perception. Mahwah, NJ: Lawrence Erlbaum, 1986. [73] J.-C. Baillie, “Urbi: Towards a universal robotic low-level programming language,” in Proc. IEEE Int. Conf. Intell. Robots Syst., Aug. 2005, pp. 820–825. citation: Oudeyer, Pierre-Yves and Kaplan, Frédéric and Hafner, Véréna (2007) Intrinsic Motivation Systems for Autonomous Mental Development. [Journal (Paginated)] document_url: http://cogprints.org/5473/1/ims.pdf