--- abstract: |- An inductive learning algorithm takes a set of data as input and generates a hypothesis as output. A set of data is typically consistent with an infinite number of hypotheses; therefore, there must be factors other than the data that determine the output of the learning algorithm. In machine learning, these other factors are called the bias of the learner. Classical learning algorithms have a fixed bias, implicit in their design. Recently developed learning algorithms dynamically adjust their bias as they search for a hypothesis. Algorithms that shift bias in this manner are not as well understood as classical algorithms. In this paper, we show that the Baldwin effect has implications for the design and analysis of bias shifting algorithms. The Baldwin effect was proposed in 1896, to explain how phenomena that might appear to require Lamarckian evolution (inheritance of acquired characteristics) can arise from purely Darwinian evolution. Hinton and Nowlan presented a computational model of the Baldwin effect in 1987. We explore a variation on their model, which we constructed explicitly to illustrate the lessons that the Baldwin effect has for research in bias shifting algorithms. The main lesson is that it appears that a good strategy for shift of bias in a learning algorithm is to begin with a weak bias and gradually shift to a strong bias. altloc: [] chapter: ~ commentary: ~ commref: ~ confdates: ~ conference: ~ confloc: ~ contact_email: ~ creators_id: [] creators_name: - family: Turney given: Peter D. honourific: '' lineage: '' date: 1996 date_type: published datestamp: 2001-10-11 department: ~ dir: disk0/00/00/18/18 edit_lock_since: ~ edit_lock_until: ~ edit_lock_user: ~ editors_id: [] editors_name: [] eprint_status: archive eprintid: 1818 fileinfo: /style/images/fileicons/application_postscript.png;/1818/1/Baldwin.ps|/style/images/fileicons/application_pdf.png;/1818/5/Baldwin.pdf full_text_status: public importid: ~ institution: ~ isbn: ~ ispublished: pub issn: ~ item_issues_comment: [] item_issues_count: 0 item_issues_description: [] item_issues_id: [] item_issues_reported_by: [] item_issues_resolved_by: [] item_issues_status: [] item_issues_timestamp: [] item_issues_type: [] keywords: 'bias, instinct, bias shift, Baldwin effect, concept learning, induction.' lastmod: 2011-03-11 08:54:48 latitude: ~ longitude: ~ metadata_visibility: show note: ~ number: 3 pagerange: 271-295 pubdom: FALSE publication: Evolutionary Computation publisher: ~ refereed: TRUE referencetext: | [1] Ackley, D., and Littman, M. (1991). Interactions between learning and evolu-tion. In Proceedings of the Second Conference on Artificial Life, C. Langton, C. Taylor, D. Farmer, and S. Rasmussen, editors. California: Addison-Wesley. [2] Anderson, R.W. (1995). Learning and evolution: A quantitative genetics approach. Journal of Theoretical Biology, 175, 89-101. [3] Bala, J., Huang, J., Vafaie, H., DeJong, K., and Wechsler, H. (1995). Hybrid learning using genetic algorithms and decision tress for pattern classification. Proceedings of the 14th International Joint Conference on Artificial Intelli-gence, IJCAI-95, Montreal, Canada, pp. 719-724. [4] Balakrishnan, K., and Honavar, V. (1995). Evolutionary design of neural archi-tectures: A preliminary taxonomy and guide to literature. Artificial Intelligence Research Group, Department of Computer Science, Iowa State University, Technical Report CS TR #95-01. [5] Baldwin, J.M. (1896). A new factor in evolution. American Naturalist, 30, 441- 451. [6] Barkow, J.H., Cosmides, L., and Tooby, J. (1992). Editors, The Adapted Mind: Evolutionary Psychology and the Generation of Culture, New York: Oxford University Press. [7] Belew, R.K., and Mitchell, M. (1996). Editors, Adaptive Individuals in Evolving Populations: Models and Algorithms. Massachusetts: Addison-Wesley. [8] Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias/variance dilemma, Neural Computation, 4, 1-58. [9] Glover, F. (1989). Tabu search — part i. ORSA (Operations Research Society of America) Journal on Computing, 1, 190-260. [10] Glover, F. (1990). Tabu search — part ii. ORSA (Operations Research Society of America) Journal on Computing, 2, 4-32. [11] Gordon, D.F., and desJardins, M. (1995). Evaluation and selection of biases in machine learning. Machine Learning, 20, 5-22. [12] Grefenstette, J.J. (1983). A user’s guide to GENESIS. Technical Report CS-83-11, Computer Science Department, Vanderbilt University. [13] Grefenstette, J.J. (1986). Optimization of control parameters for genetic algo-rithms. IEEE Transactions on Systems, Man, and Cybernetics, 16, 122-128. [14] Harvey, I. (1993). The puzzle of the persistent question marks: A case study of genetic drift. In S. Forrest (editor) Proceedings of the Fifth International Con-ference on Genetic Algorithms, ICGA-93, California: Morgan Kaufmann. [15] Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence, 36, 177-221. [16] Hinton, G.E., and Nowlan, S.J. (1987). How learning can guide evolution. Complex Systems, 1, 495-502. [17] Hinton, G.E. (1986). Learning distributed representations of concepts. Proceed-ings of the Eighth Annual Conference of the Cognitive Science Society, 1-12, Hillsdale: Erlbaum. [18] Lawrence, D. (1987). Genetic Algorithms and Simulated Annealing. California: Morgan Kaufmann. [19] Maynard Smith, J. (1987). When learning guides evolution. Nature, 329, 761- 762. [20] Morgan, C.L. (1896). On modification and variation. Science, 4, 733-740. [21] Nolfi, S., Elman, J., and Parisi, D. (1994). Learning and evolution in neural net-works. Adaptive Behavior, 3, 5-28. [22] Nowlan, S.J., and Hinton, G.E. (1992). Simplifying neural networks by soft weight-sharing. Neural Computation, 4, 473-493. [23] Osborn, H.F. (1896). Ontogenic and phylogenic variation. Science, 4, 786-789. [24] Pinker, S. (1994). The Language Instinct: How the Mind Creates Language. New York: William Morrow and Co. [25] Provost, F.J., and Buchanan, B.G. (1995). Inductive policy: The pragmatics of bias selection. Machine Learning, 20, 35-61. [26] Rendell, L. (1986). A general framework for induction and a study of selective induction. Machine Learning, 1, 177-226. [27] Schaffer, C. (1993). Selecting a classification method by cross-validation. Machine Learning, 13, 135-143. [28] Schaffer, C. (1994). A conservation law for generalization performance. Pro-ceedings of the Eleventh International Machine Learning Conference, ML-94. California: Morgan Kaufmann. [29] Tcheng, D., Lambert, B., Lu, S., Rendell, L. (1989). Building robust learning systems by combining induction and optimization. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, IJCAI-89, pp. 806- 812. Detroit, Michigan. [30] Turney, P.D. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal for AI Research, 2, 369-409. [31] Utgoff, P., and Mitchell, T. (1982). Acquisition of appropriate bias for inductive concept learning. Proceedings of the National Conference on Artificial Intelli-gence, AAAI-82, Pittsburgh, pp. 414-417. [32] Utgoff, P. (1986). Shift of bias for inductive concept learning. In Machine Learning: An Artificial Intelligence Approach, Volume II. Edited by R.S. Michalski, J.G. Carbonell, and T.M. Mitchell. California: Morgan Kaufmann. [33] Waddington, C.H. (1942). Canalization of development and the inheritance of acquired characters. Nature, 150, 563-565. [34] Wcislo, W.T. (1989). Behavioral environments and evolutionary change. Annual Review of Ecology and Systematics, 20, 137-169. [35] Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. (1990). Predicting the future: A connectionist approach. In T.J. Sejnowski, G.E. Hinton, and D.S. Touretzky, editors, Proceedings of the 1990 Connectionist Models Summer School, San Mateo, CA, Morgan Kaufmann. [36] Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. (1991). Generalization by weight elimination with application to forecasting. In R.P. Lippman, J.E. Moody, and D.S. Touretzky, editors, Advances in Neural Information Process-ing Systems 3 (NIPS 3), pp. 875-882. San Mateo, CA, Morgan Kauffman. [37] Weigend, A.S., and Rumelhart, D.E. (1994). Weight-elimination and effective network size. In S.J. Hanson, G.A. Drastal, and R.L. Rivest, editors, Computa-tional Learning Theory and Natural Learning Systems, pp 457-476. Cambridge, MA: MIT Press. [38] Whitley, D., and Gruau, F. (1993). Adding learning to the cellular development of neural networks: Evolution and the Baldwin effect. Evolutionary Computa-tion, 1, 213-233. [39] Whitley, D., Gordon, S., and Mathias, K. (1994). Lamarckian evolution, the Baldwin effect and function optimization. Parallel Problem Solving from Nature — PPSN III. Y. Davidor, H.P. Schwefel, and R. Manner, editors, pp. 6- 15. Berlin: Springer-Verlag. [40] Wolpert, D. (1992). On the connection between in-sample testing and generali-zation error. Complex Systems, 6, 47-94. [41] Wolpert, D. (1994). Off-training set error and a priori distinctions between learning algorithms. Technical Report SFI-TR-95-01-003, Santa Fe Institute. relation_type: [] relation_uri: [] reportno: ~ rev_number: 14 series: ~ source: ~ status_changed: 2007-09-12 16:41:02 subjects: - bio-evo - comp-sci-mach-learn - comp-sci-stat-model succeeds: ~ suggestions: ~ sword_depositor: ~ sword_slug: ~ thesistype: ~ title: 'How to shift bias: Lessons from the Baldwin effect' type: journalp userid: 2175 volume: 4