How to shift bias: Lessons from the Baldwin effect

Turney, Peter D. (1996) How to shift bias: Lessons from the Baldwin effect. [Journal (Paginated)]

Full text available as:



An inductive learning algorithm takes a set of data as input and generates a hypothesis as output. A set of data is typically consistent with an infinite number of hypotheses; therefore, there must be factors other than the data that determine the output of the learning algorithm. In machine learning, these other factors are called the bias of the learner. Classical learning algorithms have a fixed bias, implicit in their design. Recently developed learning algorithms dynamically adjust their bias as they search for a hypothesis. Algorithms that shift bias in this manner are not as well understood as classical algorithms. In this paper, we show that the Baldwin effect has implications for the design and analysis of bias shifting algorithms. The Baldwin effect was proposed in 1896, to explain how phenomena that might appear to require Lamarckian evolution (inheritance of acquired characteristics) can arise from purely Darwinian evolution. Hinton and Nowlan presented a computational model of the Baldwin effect in 1987. We explore a variation on their model, which we constructed explicitly to illustrate the lessons that the Baldwin effect has for research in bias shifting algorithms. The main lesson is that it appears that a good strategy for shift of bias in a learning algorithm is to begin with a weak bias and gradually shift to a strong bias.

Item Type:Journal (Paginated)
Keywords:bias, instinct, bias shift, Baldwin effect, concept learning, induction.
Subjects:Biology > Evolution
Computer Science > Machine Learning
Computer Science > Statistical Models
ID Code:1818
Deposited By:Turney, Peter
Deposited On:11 Oct 2001
Last Modified:11 Mar 2011 08:54

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

[1] Ackley, D., and Littman, M. (1991). Interactions between learning and evolu-tion.

In Proceedings of the Second Conference on Artificial Life, C. Langton, C.

Taylor, D. Farmer, and S. Rasmussen, editors. California: Addison-Wesley.

[2] Anderson, R.W. (1995). Learning and evolution: A quantitative genetics

approach. Journal of Theoretical Biology, 175, 89-101.

[3] Bala, J., Huang, J., Vafaie, H., DeJong, K., and Wechsler, H. (1995). Hybrid

learning using genetic algorithms and decision tress for pattern classification.

Proceedings of the 14th International Joint Conference on Artificial Intelli-gence,

IJCAI-95, Montreal, Canada, pp. 719-724.

[4] Balakrishnan, K., and Honavar, V. (1995). Evolutionary design of neural archi-tectures:

A preliminary taxonomy and guide to literature. Artificial Intelligence

Research Group, Department of Computer Science, Iowa State University,

Technical Report CS TR #95-01.

[5] Baldwin, J.M. (1896). A new factor in evolution. American Naturalist, 30, 441-


[6] Barkow, J.H., Cosmides, L., and Tooby, J. (1992). Editors, The Adapted Mind:

Evolutionary Psychology and the Generation of Culture, New York: Oxford

University Press.

[7] Belew, R.K., and Mitchell, M. (1996). Editors, Adaptive Individuals in Evolving

Populations: Models and Algorithms. Massachusetts: Addison-Wesley.

[8] Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the

bias/variance dilemma, Neural Computation, 4, 1-58.

[9] Glover, F. (1989). Tabu search — part i. ORSA (Operations Research Society of

America) Journal on Computing, 1, 190-260.

[10] Glover, F. (1990). Tabu search — part ii. ORSA (Operations Research Society

of America) Journal on Computing, 2, 4-32.

[11] Gordon, D.F., and desJardins, M. (1995). Evaluation and selection of biases in

machine learning. Machine Learning, 20, 5-22.

[12] Grefenstette, J.J. (1983). A user’s guide to GENESIS. Technical Report

CS-83-11, Computer Science Department, Vanderbilt University.

[13] Grefenstette, J.J. (1986). Optimization of control parameters for genetic algo-rithms.

IEEE Transactions on Systems, Man, and Cybernetics, 16, 122-128.

[14] Harvey, I. (1993). The puzzle of the persistent question marks: A case study of

genetic drift. In S. Forrest (editor) Proceedings of the Fifth International Con-ference

on Genetic Algorithms, ICGA-93, California: Morgan Kaufmann.

[15] Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and

Valiant’s learning framework. Artificial Intelligence, 36, 177-221.

[16] Hinton, G.E., and Nowlan, S.J. (1987). How learning can guide evolution.

Complex Systems, 1, 495-502.

[17] Hinton, G.E. (1986). Learning distributed representations of concepts. Proceed-ings

of the Eighth Annual Conference of the Cognitive Science Society, 1-12,

Hillsdale: Erlbaum.

[18] Lawrence, D. (1987). Genetic Algorithms and Simulated Annealing. California:

Morgan Kaufmann.

[19] Maynard Smith, J. (1987). When learning guides evolution. Nature, 329, 761-


[20] Morgan, C.L. (1896). On modification and variation. Science, 4, 733-740.

[21] Nolfi, S., Elman, J., and Parisi, D. (1994). Learning and evolution in neural net-works.

Adaptive Behavior, 3, 5-28.

[22] Nowlan, S.J., and Hinton, G.E. (1992). Simplifying neural networks by soft

weight-sharing. Neural Computation, 4, 473-493.

[23] Osborn, H.F. (1896). Ontogenic and phylogenic variation. Science, 4, 786-789.

[24] Pinker, S. (1994). The Language Instinct: How the Mind Creates Language.

New York: William Morrow and Co.

[25] Provost, F.J., and Buchanan, B.G. (1995). Inductive policy: The pragmatics of

bias selection. Machine Learning, 20, 35-61.

[26] Rendell, L. (1986). A general framework for induction and a study of selective

induction. Machine Learning, 1, 177-226.

[27] Schaffer, C. (1993). Selecting a classification method by cross-validation.

Machine Learning, 13, 135-143.

[28] Schaffer, C. (1994). A conservation law for generalization performance. Pro-ceedings

of the Eleventh International Machine Learning Conference, ML-94.

California: Morgan Kaufmann.

[29] Tcheng, D., Lambert, B., Lu, S., Rendell, L. (1989). Building robust learning

systems by combining induction and optimization. Proceedings of the Eleventh

International Joint Conference on Artificial Intelligence, IJCAI-89, pp. 806-

812. Detroit, Michigan.

[30] Turney, P.D. (1995). Cost-sensitive classification: Empirical evaluation of a

hybrid genetic decision tree induction algorithm. Journal for AI Research, 2,


[31] Utgoff, P., and Mitchell, T. (1982). Acquisition of appropriate bias for inductive

concept learning. Proceedings of the National Conference on Artificial Intelli-gence,

AAAI-82, Pittsburgh, pp. 414-417.

[32] Utgoff, P. (1986). Shift of bias for inductive concept learning. In Machine

Learning: An Artificial Intelligence Approach, Volume II. Edited by R.S.

Michalski, J.G. Carbonell, and T.M. Mitchell. California: Morgan Kaufmann.

[33] Waddington, C.H. (1942). Canalization of development and the inheritance of

acquired characters. Nature, 150, 563-565.

[34] Wcislo, W.T. (1989). Behavioral environments and evolutionary change.

Annual Review of Ecology and Systematics, 20, 137-169.

[35] Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. (1990). Predicting the

future: A connectionist approach. In T.J. Sejnowski, G.E. Hinton, and D.S.

Touretzky, editors, Proceedings of the 1990 Connectionist Models Summer

School, San Mateo, CA, Morgan Kaufmann.

[36] Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. (1991). Generalization

by weight elimination with application to forecasting. In R.P. Lippman, J.E.

Moody, and D.S. Touretzky, editors, Advances in Neural Information Process-ing

Systems 3 (NIPS 3), pp. 875-882. San Mateo, CA, Morgan Kauffman.

[37] Weigend, A.S., and Rumelhart, D.E. (1994). Weight-elimination and effective

network size. In S.J. Hanson, G.A. Drastal, and R.L. Rivest, editors, Computa-tional

Learning Theory and Natural Learning Systems, pp 457-476. Cambridge,

MA: MIT Press.

[38] Whitley, D., and Gruau, F. (1993). Adding learning to the cellular development

of neural networks: Evolution and the Baldwin effect. Evolutionary Computa-tion,

1, 213-233.

[39] Whitley, D., Gordon, S., and Mathias, K. (1994). Lamarckian evolution, the

Baldwin effect and function optimization. Parallel Problem Solving from

Nature — PPSN III. Y. Davidor, H.P. Schwefel, and R. Manner, editors, pp. 6-

15. Berlin: Springer-Verlag.

[40] Wolpert, D. (1992). On the connection between in-sample testing and generali-zation

error. Complex Systems, 6, 47-94.

[41] Wolpert, D. (1994). Off-training set error and a priori distinctions between

learning algorithms. Technical Report SFI-TR-95-01-003, Santa Fe Institute.


Repository Staff Only: item control page