A theory of cross-validation error

Turney, Peter D. (1994) A theory of cross-validation error. [Journal (Paginated)]

Full text available as:



This paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precisely how these conflicting demands must be balanced, in order to minimize cross-validation error. A general theory is presented, then it is developed in detail for linear regression and instance-based learning.

Item Type:Journal (Paginated)
Subjects:Computer Science > Artificial Intelligence
Computer Science > Machine Learning
Computer Science > Statistical Models
ID Code:1820
Deposited By:Turney, Peter
Deposited On:13 Oct 2001
Last Modified:11 Mar 2011 08:54

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

Aha, D.W., Kibler, D. (1989) Noise-tolerant instance-based learning algorithms, Proceed-ings

of the Eleventh International Joint Conference on Artificial Intelligence, 794-


Aha, D.W., Kibler, D., & Albert, M.K. (1991) Instance-based learning algorithms,

Machine Learning, 6:37-66.

Akaike, H. (1970) Statistical predictor identification, Annals of the Institute of Statistical

Mathematics, 22:203-217.

Akaike, H. (1973) Information theory and an extension of the maximum likelihood

principle, Second International Symposium on Information Theory, edited by B.N.

Petrov and F. Csaki (Budapest: Akademia Kiado).

Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on

Automatic Control, AC-19: 716-723.

Barron, A.R. (1984) Predicted squared error: a criterion for automatic model selection, in

Self-organizing Methods in Modeling: GMDH Type Algorithms, edited by S.J.

Farlow (New York: Marcel Dekker).

Dasarathy, B.V. (1991) Nearest Neighbor Pattern Classification Techniques, Edited col-lection

(California: IEEE Press).

Draper, N.R. & Smith, H. (1981) Applied Regression Analysis, Second Edition (New

York: John Wiley & Sons).

Ein-Dor, P. & Feldmesser, J. (1987) Attributes of the performance of central processing

units: a relative performance prediction model, Communications of the ACM,


Eubank, R.L. (1988) Spline Smoothing and Nonparametric Regression (New York:

Marcel Dekker).

Fraser, D.A.S. (1976) Probability and Statistics: Theory and Applications (Massachusetts:

Duxbury Press).

Geman, S., Bienenstock, E., & Doursat, R. (1992) Neural networks and the bias/variance

dilemma, Neural Computation, 4:1-58.

Kibler, D., Aha, D.W., & Albert, M.K. (1989) Instance-based prediction of real-valued

attributes, Computational Intelligence, 5:51-57.

Moody, J.E. (1991) Note on generalization, regularization, and architecture selection in

nonlinear learning systems, First IEEE-SP Workshop on Neural Networks for Signal

Processing (California: IEEE Press).

Moody, J.E. (1992) The effective number of parameters: an analysis of generalization and

regularization in nonlinear learning systems, in Advances in Neural Information

Processing Systems 4, edited by J.E. Moody, S.J. Hanson, and R.P. Lippmann (Cali-fornia:

Morgan Kaufmann).

Sakamoto, Y., Ishiguro, M., & Kitagawa, G. (1986) Akaike Information Criterion Statis-tics

(Dordrecht, Holland: Kluwer Academic Publishers).

Strang, G. (1976) Linear Algebra and Its Applications (New York: Academic Press).

Turney, P.D. (1990) The curve fitting problem: a solution, British Journal for the Philoso-phy

of Science, 41:509-530.


Repository Staff Only: item control page