The Journal of Psychology, 1987, 121(3), 259-271

Reprinted from The Journal of Psychology, published by

HELDREF PUBLICATIONS, 4000 Albemarle Street, N.W., Washington, D.C. 20016.

Meta-Analysis of Pragmatic and Theoretical Research: A Critique

SIU L. CHOW

Department of Psychology

University of Wollongong, Australia

ABSTRACT. Meta-analysis refers to a set of statistical procedures used to summarize and integrate many empirical studies that focus on one issue. This numerical method of integrating research findings is said to be superior to the narrative type of reviews because it is more objective, reliable, and rigorous. Moreover, the meta-analytic approach is supposedly capable of resolving research controversies, strengthening empirical hypotheses, and discovering new relationships among variables. In this study, these claims are examined and found to be wanting. Some objections to the use of meta-analysis as a means of substantiating theoretical assertions are raised with reference to the rationale of experimentation and to how knowledge evolves. It is concluded that it is inappropriate to apply meta-analysis to integrate theoretical research.

AS THE NUMBER of empirical studies bearing on a research topic in-creases, a need arises to integrate existing findings in the form of a review. Most literature reviews in psychology are non-numerical and narrative in nature (Glass, 1976). Some investigators concerned with "outcome evaluation" studies have suggested that a more formal or quantitative approach to literature review should be adopted (Cooper, 1979; Feldman, 1971; Glass, 1976, 1978; Glass, McGaw, & Smith, 198 1; Jackson, 1980; Light & Smith, 197 1). The alternative approach suggested by Glass (1976) is called meta-analysis, for which specific numerical procedures are available (Cooper, 1982; Glass et al., 198 1; Rosenthal, 1978, 1983; Rosenthal &Rubin, 1979, 1982a, 1982b). It has been claimed that the meta-analytic approach is superior to the traditional, narrative approach (Cooper & Rosenthal, 1980).

Meehl (1978) distinguished between two types of research, namely, (a) pragmatic research analogous to that used in agronomy (e.g., "Would adding some potash to the soil help?") and (b) investigation of "entity-postulating" theories (e.g., "By virtue of what underlying mechanisms is a teacher's behavior affected by the teacher's expectation?"). For ease of exposition, these two kinds of research will be called pragmatic and theory-corroboration or theoretical research, respectively, in subsequent discussion.

Glass and Kliegl (1983) considered the objective of a meta-analysis of pragmatic research to be a means of influencing policy makers (see also Gallo, 1978; Mintz, 1983). The meta-analytic approach, however, has recently been adopted to integrate empirical findings with the hope of validating theoretical notions and advancing knowledge (e.g., Cooper, 1979; Harris & Rosenthal, 1985). In view of Meehl's (1978) distinction between pragmatic and theoretical research, a question may be raised about the propriety of ap-plying meta-analysis to theoretical research. The main features of the meta-analytic approach and its difficulties are presented here with reasons why this approach should not be applied to theoretical research.

Main Features of Meta-Analysis

In meta-analysis, the significance levels (or effect sizes) of two empirical studies are treated as analogous to the means of two or more random samples. As there is a procedure available for comparing the means of two or more randomly chosen samples, a comparable procedure was developed for comparing the significance levels or effect sizes of two or more empirical studies (Rosenthal, 1983; Rosenthal & Rubin, 1979). Another characteristic of the meta-analytic approach is its emphasis on effect size. This characteristic is taken seriously because an effect size is used as an indication of practical importance (Rosenthal, 1983). Analogous to a planned comparison among means, a method for performing planned comparisons among effect sizes has also been developed (Rosenthal, 1983).

The advocacy of meta-analysis is predicated on the assertion that literature reviews can, and should, be conducted in a standard, replicable, and rigorous manner (Cooper & Rosenthal, 1980; Glass et al., 1981). Moreover, the apparent chaos created by conflicting findings may be resolved by testing the overall significance level or effect size of all relevant studies (Harris & Rosenthal, 1985; Rosenthal & Rubin, 1978). Furthermore, it is believed that something new may emerge when the findings of many independent studies are combined (Light & Smith, 1971). For example, areas of research that have hitherto been neglected may be brought to the surface (Strube & Hartman, 1982), and theoretical notions may be sharpened as a result of providing a precise summary of existing findings (Fiske, 1983; Strube & Hartman, 1982). Moreover, the meta-analytic approach is believed to be compatible with the view that scientific knowledge accumulates (Jackson, 1980; Light & Smith, 1971).

Difficulties of Meta-Analysis

Meta-analysis is not without critics (e.g., Cook & Leviton, 1980; Eysenck, 1978; Gallo, 1978; Leviton & Cook, 198 1; Mintz, 1983; Rachman & Wilson, 1980; Sohn, 1980; Wilson & Rachman, 1983). Glass et al. (1981) summarized four difficulties of meta-analysis: quality, commensurability, selection-bias, and nonindependence.

The quality problem

To avoid any bias in the selection of studies to be included in a review, Glass (1976, 1978) suggested that as many studies as possible should be included in the review, regardless of their sources, commensurability, and quality. Some critics of meta-analysis have reservations about this practice because they question the propriety of including studies of poor quality (Eysenck, 1978; Gallo, 1978; Mintz, 1983; Rachman & Wilson, 1980; Strube & Hartman, 1982, 1983; Wilson & Rachman, 1983).

Glass (1978) found the inclusion of studies of questionable quality acceptable for the following reasons. First, to question the quality of a study is to confuse research findings with research design. Second, many weak studies may add up to a strong conclusion. Third, it was argued that if design flaws are important, there should be a correlation between the quality of a study and the magnitude of the effect size. Yet, there is no such correlation (see also Glass et al., 1981; Glass & Kliegl, 1983; Harris & Rosenthal, 1985; Rosenthal & Rubin, 1978). Similarly, Fiske (1983) ignored the quality problem because there was no consensus as to the overall quality of a study.

Some researchers sympathetic with meta-analysis recommended that studies of poor quality might be given less weight (Strube & Hartman, 1983). This is a poor solution because it introduces a source of unreliability. For example, precisely how much weight should be given to a particular study? Could all meta-analysts assign weights in a comparable manner?

Alternatively, Smith, Glass, and Miller (1980) suggested the following strategy. The meta-analyst could first compare the results of good and bad studies. If they produced different results, the bad studies would be discarded. If, on the other hand, they produced comparable results, they should be given equal weights. The question, however, then becomes what additional information could be gained by including studies of poor quality (Mintz, 1983).

The commensurability problem

Conclusions based on aggregating incommensurable studies may be misleading; it is not legitimate to compare apples and oranges (Cook & Leviton, 1980; Mintz, 1983; Presby, 1978). Glass et al. (1981) made several rejoinders. First, apples and oranges can properly be mixed together because they are both fruits. In other words, if some conceptual superset can be found, the apparently different studies can be treated as the same. Second, although the critics insist that only studies involving the same independent and dependent variables can be aggregated, the critics have never defined "same." Third, there are obvious individual differences among the subjects in an experiment. Yet, even the critics of meta-analysis would aggregate data from individual subjects. If data from different subjects can be aggregated, it should also be permissible to aggregate the results of diverse empirical studies.

Glass et al.'s (1981) appeal to individual differences is not a valid one. In the case of aggregating individual subjects' data, there are special procedures for minimizing the effects due to individual differences. For example, most investigators take care to assign a group of theoretically homogeneous subjects randomly to the control and experimental conditions of an experiment. Alternatively, subjects are first matched or blocked in some theoretically relevant way before they are randomly assigned to the various conditions of an experiment. Comparable procedures are not available in the case of meta-analysis.

Glass et al.'s (1981) "apples and oranges are fruits" argument is not convincing. For example, adopting such a superset may oversimplify the situation. There are real and important differences between apples and oranges, which may (but should not) be overlooked. Studies of oranges may have been specifically designed to study unique features of oranges, not because they are fruits (see Presby, 1978, for a discussion in the context of psychotherapy). For example, the acidity of oranges (or the texture of apples) is not a property common to all fruits. The outcome of Study A may be due to the acidity of oranges; the texture of apples may be the reason for the outcome of Study B. In other words, oranges and apples should not be mixed.

The selection-bias problem

Despite the claim that the meta-analytic approach is comprehensive, selective inclusion of studies in the integration or comparison exercise is inevitable (see Rachman & Wilson, 1980; Strube & Hartman, 1982; Wilson & Rachman, 1983). Hence, the meta-analytic approach is also susceptible to selection bias (Cook & Leviton, 1981; Rachman & Wilson, 1980; Wilson & Rachman, 1983). Instead of answering the criticism that the meta-analytic approach is not inherently immune from selection bias, Glass et al. (1981) treated the problem as though it was about "selection biases in reported research" (Glass et al., 1981, p. 227). This, however, is not a proper solution of the selection-bias problem.

The nonindependence problem

In a meta-analysis, summary statistics from individual studies are treated as if they are observations generated by individual subjects in an experiment. That is, underlying the meta-analytic approach is the assumption that ". . . each study can be viewed as a random sample taken from a common population" (Light & Smith, 1971, p. 445). In appealing to the theoretical notion of "random sampling," meta-analysts have to assume that the individual studies are independent of one another. Glass and his associates acknowledged this nonindependence problem (Glass, 1978; Glass et al., 1981; see also Strube & Hartman, 1983). Their solution was to assume that the nonindependence assumption was true because it was the practical thing to do (Glass, 1978, p. 376).

This creates two problems. First, is Glass's (1978) practical assumption warranted if the objective of a review is to advance knowledge and not to influence a policy maker? Second, the nonindependence problem arises only when a quantitative review is deemed necessary or appropriate. Is such a review necessary or appropriate for the advancement of knowledge?

Implications of the unresolved problems

Having seen that the four problems identified by Glass et al. (1981) have not been satisfactorily resolved, it is now necessary to consider their implication on the meta-analysts' claims about (a) rigor, replicability, and objectivity, (b) advancing knowledge, (c) resolving controversies, and (d) strengthening conclusions.

The "rigor, replicability, and objectivity" claim

In order for the meta-analytic approach to be rigorous, replicable, and objective, there must be self-evident or well-defined criteria for selecting as well as coding individual studies, estimating effect size, aggregating significance levels, and combining effects. In view of the unanswered selection-bias problem mentioned previously, it is obvious that such a set of objective criteria is missing. Moreover, the coding of studies is also susceptible to subjective influences (Cook & Leviton, 1980; Mintz, 1983; Wilson & Rachman, 1983), and the choice of the technique used to estimate effect size is an arbitrary one (see Strube, 198 1; Strube& Hartman, 1982).

Mintz's (1983) distinction between "public" and "objective" criteria is instructive. What the meta-analysts have done is to state publicly the set of criteria used in selecting and coding studies. The problem of bias still arises, however, because there is no self-evident way of setting up the criteria. Alternatively, it may not be meaningful to set up the criteria in an algorithmic manner (Mintz, 1983). That is, the root of the problem of selection bias is not the "(selection) but the arbitrary sampling criteria …" (Wilson & Rachman, 1983, p. 56, italics added).

Strengthening of conclusion

To Glass (1978, see particularly p. 356), a collection of weak studies can lead to a strong conclusion in a meta-analysis. The difficulty with Glass's contention is his total disregard for validity. Glass (1978) did not consider whether a conclusion was warranted. This is antithetical to the objective of advancing knowledge (Campbell & Stanley, 1963; Chow, in press-c; Cook & Campbell, 1979). It is also important to note that one can be sympathetic with the meta-analytic approach without being as extreme as Glass (see, e.g., Shapiro & Shapiro, 1983).

Resolution of controversies

Meta-analysts envisage two ways of resolving controversies. First, there is the "voting method" (Light & Smith, 1971) in which the meta-analyst assembles a set of relevant studies. A tally is then made of the number of studies that show a significant result and the number of studies that do not. A positive conclusion is made if a majority of the studies show a significant result; a negative conclusion is made otherwise (Light & Smith, 1971).

A more sophisticated version of this method has recently been used by Rosenthal and Rubin (1978). They counted the number of studies that reported a significant interpersonal expectancy effect. The median proportion of studies (out of 345 studies) that reported a significant expectancy effect in the predicted direction was 0.39 (Rosenthal & Rubin, 1978, p. 379). The interpersonal expectancy effect is believed to be supported because such a proportion is higher than what would be expected by chance (see also Rosen-thal, 1973). Nowhere is such an argument more explicit than the following contention by Rosenthal:

But we must not reject the theory because "only" 84 studies support it; on the contrary. According to the rules of statistical significance, we could expect 5% of those 242 studies (about 12) to have come out as predicted just by chance. (Rosenthal, 1973, p. 59).

The question that is suggested is whether research controversies are just arguments about whether there is an effect in a majority of studies. Disputes among researchers are more often about data interpretation. The more serious implication of Rosenthal's (1973; Rosenthal & Rubin, 1978) reasoning is that, to substantiate a theoretical notion, it is not necessary for the theory to be correct every time. All that is required is that the theoretical notion is supported more than 5% of the time (Mixon, personal communication). Such a position has been characterized as "preposterous" (Meehl, 1978).

Meta-theoretical issues implicated by meta-analysis

The lesson to be learned from Meehl's (1978) distinction between pragmatic and theoretical research is that they implicate different meta-theoretical questions. For example, the success of pragmatic research is judged by its utility, whereas the success of theoretical research has to be judged by its explanatory power as well as by its ability to define what should happen under some well-defined conditions. Although questions about "warranted assertability" are important to theoretical research (Chow, in press-a; Manicas & Secord, 1983), they are ignored in pragmatic research (Glass et al., 1981). Moreover, although it is sufficient to determine whether there is an effect in pragmatic effect without asking why, the "why" question is important in theoretical research.

The primary concern of the theoreticians is how to choose among many theoretical contenders. Experimentation is ultimately the only way to choose among alternative theories (Berkowitz & Donnerstein, 1982; Campbell & Stanley, 1963; Chow, in press-a, in press-c; Cook & Campbell, 1979; Manicas & Secord, 1983; Meehl, 1978; Mook, 1983). Literature reviews of theoretical research should be concerned with how well a certain theoretical position is supported (as well as in what way) by experimental studies (see, for example, Coltheart's, 1975, 1980, and Haber's, 1983, discussions of the interesting fate of iconic memory).

Consequently, to assess the propriety of applying meta-analysis to literature reviews of theoretical research, it is essential that the objectives, procedures, and functions of meta-analysis be considered with reference to the rationale of experimentation. For example, were Glass et al. (1981) and Fiske (1983) warranted in dismissing the commensurability and the quality problems on the grounds that there was not any clear criterion of sameness and quality? Is questioning the quality of some experimental data equivalent to a confusion between research findings and research design, as asserted by Glass (1978)? Ultimately, it is necessary to question the suggestion that literature review should be algorithmic in nature, and that knowledge accumulates in a quantitative way.

Problems of meta-analysis of theoretical research

Often, the diverse experiments bearing on a theory are "converging operations" (Garner, Hake, & Eriksen, 1956), "conceptual replications" (Cozby, 1981), or "constructual replications" (Cook & Leviton, 1980) devised to test various aspects of the theory. The important points about these converging operations are that (a) they are not literal replications of the same experiment, (b) the experimental conditions and procedures used in the converging operations (i.e., the individual experiments) are often very different, (c) the experimental task may be very different from the original phenomenon for which the theory is proposed, and (d) it is inevitable that some auxiliary assumptions have to be made implicitly (see Chow, in press-c, for a detailed discussion).

In sum, there are theoretical, methodological, and empirical reasons why the various experiments within a series of converging operations are different. They have to be different because they are concerned with different aspects of the theory. More importantly, these differences cannot be ignored by appealing to a super-category. If data from these experiments are aggregated in the way suggested by the meta-analysts (e.g., Glass et al., 1981; Harris & Rosenthal, 1985), apples are being mixed with oranges with no defensible justification.

The quality problem in meta-analysis of theoretical research

The quality problem is a validity problem. An experiment that is deficient in either statistical conclusion validity, internal validity, or construct validity is meaningless and, therefore, worthless. Consequently, it should not be used (Cook & Campbell, 1979; Meehl, 1978).

Knowledge evolves rather than accumulates

Formal logic dictates that no unambiguous conclusion about the truth of a theory is possible, no matter how long the series of converging operations is extended. Everytime we fail to reject a component of a theory, however, our understanding of the theory is advanced. Moreover, included in the auxiliary assumptions are meta-theoretical and methodological assumptions. In order to retain a theory in the face of negative experimental outcomes, an investigator has to demonstrate that one (or more) of the auxiliary assumptions is not warranted. That is, the investigator has to critically evaluate the measurement used, the experimental design adopted, or the statistical procedures chosen. It is true that the number of probable alternative explanations is potentially large. It does not follow that this is a source of embarrassment, however, as suggested by Glass (1978). In fact, this is as it should be, because there are logically numerous ways in which an investigator can be wrong. Moreover, one way for knowledge to grow is to refine previously held assumptions.

Glass's (1978) appeal to parsimony is questionable for two reasons. First, critical examinations of the auxiliary assumptions do not necessarily increase the complexity of the theory under investigation. Second, the principle of parsimony refers to a situation in which there are two or more contending theories that can account for the same phenomenon equally well. The simplest one among the contenders is the preferred one. This is very different from Glass's (1978) representation of the issue. Our knowledge changes as a series of converging operations progresses. As has been shown, this process accumulation of changes is not a numerical exercise; it is a conceptual one. It is more appropriate and less misleading to say "knowledge evolves," rather than "knowledge accumulates."

The Meta-Analytic Objectives Revisited

Two questions should be raised in regard to the meta-analyst's call for a formal, standardized, replicable, and rigorous literature-review process. First, should the review be formal, standardized, and replicable? Second, should the rigor be a quantitative one?

Research in psychology is very often a theory-corroboration process. That is, an observable behavioral phenomenon is to be explained by an unobservable hypothetical mechanism that is assumed to be real and efficacious (Chow, in press-c; Harré & Secord, 1972; Manicas & Secord, 1983; Popper, 1968/1962). The theoretical notions involved are the products of speculation, however educated (Popper, 1968/1962).

In other words, theoretical disputes are often about interpretations, not data. The manner in which these speculative accounts are exemplified is heavily colored by the theorist's meta-theoretical beliefs and methodological assumptions. Consequently, to properly review a set of studies requires a critical examination of these assumptions. This is a conceptual, not a statistical, exercise. Hence, a literature review should not be a formalized or standardized one. Furthermore, a second review is carried out only when the second reviewer does not agree with the interpretation expressed in the first review. By its very nature, the second review is often different from the first one. In other words, the requirement of replicability is contrary to one of the main functions of literature review, namely, to exchange theoretical understanding.

It has also been claimed that the meta-analytic approach may be used to test a theory (Harris & Rosenthal, 1985) and to suggest new relationships (Light & Smith, 1971; Strube & Hartman, 1982). Harris and Rosenthal's (1985) exercise was an attempt to confirm a theoretical notion in a post hoc manner. Yet, one of the important features of theory-coffoboration experimentation is that the theoretical expectation of an experiment should be stated before data collection. The post hoc nature of meta-analysis is antithetical to the rationale of experimentation (see also Chow, in press b).

The traditional, narrative kind of literature review is characterized as being "casual" and without any evidential support by Glass (1978). Meta-analysis owes its apparent attractiveness to its being described as the opposite of the narrative approach. Even if Glass's (1978) characterization of narrative reviews were true of pragmatic research, it is definitely not true of narrative reviews of theoretical research (see, e.g., Kahneman's, 1968, review of masking, and Coltheart's, 1980, review of iconic memory). It is not the case that narrative reviews lack rigor. On the contrary, rigor is maintained by reviewers of the traditional approach when they evaluate the validity of individual studies. They judge the validity of a study with reference to well-defined statistical procedures and experimental designs. Moreover, study-selection is guided by theoretical relevance. Yet, these features of rigor are derogated as an effort ". . . to carp on the design or analysis deficiencies . . ." by Glass (1976, p. 4). A reviewer's judicious and teoretically guided selection and evaluation of individual studies is ironically considered as an exemplification of selection bias.

Summary

Theoretical understanding of a phenomenon is extended by converging operations, not literal replications. Yet, meta-analysts have to assume that all of the studies to be integrated are literal replications of the same experiment. By their very nature, properly designed experiments forming a series of converging operations for corroborating a theory are different experiments. Consequently, they are not commensurable in the way required by meta-analysis.

A case is made that knowledge evolves as a result of trial and error at the conceptual level. Such a process cannot (and should not) be represented in a numerical or algorithmic manner. This is the case because theoretical disputes have to be settled by evaluating contending interpretations of data with respect to theoretical and methodological assumptions, as well as to the requirements of different kinds of validity. Narrative reviews can be rigorous because they are guided by principles of logic, experimental design, and statistical procedures. In sum, the method of literature review arrived at influencing policy makers cannot be validly applied to theoretical reviews and should not be used.

REFERENCES

Berkowitz, L., & Donnerstein, E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. American Psychologist, 37, 245-257.

Campbell, D. T., & Stanley, J. L. (1963). Experimental and quasi-experimental designs or research. Chicago: Rand McNally.

Chow, S. L. (in press-a). Science, ecological validity, and experimentation. Journal for the Theory of Social Behaviour.

Chow. S. L. (in press-b). Some reflections on Harris and Rosenthal's thirty-one meta-analyses. Journal of Psychology.

Chow, S. L. (in press-c). Experimental psychology: Rationale, procedures, and issues. Detselig Enterprises Ltd.

Coltheart, M. (1975). Iconic memory: A reply to Professor Holding. Memory & Cognition, 3, 42-48.

Coltheart, M. (1980). Iconic memory and visible persistence. Perception & Psychophysics, 27, 183-228.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analyses issues. Chicago: Rand McNally.

Cook, T. D., & Leviton, L. C. (1980). Reviewing the literature: A comparison of traditional methods with meta-analysis. Journal of Personality, 48, 449-472,

Cooper, H. M. (1979). Statistically combining independent studies: A meta-analysis of sex differences in conformity research. Journal of Personality and Social Psychology, 37, 131-146.

Cooper, H. M. (1982). Scientific guidelines for conducting integrative research reviews. Review of Educational Research, 52, 291-302.

Cooper, H. M., & Rosenthal, R. (1980). Statistical versus traditional procedures for summarizing research findings. Psychological Bulletin, 87, 442-449.

Cozby, P. C. (1981). Methods in behavioral research (2^nd ed.). Palo Alto, CA: Mayfield.

Eysenck, H. J. (1978). An exercise in mega-silliness. American Psychologist, 33, 517.

Feldman, K. A. (197 1). Using the work of others: Some observations on reviewing and integrating. Sociology of Education, 44, 86-102.

Fiske, D. W. (1983). The meta-analysis revolution in outcome research. Journal of Consulting and Clinical Psychology, 51, 65-70.

Gallo, P. S., Jr. (1978). Meta-analysis-A mixed metaphor? American Psychologist, 33, 515-517.

Gamer, W. R., Hake, H. W., & Eriksen, C. W. (1956). Operationalism and the concept of perception. Psychological Review, 63, 149-159.

Glass, G. V. (1976). Primary, secondary and meta-analysis of research. Educational Researcher, 5, 3-8.

Glass, G. V., & Kliegi, R. M. (1983). An apology for research integration in the study of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 28-41.

Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analvsis in social research. Beverly Hills, CA: Sage.

Haber, R. N. (1983). The impending demise of the icon: A critique of the concept of iconic storage in visual information processing. Behavioral and Brain Sciences, 6, 1-11.

Harré, R., & Secord, P. F. (1972). The explanation of social behavior. Oxford: Blackwell.

Harris, M. J., & Rosenthal, R. (1985). Mediation of interpersonally expectancy effects: 31 meta-analyses. Psychological Bulletin, 97, 363-386.

Jackson, G. B. (1980). Methods for integrative reviews. Review of Education Research, 50, 438-460.

Kahneman, D. (1968). Methods, findings, and theory in studies of visual masking. Psychological Bulletin, 70, 404-425.

Leviton, L. C., & Cook, T. D. (1981). What differentiates meta-analysis from other forms of review. Journal of Personality, 49, 231-236.

Light, R. J., & Smith, P. V. (197 1). Accumulating evidence: Procedures for resolving contradictions among different research studies. Harvard Educational Review:41,429-471,

Manicas, P. T., & Secord, P. F. (1983). Implications for psychology of the new philosophy of science. American Psychologist, 38, 399-413.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 4, 806-834.

Mintz, J. (1983). Integrating research evidence: A commentary on meta-analysis. Journal of Consulting and Clinical Psychology, 51, 71-75.

Mook, D. G. (1983). In defense of external invalidity, American Psychologist, 38, 379-387.

Popper, K. R. (1968). Conjectures and reputations (originally published in 1962). New York: Harper & Row.

Presby, S. (1978). Overly broad categories obscure important differences between therapies. American Psychologist, 33, 514-515.

Rachman, S., & Wilson, G. T, (1980). The effects of psychological therapy. Oxford: Pergaman.

Rosenthal, R, (1973, September). The Pygmalion effect lives. Psychology Today, 56-64.

Rosenthal, R, (1978). Combining results of independent studies. Psychological Bulletin, 85, 185-193.

Rosenthal, R, (1983). Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 4-13.

Rosenthal, R,, & Rubin, D. B. (1978). Interpersonal expectancy effects: The first 345 studies. Behavioral and Brain Sciences, 3, 377-386,

Rosenthal, R,, & Rubin, D. B. (1979). Comparing significance levels of independent studies. Psychological Bulletin. 86, 1165-1168.

Rosenthal, R., & Rubin, D, B, (1982a). Comparing effect sizes of independent studies. Psychological Bulletin, 92, 500-504.

Rosenthal, R,, & Rubin, D. B. (1982b). A simple, general purpose display of magnitude of experimental effect. .Journal of Educational Psychology, 74, 166-169.

Shapiro, D. A., & Shapiro, D. (1983). Comparative therapy outcome research: Methodological implications of meta-analysis. Journal of Consulting and Clinical Psychology, 51, 42-53.

Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy outcome studies. American Psychologist, 32, 752-760.

Smith, M. L,, Glass, G. V., & Miller, T. I. (1980). Benefits of psychotherapy. Baltimore, MD: Johns Hopkins University Press.

Sohn, D. (1980). Critique of Cooper's meta-analytic assessment of the findings of sex differences in conformity behavior. Journal of Personality and Social Psychology, 39, 1215-1221.

Strube, M. J. (1981). Meta-analysis and cross-cultural comparison: Sex differences in child competitiveness. Journal of Cross-Cultural Psychology, 12, 3-20.

Strube, M. J., & Hartman, D. P. (1982). A critical appraisal of meta-analysis. British Journal of Clinical Psychology, 21, 129-139.

Strube, M. J., & Hartman, D. P. (1983), Meta-analysis: Techniques, applications, and functions. Journal of Consulting and Clinical Psychology, 51, 14-27.

Wilson, G. T., & Rachman, S. J. (1983). Meta-analysis and the evaluation of psychotherapy outcome: Limitations and liabilities. Journal of Consulting and Clinical Psychology, 51, 54-64.

*This research was supported by a Category A grant from the University of Wollongong. I would like to thank Philip de Lacey, Dennis Hunt, Don Mixon, and Nicola Ronan for their helpful comments.

Requests for reprints should be sent to Siu L. Chow, Department of Psychology,

University of Regina, Regina, Saskatchewan, Canada, S4S 0A2