1032 Weidemann and mueller Decision Noise and Response Time relatively more incorrect responses in the tails of the dis- Using response time to qualify the classification re- tributions, which is necessary to capture the smooth transi- sponse, Balakrishnan and MacDonald analyzed those con- tion between values below and above 1.0 and the small ten- ditions in our experiment (Mueller & Weidemann, 2008) dency for the likelihood ratios to approach 1.0 for extreme that required only a classification response (but no con- (fast or slow) RTs. To validate the assumptions for this sim- fidence ratings). Their results replicated those from our ulation, we confirmed that our data indeed showed higher analyses that used confidence ratings to qualify the clas- variability for incorrect trials and that the distribution of sification response. Balakrishnan and MacDonald argued RTs was approximately log normal. For each condition, that, unlike the confidence rating results, the RT results we simulated 200,000 trials with “yes” and “no” response could not be explained by differential levels of decision proportions set equal to the empirical hit and false alarm noise, because only one type of response was required. rates and stimulus base rates in both 2:1 and 1:2 propor- Crucial to our account of confidence rating data (Muel- tions. We then recoded and binned the simulated RTs as ler & Weidemann, 2008) was the notion of different Balakrishnan and MacDonald did, in order to obtain the amounts of decision noise for classification responses and RT-ROC and RT-likelihood ratio functions shown in Fig- dependent measures indexing response confidence. Con- ures 1 and 2 (we used a bin size of 5,000 samples when the fidence ratings are not the only conceivable measure of re- entire simulated data set was sorted by RT, which resulted sponse confidence, and RTs in particular could reasonably in 2.5% of the data falling into any RT bin). be used for this purpose (with faster RTs often indexing We note that the qualitative pattern of the simulated RT- more confident responses). Confidence noise, therefore, ROC functions in Figure 1 is very similar to that seen in the is not specific to confidence ratings but naturally extends confidence data (Mueller & Weidemann, 2008, Figure 9). to other dependent variables. Likewise, the simulated RT likelihood ratio functions (Fig- The notion of response confidence, however, does ure 2) are consistently below 1.0 for “A” responses and con- not need to be invoked. Many dependent measures (e.g., sistently above 1.0 for “B” responses irrespective of stimu- RTs, or electrophysiological or brain imaging data) could lus base rates, much like the actual data (Balakrishnan & qualify a classification response and could reasonably be MacDonald, Figure 2). We made the assumptions described used to generate ROC or likelihood ratio functions. To the above primarily for simplicity and convenience. The fact extent that these measures are noisy indices of internal that these patterns can be observed for randomly gener- states, they could produce patterns of results that are simi- ated RTs with a slight difference in variance for correct and lar to those observed for confidence ratings. Depending on incorrect trials shows that they do not depend on sophis- the nature of these measures, “confidence noise” or even ticated assumptions about the mapping between internal “decision noise” may not be the most appropriate terms to states and RTs. It is also obvious that the simple model describe the noise associated with them, but nevertheless described above cannot account for every detail of the em- their effects could mirror those of decision noise. pirical likelihood ratio functions shown by Balakrishnan Indeed, even when RTs are generated randomly, the main RT results (i.e., crossing RT-ROC functions and likelihood functions that pass through 1.0 at the middle 1.0 RT bin regardless of base rate) can be captured simply by setting the hit and false alarm rates as well as the stimulus Proportion of Signal Trials With RT bin > k Base rate ratio (A to B): base rates to the actual values. These assumptions are con- 1 to 2 .8 sistent with a shift in a decision criterion (although they do not necessarily imply it), and they illustrate that the main aspects of the data can be captured even with very Base rate ratio (A to B): basic assumptions. As we show below, with the additional .6 2 to 1 constraint (confirmed in our data) that RTs for correct re- sponses are slightly less variable than those for incorrect responses, one can capture the more subtle features of the .4 RT likelihood ratio function—namely, the smooth transi- tion between values below and above 1.0, and the small tendency to approach 1.0 for extreme (fast or slow) RTs. .2 To illustrate how decision noise can produce RT-ROC Respond “A” functions that are similar to the confidence ROC (C-ROC) Transition point functions that we observed (Mueller & Weidemann, 2008, Respond “B” 0 Figure 9) and to RT-likelihood ratio functions like those reported by Balakrishnan and MacDonald (Figure 2), we 0 .2 .4 .6 .8 1.0 simulated data from a classification experiment with hit and false alarm rates equivalent to those in our experiment. Proportion of Noise Trials With RT bin > k We randomly generated RTs for all conditions by sampling Figure 1. Simulated response time (RT) receiver operating from normal distributions with means of 1,000 msec and characteristic (ROC) functions. These simulated functions show standard deviations of 100 msec for correct trials and the same qualitative pattern as those for the ROC functions based 110 msec for incorrect trials. These assumptions lead to on confidence ratings (Mueller & Weidemann, 2008). notes and Comment 1033 to deviate from the point between the classification re- Simulated RT Likelihood Ratio 3.0 sponses (such a deviation would indicate a biased decision rule, according to classical SDT) and furthermore argue 2.5 that the likelihood ratios associated with low confidence Base rate ratio (A to B): 2.0 2 to 1 responses tend to be very close to 1.0. As Balakrishnan 1.5 and MacDonald point out, the DNM can produce these results, but it is not constrained to do so—a point that they 1.0 view as a disadvantage of the DNM. Respond “A” 0.5 Whereas the results described above (likelihood ra- Respond “B” 0.0 tios transitioning from below to above 1.0 between the response categories and approaching 1.0 for low confi- 1 20 40 60 80 dence ratings) have been found repeatedly, they are by no RT bin means universal. Figure 3 shows the ratios of the small- Simulated RT Likelihood Ratio est confidence likelihood ratio above 1.0 and the largest 3.0 below 1.0 in the data presented by Van Zandt (2000) and 2.5 modeled with the DNM by us (Mueller & Weidemann, Base rate ratio (A to B): 2008). For example, a ratio of 4.0 indicates that the like- 2.0 1 to 2 lihood ratio changed by a factor of 4.0 (e.g., from 0.5 to 1.5 2.0) between the confidence responses flanking the neu- 1.0 tral likelihood ratio (usually the two lowest confidence ratings for either classification response). Most of these Respond “A” 0.5 Respond “B” ratios are close to 1.0, but several ratios on the upper end 0.0 of the scale shown in Figure 3 indicate that the neutral 1 20 40 60 80 likelihood ratio is not always smoothly approached. In light of these data, a model constrained in the ways sug- RT bin gested by Balakrishnan and MacDonald does not seem Figure 2. Simulated response time (RT) likelihood ratio func- warranted. Indeed, our model predicts that violations of tions. As in the RT data (Balakrishnan & MacDonald, 2008), these constraints should be observable to the extent that the likelihood ratio is consistently below 1 for “A” responses and the response policy can be shifted substantially, espe- consistently above 1 for “B” responses, irrespective of stimulus cially if the difference between classification and confi- base rate. dence noise can be reduced. and MacDonald (Figure 2). Particularly the empirical func- Sequential Dependencies tions seem somewhat noisier and also seem to approach We identified substantial sequential dependencies in 1.0 equally from both sides at the transition between clas- our data, which may represent one of presumably many sification responses, whereas the tendency to approach 1.0 sources of decision noise (Mueller & Weidemann, 2008). at the transition is more pronounced for the more frequent More specifically, we showed that participants in our ex- classification responses in our simulations. periment were likely to repeat the previous confidence These simulations suggest that more realistic models rating even when the classification response changed of RTs in signal detection tasks do not necessarily have to (e.g., a high-confidence “A” response was more likely to shun the notion of flexible decision criteria to account for be followed by a high-confidence “B” response than by RT likelihood ratio functions. The assumption that RTs for a low-confidence “B” response). Because the source of correct responses are less variable than those for incorrect the decision noise was not crucial for our argument, the responses is admittedly ad hoc (although confirmed by DNM does not incorporate any sequential dependencies our data); it serves only to smooth the transition in the and we specifically did not attempt to account for any data likelihood ratios and could be implemented in process by assuming that a classification criterion shifts back and models with a number of reasonable mechanisms. forth on each trial depending on the previous response, as suggested by Balakrishnan and MacDonald. Balakrishnan and MacDonald calculated d ′ for tri- Constraints in the Likelihood Ratio Function Classical SDT predicts that the likelihood ratio should als conditioned on the prior classification response and equal 1.0 at the point between the two classification re- found small (,.06) differences within each base rate and sponses for unbiased responses and that it should deviate response (confidence rating vs. forced choice) condition. In particular, the overall d ′ values fell between the d ′ val- from 1.0 if the response is biased. The likelihood ratio at any point is equivalent to the slope of the ROC function ues obtained for “A” responses and for “B” responses, al- though the relative order of d ′ values was not consistent at that point (see Zhang & Mueller, 2005), and therefore any peak in the ROC function where the slope changes across base rate or response conditions. Balakrishnan and from below 1.0 to above 1.0 (see Mueller & Weidemann, MacDonald argue that, contrary to the data, our model should predict consistently higher d ′ values when con- 2008, Figures 9 and 12) corresponds to the point where the likelihood ratio crosses 1.0. Balakrishnan and Mac- ditioning on the previous response, because this should Donald (2008) argue that this point is rarely, if ever, found reduce decision noise. 1034 Weidemann and mueller 7 6 5 Frequency 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Ratio of Confidence Likelihood Ratios at Transition Through 1.0 Figure 3. Ratios of the smallest confidence likelihood ratio greater than 1 and the largest confidence likelihood ratio less than 1 for the data reported in Van Zandt (2000) and modeled with the decision noise model in Mueller and Weidemann (2008). The transition through 1.0 usually occurred (with some exceptions) between the two response classes. The ratio is between adjacent likelihood ratios except for rare cases when a likelihood ratio was exactly 1.0 (in such cases the ratio was taken between the two values flanking the 1.0 likelihood ratio). To the extent that these small differences in d ′ values provide important guidance and constraints for more ad- are reliable, their relative orders would indeed be difficult equate models of choice under uncertainty. to model, because it seems to interact with base rate and AuThOR NOTe response conditions. In particular, as Balakrishnan and MacDonald (2008) pointed out, these data seem at odds This research was supported in part by a postdoctoral fellowship to C.T.W. from the German Academic Exchange Service (DAAD). Corre- with a simple model that adjusts a decision criterion up spondence may be sent via e-mail to C. T. Weidemann, Department of or down on every trial depending on the previous clas- Psychology, University of Pennsylvania (ctw@cogsci.info). sification response. We do not have an account for these results, but we note that more complex effects of the previ- ReFeReNCeS ous response on a decision criterion (possibly contingent Balakrishnan, J. D. (1998a). Measures and interpretations of vigilance on whether or not the previous response was correct) may performance: Evidence against the detection criterion. Human Fac- be able to explain the small fluctuations in hit and false tors, 40, 601-623. alarm rates that give rise to these results. Balakrishnan, J. D. (1998b). Some more sensitive measures of sensi- tivity and response bias. Psychological Methods, 3, 68-90. Balakrishnan, J. D. (1999). Decision processes in discrimination: Discussion Fundamental misrepresentations of signal detection theory. Journal As we have shown above and previously (Mueller & of Experimental Psychology: Human Perception & Performance, 25, Weidemann, 2008), the indices of criterion shifts proposed 1189-1206. by Balakrishnan (1998a, 1998b, 1999) are not always able Balakrishnan, J. D., & MacDonald, J. A. (2008). Decision criteria do not shift: Commentary on Mueller and Weidemann (2008a). Psy- to detect such shifts in the presence of decision noise. We chonomic Bulletin & Review, 15, 1022-1030. have shown that decision noise does indeed seem to have Mueller, S. T., & Weidemann, C. T. (2008). Decision noise: An expla- a large influence on confidence rating and forced choice nation for observed violations of signal detection theory. Psychonomic responses (Mueller & Weidemann, 2008) and have argued Bulletin & Review, 15, 465-494. Van Zandt, T. (2000). ROC curves and confidence judgments in rec- that the RT results presented by Balakrishnan and Mac- ognition memory. Journal of Experimental Psychology: Learning, Donald (2008) are consistent with a high degree of noise Memory, & Cognition, 26, 582-600. in the mapping between internal states and RT for a two- Zhang, J., & Mueller, S. T. (2005). A note on ROC analysis and non- alternative forced choice response. parametric estimate of sensitivity. Psychometrika, 70, 203-212. As we stated previously (Mueller & Weidemann, 2008), NOTe we do not mean to argue in favor of wholesale acceptance of SDT or against sequential sampling models. With the 1. We use this term in the loose sense defined earlier (Mueller & DNM, we simply showed that the data are compatible with Weidemann, 2008, note 1) to refer to the basic paradigm in which two stimulus classes are discriminated and categorized into two classes (old flexible decision criteria that adapt to task contingencies or new, signal or noise, yes or no, A or B, etc.). (Mueller & Weidemann, 2008). To understand the pro- cesses involved in signal detection better, it is crucial to carefully analyze the extent and the limits of the implica- (Manuscript received May 5, 2008; tions of theory violations. The outcomes of such analyses revision accepted for publication July 19, 2008.)