Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 May;12(5):535-40.
doi: 10.1038/nn.2303.

Circular analysis in systems neuroscience: the dangers of double dipping

Affiliations
Review

Circular analysis in systems neuroscience: the dangers of double dipping

Nikolaus Kriegeskorte et al. Nat Neurosci. 2009 May.

Abstract

A neuroscientific experiment typically generates a large amount of data, of which only a small fraction is analyzed in detail and presented in a publication. However, selection among noisy measurements can render circular an otherwise appropriate analysis and invalidate results. Here we argue that systems neuroscience needs to adjust some widespread practices to avoid the circularity that can arise from selection. In particular, 'double dipping', the use of the same dataset for selection and selective analysis, will give distorted descriptive statistics and invalid statistical inference whenever the results statistics are not inherently independent of the selection criteria under the null hypothesis. To demonstrate the problem, we apply widely used analyses to noise data known to not contain the experimental effects in question. Spurious effects can appear in the context of both univariate activation analysis and multivariate pattern-information analysis. We suggest a policy for avoiding circularity.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Intuitive diagrams for understanding circular analysis
(a) The top row serves to remind us that our results reflect our data indirectly: through the lens of an often complicated analysis, whose assumptions are not always fully explicit. The bottom row illustrates how the assumptions (and hypotheses) can interact with the data to shape the results. Ideally (bottom left), the results reflect some aspect of the data (blue) without distortion (although the assumptions will determine what aspect of the data is reflected in the results). But sometimes (bottom center) a close inspection of the analysis reveals that the data get lost in the process and the assumptions (red) predetermine the results. In that case the analysis is completely circular (red dotted line). More frequently in practice (bottom right), the assumptions tinge the results (magenta). The results are then distorted by circularity, but still reflect the data to some degree (magenta dotted lines). (b) Three diagrams illustrate the three most common causes of circularity: selection (left), weighting (center), and sorting (right). Selection, weighting, and sorting criteria reflect assumptions and hypotheses (red). Each of the three can tinge the results, distorting the estimates presented and invalidating statistical tests, if the results statistics are not independent of the criteria for selection, weighting, or sorting.
Fig. 2
Fig. 2. Example 1: Data selection can bias pattern-information analysis
(a) In order to assess to what extent human inferior-temporal activity patterns reflect bottom-up sensory signals and top-down task constraints, we measured activity patterns with fMRI while subjects viewed object images of different categories and judged either whether the object shown was “animate” (task 1) or whether it was “pleasant” (task 2). (b) We selected all inferior-temporal voxels for which any two-sided t test contrasting two conditions was significant at p<0.001 (uncorrected for multiple tests). We then cleanly divided the data by using odd runs for training and even runs for testing. We used a linear classifier to determine whether the activity pattern would allow us to decode the stimulus category (light gray bars) and the judgment task (dark gray bars). Results (top left) suggested that both stimulus and task can be decoded with high accuracy, significantly above chance. However, application of the same analysis to Gaussian random data (top right), also suggested high decoding accuracies significantly above chance. This shows that spurious effects can appear when data from the test set is used in the initial data-selection process. Such spurious effects can be avoided by performing selection using data independent of the test data (bottom row). Error bars indicate +/−1 across-subject standard error of the mean. For details on experiment and analysis, see Example 1: Pattern-information analysis.
Fig. 3
Fig. 3. Example 2: ROI definition can bias activation analysis
A simulated fMRI block-design experiment demonstrates that nonindependent ROI definition can distort effects and produce spuriously significant results, even when the ROI is defined by rigorous mapping procedures (accounting for multiple tests) and highlights a truly activated region. Error bars indicate +/− 1 standard error of the mean. (a) The layout of this panel matches the intuitive diagrams of Fig. 1a: The data in Fig. 1a correspond to the true effects (left); the assumptions to the contrast hypothesis (top), and the results to ROI-average activation analyses (right). A 100-voxel region (blue contour in central slice map) was simulated to be active during conditions A and B, but not during conditions C and D (left). The t map for contrast A-D is shown for the central slice through the region (center). When thresholded at p<0.05 (corrected for multiple tests by a cluster threshold criterion), a cluster appears (magenta contour), which highlights the true activated region (blue contour). The ROI is somewhat affected by the noise in the data (difference between blue and magenta contours). The noise pushes some truly activated voxels below the threshold and lifts some nonactivated voxels above the threshold (white arrows). This can be interpreted as overfitting. The bar graph for the overfitted ROI (bottom right, same data as used for mapping) reflects the activation of the region during conditions A and B as well as the absence of activation during conditions C and D. However, in comparison to the true effects (left) it is substantially distorted by the selection contrast A-D (top). In particular, the contrast A-B (simulated to be zero) exhibits spurious significance (p<0.01). When we use independent data to define the ROI (green contour), no such distortion is observed (top right). For details on the simulation and analysis, see Example 2: Regional activation analysis in the text. (b) The simulation illustrates how data selection blends truth (left) and hypothesis (right) by distorting results (top) so as to better conform to the selection criterion.
Fig. 4
Fig. 4. A policy for noncircular analysis
This flow diagram suggests a procedure for choosing an appropriate analysis that avoids the pitfalls of circularity. Considering the most common errors (bottom left, red letter references) can help recognize circularity in assessing a given analysis. The gist of the policy is as follows: We first consider performing a nonselective analysis only. If selective analysis is needed and we can demonstrate that the results are independent of the selection criterion under the null hypothesis, then all data are used for selective analysis. If we cannot demonstrate this, then a split-data analysis can serve to ensure independence. (For details, see Supplementary Information, A policy for noncircular analysis.)

Comment in

  • Double-dipping revisited.
    Button KS. Button KS. Nat Neurosci. 2019 May;22(5):688-690. doi: 10.1038/s41593-019-0398-z. Nat Neurosci. 2019. PMID: 31011228 No abstract available.

Similar articles

Cited by

References

    1. Baker CI, Hutchison TL, Kanwisher N. Does the fusiform face area contain subregions highly selective for nonfaces? Nat Neurosci. 2007;10(1):3–4. - PubMed
    1. Simmons WK, Matlis S, Bellgowan PS, Bodurka J, Barsalou LW, Martin A. Imaging the context-sensitivity of ventral temporal category representations using high-resolution fMRI. Society for Neuroscience Abstracts. 2006
    1. Baker CI, Simmons WK, Bellgowan PS, Kriegeskorte N. Circular inference in neuroscience: The dangers of double dipping. Society for Neuroscience Abstracts. 2007 - PMC - PubMed
    1. Vul E, Kanwisher N. Begging the question: The non-independence error in fMRI data analysis. In: Hanson S, Bunzl M, editors. Foundations and Philosophy for Neuroimaging. in press. To appear in.
    1. Vul E, Harris C, Winkielman P, Pashler H. Perspectives on Psychological Science. Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition. in press. - PubMed

Publication types