FEDS Notes
June 26, 2025
How to Design Rules for Ex-Post Evaluation1
Benjamin S. Kay and Marco Migueis
Introduction
Ex-ante cost-benefit analyses and other impact assessments are now a standard part of the rulemaking process. Yet some important effects of regulation are difficult—or impossible—to assess before a rule takes effect. In such cases, ex-post (or retrospective) evaluation, conducted after a rule is in effect, offers an opportunity to measure real-world outcomes that could not reliably be predicted in advance. Such an evaluation can help determine whether a rule achieves its intended goals, whether the imposed costs are justified, and whether any unintended consequences have emerged. However, when rules are not designed with ex-post evaluation in mind, measuring certain impacts may be difficult or impossible in practice. This article discusses options to revise the rulemaking process to avoid this problem. Designing rules with ex-post evaluation in mind would facilitate such evaluations, ultimately leading to better rules.
One rationale for evaluating rules is that they can impose costs on the public. To promote efficiency and minimize unintended harm, policymakers may prefer rules that are simple where complexity is unnecessary, inexpensive where high costs can be avoided, and designed to avoid adverse side effects. Recognizing these tradeoffs, the U.S. Congress has required federal agencies to conduct impact analyses for certain rulemakings.2 In addition, a standing executive order mandates that agencies perform cost-and-benefit analyses before issuing major new rules.3, 4 The ex-ante assessments help ensure that regulatory costs are justified and prevent some of the least cost-effective rules from being implemented.
However, rules can prove more costly, yield fewer benefits, and produce more undesirable consequences than analysts anticipate in ex-ante assessments. These estimates are inherently uncertain, as they depend on future economic conditions and the reactions of regulated entities. Moreover, behavioral and political factors—such as bureaucratic optimism (Krause and Corder, 2007; Liu, Stoutenborough, and Vedlitz, 2016), ideologically motivated policymaking (Hogwood and Gunn, 1984; Prendergast, 2007), and regulatory capture by special interests (Dal Bo, 2006)—can cause analysts to overestimate the net benefits of new rules.
Current U.S. executive branch policy requires agencies to conduct retrospective analyses to support the periodic review of regulations.5 Similarly, an executive order issued by President Obama encourages independent agencies to conduct retrospective analysis—a recommendation that remains in effect.6 In addition, Section 610 of the Regulatory Flexibility Act mandates that federal agencies periodically review regulations that have a significant economic impact on a substantial number of small entities. Banking regulators are also required to regularly perform a systematic review of their regulations under the Economic Growth and Regulatory Paperwork Reduction Act (EGRPRA).
While these motivations for ex-post evaluation are evergreen, interest in such evaluations has grown in response to legal and political shifts over the past decade. During the first Trump administration, Executive Order 13771 (2017) required that "…any new incremental costs associated with new regulations shall, to the extent permitted by law, be offset by the elimination of existing costs associated with at least two prior regulations." Although the executive order was rescinded during the Biden administration, that rescission was itself reversed in the second Trump administration.7 Executive Order 14192 now requires that the "…total incremental cost of all new regulations, including repealed regulations, being finalized this year, shall be significantly less than zero."8 For policymakers aiming to meet such cost-reduction mandates, ex-post evaluation may help quantify regulatory cost changes and track progress toward aggregate cost targets.
Further contributing to a changing policy environment, oversight bodies are increasingly pressing federal agencies to provide stronger analytic justification for their rules. For example, guidance from the Office of Management and Budget (OMB) instructs agencies to quantify the total incremental costs or savings of each new and repealed significant regulatory action or guidance document.9 The instructions in this guidance amount to establishing a regulatory cost budget for each agency. If agencies are expected to meet cost targets for new rules, they may be incentivized to identify regulations with high costs and low benefits to repeal or replace. Because retrospective evaluations often yield more accurate estimates of realized costs than ex-ante projections, they can be useful in quantifying current burdens and informing regulatory cost budgeting.
Separate from this recent guidance on regulatory cost budgets, the OMB has historically encouraged agencies to plan for ex-post review of regulatory outcomes. In its 2017 annual report to Congress, OMB's Office of Information and Regulatory Affairs (OIRA) advised that "Rules should be written and designed to facilitate retrospective analysis of their effects, including consideration of the data that will be needed for future evaluation of the rules' ex-post costs and benefits."10 The currently effective OMB Circular A-4 (2003) similarly supports using retrospective analysis to assess whether a regulation's costs have changed after implementation. The now-rescinded 2023 version of Circular A-4 went further, stating that "…agencies may consider the benefits and costs of regulatory alternatives that would facilitate data collection to support future analyses or retrospective review. These alternatives may be especially valuable if there are significant uncertainties about benefits or costs, or if benefits or costs may change over time."
In practice, agencies are expected to estimate a rule's costs and benefits before issuance and, in some cases, to design regulations in ways that facilitate later evaluation of their realized impacts. When agencies seek to support their rules with empirical evidence in legal proceedings, ex-post analyses can help demonstrate whether predicted benefits and costs materialized.
The need to defend regulations in court may be growing. Two Supreme Court decisions in 2024 significantly increased the legal risks for federal rulemaking. In Corner Post, the Court held that the six-year statute of limitations under the Administrative Procedure Act begins when a regulated party is first subject to the rule, rather than when the rule is finalized, which had been the prior interpretation. In Loper Bright, the Court overturned the Chevron doctrine, which had required courts to defer to reasonable agency interpretations of ambiguous statutes. This decision shifted interpretive authority from agencies to the judiciary and may lead to greater scrutiny in litigation. While these are only two among many legal risks agencies face, ex-post evaluation may strengthen a rule's legal defensibility by providing evidence that its benefits outweigh its costs.
The Government Accountability Office (GAO) has consistently emphasized the importance of retrospective review in strengthening regulatory quality. In reports from 2007, 2014, and 2018, the GAO called for expanding or improving the use of ex-post evaluation across federal agencies.11 Despite such calls, challenges persist. For example, a 2024 GAO review of independent federal financial regulators found that they had conducted few retrospective reviews of their existing rules.12
Finally, some policymakers may view ex-post evaluation as a tool to promote public trust in the government. Demonstrating that regulatory decisions are reviewed and evidence-based can signal accountability and responsiveness. Research suggests that transparency, such as disclosing government performance, can positively influence citizens' perceptions of government.13
When conducted, ex-post evaluations can help identify rules that are underperforming or no longer justified. In some cases, this evidence may support repeal. In others, evidence of strong performance and low costs could justify expanding a rule's scope: for example, by extending it to additional firms, increasing its stringency, or replacing alternative policies based on demonstrated effectiveness. This article proposes strategies to improve the government's capacity to conduct meaningful ex-post evaluations, supporting such analysis in contexts where policymakers choose to pursue it.14
Facilitating ex-post evaluation
While ex-post evaluations are incredibly valuable for improving the regulatory environment, conducting them can be difficult. Key challenges include identifying an appropriate counterfactual, gathering data on outcomes, and attributing these outcomes to the rule itself. Fortunately, these challenges can be mitigated when regulations are designed and implemented with evaluation in mind. The sections that follow explore organizational and implementation choices that can support ex-post evaluation.15
Staffing rulemakings with an eye toward ex-post evaluation
Ex-post evaluation benefits from the early involvement of evaluation staff in the rulemaking process. Such staff can help identify opportunities to design rules in ways that support future evaluation. The marginal costs of such an assignment are likely to be modest. Under current practices, evaluation staff are often already engaged in the rulemaking process—making suggestions to reduce burdens and increase benefits. Asking them to also identify opportunities to support ex-post evaluation is unlikely to add significant cost.
However, embedding economic analysis staff in the rulemaking process poses tradeoffs. Their economic expertise can improve the quality of rules, but their participation may also reduce the independence of the evaluation process from policy development. The Securities and Exchange Commission (SEC) mitigates this issue by assigning different cost-benefit analysts participate to the drafting and evaluation phases of a rule—an approach that requires more personnel and increases agency costs.16 But staff responsible for ex-ante impact analysis are often already engaged in the rulemaking process. Asking them to identify opportunities to support ex-post evaluation is unlikely to further reduce their independence. Therefore, the incremental cost of this added responsibility is likely modest, particularly when compared with the broader compliance costs and regulatory burdens of major regulations.
To the extent that designing for ex-post evaluation leads to more such evaluations being conducted, costs may rise relative to the current baseline, where formal ex-post analyses are relatively uncommon. However, the cost of conducting these evaluations is typically modest compared to the potential benefits—such as improving the efficacy and efficiency of existing rules. Moreover, by designing rules to support evaluation, agencies may create natural experiments that external researchers can study at no additional cost to taxpayers. In some cases, this could reduce the incremental cost of evaluation to nearly zero. Policymakers should weigh these cost considerations alongside the potential political, procedural, and reputational benefits of ex-post evaluations.
Differentiated implementation and application of rules
Ex-post evaluation can be facilitated by differentiated application of rules across similar firms. Applying different rules to otherwise comparable entities creates natural treatment and control groups, improving the ability to identify regulatory effects.17 Two commonly used forms of differentiation that support such analyses are: (1) tiered application of rules based on firm size or other relevant criteria and (2) staggered implementation.18 In addition, while pure randomization of regulatory policies across firms is uncommon and poses several challenges, it has been used in public policy and will be discussed at the end of this section.
Many regulations apply only to firms that exceed a certain threshold of activity, leaving firms below that threshold exempt.19 This tiered approach can be appropriate because imposing stricter requirements on to certain entities, such as those that are larger and more complex, often yields greater benefits than applying the same rules to other entities.20 Also, regulatory compliance costs tend to increase less than proportionally with activity levels (Brock and Evans, 1985).21 However, tiering can also create incentives for firms to remain below regulatory thresholds to avoid compliance costs, which may discourage growth and reduce economic efficiency.22 In addition to these potential costs and benefits, tiering enables regulators to compare outcomes between firms above and below the threshold, helping to identify the effects of regulation.
Ex-post analyses that utilize regulatory thresholds have provided valuable insights into the effects of financial regulations. Examples include Acharya et al. (2018), who found that the introduction of stress testing for large banks led to reductions in credit to risky borrowers; Kay et al. (2014), who found that banks subject to the Durbin amendment experienced substantial declines in interchange income relative to exempt banks, though they offset more than 90 percent of the lost income through higher deposit fees);23 Alvero, Ando, Xiao (2023), who used bunching of banks around Dodd-Frank size thresholds to estimate the costs of the act's regulatory burdens; Bindal et al. (2017), who studied the effects of the $10 billion and $50 billion thresholds that trigger increased bank merger scrutiny; and Wood (2023), who examined how the 2005 amendments to the Federal Deposit Insurance Corporation Improvement Act and the Community Reinvestment Act affected bank liquidity creation. In addition, many studies use regulatory thresholds to evaluate non-financial regulations.24 These examples demonstrate that ex-post evaluation can provide insights at a limited marginal cost when rules apply differently across otherwise comparable firms.
Using tiered thresholds to identify the effects of regulations typically presents several challenges. Ideally, policy implementation would be unanticipated, preventing firms from strategically adjusting their behavior to remain out of scope—but this is rarely the case in practice. Still, reducing activity to avoid regulation can be costly, limiting firms' ability to respond.25 And while firms may have some discretion in staying above or below a threshold, ex-post comparisons between those either side of the cutoff can still shed light on how rules affect firm behavior and incentives.
Even when firms do not strategically avoid regulatory thresholds, anticipation effects may arise as firms adjust behavior well before formal implementation, in response to rules they expect to face because of speeches, draft proposals, investor pressure, or other signals. Such adjustments are especially likely when compliance is complex, time-consuming, and costly. A firm may also adjust in response to rules that currently apply only to others, anticipating that similar requirements will extend to them in the future. These dynamics complicate causal inference by blurring the timing of firm responses. A related phenomenon is the announcement effect, in which other entities respond immediately when information about rules is publicly disclosed. For instance, securities prices may adjust on the day a regulation is announced, regardless of when it takes effect. Both types of effects can complicate causal inference by shifting behavior away from the formal implementation date, but they differ in timing, observability, and the clarity of the policy signal that drives the response.
Another challenge in using tiered requirements to identify policy effects is that certain thresholds trigger multiple regulations simultaneously. For example, under the Dodd-Frank Act, crossing the $50 billion asset threshold subjects banks to several prudential standards, including enhanced stress testing, liquidity, and risk management requirements.26 Because these requirements were implemented concurrently, isolating their individual effects is difficult. Nevertheless, even when multiple rules are triggered at once, comparing firms above and below the threshold can provide useful insights—especially when estimating the most direct and distinct effects of a requirement, such as the impact of liquidity rules on holdings of high-quality liquid assets. In addition, when new requirements are layered onto an existing threshold, it may be possible to identify their marginal impact if the new rule was not anticipated at the time threshold was originally introduced. For example, while the 2010 Dodd-Frank Act established prudential liquidity requirements for large firms, a key element of the liquidity framework—the net stable funding ratio—was proposed in 2016, finalized in February 2021, and become effective in July 2021.
Tiered application of regulations can support a long-term strategy of expanding regulations that demonstrate positive net benefits and scaling back or eliminating those that do not. Policymakers can (1) apply a rule initially to a narrow set of firms, (2) conduct an ex-post evaluation of the rule's costs and benefits, and (3) adjust the scope of the rule based on the findings.27 Arguably, certain expansions of banking requirements have followed this pattern. For instance, the long-term debt requirement for large banking organizations proposed in 2023 was influenced by the perceived successes of similar mandates for global systemically important banks introduced in 2016.28 Still, regulators could benefit from applying this approach more consistently, regularly reassessing whether to expand or narrow regulations in response to ex-post evidence.
Staggered implementation, whether based on predetermined criteria or random assignment, is another common feature of rules that can support ex-post evaluation.29 Like tiering, applying a rule to similar firms at different times enables comparisons between early and late adopters, helping to assess a rule's impact. Therefore, regulators may benefit from using staggered implementation more often to evaluate the effectiveness of new policies.
In some cases, staggered implementation has been deliberately designed to test policy effects. For instance, the SEC used staggered rollout in 2000 to evaluate the decimalization of stock market prices in 2000, and the Internal Revenue Service (IRS) employed a staggered approach to distribute stimulus payments during the Great Recession.30, 31 Thanks to these clever implementation strategies, we now know far more about the consequences of these policies.32 In other cases, staggering implementation has resulted from practical considerations, such as prioritizing the most relevant entities or varying levels of readiness. For example, the Financial Industry Regulatory Authority's (FINRA) Trade Reporting and Compliance Engine (TRACE), mandated by the SEC for broker-dealer trade reporting, was initially introduced in 2002 for investment grade corporate bonds with large issuances. Over time, its scope was expanded to cover a broader range of securities, including those with smaller issuances and lower liquidity. This gradual rollout enabled researchers to assess TRACE's effects by comparing securities subject to the requirement with those that remained exempt.33
Staggered implementation has also enabled policy evaluation in cases where the staggering was not intended for that purpose. For example, Bremer and Sommer (2025) evaluated the costs and benefits of the European emission trading regime using incidental variation in implementation timing. Similarly, Barreca, Neidell, and Sanders (2021) used staggered rollout to estimate the benefits of the U.S. Acid Rain Program. Comparable setups exist in U.S. policies on long-term care staffing and overtime protections, but, to our knowledge, no ex-post evaluations have yet exploited this variation.34
Staggered implementation has also been employed in bank regulation and by quasi-regulatory organizations in the private sector. Notable examples include the current expected credit losses (CECL) accounting standard,35 stress-testing requirements for large banks,36 and the liquidity coverage ratio (LCR).37 These staggered rollouts have enabled valuable ex-post analyses. Some examples include Loudis et al. (2021), who assessed the impact of early CECL adoption during the COVID-19 stress and found that reserves were substantially more responsive to the economic outlook among early adopters; and Banerjee and Mio (2018), who examined the staggered rollout of liquidity regulation in the UK and its effects on bank balance sheets.
Staggered implementation also has limitations as a tool for policy evaluation. Because the units serving as controls are typically treated within a short time frame, often a year or two, analyses based on staggered rollout are more informative for short-term outcomes than for outcomes that depend on the long-term state of the regulation. For example, the decimalization of stock prices allowed researchers to assess impact on trading costs, which materialized quickly. By contrast, most banking regulations affect outcomes over longer horizons, and firms may begin adjusting their behavior in anticipation of rules before they take effect. Also, just as with identifying a regulation's effect through tiered application, possible anticipation and announcement effects limit how much staggered implementation can reveal about a regulation's true impact. For these reasons, evaluations based on staggered implementation require caution in interpreting results and modesty about the strength and generalizability of their findings.
Although the staggered implementations of TRACE and CECL were not designed for evaluation purposes, they enabled valuable ex-post policy analysis. Like tiering, staggered implementation supports the evaluation of regulatory effectiveness and can inform iterative improvements. Problematic or costly implementations can be slowed or abandoned, while successful ones can be accelerated and expanded when early evidence indicates strong positive effects.38
A more modest alternative to randomized staggered implementation is the use of small-scale pilot programs, in which a regulation is applied, potentially on a voluntary basis, to a limited number of firms for a defined period. These pilots can reduce the cost of experimentation while providing useful comparisons between treated and untreated firms. The SEC has conducted several such pilots, including during the rollout of TRACE, the "up-tick" pilot (which removed a short-sale price test), and the "tick-size" pilot (which increased the minimum price increment for stocks from one to five cents). These experiments have generated multiple studies evaluating the effectiveness of the associated regulatory changes.39 However, the voluntary nature of many pilot programs can limit their value for casual inference (see Harris et al. (2021) for a discussion of these limitations in financial regulation).
Beyond short-term randomized staggering, such as the decimalization of stock prices or the IRS stimulus payments, a long-term strategy for evidence-based regulation would involve more open-ended policy experiments using randomization.40 Randomly assigning treatments to comparable units remains the gold standard for identifying causal effects, which is why it has long been used in clinical trials. Experimental economics has similarly used controlled trials to explore economic questions in lab-like environments. Although fully controlled trials are rarely feasible in the context of firm regulation, regulators may still seek opportunities to adopt similar approaches where possible.
Policy randomization can result in unequal treatment among regulated entities. Still, society accepts randomization in high-stakes decisions—such as the military draft, jury selection, and clinical trials—when randomization is grounded in ex-ante fairness.41, 42 Governments have also used randomization to evaluate the effectiveness of anti-poverty programs and the benefits of Medicare coverage.43
Credible evidence on policy effectiveness can require accepting some temporary inequities among regulated entities or individuals; this tradeoff may be justified when the resulting knowledge has the potential to substantially improve long-run outcomes. This does not mean that arbitrarily large inequities should be introduced solely for the sake of experimentation. Instead, the potential benefits of policy experimentation should be weighed against the potential costs associated with any resulting inequities. In general, policy randomization is more ex-ante justifiable when there is greater uncertainty about a policy's effects (including on the bottom line of regulated entities) and when the costs of unequal treatment are relatively low.
Bank supervision is one area where policy randomization could be used to learn more. Given that appropriate baseline levels of supervision are in place, the incremental costs and benefits of different supervisory approaches could be studied through randomized assignment. For example, supervisors could test the efficacy of varying examination frequencies, team sizes and compositions, triggers for examinations, or areas of focus. Over time, such experimentation could help regulatory agencies allocate their limited resources more efficiently.
The socially optimal scope of a regulation may not align with the scope that best enables evaluation of its effects. While employing a suboptimal scope for the sake of evaluation may involve some theoretical cost, there is generally little empirical evidence that current scopes of application are optimally efficient. Therefore, experimentation with regulatory scope may be feasible without demonstrable loss of efficiency at the outset.
Even with greater use of randomization, pilot programs, tiering, and staggering, ex-post evaluation can be difficult for certain policies. As discussed earlier, the application of multiple regulatory requirements at the same threshold can challenge efforts to isolate individual effects. These challenges highlight the importance of designing policies with evaluation in mind: by embedding variation in implementation, improving opportunities for measurement, and reducing overlap across rules. Still, in practice, stakeholders may struggle to agree on a structure that is both legally defensible and cost-effective while also supporting rigorous ex-post analysis.
When policies are implemented in ways that preclude meaningful evaluation, more fundamental concerns arise, including ethical ones. For example, Campbell (1969) argued, "To be truly scientific we must be able to experiment. We must be able to advocate without that excess of commitment that blinds us to reality testing." Similarly, Boruch (1975) warned that, "[A] failure to discover whether a program is effective is unethical. Insofar as a failure to obtain unequivocal data on effects then leads to decisions which are wrong and ultimately damaging, that failure may violate good standards of both social and professional ethics." Or as Karl Popper put it, "… we make progress, if and only if, we are prepared to learn from our mistakes: to recognize our errors and to utilize them critically instead of persevering in them dogmatically." (Popper 1957). Careful examination of policy effects helps ensure that government decisions are responsible, accountable, and guided by evidence.
Dual reporting
Identifying the costs and benefits of rules requires more than clear identification of treatment and control groups—it also depends on having good data, a concern sometimes overlooked in the rule making process. For example, the definitions of capital and risk-weighted assets have changed several times since the Dodd-Frank Act. Yet regulatory reporting typically requires firms to provide data only under the latest definitions, making it difficult to measure whether changes in reported capital or assets reflect changes in behavior or shifts in definitions.
New rules can avoid this issue by requiring firms to report data using both old and new definitions for a limited time. This approach is similar to how banks report the value of securities under both mark-to-market and amortized cost valuations. Such parallel data proved invaluable in understanding the banking problems that emerged in March 2023, helping regulators identify banks with significant unrealized losses from interest rates changes that were not reflected in regulatory capital. Similarly, when revising questions in its Current Population Survey, the Census Bureau asks the old and new versions of a question to maintain consistency and data usability over time.44
Dual reporting can also be useful when the scope of a rule is being narrowed. Maintaining data from firms that were previously covered can help evaluate whether regulatory simplifications and burden reductions achieve their intended outcomes.
Failing to collect overlapping data often creates problems. Famously, the Census Bureau changed its annual health insurance survey at the same time the Affordable Care Act was implemented, making it much harder to evaluate the law through ex-post analysis.45 Another example was the Bureau of Labor Statistics' 1994 introduction of the concept of discouraged workers and other major changes to the U-4 through U-6 unemployment measures.46 These changes made comparisons with earlier data more difficult and complicated analysis of laws such as the Family and Medical Leave Act of 1993 and the Violent Crime Control and Law Enforcement Act of 1994.
Dual reporting would temporarily raise compliance costs. Still, when reporting under old definitions can rely on existing systems, the marginal costs may be low. In such cases, the potential benefits of dual reporting are likely to outweigh the costs.
Conclusion
Effective ex-post evaluation of rules can offer substantial benefits. This paper outlines two rule design strategies to support such evaluation. First, expanding the use of tiering, randomization, and staggered implementation can improve the identification of a rule's effects by creating treatment and control groups. Table 1 illustrates several applications of these techniques in public policy, showcasing how they can yield insights across a range of settings. Second, dual-reporting requirements—where both new and legacy data are collected during a transitional window—help distinguish the true effects of rules from changes in measurement. In addition, analysts tasked with ex-post evaluation could be brought into the rulemaking process early to help identify ways to enable meaningful evaluation. Once designing rules for ex-post evaluation becomes standard practice, policymakers and researchers are likely to discover more tools to support it. With a broader toolkit, many more ex-post evaluations will become possible.
Table 1. Selected Examples of Ex-Post Evaluations Using Randomization, Tiering, and Staggering
Category | Example | Details/Implications | Sources |
---|---|---|---|
Tiering | Dodd-Frank size thresholds for banking regulations | Size thresholds influence regulatory compliance, allowing study comparisons of firms near the threshold | Acharya et al. (2018), Alvero, Ando, and Xiao (2023) |
Tiering | Stress testing for large banks | Stress testing led to reduced credit to risky borrowers, aiding in understanding regulation effects | Acharya et al. (2018) |
Tiering | Durbin amendment effect on banks | The effect of the Durbin amendment on interchange income and banks’ offsetting strategies | Kay et al. (2014) |
Tiering | Bank merger regulatory thresholds ($10 billion and $50 billion) | Assessed the effects of increased scrutiny on bank mergers | Bindal et al. (2017) |
Tiering | Bank liquidity creation effects of FDICIA/CRA amendments | Amendments to banking acts provided a basis to study liquidity changes | Wood (2023) |
Staggering | Decimalization of stock market prices by the SEC | Consequences of decimalization on market health and function | Chung et al. (2004) |
Staggering | IRS distribution of stimulus payments during the Great Recession | Economic effects of fiscal stimulus from transfer payments | Parker et al. (2013) |
Staggering | FINRA's TRACE implementation for corporate bonds | The effects of disclosure on bond-market functioning | Bessembinder, Maxwell, and Venkataraman (2006), Goldstein, Hotchkiss, and Sirri (2007) |
Staggering | CECL accounting standard adoption and its COVID-19 impact assessment | Assessed reserve responsiveness during economic stress periods | Loudis et al. (2021) |
Staggering | Effect of liquidity regulation in the UK | Assessed the effect on the balance sheet of banks | Banerjee and Mio (2018) |
Staggering | Effect of European Union emissions trading regime | Assessed the effects of the emissions trading on firms’ economic performance. | Bremer and Sommer (2025) |
Staggering | US Acid Rain Program | The Acid Rain Program (ARP) caused a dramatic decline in SO2 emissions reduced cardiorespiratory mortality. | Barreca, Neidell, and Sanders (2021) |
Randomization | Stock tick size pilot program | Effects of tick and trading increments on stock liquidity | Chung, Lee, and Rösch (2020), Werner et al. (2022) |
Randomization | Vietnam draft lottery for military service | Randomized draft assignments illuminated political, social, and economic outcomes | Erikson and Stoker (2009), Angrist (1990) |
Randomization | Mexican Progresa / Oportunidades program | Randomized access to conditional cash transfers | Parker and Todd (2017) |
Randomization | Clinical trials for new medicines | Randomization provides gold-standard causal identification for treatment effects | Crofton and Mitchison (1948), Meldrum (2000) |
Randomization | Oregon’s 2008 Medicaid expansion | Randomized access to Medicaid insurance from a waitlist of eligible households, show the effect of insurance coverage on household wellbeing | Allen et al (2010), DeVoe et al (2015) |
Many have argued that policymakers may resist this approach, worrying they have more to lose by exposing poor performing policies than they have to gain from highlighting successful ones (Wolf, 1979; Hogwood and Gunn, 1984; Twight, 1991; Weiss, 1993; Rossi, 1987; Hanson, 2003; Prendergast, 2003; Prendergast, 2007; and Bovens, Hart, and Kuipers, 2009). Policymakers may also be concerned about the costs of ex-post evaluations or worry that designing for evaluation could slow down the rulemaking process, delaying public access to well-crafted policies. Still, the benefits of strengthening ex-post evaluation are worth serious consideration. The regulatory community has managed the discomforts of ex-ante impact analyses by normalizing them. Today, nearly every major rule includes such an analysis, offering valuable information to both policymakers and the public. Similarly, normalizing ex-post evaluation could reduce the stigma of admitting that some rules underperform, and in turn encourage more independent, timely, and accurate evaluations. Better ex-post evaluation would help the public understand the real impacts of government policy—and support reform or repeal of rules that fall short.
Reference
Abramowicz, Michael, Ian Ayres, and Yair Listokin. 2011. "Randomizing Law." University of Pennsylvania Law Review 159 (4): 929–1005. .
Acharya, Viral V., Allen N. Berger, and Raluca A. Roman. 2024. "Lending Implications of U.S. Bank Stress Tests: Costs or Benefits?" Journal of Financial Intermediation 34: 58–90. www.sciencedirect.com/science/article/pii/S104295731830010X.
Alessandro, Martín, Bruno Cardinale Lagomarsino, Carlos Scartascini, and Jerónimo Torrealday. 2019. "Transparency and Trust in Government: Evidence from a Survey Experiment." Paper presented at the 2nd IZA Workshop: Gender and Family Economics.
Alvero, Adrien, Sakai Ando, and Kairong Xiao. 2023. "Watch What They Do, Not What They Say: Estimating Regulatory Costs from Revealed Preferences." Review of Financial Studies 36 36 (6): 2224–2273.academic.oup.com/rfs/article/36/6/2224/6873755.
Allen, Heidi, Katherine Baicker, Amy Finkelstein, Sarah Taubman, Bill J. Wright, and the Oregon Health Study Group. 2010. "What the Oregon Health Study Can Tell Us About Expanding Medicaid." Health Affairs 29 (8): 1498–1506.
Angrist, Joshua D. 1990. "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records." American Economic Review 80 (3): 313–336. . Banerjee, Ryan N., and Hitoshi Mio. 2018. "The Impact of Liquidity Regulation on Banks." Journal of Financial Intermediation 35: 30–44. dx.doi.org/10.1016/j.jfi.2017.05.008.
Barreca, Alan I., Matthew Neidell, and Nicholas J. Sanders. 2021. "Long-run pollution exposure and mortality: Evidence from the Acid Rain Program." Journal of Public Economics 200: Article 104440. www.sciencedirect.com/science/article/pii/S0047272721000761.
Bessembinder, Hendrik, William Maxwell, and Kumar Venkataraman. 2006. "Market Transparency, Liquidity Externalities, and Institutional Trading Costs in Corporate Bonds." Journal of Financial Economics 82 (2): 251–288. www.sciencedirect.com/science/article/pii/S0304405X06000699.
Bindal, Shradha, Christa H.S. Bouwman, Shuting Sophia Hu, and Shane A. Johnson. 2017. "Regulatory Size Thresholds and Merger and Acquisition Behavior." Mays Business School Research Paper No. 2974303. papers.ssrn.com/sol3/papers.cfm?abstract_id=2974303.
Board of Governors of the Federal Reserve System. 2009. Supervisory Capital Assessment Program: Design and Implementation (PDF)., April 24.
Board of Governors of the Federal Reserve System. 2013. Dodd-Frank Act Stress Testing. March 28.
Boruch, Robert F. 1975. On common contentions about randomized field experiments. In Experimental testing of public policy: The Proceedings of the 1974 Social Sciences Research Council Conference on Social Experimentation, edited by R.F. Boruch and H.W. Reicken, 107-142. Boulder, CO: Westview Press.
Bovens, Mark, Paul 't Hart, and Sanneke Kuipers. 2009. "The Politics of Policy Evaluation." In The Oxford Handbook of Public Policy. doi.org/10.1093/oxfordhb/9780199548453.003.0015.
Bremer, Leon, and Konstantin Sommer. 2025. "Economic performance and investments under emissions trading: Untangling the effects of a staggered regulation." Energy Economics 142: Article 108170. www.sciencedirect.com/science/article/pii/S014098832400879X.
Brock, William A., and David S. Evans. 1985. "The Economics of Regulatory Tiering." Rand Journal of Economics 16 (3): 398–409.
Bureau of Labor Statistics. 1995. BLS Introduces New Range of Alternative Unemployment Measures (PDF). October.
Campbell, D.T. 1969. "Reforms as experiments." American Psychologist 24 (4): 409-429. doi.org/10.1037/h0027982.
Census Bureau. 2020. "How a Question Becomes Part of the ACS (PDF)." In Understanding and Using American Community Survey Data: What Federal Agencies Need to Know.
Chakravarty, Sugato, Stephen P. Harris, and Robert Wood. 2001. "Decimal Trading and Market Impact." SSRN Working Paper No. 266877. papers.ssrn.com/sol3/papers.cfm?abstract_id=266877.
Chung, Kee H., Bonnie F. Van Ness, and Robert A. Van Ness. 2004. "Trading Costs and Quote Clustering on the NYSE and NASDAQ after Decimalization." Journal of Financial Research 27 (3): 309–328. onlinelibrary.wiley.com/doi/full/10.1111/j.1475-6803.2004.00096.x.
Congressional Research Service. 2017. Tailoring Bank Regulations: Differences in Bank Size, Activities, and Capital Levels. CRS Report No. R45051. December.
Congressional Research Service. 2024. Cost-Benefit Analysis in Federal Agency Rulemaking. CRS In Focus No. IF12058. October 28.
Corner Post, Inc. v. Board of Governors of the Federal Reserve System, 603 U.S. 799 (2024). Supreme Court of the United States, 1 July 2024 (PDF).
Crofton, John, and D.A. Mitchison. 1948. "Streptomycin Resistance in Pulmonary Tuberculosis." British Medical Journal 2 (4588): 1009.
de Jong, Gerard, Silvia Vigneti, and Chiara Pancotti. 2020. "Ex-Post Evaluation of Major Infrastructure Projects." Transportation Research Procedia 42: 75–84. doi.org/10.1016/j.trpro.2019.12.008.
Dal Bó, Ernesto. 2006. "Regulatory capture: A review." Oxford review of economic policy 22 (2): 203–225. academic.oup.com/oxrep/article/22/2/203/334718.
Department of Energy. 2011. Final Plan for Retrospective Analysis of Existing Rules (PDF). Department of Energy. August 23.
DeVoe, Jennifer E., Miguel Marino, Rachel Gold, Megan J. Hoopes, Stuart Cowburn, Jean P. O'Malley, John Heintzman, et al. 2015. "Community Health Center Use after Oregon's Randomized Medicaid Experiment." The Annals of Family Medicine 13 (4): 312–320.
Erikson, Robert S., and Laura Stoker. 2009. "Vietnam Draft Lottery Status and Political Attitudes (PDF)." Annual meeting of the Midwest Political Science Association, Chicago, IL.
Executive Office of the President. 1993. Executive Order 12866 of September 30, 1993: Regulatory Planning and Review (PDF). Federal Register, October 4.
Executive Office of the President. 2011. Executive Order 13563 of January 18, 2011: Improving Regulation and Regulatory Review. Federal Register, January 21.
Executive Office of the President. 2011. Executive Order 13579 of July 11, 2011: Regulation and Independent Regulatory Agencies. Federal Register, July 14.
Executive Office of the President. 2017. Executive Order 13771 of January 30, 2017: Reducing Regulation and Controlling Regulatory Costs. Federal Register, February 3.
Executive Office of the President. 2017. Guidance Implementing Executive Order 13771, Titled "Reducing Regulation and Controlling Regulatory Costs (PDF)". Memorandum M-17-21. Office of Management and Budget, April 5.
Executive Office of the President. 2021. Executive Order 13992 of January 20, 2021: Revocation of Certain Executive Orders Concerning Federal Regulation. Federal Register, January 25.
Executive Office of the President. 2025. Executive Order 14189 of January 25, 2025: Initial Rescissions of Harmful Executive Orders and Actions. Federal Register, January 28.
Executive Office of the President. 2025. Executive Order 14192 of January 31, 2025: Unleashing Prosperity Through Deregulation. Federal Register, February 6.
Executive Office of the President. 2025. Executive Order 14215 of February 18, 2025: Ensuring Accountability for All Agencies. Federal Register, February 24.
Federal Reserve System. 2016. Total Loss-Absorbing Capacity, Long-Term Debt, and Clean Holding Company Requirements for Systemically Important U.S. Bank Holding Companies and Intermediate Holding Companies of Systemically Important Foreign Banking Organizations (PDF). Federal Register, Docket No. R-1523, RIN 7100-AE37. December.
Financial Accounting Standards Board. 2016. Accounting Standards Update (ASU) 2016-13, Financial Instruments—Credit Losses (Topic 326). Financial Accounting Foundation.
Garicano, Luis, Claire Lelarge, and John Van Reenen. 2016. "Firm Size Distortions and the Productivity Distribution: Evidence from France." American Economic Review 106 (11): 3439–3479.
Goldstein, Michael A., Edith S. Hotchkiss, and Erik R. Sirri. 2007. "Transparency and Liquidity: A Controlled Experiment on Corporate Bonds." Review of Financial Studies 20 (2): 235–273. academic.oup.com/rfs/article/20/2/235/1573567.
Government Accountability Office. 2007. Reexamining Regulations: Opportunities Exist to Improve Effectiveness and Transparency of Retrospective Reviews. July 16.
Government Accountability Office. 2014. Reexamining Regulations: Agencies Often Made Regulatory Changes, but Could Strengthen Linkages to Performance Goals. April 11.
Government Accountability Office. 2018. Bank Secrecy Act: Derisking along the Southwest Border Highlights Need for Regulators to Enhance Retrospective Reviews (PDF). February 26.
Government Accountability Office. 2024. "Financial Services Regulations—Improvements Needed to Policies and Procedures for Regulatory Analysis." GAO Report No. GAO-24-106206. July.
Gramlich, John. 2017. "Jury Duty Is Rare, but Most Americans See It as Part of Good Citizenship." Pew Research Center, August 24.
Hanson, Robin. 2003. "Warning Labels as Cheap-Talk: Why Regulators Ban Drugs." Journal of Public Economics 87 (9–10): 2013–2029. doi.org/10.1016/S0047-2727(01)00223-7.
Harris, Larry, Charles Kahn, Robert McDonald, and Chester Spatt. 2019. The Role of Pilot Studies in Financial Regulation. SSRN Working Paper No. 3629339. October 25. ssrn.com/abstract=3629339.
Herrick, Charles, and Daniel Sarewitz. 2000. "Ex Post Evaluation: A More Effective Role for Scientific Assessments in Environmental Policy." Science, Technology, & Human Values 25 (3): 309–331.
Hogwood, Brian W., and Lewis A. Gunn. 1984. Policy Analysis for the Real World. Oxford: Oxford University Press.
Kay, Benjamin S., Mark D. Manuszak, and Cindy M. Vojtech. 2018. "Competition and Complementarities in Retail Banking: Evidence from Debit Card Interchange Regulation." Journal of Financial Intermediation 34: 91–108. www.sciencedirect.com/science/article/pii/S1042957318300184.
Kling, Arnold. 2020. "Micro Experiments and Macro Experiments." askblog, May 1.
Kovacic, William E. 2006. "Using Ex Post Evaluations to Improve the Performance of Competition Policy Authorities." Journal of Corporation Law 31: 503–547.
Krause, George A., and J. Kevin Corder. 2007. "Explaining Bureaucratic Optimism: Theory and Evidence from U.S. Executive Agency Macroeconomic Forecasts." American Political Science Review 101 (1): 129–142.
Labonte, Marc, and David W. Perkins. 2017. Bank Systemic Risk Regulation: The $50 Billion Threshold in the Dodd-Frank Act (PDF). Congressional Research Service.
Leonardi, Marco, and Giovanni Pica. 2006. "Effects of Employment Protection Legislation on Wages: A Regression Discontinuity Approach (PDF)." University of Milan.
Leuz, Christian, and Peter D. Wysocki. 2016. "The economics of disclosure and financial reporting regulation: Evidence and suggestions for future research." Journal of Accounting Research 54.2: 525-622. onlinelibrary.wiley.com/doi/full/10.1111/1475-679X.12115.
List, John. A. 2024. "Optimally Generate Policy-Based Evidence before Scaling." Nature 626: 491–499. www.nature.com/articles/s41586-023-06972-y.
Lorenc, Amy G., and Jeffery Y. Zhang. 2020. "How Bank Size Relates to the Impact of Bank Stress on the Real Economy." Journal of Corporate Finance 62: 101592, doi.org/10.1016/j.jcorpfin.2020.101592.
Loudis, Bert, Sasha Pechenik, Ben Ranish, Cindy M. Vojtech, and Helen Xu. 2021. "New Accounting Framework Faces Its First Test: CECL during the Pandemic." FEDS Notes, December 3.
Meldrum, Marcia L. 2000. "A Brief History of the Randomized Controlled Trial: From Oranges and Lemons to the Gold Standard." Hematology/Oncology Clinics of North America 14, no. 4: 745–760.
Mueller, Paul S., Victor M. Montori, Dirk Bassler, Barbara A. Koenig, and Gordon H. Guyatt. 2007. "Ethical Issues in Stopping Randomized Trials Early Because of Apparent Benefit." Annals of Internal Medicine 146, no. 12: 878–881.
Office of Management and Budget. 2003. Circular A-4: Regulatory Analysis. Executive Office of the President. www.regulationwriters.com/downloads/Circular-A-4.pdf.
Office of Management and Budget. 2017. 2017 Report to Congress on the Benefits and Costs of Federal Regulations and Agency Compliance with the Unfunded Mandates Reform Act (PDF). Executive Office of the President.
Office of Management and Budget. 2023. "Circular A-4: Regulatory Analysis (PDF)." Executive Office of the President.
Office of the Comptroller of the Currency, Federal Reserve System, and Federal Deposit Insurance Corporation. 2014. "Liquidity Coverage Ratio: Liquidity Risk Measurement Standards." Federal Register, vol. 79, no. 197, October 10.
Office of the Comptroller of the Currency, Federal Reserve System, Federal Deposit Insurance Corporation. 2016. "Net Stable Funding Ratio: Liquidity Risk Measurement Standards and Disclosure Requirements." Federal Register, June 1.
Office of the Comptroller of the Currency, Federal Reserve System, Federal Deposit Insurance Corporation. 2021. "Net Stable Funding Ratio: Liquidity Risk Measurement Standards and Disclosure Requirements." Federal Register, February 11.
Office of the Comptroller of the Currency, Federal Reserve System, Federal Deposit Insurance Corporation. 2023. "Long-Term Debt Requirements for Large Bank Holding Companies, Certain Intermediate Holding Companies of Foreign Banking Organizations, and Large Insured Depository Institutions." Federal Register, vol. 88, no. 64524, September 19.
Loper Bright Enterprises v. Raimondo, 603 U.S. 369 (2024). Supreme Court of the United States, 28 June 2024 (PDF).
Liu, Xinsheng, James W. Stoutenborough and Arnold Vedlitz. 2017. "Bureaucratic expertise, overconfidence, and policy choice." Governance 30: 705-725. onlinelibrary.wiley.com/doi/abs/10.1111/gove.12257.
Meyer, Breed D. 1995. "Natural and quasi-experiments in economics." Journal of Business & Economic Statistics 13.2: 151-161. www.tandfonline.com/doi/abs/10.1080/07350015.1995.10524589
Parker, Jonathan A., Nicholas S. Souleles, David S. Johnson, and Robert McClelland. 2013. "Consumer Spending and the Economic Stimulus Payments of 2008." American Economic Review 103, no. 6: 2530–2553.
Parker, Susan W., and Petra E. Todd. "Conditional cash transfers: The case of Progresa/Oportunidades." Journal of Economic Literature 55.3 (2017): 866-915.
Popper, Karl. 1957. The Poverty of Historicism. Routledge. doi.org/10.4324/9780203538012.
Prendergast, Canice. 2003. "The Limits of Bureaucratic Efficiency." Journal of Political Economy 111, no. 5: 929–958.
Prendergast, Canice. 2007. "The Motivation and Bias of Bureaucrats." American Economic Review 97, no. 1: 180–196.
Rawls, John. 1971. A Theory of Justice. Belknap Press of Harvard University Press.
Rossi, Peter H. 1987. "The Iron Law of Evaluation and Other Metallic Rules (PDF)." Research in Social Problems and Public Policy, vol. 4, no. 1, pp. 3-20.
Securities and Exchange Commission. 2000. "Commission Notice: Decimals Implementation Plan for the Equities and Options Markets." Exchange Committee on Decimals, July 24.
Silverman, William A., and Iain Chalmers. 2001. "Casting and Drawing Lots: A Time Honoured Way of Dealing with Uncertainty and Ensuring Fairness." BMJ 323, no. 7327: 1467–68.
Tavernise, Sabrina. 2014. "Census Survey Revisions Mask Health Law Effects." New York Times, April 15. www.nytimes.com/2014/04/16/us/politics/census-survey-revisions-mask-health-law-effects.html.
Twight, Charlotte. 1991. "From Claiming Credit to Avoiding Blame: The Evolution of Congressional Strategy for Asbestos Management." Journal of Public Policy, vol. 11, no. 2, pp. 153–86.
U.S. Congress. 1980. Regulatory Flexibility Act, Public Law 96-354 (PDF), 94 Stat. 1164.
U.S. Congress. 1995. Unfunded Mandates Reform Act of 1995 (PDF), Public Law 104-4, 109 Stat. 48.
U.S. Congress. 2008. Economic Stimulus Act of 2008, Public Law 110-185, 122 Stat. 613.
U.S. Congress. 2010. Dodd-Frank Wall Street Reform and Consumer Protection Act (PDF), Public Law 111-203, 124 Stat. 1376.
U.S. Congress. 2018. Economic Growth, Regulatory Relief, and Consumer Protection Act, Public Law 115-174, 132 Stat. 1296.
Vaccaro, Giannina. 2018. "Using Econometrics to Reduce Gender Discrimination: Evidence from a Difference-in-Discontinuity Design (PDF)." Paper presented at the 2nd IZA Workshop: Gender and Family Economics.
Wang, Qiushi, and Zhenghui Guan. 2022. "Can Sunlight Disperse Mistrust? A Meta-Analysis of the Effect of Transparency on Citizens' Trust in Government." Journal of Public Administration Research and Theory.
Weiss, Carol H. 1993. "Where Politics and Evaluation Research Meet." American Journal of Evaluation. doi.org/10.1177/109821409301400119.
Weldon, Kathleen. 2017. "Suppose They Gave a War and Nobody Came: Changing Opinions on the Draft." Roper Center for Public Opinion, July 24.
Wolf Jr., Charles. 1979. "A Theory of Nonmarket Failure: Framework for Implementation Analysis." The Journal of Law and Economics 22 (1): 107–139.
1. We thank Julianna Sterling for excellent research assistance. Also, we thank Mark Manuszak, Robert Stewart, and the participants of the Board's Supervision and Regulation Policy Research and Analytics seminar for their helpful comments and suggestions. The views expressed in this note are ours and do not reflect official positions of the Federal Reserve Board or the Federal Reserve System. Return to text
2. Two key laws here are the Regulatory Flexibility Act of 1980 and the Unfunded Mandates Reform Act of 1995. The Regulatory Flexibility Act requires agencies to consider the impact of their rules on small entities. The Unfunded Mandates Reform Act requires agencies to assess the costs and benefits of any regulation that has a material impact on state, local, or tribal governments. Both laws define agencies according to Title 5, Section 551(1) of the US Code; however, the Regulatory Flexibility Act includes independent regulatory agencies, while the Unfunded Mandates Reform Act excludes them. Return to text
3. See Executive Order No. 12866, 58 Federal Register 51735 (October 4, 1993) and Circular A-4: Regulatory Analysis, Office of Management and Budget, Executive Office of the President (final-2003 and proposed-2023 versions). Return to text
4. Historically these requirements did not apply to independent regulatory agencies. However, recently issued Executive Order No. 14215, 90 Federal Register 10447 (February 24, 2025) mandates that "…all executive departments and agencies, including so-called independent agencies, shall submit for review all proposed and final significant regulatory actions to the Office of Information and Regulatory Affairs (OIRA)…" along with their cost-benefit analyses. OIRA traditionally reviews agency submissions of the potential costs and benefits of "significant" rules (Congressional Research Service, 2024). Return to text
5. See Executive Order No. 13563, 76 Federal Register 3821 (January 21, 2011). Return to text
6. See Executive Order No. 13579, 76 Federal Register 41587 (July 14, 2011). Return to text
7. See Executive Order No. 13992, 86 Federal Register 7049 (January 25, 2021) and Executive Order No. 14189, 90 Federal Register 8655 (January 28, 2025). Return to text
8. See Executive Order No. 14192, 90 Federal Register 9783 (February 6, 2025). Return to text
9. See Memorandum M-17-21, Guidance Implementing Executive Order 13771, Office of Management and Budget (April 5, 2017). Return to text
10. See 2017 Report to Congress on the Benefits and Costs of Federal Regulations and Agency Compliance with the Unfunded Mandates Reform Act, Office of Management and Budget (2017). Return to text
11. See Government Accountability Office (2007, 2014, and 2018). Return to text
12. See Government Accountability Office (2024). Return to text
13. Alessandro, Cardinale, Lagomarsino, Scartascini, and Torrealday (2019) show that government transparency, and particularly transparency about government performance (e.g., implementing effective rules) promotes citizen confidence in government. In a metanalysis of 49 studies, Qiushi and Guan (2022) show that government transparency has a positive and significant effect on citizen trust in government. Return to text
14. See Herrick and Sarewitz (2000), Kovacic (2006), and de Jong et al. (2018) for a discussion of the benefits of ex-post evaluation of policies in the context of environmental policy, competition policy, and infrastructure projects, respectively. Return to text
15. The Department of Energy previously highlighted a related motivation, stating that it would "consider how regulations might be designed and written" to better support retrospective evaluation and measurement of outcomes (DOE 2011). Although DOE's subsequent actions regarding this specific goal are uncertain, our work explicitly advances this approach. Return to text
16. Even with this separation of drafting and analysis roles, both sets of analysts ultimately answer to SEC leadership. Skeptics could still justifiably worry about the analysts' independence when they have the same bosses. Return to text
17. The typical identification assumptions used in measuring the effects of regulations do not require that firms be "similar." Rather, the (partly non-testable) assumptions needed for causal identification are exchangeability (i.e., there is conditional mean independence of treatment and control units), positivity (i.e., the probability of a unit being treated is positive but below one), and stable unit treatment value (i.e., potential outcomes of each unit are unaffected by the treatment assignment of other units). Similarity of treated and control units is one way to satisfy these assumptions but is not necessary.
A potential objection to using differentiated application of regulations to similar firms to identify treatment effects is that, by changing the conditions of competition across firms, introducing a regulation for a subset of firms can also affect outcomes for the non-treated firms – which would violate the stable unit treatment assumption. This is a potential limitation to consider when applying the identification methodologies discussed below. Return to text
18. Differentiated application of regulations is often mandated by statute. This article focuses on the ex-post evaluation of regulations issued by the executive branch agencies, but similar approaches can be followed by Congress to evaluate the effectiveness of statutes ex-post. Return to text
19. In the banking regulation context, regulations with thresholds of application abound (see Congressional Research Service (2017) for a listing of all the thresholds that were in place at that time). Several regulations include enhanced requirements for large firms, complex firms, or both, including capital, liquidity, and resolution requirements. Return to text
20. Large banking organizations tend to pose proportionally more systemic risk to the financial system than smaller banking organizations (see, for instance, Lorenc and Zhang (2020)). Return to text
21. Given the same regulatory requirements, a firm with $1 billion in activity may be able to use the same system—and incur a similar absolute cost—as a firm that performs $2 billion of the activity. Return to text
22. France imposes numerous regulations on firms once they employ 50 or more workers, resulting in an unusually high number of firms with exactly 49 employees. Such "bunching" has been estimated to reduce French aggregate productivity by 3.4 percent of gross domestic product (Garicano et al., 2016). Return to text
23. The Durbin amendment to the Dodd-Frank Wall Street Reform and Consumer Protection Act requires the Federal Reserve to limit fees charged to retailers for credit card processing. Return to text
24. See, for example, Leonardi and Pica (2006) and Vaccro (2017). Return to text
25. For example, consider that the average domestic banking organization subject to Category III requirements had $550 billion in assets as of 2023:Q4. Given a typical return on assets of 1 percent, these banks would have to give up $3 billion in returns to be subject to Category IV requirements instead. Given observed growth in balance sheets, the current regulations have not caused banks to downsize in this way. Return to text
26. See Congressional Research Service (2017) and Labonte and Perkins (2017). The $50 billion threshold was generally increased to $250 billion in the Economic Growth, Regulatory Relief, and Consumer Protection Act of 2018. Return to text
27. See List (2024) for a discussion of how to optimally generate policy-relevant evidence before scaling an intervention. Return to text
28. See Federal Reserve System (2016) and Office of the Comptroller of the Currency, Federal Reserve System, and Federal Deposit Insurance Corporation (2023). Return to text
29. The idea of using staggered implementation for causal identification has long been a standard practice in policy analysis (Meyer, 1995). The earliest reference we know of calling for expanded use of staggering to promote policy evaluation is Leuz and Wysocki (2016). Return to text
30. Between August 2000 and January 2001, the New York Stock Exchange, in a series of incremental steps, began trading and quoting all its listed securities in increments of a penny. See SEC (2000) for additional details. Return to text
31. The primary component of the Economic Stimulus Act of 2008 was a $100 billion program of economic stimulus using tax rebates. The IRS implementation of this stimulus was randomized based on the last two digits of the filer's Social Security number. See Parker et al. (2013) for details. Return to text
32. Chung et al. (2004) studied the staggered roll out of decimalization. The Nasdaq Stock Market and the New York Stock Exchange have also performed ex-post analyses of this policy. As predicted, these analyses found that decimal pricing reduced bid-ask spreads. There were concerns the rule would reduce market liquidity, but the analyses failed to find evidence that liquidity declined. Chung, Lee, and Rösch (2020) and Werner et al. (2022) look at a related policy experiment involving a stratified random sampling process that assigned different quote and trading price increments in 2016 called the tick size pilot program. Parker et al. (2013) studied the effect of the staggered IRS roll out of the economic stimulus payments of 2008. This paper helped establish the efficacy of fiscal stimulus spending, finding a marginal propensity to consume of 50 to 90 percent in the first three months after stimulus payments were received. These results imply that such payments are highly effective stimulus. Return to text
33. See Bessembinder et al. (2006) and Goldstein et al. (2007). Return to text
34. The 2024 CMS staffing rule for long-term care facilities phases in nurse staffing minimums across facilities at different times, based on geographic and facility-specific factors, allowing for future comparisons across rollout cohorts. Similarly, the Department of Labor's 2024 overtime rule raises salary thresholds in two steps, creating a time-based regression discontinuity that could support evaluation of labor market effects. To date, neither rule has been the subject of published ex-post evaluation exploiting this variation. Return to text
35. The CECL accounting standard was required for entities that meet the definition of an SEC filer as of January 2020, and all other entities were required to adopt CECL by January 2023 (see Financial Accounting Standards Board, 2016). Return to text
36. Stress tests were initially implemented for the largest 19 banks in 2009 (under the Supervisory Capital Assessment Program) and later extended to all banks with more than $50 billion in total assets in 2013, following on the statuary mandate from the Dodd-Frank Act; see Board of Governors (2009, 2013). Return to text
37. In an implementation that combines staggering and tiering, the LCR was initially introduced for the largest firms in 2015:Q1, and, later, a modified LCR was applied to a set of smaller yet large firms starting in 2016:Q1 (Office of the Comptroller of the Currency, Federal Reserve System, and Federal Deposit Insurance Corporation, 2014). Return to text
38. This occasionally happens in medical trials where a treatment is so effective that the study is ended early over ethical concerns about denying potentially life-saving treatment to the control group. See Mueller, Montori, Bassler, Koenig, and Guyatt (2007) for a discussion of this issue. Return to text
39. See Harris et al. (2021) for a description of these and other pilot programs and for several references to studies that were based on these pilot programs. Return to text
40. See Abramowicz, Ayres, and Listokin (2011) for further discussion of the benefits of randomizing laws and regulations. Return to text
41. See Silverman and Chalmers (2001), Erikson and Stoker (2010), Gramlich (2017), and Weldon (2017) for a discussion of public perspectives on fairness in these contexts. Return to text
42. See Rawls (1971) for a discussion of the fairness of equality prior to resolution of uncertainty (i.e., being behind the veil of ignorance). Return to text
43. The Mexican anti-poverty program Progresa/Oportunidades was designed for evaluation using a randomized design. The ongoing program provides cash transfers to poor families based on school attendance and doctor visits (Parker and Todd, 2017). Due to lack of available funds to cover all eligible citizens, Oregon randomized who was covered by a Medicaid expansion in 2008, giving important insights actual benefits of government health insurance (Allen et al, 2010; DeVoe et al, 2015). Return to text
44. The Census Bureau frequently asks multiple versions of a question when introducing a potential question to the American Community Survey. In addition, extensive effort typically goes into the revision of existing questions to validate when and if the new and old series are comparable. See Census Bureau (2020). Return to text
45. See Tavernise (2014). Return to text
46. See Bureau of Labor Statistics (1995). Return to text
Kay, Benjamin S., and Marco Migueis (2025). "How to Design Rules for Ex-Post Evaluation," FEDS Notes. Washington: Board of Governors of the Federal Reserve System, June 26, 2025, https://doi.org/10.17016/2380-7172.3698.
Disclaimer: FEDS Notes are articles in which Board staff offer their own views and present analysis on a range of topics in economics and finance. These articles are shorter and less technically oriented than FEDS Working Papers and IFDP papers.