Transparency for Congress's Scorekeepers
Two of the most important and powerful agencies of the federal government are barely known to the American public. Those in the know refer to them by their three-letter acronyms, but they are not intelligence agencies or secretive law-enforcement units. They don't cut checks, negotiate treaties, regulate commerce, or make policy of any kind. Rather, they consist of teams of economists, modelers, software developers, data scientists, and other technicians and wonks who run computer projections that estimate what proposed legislation will cost and what effects it might have.
The Congressional Budget Office (known as CBO) and the Joint Committee on Taxation (known as JCT) are Congress's scorekeepers. The JCT, the older of the two, was created in 1926 to oversee the Internal Revenue Service. But both agencies have served their scorekeeping role since the enactment of the Congressional Budget and Impoundment Control Act of 1974. That law, which created the CBO and defined the modern mission of the JCT, sought to make Congress less dependent on the executive branch for analysis of the potential costs and fiscal effects of legislation. It empowered CBO to produce economic projections, analyses of the president's budget proposals, and — perhaps most important — cost estimates for any proposed legislation that would require any federal spending. The JCT does the same for legislation that would affect federal revenue. The two agencies therefore work together to provide Congress with estimates of the fiscal effects of its work.
The scorekeepers play a pivotal role in the budget process created by that same 1974 law. Their projections set the parameters for each year's budget process and determine whether proposed legislation meets the requirements for spending, revenue, deficits, and debt that Congress establishes for itself. That means that the assumptions made by CBO and JCT, and the assessments they produce, shape nearly every federal policy debate. Although the agencies acknowledge the uncertainty involved in modeling and projection, they are nonetheless required to give Congress specific numerical estimates of spending and revenue and of other anticipated effects of legislation (effects, for instance, on the number of people with health insurance or the number of people receiving welfare or unemployment benefits).
These projections come to be treated as facts, shaping the way various policy proposals are understood and debated. A May 2017 primer in the New York Times on a proposed Republican health-care bill offered a classic example: After describing how the bill would alter the Medicaid program, the article simply asserted, without citing any source, that "Medicaid cuts would total $880 billion over 10 years." A casual reader might have imagined that such a number was stated in the legislation being summarized. In fact, it was the result of a highly complex projection by the CBO — an estimate laden with assumptions and layers of uncertain modeling decisions, most of which have never been fully revealed to policymakers or the public. But the projection was presented as a simple fact, and defined the contours of the argument about the GOP proposal.
Congress's scorekeepers understand the risks involved in the stature they have attained, and they do work hard to avoid rank partisanship in their analysis. They attempt to remain decidedly neutral — hiring on the basis of professional qualifications, avoiding policy recommendations, basing their assumptions on the consensus of relevant experts, and providing estimates that fall in the middle of the range of possible outcomes in their models.
And members of both parties in Congress clearly value their work. Congress, justifiably, allots over $50 million a year to CBO and JCT, much of which is spent on modeling. To put this into perspective, the combined CBO and JCT budget is equal to more than half of the National Science Foundation's spending on not just economics, but all of the social sciences. All told, Congress has spent billions of dollars on the development and application of the scorekeepers' simulation models. That may be a drop in the bucket of the federal budget, but these are surely some of the most expensive economic models ever developed. While some of the models are little more than spreadsheets in Microsoft Excel, others involve complex computational suites with thousands of moving parts. Some contain sensitive assumptions that dramatically influence a bill's score. All of the models, simple and complex, are constantly evolving to meet the demands of new policy ideas and proposals.
Congress also gives the CBO and JCT rare access to private data that the government collects through its administration of the tax system, the census, and other functions. These administrative data are of enormous value for understanding the consequences of public policy, and the scorekeepers are society's only regular conduit — for the purpose of fiscal-policy analysis — to this treasure trove of government data.
The immense influence and importance of the scorekeepers' modeling, and its capacity to shape the most significant federal policy debates, mean that the stakes of their work are incredibly high, and that mistakes and biases that creep into that work can create dangerous distortions in American public life. The scorekeepers' privileged access to data means that even merely suboptimal analysis — analysis that is free of mistakes or biases but that doesn't extract the full information content of the data — can cost society dearly.
In light of such risks, simply trusting that the agencies will hire well and play things straight cannot be enough. The CBO and JCT have earned Congress's trust, but that trust cannot be blind. And today that is too often the case. The scorekeepers' models are subject to neither congressional oversight nor systematic independent review. The scorekeepers neglect their duty to research by hiding their models from open review, and Congress shirks its responsibility of oversight by allowing the models to remain hidden.
Secret models make it harder than it needs to be to discover errors or analytical biases, and to get a sense of the degree of uncertainty or the sorts of assumptions baked into key projections. Secrecy undermines the scientific quality of the scorekeepers' work and diminishes its value to the public. Greater transparency, made possible by new technologies but so far studiously avoided by the CBO and JCT, would do both agencies a lot of good — and it would protect the integrity of our policy debates and empower far greater policy experimentation. It is time for Congress to modernize its scorekeepers.
BLACK-BOX MODELING
The secrecy in which the scorekeepers shroud their models threatens both their own work and the broader work of Congress.
For one thing, this secrecy sacrifices the scientific quality of the work of CBO and JCT by limiting their access to independent review and outside ideas and suggestions. It is true that the scorekeepers document the general approach of select models and publish select details — in some cases they even provide extensive detail, in confidence, to hand-chosen reviewers — but they do not provide sufficient detail for Congress or independent experts to systematically review or contribute to their analyses.
This is not to say that the scorekeepers do not have good modelers working for them, or that the models and estimates that they produce are systematically of low quality. That is certainly not the case. The scorekeepers have considerable financial resources, a near monopoly on the best data available to our government, and many excellent modelers and economists. CBO even has advisory panels full of prominent economists to give them structured feedback on their methodology.
But the economy is fundamentally complicated, and the task of modeling it is even more so. The scorekeepers have no monopoly on economic wisdom, technical expertise, or good, practical ideas for how to plausibly model complicated human interactions and decisions. However well-funded they may be, and even if their staffs and budgets were to grow considerably (as they would surely like), it will always be the case that the majority of modeling technicians are not on their payroll and are instead employed by academic institutions and other for-profit and nonprofit firms. This is to say nothing of the growing legions of data scientists, data engineers, and computational-modeling experts from other domains outside the policy world and the economics profession, all of whom could contribute to the scorekeepers' work, if permitted.
We can safely assume that the scorekeepers would produce more accurate scores if their modeling teams could participate openly in the scientific and research communities. But at this point, CBO and JCT leadership enforce rules that prohibit their staff from following the principles of transparency and openness that accommodate scientific collaboration and have become standard practice in the social sciences.
Beyond the detriment to their predictive accuracy, the scorekeepers' guardedness means that their models are not available for Congress or the public to use for themselves. The only available estimates are those that the scorekeepers produce for bills before Congress or in response to special requests from members or committees. Policymakers cannot run the models to test alternative policy scenarios in order to design creative policy proposals. They cannot see how their proposals would fare under alternative assumptions about how the economy will evolve or might respond to a given policy change, since the scorekeepers provide point estimates under only one set of assumptions and conditions. This denies policymakers and staff the opportunity to develop a deeper understanding of why policies have the effects they do and of what risks they might involve, and it makes it harder for them to design good policy.
Members of Congress are actually restricted by the ethics rules of both houses from having proposals modeled by outside experts, since such modeling, which is very expensive, would amount to in-kind contributions to congressional offices. (They can sometimes work informally with academic or think-tank modelers, but only under very restrictive rules.) So the CBO and JCT are effectively their only sources of guidance on the cost and other effects of policy proposals, and yet the scorekeepers do not allow members and staff to understand the details of their modeling work or to consider alternative scenarios. In fact, because the scorekeepers are kept intensely busy by the flow of legislative proposals, they generally cannot engage in exploratory modeling of potential ideas with any but the most senior members of the relevant committees or the congressional leadership. They therefore often act as bottlenecks restraining policy development, rather than enablers of it.
Perhaps worst of all, interested American citizens cannot access information that could inform their votes and shape their understanding of key policy issues. Obviously, most Americans wouldn't look into CBO models on health care or infrastructure even if those were available. But some voters would, and many journalists, activists, and people directly affected by potential policy changes could digest and channel such information for others. The modeling CBO and JCT do, developed at great expense to the public, could serve as an immensely valuable public resource. But instead, the scorekeepers position themselves as gatekeepers and guard such vital information from the public.
Taken together, then, the secrecy surrounding the work of the CBO and JCT creates two kinds of problems: First, a fundamentally important research product is not adequately exposed to outside ideas or the scientific process; and second, a fundamentally important source of information about public policy is kept from both the people's elected representatives and the people themselves.
The stakes are not limited to the cost of mismanaging the tens of millions of dollars that the scorekeepers spend on modeling every year. We are at risk of Congress systematically making ill-informed policy decisions because the scorekeepers — who often have unique access to the best available data — are not fully participating in the wider community of experts, and because their models are not as accessible as they should be. And perhaps more worrying still, conducting important analyses behind a curtain of secrecy can degrade trust in government at a time when such trust is sorely lacking already.
Both Congress and the scorekeepers themselves should want to change this state of affairs. Greater transparency and availability of information would serve everyone involved.
TOWARD REAL TRANSPARENCY
Any significant move toward transparency would confront some practical and logistical challenges that must be thought through. But before considering those challenges, we should think about what greater transparency might involve if there were no constraints. In an ideal world, what would it aim to achieve?
Given that the two most significant pitfalls of the scorekeepers' approach today are that their work is neither adequately exposed to the scientific process nor adequately accessible to outsiders, the question becomes more pointed: What approach might maximize review and contributions from outside experts, maximize accessibility for Congress and the public, and be most effective for building trust in government?
The answer is straightforward: The optimal standard of transparency and accessibility is what scientists call replicability, so the ideal solution is for the scorekeepers to provide enough information about their work so that independent experts could replicate their analyses from beginning to end. In practice, this would mean publishing their data along with the source code that prepares and cleans the data, conducts intermediate and final analyses, and generates the results. (Remember, we are discussing the optimal solution in an ideal world — we will soon enough get to constraints and necessary workarounds.)
Replication facilitates scientific review by enabling reviewers to understand exactly what is going on in an analysis in a way that reading an incomplete and possibly inaccurate natural-language description of the analysis never can. Replication also, as the name suggests, solves the problem of accessibility. When outsiders can replicate a model, they can also run the model for themselves. Replicability would allow outsiders to run alternative policy scenarios with alternative assumptions, and so to project what the CBO and JCT would make of various proposals. The principle of scientific replication, applied to the scorekeepers, would allow for outside reviewers and contributors, would facilitate access to the models by Congress and the public, and would be a step toward building greater trust in government.
The pursuit of replication in policy analysis is a topic that I am intimately familiar with as the director of the Open Source Policy Center (OSPC) at the American Enterprise Institute. Our mission is to champion transparent, accessible, and replicable public-policy research. We don't just pontificate on these issues: OSPC staff members, including myself, contribute to several open-source economic models for public-policy analysis. These models are similar to, but separate from, the models developed by the CBO and JCT. Open source is a step beyond replicable, in that the modeling projects are actually designed to welcome and encourage external contributions.
We also lead the development of PolicyBrain, a web application that gives non-expert users access to these open-source models via an easy-to-use online interface. PolicyBrain enables users to design their own policy proposals from scratch or analyze existing proposals. The tools OSPC develops are used by congressional offices, the White House, leading political candidates, think-tank analysts, journalists, academics, students, and the general public.
In many ways, OSPC goes beyond what I am recommending for the scorekeepers — but it is a demonstration of the viability of this more modest proposal. It is not necessary, for instance, to task the CBO and JCT with seeking out external reviewers, contributors, and users, as OSPC does. There should be no mandate that the scorekeepers' code be user-friendly for non-experts, or for them to develop web applications like PolicyBrain to facilitate ease of use. The scorekeepers' only mandate should be to produce scores for Congress and to make available the materials that would allow other professionals to fully replicate those scores.
Mandates for further steps aren't necessary because such steps would surely be undertaken by outside experts if the CBO and JCT achieved real replicability. With or without encouragement from the scorekeepers, independent experts would certainly offer their reviews and suggestions, and would make the replicated models easy for non-experts to use. OSPC, for instance, could develop web applications like PolicyBrain expressly for CBO and JCT models. We could market these tools to policymakers and the public so that everyone would have access to the information. The demand for this information would be substantial: You could easily imagine digital teams at the top newspapers, think tanks, and many other organizations making competing web applications available. The models would ultimately be accessible to everyone, and competition among providers would lead to polished, easy-to-use products for both policymakers and the public.
Imagine congressional staffers and lawmakers trying out various iterations of their policy proposals to see which one the CBO would score as having the most positive effects. Imagine them testing and fine-tuning their policies before proposing them. Imagine engaged members of the public doing the same, and policy debates hinging on disagreements over key assumptions in the models, which could be addressed with further empirical study. All of that would be possible with full replicability.
Full replicability is, therefore, the optimal solution. Should this replication standard exist in its ideal form, outside experts would be allowed to make intellectual and technical contributions at will. Society would fully utilize the private administrative data available to the scorekeepers, and the scientific component of the scorekeepers' work would be reinforced.
FALSE CONSTRAINTS
If replicability would have such enormous advantages, why hasn't Congress required the scorekeepers to pursue it? And why haven't CBO and JCT done it on their own?
Proposals to move in this direction have been made by a variety of experts and observers in recent years but have met with resistance and concern. Some of the most common causes for worry are actually easily answered. But others are more serious, and would have to be addressed before any meaningful moves in this direction could be made.
One frequently cited constraint certainly isn't a reason to hold off: When presented with the case for replicability, the scorekeepers sometimes respond that severe resource constraints — like having too little funding, too few staff, and generally not enough time given their legal obligations to Congress — stand in their way. But these arguments are misleading at best. There may need to be some consideration for the transition to replicability, and for how to carry it out while still doing all the work Congress demands of the scorekeepers. But beyond the transition, the scorekeepers' budgets need not be higher to accommodate full replicability. In fact, there is a path to ensuring replicability that should increase the scorekeepers' efficiency, make their models cheaper and easier to maintain, and make their staffing and training requirements easier to satisfy. And that is not even accounting for the free technical contributions from independent experts that the scorekeepers are likely to receive.
To understand this point, we need to delve into a few basics of software development. Remember, the scorekeepers' models, both spreadsheets and complex computational suites, are fundamentally software products. There are four basic techniques that are widely recommended to ensure the quality of any software project. These four techniques are somewhat technical, but understanding them is essential to understanding why the scorekeepers are wrong to suggest that resource constraints would prevent replicability. They are version control, automated testing, documentation, and internal code review.
Implementing version control means adopting a system for carefully tracking changes to software and data over time. Implementing automated testing means building the capability to automatically run your software in order to see how your results change over time, among other objectives. Documentation means writing down descriptions of the structure and details, and instructions for how to use your software and data. Practicing internal code review means ensuring that every substantive change is reviewed by someone beyond the modeler proposing the change. These four techniques provide immeasurable efficiency gains for development teams. Most important, these techniques are essential for limiting the bugs in software and generally ensuring its quality.
If these basic standards for software quality are followed in a modeling project, then the additional cost of providing external replicability (in terms of time, staff resources, and money) should be negligible. Every one of the requirements for external replication is satisfied simply by following these standard software-development practices internally. With version control you have an easy way to track the versions of your model that contributed to any particular analysis. With automated testing you have concrete examples — in the automatic tests themselves — of exactly how to run your model under a wide variety of scenarios, including, ideally, the scenarios that you publish in your reports. With documentation, you have pre-written English descriptions of your code and data. With internal code review, you gain experience making your project accessible to reviewers. Beyond these basic quality assurances, the marginal cost of providing external replicability, above and beyond writing high-quality models, is essentially zero. And again, this is not even accounting for any valuable external contributions.
Given the importance of their work and the need for accuracy, it may be surprising to learn that the scorekeepers do not consistently follow these standards (a fact I have learned through conversations with scorekeeping staff and others close to their work). Failing to follow these practices leads to something called "technical debt." And the term is awfully apt. It describes a situation in which you have sacrificed quality for some expediency and are now paying a long-term carrying cost in the form of lower-quality software that is more expensive to maintain.
This is not to say that the scorekeepers must necessarily adopt these four techniques before they begin practicing replication. But it does show that resource constraints are no excuse because, if it is not free or close to free to make their models replicable, then there must be some other mismanagement or inefficiency going on that needs to be addressed regardless of any new efforts to achieve replicability.
The scorekeepers themselves should want to pay off their technical debt and adopt these techniques as their path to replicability, but they should also be encouraged to do so by Congress. This is both because they need to know that Congress prioritizes this work and will allow them to devote some energy and resources to the required transition, and because they need to know that Congress will not penalize or embarrass them for their past practices. It is likely that part of the reason CBO and JCT do not already practice replicability is that they are embarrassed by the software quality of their models. Given that trust in the scorekeeping institutions should be carefully protected, Congress should be sensitive to their plight, and, should the scorekeepers request a transition period and temporary funding, Congress should consider those requests seriously.
Beyond resource constraints or the potential for embarrassment, both of which could be addressed through better software practices, the scorekeepers might also object to replicability on the grounds that nefarious external influences (like interest groups and lobbyists) would propose changes to the models that are not scientifically founded and that are paid for by rich clients. But this, too, is a false constraint. For better or worse, the scorekeepers are already a target for lobbying and political influencers. Ex-staffers of both agencies play an active role in the industry as private intermediaries, valuable for their collegial relationships with the scorekeepers and their intimate and non-public knowledge of the scorekeepers' methods and assumptions. There are no public records of the meetings that ex-staffers or other paid influencers have with CBO and JCT staff.
It would be much better for information on the scorekeepers' methods to be available to all and for any changes that the scorekeepers introduce to their models — perhaps influenced by these external lobbyists — to be visible to all. The ultimate responsibility for accepting suggestions will continue to fall to the scorekeepers, so the only way that a nefarious change could be introduced is if the scorekeepers accept it. The scorekeepers' responsibility would be to accept good ideas from experts and reject bad ideas from special interests, ideologues, and cranks. They have the same responsibility today, but replicability will mean that their assurances to Congress and the public of their diligence can be verified by external review and oversight, and not rely solely on trust.
Just as any honest manager appreciates having someone else review his expenses to prevent even the appearance of impropriety, an honest scorekeeper should appreciate having outside review of his imperviousness to nefarious external influencers. If this seems like an additional burden on the scorekeepers, it doesn't have to be: They can easily set up a system by which all proposals and suggestions would be made publicly, enabling outside reviewers to help the scorekeepers evaluate their merits. Whatever process the scorekeepers choose to adopt for dealing with outside influencers, the situation would be no worse than it is today — and these adjustments offer an easy path to significant improvement.
PROTECTING PRIVATE DATA
False constraints aside, there is one true limit to full replicability of the work of CBO and JCT: the data themselves. The private data that the scorekeepers use — valuable data provided by all of us through our interactions with government and made available to CBO by agencies like the Internal Revenue Service, the Social Security Administration, and the Census Bureau — cannot readily be shared due to (justified) legislation that protects data privacy. This is not a constraint on the replicability of all of CBO's and JCT's work, as not all of their modeling involves private data. But some does. And in those cases, the optimal level of replication — allowing independent experts to go from basic data through final analyses, step alpha through step omega — is in fact impossible.
But even in those cases, all hope is not lost. The limitation of having incomplete access to private data cannot be fully overcome, but it can be mitigated. In particular, if the scorekeepers carefully document their private data, without disclosing any of them, then near-complete transparency and accessibility are still possible. The documentation must be thorough. The scorekeepers will need to describe exactly which private variables their data contain and the statistical properties of each of those variables, including observation counts, averages, distributions, and the important arithmetic and statistical relationships among the variables.
With sufficient public documentation, independent experts can create alternative datasets that will stand in for the private data and allow the models to run. A simple dummy dataset (or "test dataset") might do no more than that. It could be constructed with randomly generated data to match the basic outline of the real data in such a way that the model would run but would produce unintelligible results. But a carefully constructed alternative-production dataset could do much more than that — not just allowing the model to run but producing sensible results. Alternative-production datasets could be constructed from publicly available data sources that are analogous to but of lower quality than the administrative data, and they could go through multiple stages of "statistical therapy" until they match the statistical properties of the private data closely enough to accommodate specific models.
If the scorekeepers adopt replicability, dummy datasets will be available first due to their simplicity and relative ease of construction. And the value of even such starter dummy datasets should not be underestimated. Without running a model yourself, it can be difficult to understand how it works; you can't change some code and see what happens. But with dummy data, you can. Given that the source code for a model fully documents its assumptions and methods, and dummy data could allow independent experts to fully understand the source code, the production of dummy data would facilitate complete methodological transparency.
By allowing independent experts to run the source code and modify it, simple dummy data can also facilitate technical contributions from outside experts. These contributions can be methodological suggestions, implemented in a way that still allows the model to run and pass all of its tests, or they can be contributions to basic software quality, like refactoring code or adding new tests or documentation. Certainly the scorekeepers won't be required to accept any of these suggestions, but, as someone who maintains economic models, I can tell you that it is far better to receive suggestions that work than suggestions that break your model. So even simple dummy data are enough to enable external review and collaboration, and to kick-start scientific progress.
That said, the fact that experts can't use dummy data to generate actual alternative estimates would certainly be an enormous constraint on real replicability. That is where alternative-production datasets come into play. For example, in one of the projects that I am currently working on, we rely on a dataset from the Internal Revenue Service's Statistics of Income division that costs over $8,000. For many users, that price is prohibitive, and the data may as well be private. The first thing we did was create a dummy dataset that allowed external contributors and reviewers to assess our model. But many members of our community still wanted to generate sensible results for themselves, and the dummy dataset wouldn't allow that. To solve the problem, our community has developed the first version of an alternative-production dataset. That dataset is based on the Current Population Survey (which is freely available), and it includes various imputations and other adjustments to make it similar to the IRS file. Now users without the IRS file can run analyses for themselves with our model; the answers may not always be perfect, but in most cases they are close and provide valuable policy insights. Independent experts could create similar alternative datasets for the scorekeepers' models to solve the same problem.
OTHER PATHS TO FULL REPLICABILITY
If the scorekeepers were to adopt replicability, dummy data and alternative datasets are sure to be created. They will satisfy the requirements of transparency, enable contributions, and go a long way toward accessibility. Yet there are still more ways — theoretical for the moment, but promising nonetheless — that greater accessibility might be achieved.
First, the government agencies that currently warehouse the private data could set up research-data centers that would allow researchers, under close supervision, to run the scorekeeping models on the real data. Research-data centers already exist in some government agencies, and more are being created. There would need to be coordination among government agencies to ensure that the necessary data are all in one place, but there has been movement in this direction; the Commission on Evidence-Based Policymaking was created in 2016 to develop a strategy to ensure that evidence about government programs is collected efficiently and routinely. (The commission was intended to be temporary and recently published its final report.)
The second possibility is that the government agencies, the scorekeepers, or researchers working on research datasets could create synthetic datasets from the original private data that could be shared with fewer restrictions than the private data themselves. This project would need to overcome significant technical challenges, but again, there is already some progress underway, including a major project at the Urban-Brookings Tax Policy Center that aims to create synthetic tax data and would be invaluable for running revenue-estimation models and more.
Finally, outside experts could develop a web application, like the Open Source Policy Center's PolicyBrain simulation suite, that would make the scorekeepers' models available to non-technicians via an easy-to-use interface. This web application could be powered by an alternative-production dataset, a synthetic dataset, or — in cooperation with the scorekeepers or the private-data providers, and after implementing disclosure-avoidance algorithms and other security measures — even the private data themselves. In fact, all of the source code for PolicyBrain is open source, and PolicyBrain itself could be adapted to work with at least some of the scorekeepers' models. PolicyBrain already implements the disclosure-avoidance algorithm that is necessary for making tabulations available from private data, because we have to protect the $8,000 IRS data file mentioned above.
Of these three possibilities, the development of web applications like OSPC's PolicyBrain running on alternative-production datasets is the most practicable and most likely. In fact, it is almost certain to happen if the OSPC is still around when the scorekeepers become replicable: We'll build it ourselves. Others surely will too.
For many reasons, then, the limiting factor of data availability is not really all that limiting. Using both already-available techniques and some readily imaginable ones, the challenge certainly can be overcome.
LEGISLATING REPLICABILITY
The Congressional Budget Office and Joint Committee on Taxation work for Congress. They have some freedom to alter their techniques, but ultimately a serious move toward replicability should be guided by congressional instructions, and these will need to take the form of legislation enacted into law. Congress should work with the scorekeepers in developing such a law, but lawmakers and staff should keep in mind a few essential elements.
First, the scorekeepers should be instructed to release source code for all models, intermediate and final analyses, and all data-preparation routines used during their analysis. The source code that implements a model is the only complete documentation for that analysis. Natural-language descriptions will invariably leave out crucial details (as we see in today's CBO and JCT reports). Independent experts cannot replicate a model based on a written description alone.
Adding intermediate analyses to the list of required data would capture any analysis that yields estimates, assumptions, or other inputs that affect the final model. For example, if the JCT needs to estimate the relationship between the price of cigarettes and cigarette consumption before they can estimate the revenue impact of a cigarette tax, then the source code for that estimate should be fully disclosed. Data-preparation routines, meanwhile, comprise some of the core work of constructing a model. Analysts spend much of their time preparing databases that serve as inputs to their models. Replicators should be able to recreate every step from raw data to final output, and that includes being able to create these databases from scratch, including all cleaning, statistical matching, imputations, and other steps. These scripts should be made available even where the raw data cannot be released.
Second, the scorekeepers should be given relatively detailed instructions for the release of data. They should be required to release all unprocessed data used in their analyses not covered by any statute prohibiting release. For data that cannot be released because of laws that protect privacy, analysts should release comprehensive variable descriptions and summary statistics. This information should allow outside replicators to construct a test dataset that enables them to run the data-preparation routines and model code from beginning to end. The scorekeepers should also cite the statute that prevents release of the data and provide contact information for the governmental provider of that data.
Third, the law must insist on full documentation. This means the scorekeepers should be required to release documentation for their analysis, code, and data. This is the most subjective of the requirements laid out here, so legislation might need to be especially specific. The goal is for the documentation to allow a competent outsider to understand the methodology, program implementation, and approach to running the source code.
Fourth, the law should specify a trigger to release replication materials whenever a numerical result is distributed. Simply put, this means requiring the CBO and JCT to release replication materials at the same time they release any of their official products: a score, assessment, or projection for Congress's use. If that product is issued privately to a member of Congress or committee, then replication materials may also be released privately. (The private recipient, of course, may choose to release the replication materials publicly.) If the result is distributed publicly, then the replication materials should also be released for public consumption. This trigger mechanism will ensure the timely release of replication materials.
Fifth and finally, legislators should give the CBO and JCT a reasonable transition period to prepare for this new way of doing business. Every item proposed for public release above should already be maintained by a diligent and careful team of analysts. Each of these items is essential for quality control. That said, not every analyst spends as much time on quality control as he should or would like to, especially when faced with pressing deadlines backed with legal force. And it does appear that some of these materials are not currently maintained by CBO and JCT analysts, which means that they will need to be prepared. Separating data that can be published from data that cannot will also take time.
This suggests an effective date no sooner than six months after the legislation is passed. If the scorekeepers make a strong commitment to adopt better internal software practices as a path toward achieving replicability, then a longer transition period and a temporary budget increase should be considered as well.
THE PROMISE OF TRANSPARENCY
Some legislators have already begun to see the promise of such steps. The CBO Show Your Work Act, introduced in this Congress by Senator Mike Lee and Representative Warren Davidson (Republicans from Utah and Ohio, respectively), would apply standards much like the ones described above to the CBO. There is no pending legislation to address replicability with the other congressional scorekeeper, the JCT, but that could easily be added to the same measure.
Congress should also consider applying the same standards to the executive-branch scorekeepers at the Office of Management and Budget and the Office of Tax Analysis at the Treasury Department. And beyond scorekeeping, pending legislation called the Comprehensive Listing of Evidence for Assessments of Regulations (or CLEAR) Act would apply similar standards to the regulatory agencies for assessments of regulation, including cost-benefit analyses. This could lead to major improvements in the openness and transparency of the regulatory process.
Going further still, the statistical agencies, and indeed any government agencies that produce analyses that influence policy, should over time be expected to submit their work to the scientific process by enabling replication through transparency.
But Congress's scorekeepers should come first. Their assessments play an essential part in the work of the legislative branch, and that work has already suffered from the excessive secrecy in which the development of those assessments is shrouded. For the sake of their own integrity and reputation, and for the sake of good legislating, it is time to make the work of the Congressional Budget Office and the Joint Committee on Taxation more transparent, more replicable, and therefore more reliable.