Diagnosing Healthcare

Kevin Lewis

May 18, 2026

Performance of a large language model on the reasoning tasks of a physician
Peter Brodeur et al.
Science, 30 April 2026, Pages 524-527

Abstract:
More than 65 years ago, complex clinical diagnostic reasoning cases were introduced as the gold standard for the evaluation of expert medical computing systems, a standard that has held ever since. In this study, we report the results of a physician evaluation of a large language model (LLM) on challenging clinical cases across five experiments with a baseline of hundreds of physicians. We then report a real-world study comparing human expert and artificial intelligence (AI) second opinions in randomly selected patients in the emergency room of a major tertiary academic medical center. In all experiments, the LLM outperformed physician baselines and displayed continued improvement from prior generations of AI clinical decision support. Our study suggests that LLMs have eclipsed most benchmarks of clinical reasoning, motivating the urgent need for prospective trials.

Sociodemographic Variability in Pediatric Emergency Decisions by AI
Mahmud Omar et al.
Pediatrics, forthcoming

Methods: We analyzed sociodemographic variations in pediatric emergency recommendations from an ensemble of 10 LLMs, evaluating 500 validated standardized cases and 500 real clinical scenarios, totaling more than 3.7 million model outputs.

Results: Significant deviations emerged, particularly for cases labeled with socioeconomic adversity, such as unstable housing or low family income. Although increased vigilance toward certain risk factors might be clinically reasonable, the magnitude and consistency of model recommendations were notably high compared with the physician-derived ground truth, especially for low-income and immigrant groups. Intersectionality involving Black race consistently intensified these differences. For example, cases labeled Black unhoused received substantially higher recommendations for urgent interventions (+10.5 percentage points [pp]; adjusted P < .001), additional investigations (+14.1 pp; adjusted P < .001), and suspicion of maltreatment (+26.6 pp; adjusted P < .001), even without clinical justification, compared with white or high-income cases. The LLMs also demonstrated clinical sensitivity to caregiver demographics, as expected. However, caregiver factors were associated with different recommendation patterns to a slightly lesser degree yet still showed significant variations and similar trends as child factors.

Changes in Nonprofit Hospitals’ Finances, Operations, and Quality of Care After Using Management Consultants
Joseph Dov Bruch et al.
Journal of the American Medical Association, forthcoming

Design, Setting, and Population: Observational study using a stacked difference-in-differences design to compare 306 US nonprofit hospitals that used a management consultant firm for the first time in 2010-2022 with 513 matched hospitals that did not use management consultants during 2009-2023.

Results: More than 20% of nonprofit hospitals hired management consultants during the study period. Nonprofit hospitals that hired management consultants paid an average of $15.7 million for their services, and nonprofit hospitals collectively spent more than $7.8 billion on these services from 2009 to 2023. Despite this substantial investment, analyses of hospitals’ financial performance, operational decisions, and claims-based patient outcomes revealed little evidence of substantial, statistically significant, or systematic improvements attributable to consulting engagements. Relative changes were estimated for financial measures, such as net patient revenue (−2.22%; 95% CI, −5.11% to 0.76%; P = .14), operating expenses (−1.07%; 95% CI, −3.56% to 1.49%; P = .41), fixed assets (2.05%; 95% CI, −6.54% to 11.42%; P = .65), bad debt (−6.31%; 95% CI, −19.82% to 9.48%; P = .41), days’ cash on hand (−8.56%; 95% CI, −28.00% to 16.13%; P = .46), total margin (−0.19 [95% CI, −1.20 to 0.82] percentage points; P = .71), and operating margin (0.15 [95% CI, −0.94 to 1.23] percentage points; P = .79). Relative changes were estimated for operational measures, such as inpatient length of stay (1.71%; 95% CI, −0.34% to 3.81%; P = .10) and total inpatient days (0.29%; 95% CI, −2.57% to 3.23%; P = .85). Relative changes for quality-of-care outcomes were also generally not significant. The sole exception was 30-day readmission for patients with stroke (1.37 [95% CI, 0.14 to 2.61] percentage points; P = .03), which was not robust to alternative specifications.

Primum Non Nocere: The Unintended Consequences of Financial Reporting Frequency in Healthcare
Bin Li et al.
Vanderbilt University Working Paper, April 2026

Abstract:
Does a change in financial reporting frequency affect managerial decisions in the healthcare sector? The answer is yes, and notably, such decisions can have detrimental societal outcomes. Using a regulatory change that required Massachusetts hospitals to move from annual to quarterly financial reporting, we document that more frequent reporting shifts managerial attention and resource allocation toward near-term financial outcomes. Following the mandate, hospitals earn higher profitability and improve operating efficiency. Further analysis reveals that these outcomes stem from scaling back labor and capital investment, rather than from improved revenues. More importantly, these changes are accompanied by a deterioration in patient care quality, reflected in higher patient mortality rates, increased hospital readmissions, and lower patient satisfaction. Our paper highlights a negative externality associated with a well-intentioned change in the frequency of financial statement reporting.

Beyond the Drug Label: Regulatory-Induced Complexities in Health Information
Marie Yeh et al.
Journal of Consumer Affairs, Summer 2026

Abstract:
We investigate how content-centric regulation obligates pharmaceutical companies to provide material information that includes balanced information about a drug's benefits and risks to consumers. Paradoxically, this regulatory compliant information results in information so complex it leaves a vacuum of easy-to-digest and useful consumer information. This research compares company-provided information regulated by the Food and Drug Administration (patient package inserts) with consumer-to-consumer provided information, user-generated content in the form of prescription drug reviews (UGC), coded for risk and benefit and calculating readability and linguistic metrics. We apply a consumer-centric information complexity framework which identifies the disconnect between pharmaceutical company practices (as regulated) and UGC (relatively unregulated). Analyses show that UGC drug reviews present risks and benefits in a more balanced manner than manufacturer-created patient labels. Findings identify regulatory complexity as a driver of information inadequacy that may push consumers to UGC, indicating a need for information co-production between consumers, producers and regulators.

Regulated versus unregulated competition: How drug shortages boost illegal pharmacy sales
Luis Diestre & Benjamin Barber
Strategic Management Journal, forthcoming

Abstract:
This study examines how product shortages among legal firms create competitive opportunities for illegal firms. We analyze 713 drug shortages in the United States between 2017 and 2023 and show that shortages substantially increase illicit pharmacies’ sales of affected drugs, which rise by 40.5% during shortages and remain 32.8% higher in the six months after shortages are resolved. The effects are strongest for drugs treating chronic conditions, drugs with fewer substitutes, and drugs experiencing more intense shortages. We also find spillover effects, as illegal sales of drugs typically consumed alongside the shortage drug also increase during shortages and remain higher after shortages are resolved. This evidence underscores how regulatory constraints that limit legal firms’ flexibility create opportunities for illegal competitors to capture unmet demand.

Common Agent or Double Agent? Pharmacy Benefit Managers in the Prescription Drug Market
Rena Conti et al.
Review of Economics and Statistics, forthcoming

Abstract:
Pharmacy benefit managers dominate the U.S. pharmaceutical market but are controversial and poorly understood. We analyze PBMs as market intermediaries that operate formulary contests in which on-patent brand-drug makers compete for favorable placement by offering rebates off list price. These formulary contests deliver efficiency gains compared to drug makers selling directly to consumers; PBMs capture some of these gains. Our approach answers key questions regarding the determinants of efficiency, rebates, list prices, and PBM market power in the pharmaceutical market. Our analysis also explains how common contracting practices, federal regulations, and incentives within formulary contests can undermine market efficiency.

Can Employees’ Past Helping Behavior Be Used to Improve Shift Scheduling? Evidence from ICU Nurses
Zhaohui (Zoey) Jiang et al.
Management Science, forthcoming

Abstract:
Employees routinely make valuable contributions at work that are not part of their formal job description, such as helping a struggling coworker. These contributions, termed organizational citizenship behavior, are studied from many angles in the organizational behavior literature. However, the degree to which the past helping behavior of employees scheduled to a shift impacts that shift’s operational outcomes remains an underexplored question. We define two measures of past helping behavior for members of a shift -- the total past helping of each employee and the past helping between each pair of employees -- and hypothesize that they are associated with shift performance. We empirically confirm our hypotheses with detailed scheduling and patient outcome data from six intensive care units (ICUs) at a large academic medical center, using the hospital’s electronic medical records to identify cases of one nurse helping another. Our empirical results indicate that both measures of past helping are predictive of patient length of stay (LOS), more so than the broadly studied notion of team familiarity. Counterfactual analysis shows that relatively small changes in shift composition can yield significant reduction in total LOS, indicating the managerial significance of the results. Overall, our study suggests the potential value of shift scheduling using data on past helping behaviors, and this may have promise far beyond the selected application to ICU nursing.

Nurse practitioner scope of practice law and safety net participation: Evidence from WIC
Owen Fleming & Lilly Springer
Contemporary Economic Policy, forthcoming

Abstract:
Given that nurse practitioners (NPs) are likely to practice in underserved areas, NP scope of practice reform may have spillover effects on safety net program participation. Leveraging the staggered rollout of NP full practice authority (FPA) across states, we estimate the effect of NP FPA on Women, Infants, and Children (WIC) participation. We find that starting 4–5 years after FPA, WIC participation increases by 4.1%, rising to 7.4% a decade post-FPA. The increase is driven by the enrollment of women and children and may be attributable to the patient-centered care delivery of NPs, which reduces the stigma and information costs associated with WIC participation.

Increasing Accountability and Compliance with Robot Advice
Jana Holthöwer, Jenny van Doorn & Stephanie Noble
Journal of Marketing, forthcoming

Abstract:
Service robots on organizational frontlines, notably in health and elderly care settings, promise to tackle staff shortages. In such service contexts, compliance is crucial for consumer well-being, but compliance with robot advice remains problematically low. This research explores how the source of robot advice affects compliance in human-robot interactions. In six studies, including four field studies with real human-robot interactions, the authors demonstrate that consumers are more likely to comply with advice given by a robot service provider when the source of advice is a human rather than the robot itself. This is because a human source of robot advice increases the feeling of accountability, or the expectation that one might need to justify one’s actions to others, which is more difficult to achieve with only robot social presence. In turn, this fosters advice adherence, which also persists over time across repeated interactions. However, when the robot embeds social cues in the advice, the difference in accountability and compliance between robot-only and robot advice with a human source attenuates. These insights hold enormous promise, especially for health care practitioners, institutions, and consumers for whom increased compliance can lead to better health outcomes, reduced hospital readmissions, improved recovery, and elevated well-being.

number 67 • Spring 2026

Findings

Diagnosing Healthcare

Kevin Lewis

May 18, 2026

Close to the Customer

Kevin Lewis

Insight

Archives

A weekly newsletter with free essays from past issues of National Affairs and The Public Interest that shed light on the week's pressing issues.

Sign-in to your National Affairs subscriber account.

Already a subscriber? Activate your account.

subscribe

Unlimited access to intelligent essays on the nation’s affairs.