In recent months, health officials within the Trump administration have consistently emphasized the significant role of artificial intelligence (AI) in revolutionizing various aspects of healthcare. They have touted it as a game-changer for expediting the approval of new life-saving drugs, optimizing operations within multibillion-dollar health agencies, and aiding efforts to curtail unnecessary government expenditures without compromising quality. Health and Human Services Secretary, Robert F. Kennedy Jr., has been particularly vocal, declaring in congressional hearings that a new “AI revolution” has commenced.
During such discussions, Kennedy highlighted that the Department of Health and Human Services (HHS) has started implementing AI technologies to enhance data management in healthcare, ensuring security while accelerating drug approvals. This declaration clearly illustrated the palpable excitement surrounding AI applications in healthcare among certain officials.
Just weeks before Kennedy’s statements, the U.S. Food and Drug Administration (FDA)—the regulatory arm of the HHS distinguished by its oversight of vast sectors of the American pharmaceutical and food systems—introduced an AI tool named Elsa, designed to expedite the approval processes for drugs and medical devices dramatically. Despite the promising nature of this tool, the response within the agency has ranged from indifference to concern.
According to six current and former FDA officials, who opted to remain anonymous, Elsa has proven to be beneficial for tasks such as creating meeting notes and templating communications. However, they also warned that the tool has demonstrated troubling tendencies, including fabricating nonexistent studies—a phenomenon referred to as AI “hallucination”—and inaccurately representing existing research. This unreliability poses significant concerns regarding the use of Elsa for critical FDA tasks.
The implications of these inaccuracies are profound. For instance, one FDA employee expressed that anything not rigorous enough to warrant double-checking is deemed unreliable. They emphasized that the AI can “hallucinate confidently,” signaling a stark deviation from the assurance provided by its developers. Another employee mentioned that while AI is intended to save time, it often leads to wasted efforts due to the increased vigilance required to verify authenticity.
Furthermore, capabilities of Elsa in drug and device review processes remain limited. The AI lacks access to essential documents such as industry submissions needed to address fundamental questions like the number of times a company has sought FDA approval and the related products they market. This presents serious concerns about the efficacy of a tool that FDA Commissioner Dr. Marty Makary has claimed will revolutionize drug and medical device approvals at a time when federal oversight regarding AI use in medicine appears scant.
Despite these concerns, the FDA maintains that they are already utilizing Elsa to hasten protocol reviews and prioritize inspection targets. Makary himself indicated that most FDA scientists primarily utilize Elsa for its organizational capabilities, such as study identification and meeting summarizations. The head of AI at the FDA, Jeremy Walsh, acknowledged the instances of Elsa “hallucinating,” stating that these shortcomings are not uncommon among various generative AI models.
Walsh also indicated that improvements are on the horizon, as updates to enhance Elsa’s functionality are expected in the coming months, including enhanced information access through user document uploads. However, when confronted about existing mistakes made by the AI, Makary stated that the use of Elsa is not mandatory for staff and that concerns had not been explicitly raised with him.
This acknowledgment raises questions regarding the touted gains in efficiency, with staff indicating that the necessity for double-checking AI output detracts from potential time savings. Moreover, the initial versions of Elsa were based on an earlier AI model developed during the Biden administration, indicating a prolonged period of evolution for this tool.
The FDA leadership opted to engage staff from various centers to gather insights that informed the development and ultimately the launch of Elsa. Training on Elsa remains voluntary, and while over half of FDA employees have utilized the platform, actual engagement in certain areas has been considerably lower. This suggests that despite optimistic beginnings, the adoption of Elsa has been relatively weak among staff.
Moreover, employees testing Elsa’s capabilities have encountered significant errors. Fundamental functionalities, such as accurately summarizing studies, remain unrefined, as there is no way for the AI to discern the most crucial aspects of multi-page documents. Evidence gathered reveals that Elsa frequently miscounts products and provides erroneous information when queried, highlighting the need for human oversight to validate its outputs.
In light of these issues, Walsh acknowledged some responses did not surprise him, underscoring the procedural adjustments required to address existing gaps in Elsa’s capabilities. These refinements are framed within a context of improving AI tools as models evolve, suggesting the FDA remains committed to harnessing AI’s potential despite the evident obstacles.
Ultimately, the rapid integration of AI into U.S. health agencies reflects a long-standing interest in exploring AI’s potential since at least 2018, when discussions regarding its applications in national security initiated the momentum for further evaluation. Meanwhile, the U.S. has lagged behind other regions, such as Europe, where regulations regarding AI use, especially concerning healthcare, have been firmly established through legislation like the recently approved AI