AI Helps EPA Standardize and Sharpen Toxic Chemical Data
Across the U.S., more than 20,000 facilities report to the Toxics Release Inventory (TRI) every year, which tracks the management and release of billions of pounds of toxic chemical waste. This resource is one of the Environmental Protection Agency’s (EPA) most powerful tools for understanding toxic chemical releases from industrial and federal facilities and supporting informed decision-making by communities. Facilities must report what happens to each listed chemical, including on-site activities where waste is treated, released to the environment, or recycled at the facility, as well as transfers off-site to other locations for further waste management.
Unfortunately, EPA’s off-site transfer data had long been difficult to analyze due to inconsistent names, outdated addresses, or slight variations in how the same receiving location is identified. Without standardization around the off-site location data, a single recycling center might appear under a dozen spellings and address formats, which complicated efforts to track where chemicals were going. This lack of standardization created a fragmented picture and made it harder for analysts to analyze and map the data.
EPA needed a way to clean, match, and standardize millions of records without losing the data’s integrity. They also needed to accelerate the work because manual processing could take nearly a full year to work through a single reporting year of data.
That is where EPA tapped Abt to come up with a solution.
Pairing TRI Expertise with Modern AI –Enabled Tools
Abt brought together nearly 30 years of TRI program expertise and advanced machine learning capabilities to tackle the problem at scale.
We applied a two-part AI approach:
1. Unstructured machine learning entity resolution
Abt used Splink—an open-source Python package designed for large-scale deduplication and record linkage—to process the data and identify clusters of data representing the same location. We customized the model with:
- State-based blocking to optimize processing speed
- Informed parameter tuning that weighted certain fields over others
- Logic to assign clusters that reflected real-world TRI reporting patterns
This reduced the noise in the dataset, streamlined the entity resolution process, and created preliminary clusters of likely matches.
2. Large language model review to refine complex cases
For clusters that needed deeper review, we used large language models (LLMs) to assess whether a group of facilities truly represented the same location or needed to be separated. This second pass was especially valuable for ambiguous records, where name similarity alone could mislead traditional algorithms. Human oversight remained central to test and validate the outputs from the LLMs. Abt’s subject matter experts confirmed clusters, identified false positives that should be reassigned to new, smaller clusters, and ensured that every automated step aligned with EPA’s business rules for off-site locations.
Results: 97% Record Completion Achieved in a Fraction of the Time
Nearly complete standardization of off-site transfer locations
In EPA’s dataset, coding of official facility identifiers increased from about 80 percent to 97 percent with significantly improved accuracy. Stakeholders and data users can now better analyze where toxic chemical waste is going, while improving data quality, accessibility, and public transparency.
Year of manual review reduced to weeks
EPA staff reported that AI-enabled workflows condensed a yearlong review process into just a few weeks for multiple reporting years. This timesaving measure allows EPA to focus time and efforts on other important analysis, tasks, and deliverables.
Cleaner, mappable dataset for the first time
Because most locations are now tied to a standardized facility ID with latitude and longitude coordinates, EPA and the public can better visualize transfer pathways on a map. This was not possible before with the raw data.
Better inputs for the Office of Pollution Prevention and Toxics
By cleaning and standardizing off-site transfer records, EPA can now rely on this dataset for a wider range of analyses and decision-making. This includes data needs for the existing chemicals program under the Toxic Substances Control Act, the Emergency Planning and Community Right-to-Know Act, and the Pollution Prevention Act.
A Model for Mission Driven AI Modernization
EPA’s chemical safety mission depends on high-quality, well-structured data. With Abt’s support, EPA now has cleaner information, faster workflows, and a stronger foundation for broader AI readiness and future AI-enabled modernization.
This effort shows how deep subject matter expertise and advanced AI engineering work together to create real value. EPA can now deliver on its mission more effectively, while reducing burden and saving taxpayer dollars.
Let's Connect
Read More
Environment & Energy in the United States
Modernizing America’s Waterborne Commerce Intelligence
Abt’s partnership with USACE’s Waterborne Commerce Statistics Center strengthens America’s capacity to deliver reliable navigation data through AI-enabled waterborne commerce intelligence.
Abt Forward: November 2025 Newsletter
Abt Forward: Military Families, AI Leapfrogs, Mission Impact. Explore military family resilience, AI innovations, and community impact.
Multi-site study of communities with PFAS-contaminated drinking water: Methods, demographics, and PFAS serum concentrations
Presents findings from a multi-site study examining the relationship between PFAS-contaminated drinking water and PFAS levels in residents’ blood across the U.S.
Coastal Resilience Powered by Open Data and Modern Tools
For the Department of the Interior, streamlined data access and user-friendly tools that strengthen coastal protection, marine minerals access, and environmental resilience.
Addressing Cumulative Air Pollution in Massachusetts
Supporting Massachusetts in developing cumulative impact analysis and protective air permitting tools.
Shelter from the Storm: Addressing the Dual Crisis of Extreme Weather and Homelessness
Homeless response systems need support to strengthen their resilience to extreme weather shocks and the disproportionate harm that extreme weather has on people experiencing homelessness.
How to Tackle Contaminants and Make Drinking Water Safer
Abt’s expertise with PFAS, lead, and perchlorate enables us to assess health risks, technology effectiveness, and health benefits of water treatment options.
79th Session of the UN General Assembly (UNGA 79) and Climate Week NYC
Abt Global will be participating in events during the 79th Session of the UN General Assembly (UNGA 79) and Climate Week NYC.
Understanding Public Perception of Extreme Heat and Health Risks
This report by CRC, Abt, and the National Weather Service profiles public perception of dangerous heat events & recommends actionable messaging.
Abt Global to Partner with U.S. Department of Interior to Facilitate Clean Energy Collaborations
Through a $65 million contract, Abt will help DOI work with a range of stakeholders to maximize investments in clean energy and infrastructure development.