Search
February 27, 2026

AI Helps EPA Standardize and Sharpen Toxic Chemical Data

Across the U.S., more than 20,000 facilities report to the Toxics Release Inventory (TRI) every year, which tracks the management and release of billions of pounds of toxic chemical waste. This resource is one of the Environmental Protection Agency’s (EPA) most powerful tools for understanding toxic chemical releases from industrial and federal facilities and supporting informed decision-making by communities. Facilities must report what happens to each listed chemical, including on-site activities where waste is treated, released to the environment, or recycled at the facility, as well as transfers off-site to other locations for further waste management. 

Unfortunately, EPA’s off-site transfer data had long been difficult to analyze due to inconsistent names, outdated addresses, or slight variations in how the same receiving location is identified. Without standardization around the off-site location data, a single recycling center might appear under a dozen spellings and address formats, which complicated efforts to track where chemicals were going. This lack of standardization created a fragmented picture and made it harder for analysts to analyze and map the data.

EPA needed a way to clean, match, and standardize millions of records without losing the data’s integrity. They also needed to accelerate the work because manual processing could take nearly a full year to work through a single reporting year of data. 

That is where EPA tapped Abt to come up with a solution.


Pairing TRI Expertise with Modern AI –Enabled Tools 

Abt brought together nearly 30 years of TRI program expertise and advanced machine learning capabilities to tackle the problem at scale.  

We applied a two-part AI approach: 

1. Unstructured machine learning entity resolution  

Abt used Splink—an open-source Python package designed for large-scale deduplication and record linkage—to process the data and identify clusters of data representing the same location. We customized the model with: 

  • State-based blocking to optimize processing speed
  • Informed parameter tuning that weighted certain fields over others
  • Logic to assign clusters that reflected real-world TRI reporting patterns 
     

This reduced the noise in the dataset, streamlined the entity resolution process, and created preliminary clusters of likely matches. 

2. Large language model review to refine complex cases 

For clusters that needed deeper review, we used large language models (LLMs) to assess whether a group of facilities truly represented the same location or needed to be separated. This second pass was especially valuable for ambiguous records, where name similarity alone could mislead traditional algorithms. Human oversight remained central to test and validate the outputs from the LLMs. Abt’s subject matter experts confirmed clusters, identified false positives that should be reassigned to new, smaller clusters, and ensured that every automated step aligned with EPA’s business rules for off-site locations. 


Results: 97% Record Completion Achieved in a Fraction of the Time 

Nearly complete standardization of off-site transfer locations 
In EPA’s dataset, coding of official facility identifiers increased from about 80 percent to 97 percent with significantly improved accuracy. Stakeholders and data users can now better analyze where toxic chemical waste is going, while improving data quality, accessibility, and public transparency. 

Year of manual review reduced to weeks 
EPA staff reported that AI-enabled workflows condensed a yearlong review process into just a few weeks for multiple reporting years. This timesaving measure allows EPA to focus time and efforts on other important analysis, tasks, and deliverables.   

Cleaner, mappable dataset for the first time 
Because most locations are now tied to a standardized facility ID with latitude and longitude coordinates, EPA and the public can better visualize transfer pathways on a map. This was not possible before with the raw data. 

Better inputs for the Office of Pollution Prevention and Toxics  
By cleaning and standardizing off-site transfer records, EPA can now rely on this dataset for a wider range of analyses and decision-making. This includes data needs for the existing chemicals program under the Toxic Substances Control Act, the Emergency Planning and Community Right-to-Know Act, and the Pollution Prevention Act.


A Model for Mission Driven AI Modernization

EPA’s chemical safety mission depends on high-quality, well-structured data. With Abt’s support, EPA now has cleaner information, faster workflows, and a stronger foundation for broader AI readiness and future AI-enabled modernization. 

This effort shows how deep subject matter expertise and advanced AI engineering work together to create real value. EPA can now deliver on its mission more effectively, while reducing burden and saving taxpayer dollars.


Let's Connect

Read More

Environment & Energy in the United States

Modernizing America’s Waterborne Commerce Intelligence

Abt’s partnership with USACE’s Waterborne Commerce Statistics Center strengthens America’s capacity to deliver reliable navigation data through AI-enabled waterborne commerce intelligence.

Learn More
Impact Brief

Abt Forward: November 2025 Newsletter

Abt Forward: Military Families, AI Leapfrogs, Mission Impact. Explore military family resilience, AI innovations, and community impact.

Learn More
News

Multi-site study of communities with PFAS-contaminated drinking water: Methods, demographics, and PFAS serum concentrations

Presents findings from a multi-site study examining the relationship between PFAS-contaminated drinking water and PFAS levels in residents’ blood across the U.S.

Learn More
Publication

Coastal Resilience Powered by Open Data and Modern Tools

For the Department of the Interior, streamlined data access and user-friendly tools that strengthen coastal protection, marine minerals access, and environmental resilience. ​

Learn More
Impact Brief

Addressing Cumulative Air Pollution in Massachusetts

Supporting Massachusetts in developing cumulative impact analysis and protective air permitting tools.

Learn More
Impact Brief

Shelter from the Storm: Addressing the Dual Crisis of Extreme Weather and Homelessness

Homeless response systems need support to strengthen their resilience to extreme weather shocks and the disproportionate harm that extreme weather has on people experiencing homelessness.

Learn More
Publication

Gerontological Society of America (GSA) 2024 Annual Scientific Meeting

Learn More
Event

Supporting Litigation to Enforce the Clean Water Act

Learn More
Project

How to Tackle Contaminants and Make Drinking Water Safer

Abt’s expertise with PFAS, lead, and perchlorate enables us to assess health risks, technology effectiveness, and health benefits of water treatment options.

Learn More
Impact Brief

79th Session of the UN General Assembly (UNGA 79) and Climate Week NYC

Abt Global will be participating in events during the 79th Session of the UN General Assembly (UNGA 79) and Climate Week NYC.

Learn More
Event

Understanding Public Perception of Extreme Heat and Health Risks

This report by CRC, Abt, and the National Weather Service profiles public perception of dangerous heat events & recommends actionable messaging.

Learn More
Publication

Abt Global to Partner with U.S. Department of Interior to Facilitate Clean Energy Collaborations

Through a $65 million contract, Abt will help DOI work with a range of stakeholders to maximize investments in clean energy and infrastructure development. 

Learn More
News