How should financial institutions assess the accuracy of third-party AML data?

As technology becomes more integral to financial industry regulation, data accuracy has become a leading concern for many institutions. In fact, a 2017 Thomson Reuters Legal survey of anti-money laundering (AML) professionals found that only 23 percent of participants express extreme confidence in their AML and customer due diligence (CDD) data vendors[1]. On a descending scale from seven to one, survey participants listed the following factors as their leading causes for doubt: coverage gaps in certain regions (6); timeliness (6); data structure, or the organization of records (5); and lack of data coverage definition (5). These four blind spots pose a serious problem for financial institutions (FIs) because the costs of bad data are steep.

From the budget strain of reconciling false-positive suspicious activity reports (SARs), which may account for 90 percent or more of all risk alerts, and the legal threat of million-dollar regulatory penalties caused by similarly high false-negatives that result in the systemic corruption of customer accounts, bad data is costing billions[2]. This gap in accurate information, is the reason banking compliance teams spend “80 percent of this time on issues of low or moderate materiality, and only 20 percent on critical high-risk issues,” according to a 2017 report from consulting firm McKinsey & Company[3].

Compliance managers are conscious of this dilemma and are actively investing in third-party data solutions to help them conform to more cyber-focused AML enforcement regimes, both at the federal level[4] and locally. The New York Department of Financial Services’ (NYDFS) final rule on transaction monitoring and filtering[5] – legislation that could inspire industry-wide reform – is largely driving the trend. Focal points of the rule are the documentation and reporting of data lineage, or the time-stamped origins of data feeds and inputs, and the internal processes in place to verify third-party vendors.

Both of these NYDFS priorities have raised the stakes for AML and Know Your Customer (KYC) data vendor selection, creating a new layer of risk. But how can FIs, which unlike technology firms, are non-native to big data, assess the quality and accuracy of such complex information systems? Compounding this difficulty is the risk that some AML data vendors may deliberately conceal their coverage gaps through misleading marketing. As such, determining the quality of AML vendor data depends on asking the right questions. The main thing to remember when beginning this inquiry is the issue of timeliness. Eventually, all data goes stale[6].

While there are no publicly available metrics on AML-data decay, marketing analytics firm MarketingSherpa found that business-to-business data expires at a rate of 2.1 percent every month, or roughly 22.5 percent annually2. In a KYC context, MarketingSherpa’s findings mean that customer data points like address, phone number, email address and related business entities could quickly become irrelevant. Thus, the first question that FIs should ask their vendors is: How often do you validate your data? After this initial screen, FIs should focus on the following three issues to assess vendor suitability: the structure and variety of recordkeeping fields; the completion ratio for each field; and determining whether the vendor is a data originator or reseller. The following tips will offer compliance officers a step-by-step guide for assessing the quality of third-party AML data.


Unscrupulous data vendors will often exaggerate their value proposition, selling the size of their dataset, without telling enterprise buyers how often they purge expired records or how accurate the information is. Thus, these vendors’ datasets will keep growing by the terabyte, while the seller has no insight about how actionable those records really are.

With this in mind, it’s up to the data buyer to press prospective suppliers about the processes they use to keep their data current. Some key questions to ask are:

  • Are they just “pinging” servers to see if email domains and other digital identifiers exist? This is no longer valid, as many Internet service providers will silently drop these requests[7]
  • Do they have policies in place to delete inaccurate records?
  • Do they have any statistically reliable systems to verify timeliness?

If the account executives cannot confidently answer these questions, it’s best to look elsewhere for AML and KYC analytics. Best-in-class solutions will leverage machine learning to organize, clean, structure, and timestamp records at regular intervals.


This point is the most basic. Enterprise data buyers need to ask prospective vendors how they organize their data and what type of information they include in their data sets. As such, FIs should ask AML vendors the following:

  • What type of data do they offer?
  • What are the column headers of the data set?
  • What are the table-driven values of fields, or analytics derived from the calculation or analysis of the data set as a collective[8]

Any vendor worth their salt will be eager to share their “output file,” or data set format with a prospective customer. By requesting this document, FIs can qualify the depth of the information collected and the business logic guiding the organization of records. Additionally, the best AML data providers will have the ability to modify output documents in a way that aligns with the business needs of the customer.

Completion ratio for each field

Unfortunately, a thorough and thoughtful database structure offers no guarantees that those records will be complete, recent or accurate. Therefore, FIs need to ask their AML analytics vendors what their fill rates are for fields that are most pertinent to their compliance risks.

Is the vendor an originator or reseller?

Lastly, FIs need to determine their AML vendor’s underlying data lineage. Did they construct these datasets independently or did they buy it from another third party and repackage it as their own? The latter isn’t necessarily a bad thing, provided the vendor is aggregating data from a variety of high-integrity sources and has mechanisms in place to certify the quality, structure and completeness of information.

This analytics due diligence is paramount with the increasing regulatory emphasis on data lineage reporting, as highlighted by the NYDFS Final Rule.

Seeing CLEARly

The most glaring flaw in the big data economy, for AML purposes and otherwise, is that input systems inherently assume that user field entries are truthful and accurate. So, when the computer or sensor gathers data from an end user, potentially fictitious inputs are assumed to be faithful[9]. While the data ecosystem increasingly adopts sophisticated artificial intelligence (AI) and machine learning tools to address this problem, FIs can still benefit from an AML vendor that employs proven dataset cleansing techniques and logs information from the most trusted information feeds.

Thomson Reuters CLEAR for AML/KYC is an investigative public records tool engineered to address the specific challenges facing financial AML and KYC professionals. Not only does CLEAR offer complete transparency about its data lineage and source feeds, but it also provides real-time coverage for reverse-phone checks that include current name and address information to identify individual and business subscribers by phone number.

Additionally, CLEAR extends this real-time gateway functionality to arrest, incarceration records, motor vehicle registrations and credit reports. This coverage also encompasses Voice-Over IP phones and burner phones, which are more frequently linked to criminality. CLEAR updates its datasets daily and verifies the currency of records with statistically proven machine learning applications. In a compliance ecosystem, where bad AML data is draining billions from financial organizations, FIs need a reliable analytics vendor now more than ever. With CLEAR, FIs can mitigate the risks of bad data at the front-end and prosper through more sustainable compliance.

Thomson Reuters is not a consumer reporting agency and none of its services or the data contained therein constitute a ‘consumer report’ as such term is defined in the Federal Fair Credit Reporting Act (FCRA), 15 U.S.C. sec. 1681 et seq. The data provided to you may not be used as a factor in consumer debt collection decisioning, establishing a consumer’s eligibility for credit, insurance, employment, government benefits, or housing, or for any other purpose authorized under the FCRA. By accessing one of our services, you agree not to use the service or data for any purpose authorized under the FCRA or in relation to taking an adverse action relating to a consumer application.