White paper

Why using search engines for financial risk assessment is risky

Popular search engines are not reliable tools for meaningful due diligence

Financial risk assessments in the era of big data might seem easy. After all, so much information is available from so many sources that responsible due diligence is only a few keystrokes away. Want to vet a new customer or find out if a business seeking a loan has a good reputation?

Just type their name into a search bar, and there it is — all the information one could possibly want.

Or not.

The truth is that popular search engines are unreliable tools for meaningful due diligence, particularly when identifying suspicious financial activity, fraud, money laundering, criminal involvement, or other forms of legal malfeasance. Organizations relying on mainstream search engines for fraud protection and regulatory compliance may be unwittingly exposing themselves to dangerously high levels of risk. Why? Because, in addition to its technical limitations, one of the worst things a basic internet search can do to a financial institution’s stakeholders is fool them into believing they have done their due diligence when they haven’t.

“Google is a great basic search engine, but it’s a horrible risk-assessment tool,” says Eric Gerhard, manager of product development for Thomson Reuters. “It’s simply not designed to perform the functions that compliance officers and BSA/AML professionals need.”

Popular search engines don’t:

  • Aggregate international corporate registries and ultimate beneficial ownership
  • Authenticate identity documents and uncover synthetic identities
  • Extrapolate intelligently to establish true identities
  • Have access to curated proprietary databases
  • Keep customer data secure
  • Look specifically for criminally related activity
  • Locate suspicious personal and business associates
  • Perform statistical modeling of any kind
  • Provide a defensible audit trail
  • Provide a comprehensive list of sanctions and adverse media
  • Search deeply into obscure or hard-to-access databases
  • Score, rank, or flag potential risk factors 
  • Save time that could be used on other duties

Often, the most revealing aspect of a search is what’s not there that should be there

Not all search algorithms are the same

To understand why conventional search engines are such inadequate tools for uncovering critical financial risk factors — and why targeted software solutions developed specifically for fraud prevention are so much more effective — it helps to understand how these search engines work — and don’t.

Every time a Google search is conducted, for example, Google’s bots — called spiders or crawlers — scour the internet for relevant keywords and index what they find. Google’s search algorithm identifies the webpages most people use for most purposes, then ranks them according to popularity. For any given search, the pages most likely to contain the information one is looking for — based on the pages others conducting similar searches find helpful — are the pages listed first.

The reason Google works this way isn’t just to provide users with reliable results; it’s to connect advertisers with people conducting searches related to whatever product or service they’re selling. Google’s PageRank algorithm searches for the frequency of keywords and places a higher value on sites with an established history, high traffic, or an extensive network of external links. Popular search engines are very good at guessing which webpages are most likely to provide information the average person is looking for. However, they’re not so good at locating nuggets of information hidden in infrequently searched databases — civil court records, for instance — or finding information that isn’t deliberately indexed or optimized to attract their algorithm’s attention.

More isn’t always better

More is better in many ways, but the great paradox of big data is that more information does not necessarily mean better information — it is quite the opposite. The ease with which certain types of information can be found is often inversely proportional to the amount of information available. When the Google search engine was introduced in 1998, there were about 2.4 million websites on the internet. Now, there are more than 1.5 billion websites worldwide containing 33 zettabytes of data or 33 trillion gigabytes. By 2025, the International Data Corporation (IDC) estimates that number will explode to 175 zettabytes and grow exponentially.

With so much more data to sift through, it’s more important than ever to know where to look for information, what to look for, and how to determine the quality and trustworthiness of the info one uncovers. It’s also much easier for criminals to hide content in obscure corners of the internet and for investigators to miss critical facts buried deep inside databases where conventional spider bots do not roam.

“Often, the most revealing aspect of a search is what’s not there that should be there,” says Jim Richards, founder of RegTech Consulting and the former global head of financial crimes risk management for Wells Fargo. Legitimate businesses should have licenses and certifications, and professionals should have records of activity and alliances in their chosen field. Diligent investigators using conventional search engines can unearth these and other types of pertinent information — but, says Richards, “While Google can be a useful tool for compliance, it shouldn’t be the only tool.”

Haystacks versus needles

Indeed, there is a profound difference between a random internet search and a search conducted using software specifically designed to locate information in court records and identify suspicious patterns of financial activity. If data were haystacks, a basic search engine would collect hundreds of them and invite you, via links, to search for needles. Risk-assessment software is more like a giant magnet that pulls the needles up and leaves the hay behind.

The universe of information people search is also quite small compared to the entirety of information available on the internet. Most analysts estimate Google searches at no more than 1% or 2% of the internet’s total webpages. The majority of the internet exists on the so-called deep web, behind paywalls, in proprietary databases, or in databases that do not contain much consumer-friendly information. Granted, most of this information is inaccessible by design, but much of it is also unavailable to searchers who don’t know it’s there and wouldn’t know how to access it even if they did.

More problematic is the way search engines present their results. In most cases, searches yield little more than a list of links that may or may not lead an investigator to the information they seek. To find out, the investigator must individually explore each link and hope they stumble on the information they are looking for. Most searches yield many thousands — if not millions — of results, and examining them with anything close to the thoroughness required for responsible due diligence is an extraordinarily tedious and time-consuming task. It’s also a waste of money and much of the information gleaned from such searches is insufficient at best — and misleading or false at worst.

A better risk-assessment tool

Because popular search engines are consumer-oriented, advertising-driven tools, they are in no way designed to find the kinds of information financial investigators, compliance officers, or BSA/AML professionals need. Suppose a new customer or corporate client has been involved in civil litigation in several states and has received fines and sanctions but no criminal charges or convictions. Unless news reports covered the matters, the likelihood they’d appear in a standard internet search is very low. To locate the information, an investigator would have to search individual court records in each state, which is possible but time consuming.

By contrast, a software tool using intelligent analytics can be programmed to search such databases automatically, flag any activity involving the person or entity being searched, and evaluate or score the level of risk they represent. It might also have access to proprietary databases that contain information such as up-to-date property records, liens, judgments, defaults, bankruptcies, and other pertinent financial data. 

Furthermore, if someone is searching for information on a single individual, a well-designed algorithm can match the searched name with an identity profile, distinguishing it from the hundreds or thousands of other people in the world with the same name.

“Popular search engines don’t have access to the kinds of curated databases that private vendors have,” says Reg-Tech’s Jim Richards. “Those databases have come a long way in the past several years, and can save investigators a lot of time.”

Identifying suspicious activity

Financial investigators are often looking for evidence of fraud and money laundering as part of BSA/AML compliance obligations. Money laundering, in particular, doesn’t happen in a vacuum; it requires networks of people working together to filter money through banks in seemingly legitimate ways. Deception is part of the game.

Popular search engines are also completely unaware of BSA/AML regulatory requirements, so they provide little or no data security and aren’t designed to record a reliable data trail for auditors, regulators, or managers. Beyond “bookmarking,” there is no way to record an investigative trail should regulators want to investigate a suspicious activity report (SAR). 

Conversely, an intelligent fraud-prevention tool will keep customer data secure, create detailed logs, and have built-in reporting capabilities, allowing users to access and organize data in any number of ways. It will also include ways to score levels of risk and quickly clear the legitimate entities, freeing up time and resources.


Finance and compliance professionals need to remember that while Google and other popular search engines can be helpful tools, they are built for public use, not for professional purposes. They are certainly not for identifying information and behavior patterns specific to fraud, money laundering, or criminal activity. Anyone who uses them for fraud detection and regulatory compliance may not only be wasting time and money but also exposing their institution to multiple levels of risk.

It’s true that today’s internet contains a vast and ever-expanding wealth of publicly available data, but financial investigators need better filters and search mechanisms to make that data more useful. Advances in machine learning, deep learning, and artificial intelligence have made it possible to create sophisticated new data-mining algorithms that perform specific tasks such as fraud detection extraordinarily well. With the right tool, one person can do more reliable, higher-quality work in less time — work that financial institutions can be confident is thorough, accurate, defensible, and protected.

Thomson Reuters is not a consumer reporting agency and none of its services or the data contained therein constitute a “consumer report” as such term is defined in the Federal Fair Credit Reporting Act (FCRA), 15 U.S.C. sec. 1681 et seq. The data provided to you may not be used as a factor in consumer debt collection decisioning; establishing a consumer’s eligibility for credit, insurance, employment, government benefits, or housing; or for any other purpose authorized under the FCRA. By accessing one of our services, you agree not to use the service or data for any purpose authorized under the FCRA or in relation to taking an adverse action relating to a consumer application.

Contact us today



Trust best-in-class tools

Access the most accurate and relevant information with the latest technologies to mitigate risk and minimize fraud