Bot Filtering

Read Time: minutes

What is a Bot?
A bot is an automated program that performs a specific task or tasks.  There are many types of bots, both good and bad.

The most common forms of bots are spiders and crawlers.  While often used interchangeably, each has a slightly different task.  A spider will follow links from page to page, building a “web” of data, while a crawler may download pages from the internet, and may repeatedly perform actions such as searches on a site.

The Good, The Bad and The Ugly.
Bots account for nearly half of all internet traffic.  Not all bots are bad. GoogleBot scours the internet daily, indexing new and old pages for Google’s search engine.  Other bots may provide security checks or monitoring.

Good bots identify themselves as a bot when crawling the web, and ad servers such as AdButler can filter out these impressions.

A Bad Bot will often mimic or mask their identity to trick the site into thinking it’s a legitimate person.  Bad bots may impersonate a person, scrape your page for personal information, spam forums, or find security flaws.  A Bot simulates legitimate traffic with a varying degree of complexity.

At AdButler we consider three different classes of bots:

  • Good Bots – legitimate MRC compliant (eg. Googlebot, Bingbot, W3C bots)
  • Gray Bots – malware detection bots, security researchers (pretending to be legitimate users to trick malicious ads)
  • Bad Bots (Malicious Bots) – intentionally pretending to be legitimate users to cause fraudulent ad requests or similar purposes.

What is Bot filtering?

Good bots identify themselves in the User Agent string that is passed along to every request.  This allows ad servers to compare the ID to a list of known bots or parameters within the ID to filter these impressions and prevent discrepancies in your reporting.

Why is Bot filtering important?

Ad campaigns are generally still delivered based on CPM models.  When an advertiser is paying for every 1000 impressions, it becomes imperative to ensure those impressions are seen by legitimate people.

Known Bots Filtered by AdButler:

  • AdsBot-Google
  • Applebot
  • BingBot
  • Bing Preview Bot
  • Bloomberg Financial Market
  • BrandVerity
  • CloudFlare-AlwaysOnline
  • Coveo Bot
  • DuckDuckBot
  • facebookexternalhit
  • Github Bots
  • GomezAgent
  • Googlebot
  • Google Page Speed Insights
  • Majestic 12 Bot
  • MOAT Bot
  • PhantomJS
  • SimplePie
  • Yahoo! Slurp
  • Yandex
  • Any bot that identifies themselves with a descriptor that includes ‘bot’, ‘crawl’, or ‘spider’
  • We use many industry-standard lists and techniques such as the IAB bots and crawlers list as well as project HoneyPot to complement our own internal ad fraud detection.

I’m seeing a discrepancy between page views and impressions?
If you have exhausted your standard troubleshooting and you are still seeing a discrepancy, there is a chance your site is the victim of bad bots or bad actors that are not identifying themselves.  

If your site is the victim of bad bot activity, there are measures you can take to detect and prevent bot traffic.  You may wish to look at adopting robots.txt as a means of preventing bot traffic and you may wish to consider DDOS protection from companies such as CloudFlare.

What is being done to combat Ad Fraud?

More and more advertisers and publishers are adopting a CPV (cost per view) model, allowing advertisers to only pay for legitimate views seen by users. For other advertisers, who are more concerned about sales, retention and less focused on brand awareness, there’s a larger push towards downstream conversion tracking that traditional click and impression tracking.

 If you haven’t enabled Viewability in your AdButler account, reach out to your account manager today to discuss our enhanced analytics package.

3rd Party Ad Fraud Platforms

There are many services out there that help publishers and advertisers ensure their statistics are as accurate as possible. We highly recommend the following services.

Robert Janes