Bot Traffic: Are Your Visitors Using You?
March 2, 2015 George Weiner
“The monthly users are up this year. We did it!” Many nonprofit managers rely on web data to inform and deliver their services, as well as tell the story of organizational impact. Nonprofit websites are used for impact in multiple ways — to increase resource downloads, new registrations/sign-ups, and of course, monthly users.
What if it turned out that these users were actually bots? Would the “impact” still feel the same? In 2014, bot traffic reached a new all time high of 56% of all Internet traffic (Incapsula Bot Traffic Report, 2014 bit.ly/NPT-botreport). Most of this traffic is filtered out by analytics platforms like Google Analytics, but there is still a certain percentage of bots that fool analytics. Over the coming years, we will need to share lanes on the information highway with an increasing numbers of bots.
“Bot” is short for robot. It is a program agent for a user or another program, or simulates a human activity. On the Internet, bots are the programs, also called spiders or crawlers, that access Web sites and gather their content for search engine indexes.
Just like people, not all bots are good drivers. The 2014 Incapsula Report revealed that 29% of Internet traffic was driven by malicious bots that steal listed emails, SPAM your website, or click on ads. Even worse, some bots inject corrupt code and disrupt your site. This percentage has been on the rise for the past three years and shows no signs of slowing down.
Select Bot vs. Human Showdown
Humans have been facing off against the machines long before The Terminator. Sadly, humans are accumulating marks in the loss column.
• Mechanical Turk — 18th Century: First human posing as a computer to beat a human in chess. It’s now the name for Amazon’s marketplace for Human Intelligence Tasks (HITS) done by low-cost human labor. There is an application programming interface (API) for this service available that bots can use to make humans do these tasks (bit.ly/NPT-captchas).
• Chess: Gary Kasparov vs. Deep Blue – Feb. 10, 1996: First computer to beat an undefeated, human grandmaster in a five-match chess series. Gary Kasparov won the first match against Deep Blue, but ended up losing the second publicized match after losing his nerve in the second game of the match (bit.ly/NPT-deepblue).
• “Jeopardy!” — Ken Jennings & Brad Rutter vs. Watson – Feb. 15, 2011: IBM’s supercomputer beat the top human “Jeopardy!” players in history over two matches (bit.ly/NPT-watson).
• Turing Test Beaten — Chat bot convinces human panel it is human, July, 2014: In a 1950 paper, Alan Turing developed the Turing Test to create a standard method to tell if a machine could be considered intelligent. According to the test, a machine passes if it can convince human interrogators that it was human through a conversation.
For the first time in its 65-year history, a chat bot in 2014 passed this test, convincing a panel of humans that it is a human. The chat bot “Eugene” emulated a 13-year-old boy, so not exactly the pinnacle of intellectual prowess, but follows Turing’s suggestion that a young thinking bot would be easier to then educate (bit.ly/NPT-turing). Smart guy…
Bot Drivers Ed
Before entering into the bot highway, it is important to know the name of thine enemy:
• SPAM — Something Posing As Messaging/mail. Unwanted online messages sent en masse or posted to websites. Broader definitions include SPAM link building and other activities to game search results (bit.ly/NPT-googlespam).
• SPIM — SPAM over Instant Messaging.
• SPAT — Something Posing As Traffic.
There are four types of bad bots behind a lot this activity:
• Scrapers: Bots that crawl and copy select website content such as email addresses or content;
• Hacking tools: Bots that try to inject malicious code, take control or disrupt websites/servers;
• Spammers: Bots that post SPAM content on websites through comments and open forms. Viagra anybody?; and,
• Impersonators: Bots that mimic human users for the purpose of information gathering, ad fraud, web attacks or service disruption.
Paying for Bots
Google Analytics this past July quietly added an auto spam filter option inside every account that is based on the IAB International Spiders & Bots list (bit.ly/NPT-spamfilter). The issue of bot traffic impacting advertising tracking is something that the IAB and large ad networks should be taking very seriously as it threatens to devalue advertiser trust in the entire medium.
According to reports in The Wall Street Journal and other reputable online watchdogs, about one-third of online advertising traffic is fraudulent (bit.ly/NPT-WSJadfraud).
Google Analytics has started bringing this advanced filtering to the larger market, though large ad networks have been slow to pass any new standards of tracking in response. Dr. Augstine Fou, an expert in ad fraud, warned that “Nonprofit advertisers are some of the most vulnerable to ad fraud as they frequently receive remnant ad inventory with higher levels of fraud.”
In the coming years, nonprofits need to “beware of vague reporting of where ads are being served, and focus on clear user actions taken on their sites instead of impressions and clicks. Always test and verify traffic, remember that bots don’t donate or volunteer, that’s what you should be measuring against ad spends.” (Dr. Augustine Fou, Group Chief Digital Officer; Ad Fraud Researcher)
Google has a project that deals with documenting fake display ad click bots that are defrauding advertisers on Spider.io (http://www.spider.io/blog/). They have identified bots such as TDSS rootkits that run in the background of infected machines and then browse hundreds of sites, clicking on select display ads (watch one of these bots in action: bit.ly/NPT-badbot)
Another notable impersonator bot in 2014 was created by a Ukrainian company called Semalt. Traffic from the source “Semalt.com” started showing up in analytics reports and confusing analysts because of the sudden volume of traffic that looked human. Semalt later responded by creating a tool that lets websites unlist themselves from their crawler, which is just trying to gather SEO data (bit.ly/NPT-semalt).
Driving blindly into the future
The percentage of more devious bots is going to increase during the next five years because of the economic engine behind them. As long as there is a penny to be made online through the development of these bots they will continue to grow and become more profitable as the cost of cloud computing drops.
What will be important is the ability to defend and detect bots, especially when the website behavior is being analyzed to make decisions. The nonprofit sector will continue to depend on the data from web traffic to increase and quantify impact to funders. The need for a web data analyst and advanced preventive web measures will become standard once the industry wakes up to the level of fraudulent traffic.
There is that switch that can be turned on in Google Analytics to filter out most SPAM traffic, but doing so might reduce the reported traffic numbers. This reduction could hurt your reputation with funders, supporters, or make you look smaller than competitors that might not filter their traffic. The switch is currently turned off for most accounts by default.
The question is will we swallow our medicine now or continue a willful ignorance that there might be thousands of bots confusing our data? NPT
George Weiner is chief whaler at online communication, technology and fundraising firm WholeWale in New York City. Tweet: @WholeWhale