Are Those Robots or People Clicking on Your Site?

Posted by

The robots have landed.

It should come as no surprise to you that some of the traffic to your websites and mobile apps are not a real human being. There are spiders out there crawling websites to index them, there are malicious hackers poking and prodding away to find a moment of vulnerability and, of course, there are technologies in place to track people and their usage. It sounds a little too George Orwellian for some, but it’s a functional part of the digitization of media.

The bigger question: is all of this non-human traffic getting to be a little too much?

Last week, Tom Foremski had a post over at ZDNet titled, Report: 51% of web site traffic is ‘non-human’ and mostly malicious. The title of the news piece tells the entire story. Before looking at the two major issues that need to be thought about moving forward, here is how the website traffic is broken down (according to a study done by Incapsula – a company that provides cloud-based security for websites. The study  is based on a sample of 1000 websites/clients of Incapsula):

  • 5% is hacking tools searching for an unpatched or new vulnerability in a web site.
  • 5% is scrapers.
  • 2% is automated comment spammers.
  • 19% is from “spies” collecting competitive intelligence.
  • 20% is from search engines – which is non-human traffic but benign.
  • 49% is from people browsing the Internet.

The high cost of living.

Who pays for this traffic? You do. Along with the server and usage costs, all of this non-human traffic is also affecting overall performance as well. The more people and technology sucking bandwidth, the slower the response time is of your servers. If over half of this traffic isn’t even real people, just imagine what your bandwidth and server costs could look like. Above and beyond that, what is likelihood of this non-human traffic decreasing? Marketing is become that much more sophisticated and technologically inclined, so these types of pings and pokes are clearly going to increase over the next short (and long) while. Once this gets to the point where more website owners are aware of this intrusion, the government will step in and legislate this. Nobody wants government intervention here, but this is another prime case of technology and new media companies stepping over the line by the sheer act of overdoing it.

The third-party problem.

It’s one thing for websites to be tracking their usage and allowing non-human crawlers from search engines to index their websites in order to rank higher. But – if you look at the list above – you’ll note that scrapers, automated comment spammers and spies are all third-parties trying to leverage the website for its own, personal marketing initiatives. This makes up over twenty-five percent of all traffic. This allowance of third-parties to infiltrate and leverage website traffic is only a small fraction of the issue. What about the other third-parties that the website has partnered with and allows them access to the website and their users? It’s probably unimaginable to think about what that combined piece of website traffic may look like. We have to remember, that most consumers simply don’t understand the terms and conditions of a website and have little knowledge and understanding into all of this tracking that is happening. The number must be nothing short of astounding.

It’s time for fair play.

If we, as the New Media collective, do not start self-governing ourselves, you can rest assured that public outcry will increase and the government will step in. What information are we keeping and what information are we tracking and do we need it all? Understandably, it will be next-to-impossible to stop the malicious spies and infiltrators that are leveraging this information for spam (and knowing that this clocks in at over twenty-five of all website traffic, it should come as a rude awakening for publishers), but the crawling and sniffing that we can control, should be looked at with a discerning eye. The use of robots to crawl the Internet is nothing new. The use of robots to crawl the Internet to grab as much information for possible in a malicious way is nothing new. The ability for website owners to get smarter and ensure that they are protecting their consumers (from both the robots and third-party deals) is nothing new, either… but the numbers are getting out of control and they’re only going to increase.

It’s time to act. What are we going to do about it? 

The above posting is my twice-monthly column for The Huffington Post called, Media Hacker. I cross-post it here with all the links and tags for your reading pleasure, but you can check out the original version online here:


  1. Mitch glad you are stepping up to discuss this issue. Like in most things we just go to the extremes until we hit the wall and focus on the next thing. However now we are talking about responsibility with our clients data. We have asked for it they gave and now since we give it the receiver they must take responsibility for the information. If not handled with care the recipient should face consequences.
    Now is the time to start talking with your clients and making sure they understand your processes and procedures that are taken to secure the data given to your organization.
    Thanks for taking the lead on bringing awareness to this next challenge.

  2. Very interesting information. Am now curious about the trends over time and the effectiveness vs effort to detect and combat it. How does one determine that ‘traffic’ is malicious until it ‘does’ something? Very difficult question…
    Will explore the linked resources, thanks for citing sources!

Comments are closed.