2005: Way Beyond "Fully Rendered Ads"
- From <a href="http://publications.mediapost.com/index.cfm?fuseaction=Articles.showArticleHomePage&art_aid=32882" target="_blank">Gavin O'Malley</a>:
<blockquote>YAHOO! HAS RELEASED A NEW ad measurement platform that only counts and reports impressions after ads have been displayed on a user's Web page, the company is expected to announce today. The new platform is designed to comply with Interactive Advertising Bureau standards released in December that require ads to be "fully rendered" on the receiving browser. Previously, ads were considered "served" once they had been sent by clients, regardless of whether the ads were stopped by a pop-up blocker or a consumer left the page before a video ad fully loaded. --8/8/05</blockquote>
It is also worth noting here that there is no way to determine the origin of a hitbot. Google will be in for big trouble once it figures this out, if it hasn't already. Already Google is banning Adsense publishers accusing them of using automated clicking mechanisms--essentially accusing webmasters without proof. Google's response (when these webmasters complain) is that their algorithm is proprietary and they aren't able to disclose any details. In my opinion this is a stall tactic--Google is not saying anything because they don't know what to do about this problem! The cat is already out of the bag. But you can't stall this forever--and while Google might be catching "real" cheaters, it will be condemning many honest webmasters in the process, and this will hurt Google's reputation greatly, unless they come up with a solution quickly. I believe the solution will involve mandatory use of Urchin for *any* website wanting to buy/sell traffic with Adsense/Adwords. This is the only way to detect and subsequently block/ignore non-human behavior on such a massive scale--and this would be one step closer to tracking sales. The benefits of additional metrics will probably outweigh the cost of migrating thousands of websites. This could be a painful process but in the end I think we will all come out ahead of the thieves.
Now back to why the origin of hitbots cannot be determined. Many "hitbots" are surfing your website via proxy servers, very often anonymous. It's a long, slow process just to determining which surfers are using proxy servers, or anonymous proxy servers. Even if this information was useful for catching cheaters, this would slow down websites to the point where they wouldn't be useful. Just for kicks, let's just pretend you had a list of all known anonymous proxy servers (suspected hitbots) in the world at any given moment and you were able to reference this list against each incoming visitor to your website--what would that tell you? It wouldn't tell you much but here's one assumption you could make. Let us pretend that one of our ads on website XYZ is resulting in a high volume of (incoming) anonymous proxy traffic. Can we now assume that the website running this ad is cheating us? The answer is firmly--NO!
First of all, this proxy traffic could be coming from another website entirely, two or more sites removed. The fact that robots are spidering links, clicking our ad in the process tells us nothing of the origin of the robots or the motive for deployment. Even while observing where the clicks are going and referencing that information with where the clicks came from, we still can't determine who stands to profit from this assumed/non-assumed non-human traffic behavior--and even if we could, we can't explicitly prove there was a motive.
To make matters even worse, we can generally assume that hitbots are often the indirect result of spyware--essentially "trojans". There could be a hitbot quietly running on your machine in the background (remotely and secretly commanded by an anonymous hacker) right now and you might not even know about it. Sounds like science fiction? No--this has been the reality for years. So the bottom line here is this: Even if you could manage to detect an anonymous proxy, trace it back to an ISP, and then call up the person on the phone--just bear with me--you would probably get to speak with your average Joe surfer--and then even if you were really lucky and (assuming that this hitbot traffic had criminal origins) you were able to determine which spyware application was generating the hitbot traffic, you would still need to take additional measures (beyond the scope of this document) to determine who was commanding the hitbot, and even (now we are really stretching) if we were able to catch the person that was telling these hitbots what to do (they could cover their tracks with a long chain of anonymous proxy servers chained together using encrypted passwords) and even if this hacker were to admit what he had done, that would likely only serve to determine the innocence of our original webmaster that was simply running our ad--in this scenario the victim of circumstance.
So you can see--there's no way to point the finger here. Google accusing webmasters of clicking their own links is one thing (tracking something as basic as the IP of the webmaster that signed up to Adsense) and also wise, but it is quite another matter to make assumptions about all the "non-human" traffic on the internet--the hitbots, the crawlers, and on and on. It would be difficult to estimate the percentage of all internet traffic that is non-human. No doubt, many <a href="http://www.robotstxt.org/wc/active/html/type.html">documented robots</a> can be blocked with robots.txt, but there's a very good likelihood that there are thousands more undocumented robots crawling the web unrestricted. Hitbots are a reality--and that has been the case for years now. We need to learn to live with hitbots/robots and not spend so much time counting "hits", obsessing over them, not just because they are difficult to detect, but because they are useful.
What we need to do instead is look at the bigger picture. If I am Google can I get a big, global picture of internet traffic by looking at things from the macro level? The answer is yes because the Adsense code is everywhere--imagine how much data can be collected! There will need to be humans looking over this data carefully, looking for patterns that don't appear to be human, paying attention to language/geography specific information--looking to classify traffic for everyone's benefit. At the more micro level we can simply look at sales information and determine which keywords or websites have higher quality, better converting traffic.
Back to advertising. Paying for public impressions/clicks has been obsolete for at least five years. Remember LinkExchange? Banner swap programs failed miserably because of cheating and abuse. This is also why the original "PPC" Goto/Overture portal failed. Perhaps it is not so obvious--Google Adwords is not simply PPC. Google is measuring *behavior*. The key to "detecting humanity" is observing surfing behavior--not something you can do simply counting impressions. One of the early complaints about Overture--a competitor can simply click my ad and I lose money. Google seems to have solved this by awarding "clicked ads" superior placement. If you want to click my ad for a malicious reason, sure I will be paying for it, but you will be helping me to increase my CTR for higher placement in search results.
Should the most clicked ads get the top listings? Does this result in the highest click-output, in the short term at least? Would this undermine Google's integrity? That's up for debate, and explains why Google's ranking algorithm has become so complex. Another issue here is: If a surfer finds what he/she is looking for quickly, then you can't serve as many ads. By way of this shortsighted digression you could even make a case for showing search results that net the highest Adsense payout. Again, Google needs to be seen as the most useful search resource on the internet and so far that seems to be the case--although they have a real challenge ahead. Adsense/Adwords--these are essentially affiliate programs. Running affiliate programs of this magnitude will require *many* new employees--and where will they come from? You don't learn this stuff in college. Catching PPC cheaters will not be easy--these hackers are usually very intelligent, hungry, and from 3rd-world countries where cheating the system is a way of life--a way to survive.
On the other hand, Google can afford to be a little lazy--after all, they are in the lead and more clicks equals higher profits for Google, even if Adwords/Adsense ROI is low or nonexistent! Buying into Adwords/Adsense could get expensive, but without them online advertising would be just as, if not more expensive, and much more time consuming. Because of Adwords/Adsense scale of reach--both in terms of pageviews (assisted by Google's powerful context-sensitive "neighborhood" based indexing) and universal recognition among advertisers, there won't be much competition.
Because of the rarity and usefulness of Google's Adwords/Adsense technology, Google will be able to dictate the rules of "webmastering" to us with its enormous TOS. You can expect it to get longer and more restrictive. So far I support Google's stance on most issues in terms of content restrictions and I believe that this will significantly contribute to the "cleaning up" of the internet (ex: penalizing sites using vulgar language, sites linking to "bad neighborhoods", disreputable affiliate programs, etc.)