Loading ...
Sorry, an error occurred while loading the content.

Google And Duplicate Content

Expand Messages
  • Richard Lowe
    Copyright (C) Richard Lowe Jr. and Claudia Arevalo-Lowe, 1999-2002. Permission is granted to reprint the following article as long as no changes are made and
    Message 1 of 1 , Aug 1, 2002
      Copyright (C) Richard Lowe Jr. and Claudia Arevalo-Lowe, 1999-2002.
      Permission is granted to reprint the following article as long as
      no changes are made and the byline, copyright information, and
      the resource box is included. Please let me know if you use this
      article by sending an email to mailto:articles@...

      Article Title: Google And Duplicate Content
      Author: Richard Lowe, Jr.
      Contact Author: mailto:articles@...
      Publishing Guidelines: May be freely published w/bylines
      Web Address: http://www.internet-tips.net
      Autoresponder: mailto:article-339@...?subject=send

      I've been following the discussion about Google and mirrored
      information for some time. It is "common knowledge" that Google
      penalizes page rank when it determines that content is
      duplicated somewhere else. In fact, I've read many experts
      stating that there should be no duplicate domain names and no
      duplicate content anywhere.

      On the face of it the arguments appear to be sound. Google
      obviously has several billion pages in it's database and could,
      it appears, easily determine if content is duplicated. It also
      seems, again on the face of it, that it's reasonable to check
      for duplicate content, as this is the "mark of a spammer" and
      not necessary on the web with hyperlinking available. At least,
      this is the common wisdom.

      However, sometimes what seems reasonable and possible is not:
      not by a long shot.

      Let's begin with the technical side of things. You've got
      domain x and domain y with exactly the same content. How on
      earth would Google be able to figure that out? Let's say Google
      had 3 billion pages in it's database. To compare every page to
      every page would be an enormous task - quadrillions of

      Now, if site x had page "page1" which linked to site y which
      also had "page1", then it would be possible for Google to
      determine the duplicate content. Conceivably, it could check
      this out.

      Not only is the task enormous, but the benefit is so tiny as to
      be insignificant. Duplicate content does not imply in any way
      shape or form spamming. In actual fact, a duplicate site is
      generally going to lower page rank of BOTH sites. Instead of
      having 100 links to one site, there will presumably be 50 links
      to one and 50 to another. This would tend (all things being
      equal) to lower the page ranking of both sites. So Google gains
      nothing by this incredible expenditure of resources.

      There are several reasons for duplicate content which have
      nothing to do with spamming. Sometimes the content is actually
      duplicated, and sometimes it's just that there are several
      different domains (at least the www and non-www versions) for
      the same website

      Mirroring a site for load balancing - This is very common. The
      purpose is to split up the traffic between two copies of the

      Mirroring for region - Sometimes site mirroring is done simply
      to make it more efficient on the internet backbone itself. You
      might put an identical copy of a site in Europe, for example,
      to reduce traffic across the Atlantic, which should make it
      faster in European countries.

      Viral marketing - It's extremely common to allow other sites to
      republish articles in return for a link.

      Different domain names - Sometimes a site might be referenced
      on many different domain names. You might want to allow the
      .com, .net and .org versions of the name to all work the same,
      you might allow for common misspellings or you might cover
      different keywords (sewing-tips and sewing-secrets are examples
      of possible combinations).

      Different domain names for different markets - you might also
      want to reference your site by different names in order to
      target different markets. You could, for example, have a site
      about search engine optimization and want to target both SEO
      and web designers. Thus domain names like seo.com and
      webdesign.com would make sense.

      www - Any good webmaster knows his or her site needs to be
      referenced with and without the www.

      Okay, so what's the smart thing to do? Well, it is possible
      that search engines do compare a limited number of pages to
      check for duplication. They could certainly check if someone
      reported something, and they might check directly linked pages
      (although this is still a heck of a lot of overhead for very
      little benefit).

      Of course, Google and the other search engines can account for
      a hefty percentage of the traffic received by a site. In fact,
      sometimes the number can exceed 70 percent. So it's wise to
      spend some time ensuring that you are totally clean when it
      comes to search engine optimization. In other words, a
      technician from any search engine should be able to examine
      your site down to it's smallest detail and find no evidence of
      any kind of search engine spamming (attempting to get higher
      rankings by unethical means). This is absolutely critical to a
      site's survival for the long term.

      Keeping that in mind, here's what I tend to do.

      Multiple domains - Using multiple domains to the same site has
      a tremendous number of advantages. Thus, I tend to follow the
      advice given by others: take advantage of permanent redirection.
      In other words, set up a redirection (a 301 status code) which
      simply tells the browser "this page has moved, proceed to this
      page, and the move is permanent. This tells the spider about the
      redirection with no possibility of misunderstanding, yet allows
      for the multiple domains.

      Republished articles - I allow others to republish many of my
      articles, and at this time I have records of over 10,000 of
      them all over the internet on thousands of web sites. This is
      not a problem, as these articles are sent in text format. The
      webmaster must then drop this text into his site, which requires
      some reformatting and shuffling around. Thus, the finished
      articles may have the same text but the formatting is very, very
      different. This is a highly respected method of gaining a large
      number of incoming links: I give you something (an article,
      i.e., content) and you give me something (a link back to my

      Mirroring - I haven't needed to do this yet, so I have no advice
      as to what to do if a site requires actual, physical multiple
      versions of itself. I would tend to just do it overtly (out in
      the open) and not worry about it.

      To see a list of article available for reprint, you can send an
      email to:
      or visit

      NOTE: The following information must be included if you reprint
      this article:
      Richard Lowe Jr. is the webmaster of Internet Tips And Secrets
      at http://www.internet-tips.net - Visit our website any time to
      read over 1,000 complete FREE articles about how to improve your
      internet profits, enjoyment and knowledge.
    Your message has been successfully submitted and would be delivered to recipients shortly.