To WWW or not to WWW that is the duplicate content question!

Google is really hip on removing duplicate content from their search results. In the old days we would submit our sites in every way possible to get more results in searches i.e.:

  • http://www.craigharris.org/
  • http://craigharris.org/
  • http://66.281.47.79
  • http://www.craigharris.org/index.html
  • http://craigharris.org/index.html

You get the idea. These are all the same page. Google wants you to pick just one. Only one is going to get into the search results and you will be penalized if Google finds more than one! So are you going to go for the traditional www.craigharris.org or the short craigharris.org? Does it make a difference? It makes no difference to Google. I like the shorter craigharris.org but most of the links I’ve created over the years have been to www.craigharris.org so I am use the www format on all my sites.

Now that I’ve chosen to WWW what can I do to make Google happy? I don’t want to break any non-WWW links but I can’t have them work either since it will upset Google. If you do a Google search on the subject you will get a lot of answers that tell you to add some code to your .htaccess file that will rewrite any requests for craigharris.org to www.craigharris.org and at the same time give a 301 “Permanently Moved” code. This does work and does make Google happy BUT… using .htaccess to do this puts a stain on your server, especially if you run a high traffic site. Also, if you are like me you already have stuff in your .htaccess file and you really don’t want to make it any more complicated than it already is. My solution is to modify the Apache httpd.conf file.

Modifying the core Apache httpd.conf file can only be done if you have root access to your server. Most people are on hosted accounts which explains all the .htaccess solutions. If you can’t modify the Apache files yourself ask your system administrators if they will do it for you. Making the change is really simple.

On my server all the Virtual Hosts are listed in the httpd-vhosts.conf file. Older servers may still have them in the main httpd.conf file. Edit this file with your favorite text editor, I like Nano. My old file looked like this:

<VirtualHost *:80>
ServerAdmin webmaster@nospam.org
DocumentRoot “/home/craigharris/public_html”
ServerName www.craigharris.org
ServerAlias craigharris.org *.craigharris.org
ScriptAlias /cgi-bin/ /home/charris/cgi-bin/
ErrorLog “/home/craigharris/logs/error_log”
TransferLog “/home/craigharris/logs/transfer_log”
CustomLog “/home/craigharris/logs/referer” common
</VirtualHost>

My new modified file looks like this:

<VirtualHost *:80>
ServerAdmin webmaster@nospam.org
DocumentRoot “/home/craigharris/public_html”
ServerName www.craigharris.org
ScriptAlias /cgi-bin/ /home/charris/cgi-bin/
ErrorLog “/home/craigharris/logs/error_log”
TransferLog “/home/craigharris/logs/transfer_log”
CustomLog “/home/craigharris/logs/referer” common
</VirtualHost>
<VirtualHost *:80>
ServerName craigharris.org
ServerAlias *.craigharris.org
Redirect permanent / http://www.craigharris.org/
</VirtualHost>

The parts in bold are the modified bits. I removed the “ServerAlias” line from the original VirtualHost block and added a new second VirtualHost block. The result is that all requests for craigharris.org go to www.craigharris.org even made up sub-domains like xyz.craigharris.org will resolve to www.craigharris.org.

After you finish editing your httpd.conf file remember to restart Apache: “apachectl graceful” is the typical command.

Using the “Redirect permanent” line in Apache’s httpd.conf file uses almost zero resources as it gets compiled into the Apache server when you restart it.

If you want to non-WWW rather than WWW it shouldn’t be too hard to flip the bits around.

When you are done make sure your sitemap matches the format you have chosen and set it up to match in Google Webmaster Tools, Bing, Yahoo, etc.

Read my follow up to this articles to this one here: http://www.craigharris.org/2011/03/30/avoid-duplicate-content-from-index-html-or-index-php/

And here:http://www.craigharris.org/2011/03/29/more-to-www-or-not-to-www-that-is-the-duplicate-content-question/

This entry was posted in Servers, Websites and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


(required)*