Crawler

Fix Canonical Issues (www vs non www) and More on Apache server

Fix the  www vs non-www Canonical Issue for Your Domain

And as a bonus, redirect the index page to the root and add a trailing slash to folder names, or redirect IP address to domain.

Even redirect urls of parked domains to corresponding urls on a single main domain.

NB: This only works on an Apache server with the mod_rewrite module enabled and where you are allowed to make or modify an .htaccess file.

It may not work properly if Frontpage Extensions are enabled. However see here, it may work: 

http://www.geekswhoknow.com/articles/how-to-make-mod-rewrite-work-with-frontpage-extensions.php 

 

First check if you already have an .htaccess file. Since that’s typically a hidden file, you have to use whatever commands your FTP program has to display hidden files (or in the  file manager in the control panel of yoour  website).

If you already have an .htaccess file, great, download it and edit it in a plain text editor like Notepad.

Otherwise create a new text document in a plain text editor like Notepad and at the end save it with the name .htaccess.

Keep these directives together in your .htaccess file. Replace example.com with your own domain, and  index.html  with your own index page (which could be whatever your homepage actually is, e.g. home.html or index.php, etc):

 

RewriteEngine on
RewriteBase /
### re-direct index.html to root / ###
RewriteCond %{THE_REQUEST} ^.*\/index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ /$1 [R=301,L] ### re-direct index.htm to root / ### RewriteCond %{THE_REQUEST} ^.*\/index\.htm\ HTTP/
RewriteRule ^(.*)index\.htm$ /$1 [R=301,L] ### re-direct index.php to root / ###
RewriteCond %{THE_REQUEST} ^.*\/index\.php\ HTTP/
RewriteRule ^(.*)index\.php$ /$1 [R=301,L] ### re-direct default.html to root / ###
RewriteCond %{THE_REQUEST} ^.*\/default\.html\ HTTP/
RewriteRule ^(.*)default\.html$ /$1 [R=301,L] ### re-direct home.html to root / ###
RewriteCond %{THE_REQUEST} ^.*\/home\.html\ HTTP/
RewriteRule ^(.*)home\.html$ /$1 [R=301,L]
### re-direct IP address to www
### re-direct non-www to www 
### re-direct any parked domain to www of main domain
RewriteCond %{http_host} !^www.example.com$ [nc]
RewriteRule ^(.*)$ http://www.example.com/$1 [r=301,nc,L]
#### 

 

############################ from here ########################################

###### This group will not be needed by most websites - try without it first

### add a missing trailing slash to end of domain name or folder name
### ONLY in case the server does not compensate for it
### ONLY use it if needed - othewise remove this block of directives.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.example.com/$1/ [L,R=301]

If using https, the last line should be:

RewriteRule ^(.*)$ https://www.example.com/$1/ [L,R=301]

############################# to here ##########################################

 

 

NB:  The $ at the end of example.com in the RewriteCond step may or may not be needed.

 

The two lines in blue might not be needed or might not work, depending on server configuration, so they can be skipped.

Some Apache servers are set up in  a way where add-on domains (not merely parked domains) share the same root folder.

This is going to require a different .htaccess directive to 301 redirect non-www to www:

RewriteCond %{http_host} ^example.com$ [nc]
RewriteRule ^(.*)$ http://www.example.com/$1 [r=301,nc,L] 

If using https, the last line should be:

RewriteRule ^(.*)$ https://www.example.com/$1 [r=301,nc,L] 

 


A variant of  the directive for adding www to urls may be needed on your server:

RewriteEngine on

## Add www. to all URLs
RewriteCond %{HTTP_HOST} !^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://www.%1/$1 [R=301,L]

If using https, the last line should be:

RewriteRule ^(.*)$ https://www.%1/$1 [R=301,L]

 

Oh,  maybe you want to redirect the www version of your url’s (or IP address based ones, or other parked domains) to the non-www version of the main domain. No, problem. Here it is:

 

RewriteEngine on
RewriteBase /
 
### re-direct index.html to root / ###

RewriteCond %{THE_REQUEST} ^.*\/index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ /$1 [R=301,L] ### re-direct index.htm to root / ### RewriteCond %{THE_REQUEST} ^.*\/index\.htm\ HTTP/
RewriteRule ^(.*)index\.htm$ /$1 [R=301,L] ### re-direct index.php to root / ###
RewriteCond %{THE_REQUEST} ^.*\/index\.php\ HTTP/
RewriteRule ^(.*)index\.php$ /$1 [R=301,L] ### re-direct default.html to root / ###
RewriteCond %{THE_REQUEST} ^.*\/default\.html\ HTTP/
RewriteRule ^(.*)default\.html$ /$1 [R=301,L] ### re-direct home.html to root / ###
RewriteCond %{THE_REQUEST} ^.*\/home\.html\ HTTP/
RewriteRule ^(.*)home\.html$ /$1 [R=301,L]  ### re-direct IP address to non-www form of main domain ### re-direct www to non-www of main domain ### re-direct any parked domain to non-www form of main domain RewriteCond %{http_host} !^example.com$ [nc] RewriteRule ^(.*)$ http://example.com/$1 [r=301,nc,L] ####

 

############################ from here ########################################

###### This group will not be needed by most websites - try without it first
### add a missing trailing slash to end of domain name or folder name ### ONLY in case the server does not compensate for it ### ONLY use it if needed - otherwise remove this block of directives. RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_URI} !(.*)/$ RewriteRule ^(.*)$ http://example.com/$1/ [L,R=301]

############################# to here ##########################################

 

NB:  The $ at the end of example.com in the RewriteCond step may or may not be needed.

 

Save (as .htaccess ) and upload the file to the root of your website.

Some Apache servers are set up in  a way where add-on domains (not merely parked domains) share the same root folder. I consider it a poor setup.

This is going to require a different .htaccess directive to 301 redirect www to non-www:

RewriteCond %{http_host} ^www.example.com$ [nc]
RewriteRule ^(.*)$ http://example.com/$1 [r=301,nc,L] 

If using https, the last line should be:

RewriteRule ^(.*)$ https://example.com/$1 [r=301,nc,L]

——————————————–

A variant of  the directive for stripping www from  urls may be needed on your server:

 

RewriteEngine on

## Strip www. from all URLs
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://.%1/$1 [R=301,L]

If using https, the last line should be:

RewriteRule ^(.*)$ https://.%1/$1 [R=301,L]

 

 

Now don’t forget to make sure all the links on your site are consistent with this (don’t mix and match www and non-www links) and don’t use index.html (or whatever your homepage is) – instead use the root url (as “http://www.example.com/” or “http://example.com/” as per choice).

 

Or if you prefer all links on your site are just relative (no domain specified) and the homepage is just “/” .

 

These .htaccess directives will also help with links to the index page of a subfolder, exactly the same way as links to the index page of the site.

 

So the index page of a folder is “/foldername/” (i.e. relative to the root of the site).

 

Don’t leave folders without an index page (or appropriate .htaccess directives to suppress directory listings).

 

ATTENTION:

If you have a separate folder for an admin login for instance, you will have to add an .htaccess file there as well with this directive:

 

RewriteEngine off

 

Why bother?

Well it’s really so that you concentrate your urls under one form only and never have more than one url for the same actual page of content.

This method has nothing to do with general url rewriting for the purpose of eliminating the extension or to transform dynamic urls (which have query strings) into static urls.

For this you need other directives.

One good tool that helps generate somewhat more SEF (search engine friendly) url’s is: http://www.webmaster-toolkit.com/mod_rewrite-rewriterule-generator.shtml .

 

Other resources

More useful tools: https://www.webado.net/resources/.