Crawler

Strip Unwanted Query String (Apache)

This will work only on an Apache server.

Method 1: Using the .htaccess File Only

The following .htaccess code will strip any query string from a uri that is suffixed as .html .

 

RewriteEngine On

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteRule \.html$ http://www.example.com/%1? [R=301,L]
 or as a more general directive to strip unwanted query string from every uri including the domain root (e.g. from  http://www.example.com/?blahblah&blah ) :
RewriteEngine On

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteRule (.*) /$1? [R=301,L]

Obviously it can be adapted to various  file extensions. With care, extra conditions can be added as well, to provide exceptions or specific usage.
Here’s an example where you want to strip off a particular query string from any urls, e.g. param=paramvalue
:
RewriteEngine On

RewriteCond %{query_string} ^param=paramvalue
RewriteRule (.*) /$1? [R=301,L]
A more general example to strip off query strings which start either with doing_wp_cron or refsite singly or in combination:
RewriteEngine On
RewriteCond %{QUERY_STRING} ^doing_wp_cron=(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} ^refsite=(.*)$ [NC]
RewriteRule (.*) /$1? [R=301,L]
Or more general even:
RewriteCond %{QUERY_STRING} (^|&)doing_wp_cron= [NC,OR]
RewriteCond %{QUERY_STRING} (^|&)refsite= [NC]
RewriteRule (.*) /$1? [R=301,L]
Here’s another example which will remove the query string from all requests EXCEPT when it is attached to a particular uri, e.g. some-script.php :
RewriteCond %{query_string} .
RewriteCond %{THE_REQUEST} !^.*\/some-script\.php?(.*)
RewriteRule (.*) /$1? [R=301,L]
So now we need to ensure that whatever query string comes attached to some-script.php contains ONLY the allowed query string parameter that some-script.php handles, and nothing else. Plus the parameters have to be in the correct order, to avoid what amounts to duplicate urls. Assume the script checks the parameter values for validity anyway (as it should).

Thus you need some  php code added at the top of the script some-script.php to attempt the preliminary validation of the query string. I will post it once I have tested the code. Until then…

 

If instead of stripping off the query string you actually want to respond with a 404 or 410 for a url with certain query string parameters, see this other page.

Method 2: Using PHP scripting

Modify the .htaccess file by adding this Apache handler if you are not already using php :

 

AddHandler application/x-httpd-php .html

 

The above .htaccess directive may differ depending on your own server’s configuration.

 

For instance it may need to be this instead:

 

AddType application/x-httpd-php .html

or

AddType application/x-httpd-php5 .html

 

Whichever such directives you use, it needs to allow your .html suffixed files to be parsed for php code as well as html code.

 

 

Then create a php script (using Notepad) with this code in it:

 

<?php
 // This script strips off any query string that may have been tacked onto a url and redirects to the url wihout any query string.
 // Only to be used for pages that are not supposed to have query strings.
 // Add this script at the top of any page that should not have any query string at all.
 // The page must be server parsed as php. If the page is not suffixed as .php then you need to add these directives to an .htaccess file in the root of the website.
 // For example to make .html be parsed as php add this line to the .htaccess file (without the leading // )
 //AddHandler application/x-httpd-php .html

// NB: the ? and query string are part of REQUEST_URI.
 $request_uri=$_SERVER['REQUEST_URI'];
 $host = $_SERVER['HTTP_HOST'];
 $urlen = strlen($request_uri);
 $query_string = $_SERVER['QUERY_STRING'];

$qm = strpos($request_uri,"?");
 if ($qm) {
 $request_uri = substr($request_uri,0,$qm);
 if (strpos($request_uri, "index.php"))  {$request_uri = substr($request_uri,0,strlen($request_uri)-9); }
 $goto = "http://".$host.$request_uri;

header( "HTTP/1.1 301 Moved Permanently" );
 header("Location: $goto");
 exit;
 }
 ?>

 

Save the above script under the name strip-query.php and upload it to your site to a folder such as /includes/ (or another folder that’s in the path of all your pages).

 

Add this bit of php code at the very top of every one of your pages (NB: pages, not just any files which are not uris of pages), before any other code, before the doctype:

 

<?php include("includes/strip-query.php"); ?>

 

This will strip off unwanted query strings that may have been added to your static urls.

 

NB: Do not use if your site already uses urls with query strings. A different script needs to be used to selectively strip off only certain query strings or only certain query string parameters.