Crawler

Force a 404 or 410 Using the .htaccess File on Apache server

How to force a 404 or 410 for a url

Easy enough to do on an Apache server with the mod_rewrite module enabled, using the .htaccess file.

Add these directives (adjusted for your particular situation):

 

RewriteEngine on
 ## list uris which are supposed to return 404 are rewritten to a non-existent url which returns a 404 anyway - the server is supposed to respond with a 404**
 RewriteRule ^(.*)/path-to/url-of-bad-page-i-want-to-see-gone-forever\.html$  http://www.example.com/nonexistent-file-which-gives-404.php
## list uris which are supposed to return 410
 RewriteRule ^(.*)/path-to/url-of-bad-page-i-want-to-see-gone-forever\.html$ - [G] 
 ## custom error documents - note that you use only their uri - relative to the root (/) - NEVER use the fully qualified url withhttp://www.exampe.com/ part.
ErrorDocument 404 /error-404.php

ErrorDocument 410 /error-410.php

 

**In cases when the server does not respond with a 404 naturally for a non-existent file, then make a php script called nonexistent-file-which-gives-404.php  containing this code in it:

 

<?php

 header( "HTTP/1.1 404 Not Found" );

exit;

?>

 

Even though the file exists, it responds with a 404 – so as far as the server is concerned and any user agent checking the response, it does not exist.

 How to force a 404 or 410 for a url containing a query string you don’t want to allow
This is similar to how we strip off a query string except we serve a custom 404 (or 410) error page and respond with a 404 (or 410).
 
Useful for sites that have been hacked (and subsequently unhacked) when you want to ensure all the spammy urls formed with various query strings that your site otherwise doesn’t use and created during the hack will respond with 404 or 410. 
 
In this example there are 3 query string parameters parm1 through parm3 that may be used together in any order in a query string which make it an undesirable url. We want to respond with 404.
 
 
RewriteEngine on

RewriteCond %{QUERY_STRING} (^|&)parm1= [NC]
RewriteCond %{QUERY_STRING} (^|&)parm2= [NC]
RewriteCond %{QUERY_STRING} (^|&)parm3= [NC]
RewriteRule  (.*) /error-404.php? [R=404,L]
If you don’t want to allow any query string at all:
 
RewriteEngine on
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) /error-404.php? [L,R=404]
 
 
NB: /error-404.php is your custom 404 error page ,  located in the root folder in these examples.
 
 
Special Case : A WordPress Site
 
For a WordPress site that’s been hacked with spammy urls added with query strings, usually based on the homepage url, ater you have fixed the hack itself you need to respond with 404 for all those spammy urls that were added with query strings.
 
 
This assumes your WP site uses permalinks which do not have query strings. 
Also this will only work properly for a WordPress site where WordPress was installed in the root folder.
 
This needs to go at the top of the root .htaccess file:
 
RewriteEngine on

RewriteCond %{THE_REQUEST} !^.*\/wp-login\.php?(.*) 

RewriteCond %{THE_REQUEST} !^.*\/wp-admin\/(.*) 

RewriteCond %{QUERY_STRING} . [NC]

RewriteCond %{QUERY_STRING}  !^s= [NC]          #### to allow for search page urls

RewriteCond %{QUERY_STRING}  !^ver= [NC]        #### to allow for .js and .css files that have version numbers

RewriteCond %{QUERY_STRING}  !^rsd= [NC]        #### to allow for some feed urls

RewriteRule (.*) /my-404.php? [L,R=404]         #### my-404.php is a dummy 404 page uri - it could be nearly anything
 
 
If you find other valid query string parameters, you can add them in the same manner.