To try to combat the perennial problems webmasters of all ilk face with search engines, especially Google, here’s an umpteen-point website tune-up plan, that may help :
- Get server response headers using web-sniffer.net for any page, starting with the homepage – it should be a server response code 200; for a missing page it should be a 404 Not Found or 410 Gone.
- Resolve any canonical domain issues before they become a headache (www vs non www – make a decision) by using a 301 redirection of one to the other one to be retained. Here’s a tutorial for apache servers. For forcing a site to use http rather than https or viceversa, see this other tutorial for apache servers.
- Run a spider simulator for the homepage and others – if not enough text is found, then there’s no content to index.
- Run Xenu Link Sleuth starting at the homepage and check broken (and redirected) links and number of true links found – unless it never completes the crawl which will mean a problem. Compare list of good links found with your sitemap. Replace redirected urls found in navigation with the final destination of the redirections.
- Run a web page spam detector – more than 100 outbound links on a page signal a potential link farm – linking to too many diverse sites can be a liability. Even if all the links are to pages on your own site, 100 links or more cannot be very useful to visitors either.
- Validate your homepage and others – proper doctype and charset help – broken code especially at the block level will prevent bots from crawling.
- Pages not reachable through a normal crawl ( orphan or reachable by javascript, ajax or flash navigation) might not be crawled/indexed properly (if at all) even if present in the sitemap – make sure sitemap is correctly laid out.
- Affiliate links present on pages – alarm bells will go off. You need to ensure you have a lot of original and unique content to supplement the products available through the affiliate links and you should use rel=”nofollow” for all such links. Forget cloaking them with any sort of redirections, this is sneaky. Remember affiliate links are on par with paid links.
- Page titles, headers on the text, proper use and distribution of keywords and key-phrases (don’t keyword stuff, don’t spam), anchor text, alt text for images, increased use of css – all part of efforts of internal optimization of pages. Resize appropriately and optimize images and other media files so pages load fast.
- Watch your page layout and page size carefully. Putting ads (whether Adsense or any others) in the prime viewing area of a page (e.g. above the fold, above the top menu, in the left navigation area or smack bang where one expects actual content) signals a low quality site and a poor user experience. Having ads disguised as regular content and website links is a bad signal. Serious penalty material.
- Get or better attract relevant, quality incoming links – but not from link farms. Don’t buy rank-passing links. Forget about blog and forum signatures, unless your post is truly appropriate and relevant to that forum or blog. Don’t spam. Careful with SEO specialists you may hire, make sure you know exactly what they do and that they don’t break any guidelines, especially by procuring followed backlinks or building multiple sites or micro-sites. The statistics are grim.
- Find out the indexing situation by checking in Google.com site:example.com and site:www.example.com – investigate omitted (previously called supplemental) pages. When you see an indication of similar pages that is usually because they have the same title or description meta tags as others already listed and that should be fixed. Or are very thin content and don’t deserve to be ranked. Or are among those blocked in robots.txt. Some may benefit from having a robots “noindex” meta tag instead of being blocked in robots.txt. It depends.
- Have dead urls currently indexed first removed from the Google index by requesting it unless you have appropriate equivalent new urls to which you can 301 redirect them. No rush on url removals, they will eventually drop out, but if they are embarrassing (as after a hack) get them removed. See this site for more help on url removals. Do not request removal or block in robots.txt any url which you will 301 redirect elsewhere because search engines will never discover the redirection.
- The file robots.txt is your friend. Tweak it well, streamline it and make sure you know why you are blocking what you are blocking. See some special instructions for WordPress sites.
- Make use of Google Search Console (Webmaster Tools).
- If your site has been hacked with either a spam hack or a malware hack you really need to be on the ball and get it fixed as soon as possible. Use this tool to help find what’s hacked and for hints of what to look for.
Finally remember: CRAWLABILITY as the number 1 technical requirement for a site to even start to be indexed. View Matt Cutt’s video where he explains a lot of concepts involved in SEO. He mentions CRAWLABILTIY around 1min and again at around 3min into the video.