A common problem faced by web development companies when developing a new site for a client is that the test site they are developing somehow during the redesign stage gets indexed by Google, simply because the correct steps haven’t been followed by developers or they are simply unaware of what to do. When the client or web development company discovers that the site is indexed (I hope for the sake of the developer they have found it first), panic tends to set in and nobody seems to know what to do.
Here are some simple steps that you can take to have the content removed swiftly from the Google index.
To remove a directory and its contents, or your whole site, you must first ensure that the pages/URLs you want to remove have been blocked using a robots.txt file.
You can block Google and all major search engine crawlers to index the development domain by adding a robots.txt file to the document root. The file should contain the following:
User-agent: *
Disallow: /
This should stop search engine bots such as Google from crawling the content on the site.
To speed up the process of having the content removed from Google’s index you can submit a removal request for a whole site or a specific URL using Google Webmaster Tools.
How to request the removal of content from Google:
- 1. Set up a Google Webmaster Tools account for the test site, you can do this by visiting http://www.google.com/webmasters/tools/
- 2. Verify the site in Google Webmaster Tools, so Google knows you are the site owner
- 3. Once the site has been verified click on Optimisation > Remove URLs, as shown below:
- 4. Click the Create a new removal request button and input the homepage URL for the test site you want to remove
Click on:
The following will drop down:
If you want to remove the whole test site from Google’s index then enter the top level URL of the test server, in the example shown below the URL of the test site is http://test.test.com, this is the URL that you would enter as shown below:
Removal of a site from Google’s index can take from 2-24 hours but as a final check you can use the site: operator to ensure that the content has been removed from the index by typing the following in Google search:
site:http://dev.yourdomain.com or the homepage URL of the test site.
If you see no results for the URL, then this means all of the content has been removed from Google’s index.
So you never have to go through this process in the future make it best practice in your business to set up a robots.txt file for all your test servers, that way search engine bots, such Google cannot ever crawl your test/dev sites.