How to stop indexing & password protect your development server

There is a Mobile Optimized version of this page (AMP). Open Mobile Version.

Hey everyone ,So recently I noticed that Google had indexed my whole Development server! To combat this I immediately inserted a robots file into the root of my Apache server so that Google would stop indexing everything.

After a month it was still listing pages for my development server but was no longer updating (so it was working but I wanted Google to forget the data). I realised at this point that I needed to put some security in place so that even if someone accidentally came across the server, they couldn't access it without permission from me, so I set-up some htaccess so that you are required to login in order to access anything from the server. This has secured my development pages from unwanted viewing and now Google is finally getting the hint and has started removing results from it's search engine!

So I thought I would blog about the various methods you could use to stop unwanted visitors (or just stop Google thinking your server is an actual website).

Robots.txt

A robots file is a simple text file called “robots.txt” which you place in the root directory of your website, this file is read by all respectable search engines and is used (most often) to determine what they can and cannot index on your server.

You can use this method to tell search engines NOT to index your development server, for example:

User-agent: *
Disallow: /

The above example will disable all agents accessing anything on your server.

You can also exclude specific directories and/or files, for example:

User-agent: *
Disallow: /checkout/
Disallow: /account.php

As I have said though if Google has already indexed your website then it may keep the results even though it’s now told not to index them, so perhaps the htaccess approach is for you…

No-Index Meta Tags

It is possible to exclude pages from search engines by placing a Meta Tag into the HTML of the page, this is useful for dynamic sites where you don't want to keep updating a robots.txt file for every page you wish to be excluded.

To exclude a page from all search engines you would place the below code into your <head> tag:

<meta name="robots" content="noindex" />

Password protect the server using .htaccess

This is the best solution to protect your Development Server from search engines and unwelcome visitors!

You have to create two files in the root of your webserver, one called .htaccess and another called .htpasswd, this is remarkably difficult on windows machines as windows says there invalid names! Here is an app that will help you create these files Coming soon.

In the .htaccess file put the following code, replace “Deanos Development Server” with a string that represents your server, and AuthUserFile to the directory path of your .htpasswd file.

AuthType Basic
AuthName "Deanos Development Server"
AuthUserFile "C:\Program Files (x86)\EasyPHP-5.3.8.1\www\.htpasswd"
require valid-user

Once this is in place head over to: http://www.htaccesstools.com/htpasswd-generator/

Input a desired Username and Password and click generate.

Copy the string returned and save that into the .htpasswd file.

congratulations! your server is now blocking unwanted guests, and soon Google may give up and get the hint too! Winking smile

Author: Dean Williams

I'm a Web Developer, Graphics Designer and Gamer, this is my personal site which provides PHP programming advice, hints and tips

  • Kyle

    Password protecting your whole server is the the most bullet-proof option for ensuring your development content is not indexed. However you can also use the "X-Robots-Tag" HTTP header. If you're on an Apache server and have the "header" module enabled, just stick this line in your httpd.conf/apache2.conf config file (or similar).

    ####
    Header add X-Robots-Tag noindex
    ####

    This will apply to every page served from the whole dev server. Handy if you host multiple projects in development. You don't have to worry about copying in robots.txt files or HTML Meta tags everywhere! 😉

    More details on the HTTP tag here.

    https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag