If you have your own development server like us that are accessible across the internet using public IP or a domain name you may be having problems where somehow Google and other search engines find your development server and starts indexing it! This is very frustrating when you actually do not want projects being leaked through search engines and can lead to serious issues with disclosure agreements with your clients if you are supposed to be keeping their project a secret!
Now there are several ways as a developer we can combat this, but the most obvious ways are not the best ways. Adding nofollow in the htaccess file of each of your projects is one of the most obvious methods, however this relies on you actually REMEMBERING to do this for every single project you set-up and begin working on, the other frustrating thing is remembering to REMOVE the nofollow once you put the site live, I would say this is a very dangerous approach as it leaves the work up to the developer and if missed will cause serious implications.
Obviously the second idea involves applying the noindex and nofollow in your projects code, or if it's a system such as WordPress actually tick the box to discourage bots from indexing the site. Again this requires a manual action and if forgotten will lead you down the same path as the first idea above.
The third option is quite clever but very annoying, hide the whole server behind authentication, this might be a neat idea and will certainly stop Google or anyone else viewing the site, but it will severely affect development, especially if you have many developers who need to know the login information, remember it and then they have to enter this login every time they visit the site under a new session, this is very annoying and cumbersome. But obviously vital for those serious projects where you really dont want anyone to stumble upon your server (and in this case USE THEM).
A Better Approach...
So the question arises, how can we do this automatically across the whole development server, how do we automatically tell Google and other search engines to leave our server alone and ignore anything on it? The answer is Apache config and here is how you can do it:
You will need to find your apache httpd.conf file which could be located anywhere on your server, here are some of the most common paths:
Red Hat: /etc/httpd/conf/httpd.conf
Wamp (Windows): C:\wamp\bin\apache\apache2.4.9\conf\httpd.conf
Once you have found it you need to open it in an editor such as Notepad++ (Windows) or Nano (Linux).
You should be able to find the section below somewhere in the file, you may be able to find it from the comment or simply searching <Directory />, dont worry if yours is a little different aslong as you establish which one is the MAIN configuration section, this is usually just below the "DocumentRoot" declaration:
# # Each directory to which Apache has access can be configured with respect # to which services and features are allowed and/or disabled in that # directory (and its subdirectories). # # First, we configure the "default" to be a very restrictive set of # features. # <Directory /> Options FollowSymLinks AllowOverride None Order deny,allow Deny from all Satisfy all </Directory>
Inside the <Directory /> tag below all other rules you need to add the noindex, nofollow tag, for example:
# # Each directory to which Apache has access can be configured with respect # to which services and features are allowed and/or disabled in that # directory (and its subdirectories). # # First, we configure the "default" to be a very restrictive set of # features. # <Directory /> Options FollowSymLinks AllowOverride None Order deny,allow Deny from all Satisfy all Header set X-Robots-Tag "noindex, nofollow" </Directory>
Now everything on the development server will be noindex and nofollow - search engines will not index your development server and your team can do the important stuff without needing to worry about search engines.