Main Page

From Robotsxml

Jump to: navigation, search

robotsxml.org

This site has been setup to propose a robots.xml file for discussion which could be used to enable the automatic discovery of site services by client apps.
Please feel free to create an account and contribute to this wiki.

Comments

Articles


The existing Robots Exclusion Standard robots.txt file provides a mechanism to inform robots of content for exclusion from search indexes.

The existing Sitemaps Protocol file provides a mechanism to inform robots of content for inclusion in search indexes.


The proposed robots.xml file could provide a general data exchange index in which the robots exclusion file, sitemaps file and new forms of data indexes could be referenced. This generic approach enables the automatic discovery of site services.


For example, it would be really useful to be able to point an RSS reader at any domain name, attempt to load the robots.xml file, and subsequently be presented with a list of all available RSS feeds available from the site which could be selected for the RSS reader by the user.


A sample robots.xml file containing references to the sitemaps index file and an RSS index file may look like this:


<?xml version="1.0" encoding="UTF-8"?>

 <xmlmap xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
                             http://xml.robotsxml.org/schemas/xmlmap/0.1/xmlmap.xsd">

  <url>
   <loc>http://www.bbc.co.uk/sitemap.xml</loc>
   <namespace>http://www.sitemaps.org/schemas/sitemap/0.9</namespace>
   <lastmod>200910111110</lastmod>
  </url>

  <url>
   <loc>http://news.bbc.co.uk/rss/feeds.opml</loc>
   <namespace>http://www.opml.org/spec2</namespace>
   <lastmod>200910111110</lastmod>
  </url>

  <url>
   <loc>http://backstage.bbc.co.uk/feeds/tvradio/20091011.tar.gz</loc>
   <namespace>urn:tva:metadata:2005</namespace>
   <lastmod>200910110000</lastmod>
  </url>

 </xmlmap>


In this example the schema is derived from the sitemaps schema.

For the RSS reader example for the auto discovery of site feeds the client would be coded to lookup <anydomain>/robots.xml and match for any URLs with the <namespace> element http://www.opml.org/spec2 identifying lists of available feeds.


Example: BBC robots.txt file containing a link to the sitemaps file
Example: BBC sitemap.xml file using an alternative sitemap index namespace called siteindex.xsd
Example: BBC feeds index OPML file

Example of a data feed which could be presented in a new data format referenced by robots.xml - Weather data


User's Guide

Personal tools