Introduction to writing spiders and agents
Fault Tolerance: Be proactive
       
   1. Watch for site layout changes
        Look for "Last Modified:" changes in headers
        Look for "Location:" changes in headers
        Write code that looks for anomalies
        Compare content
        Automate the above
       
   2. Assume that people (and machines) write non-compliant HTML
   3. Assume that HTML comands will span lines of HTML
   3. Use protocol descripters
        http://www.site.com
        https://www.site.com