juust ~ php oddities

Unordered list of one element
  • rss
  • begin
  • about
    • vcard
    • WTF is BroJesus
  • php scripts
    • flickr wp widget
    • google multi key serp tool, php script
    • gwt plugin
  • php classes
    • php pagerank class
    • fibonacci class
    • robots.txt parser php class
  • serp
    • serp dashboard wordpress plugin
  • services

tweeting pipes

juust | 30/06/2009

…but serious, channels on Twitter are a hot item.

Twitter seem to want branded channels for commerce by using verified accounts to prevent spoofing celebrities, and the same goes for brandnames. There is already a growing trade in twitter accounts like @nike-shoes, @skyeurope.

To build an attractive channel I need credibility, provide regular good quality fresh content, so where do I get that :

Yahoo pipes

I am very lazy and Yahoo have a nice example online, the news aggregator with 14 sources like blogsearch, icerocket, technorati, that you can clone and use out of the box. So I cloned it, replaced the technorati api key and run the pipe with ‘banking’ as keyword. I grab the rss feed url and read that with simplexml (you can use that pipe with any keyword).

Then I take a twitter php api class from sourceforge (it only reads the account, it doesnt have the post-routines), by simon wippich, wire in the rss-feed and start posting content.

  1. require_once('twitter.class.php');
  2. $Twitter = Twitter::getInstance();
  3. $Twitter->setUser('Account','SomePassword');
  4.  
  5. $rss= simplexml_load_file("http://pipes.yahoo.com/pipes/pipe.run?_id=1234567890&_render=rss&textinput1=banking");
  6.  
  7. if($rss)
  8. {
  9.     foreach($rss->channel->item as $e)
  10.     {
  11.         $shrunk = file_get_contents('http://bit.ly/api?url='.$e->link);
  12.         $msg = trim(substr($e->title, 0, (137-strlen($shrunk)))).' '.$shrunk;
  13.         $output = $Twitter->post($msg);
  14.     }
  15. }

neofinance

Now I can post proper stuff.

The second part of a channel is the audience.

Where to get my audience ?

Google Search

Google serp scrapers are always good for 1000 targetted results on any keyword : i use
allinanchor:twitter.com/ site:twitter.com banking
as search phrase, that gets me 95% valid accounts with my keyword banking in the description

  1. $key = 'banking';
  2. //scrape urls
  3. $urls = twt_Google('allinanchor:twitter.com/ site:twitter.com '.$key);
  4. //get the account names
  5. $accounts = twt_Google_getaccounts($urls);
  6.  
  7. function twt_Google($keywords, $pages=1) {
  8. //scrape results off of google serp    
  9.     $lang='en';
  10.     $results=100;
  11.     for($i=0;$i< $pages;$i++){
  12.         $start = $i*100+1;
  13.         $vargoogleresultpage = "http://www.google.com/search?as_q=".urlencode(trim($keywords))."&num=".$results."&start=".$start."&hl=en&lr=lang_en";
  14.         $googleresponse = join("",file($vargoogleresultpage));
  15.         $googlehits = preg_split('/class=r><a /', $googleresponse, -1, PREG_SPLIT_OFFSET_CAPTURE);
  16.         $i=0;
  17.         foreach($googlehits as $googlehit){
  18.                 $i++;
  19.                 preg_match("/href=\"(.*?)\"/", $googlehit[0], $t, PREG_OFFSET_CAPTURE);
  20.                 $the_urls[] = $t[1][0];
  21.         }        
  22.     }
  23.  
  24.     //return a set with twitter urls http://www.twitter.com/account
  25.     return $the_urls;
  26. }
  27.  
  28. function twt_Google_getaccounts($arr) {
  29. //get the account name from the twitter-url
  30.     for($i=0;$i<count($arr);$i++){
  31.         $parts = explode('/', $arr[$i]);
  32.         //account is 3 : http: // … / account
  33.         $myaccounts[] = $parts[3];
  34.     }
  35.     return $myaccounts;
  36. }

There is my audience, lets make some friends :

  1.  
  2. for($i=0;$i<count ($accounts);$i++){
  3.      followthisone($accounts[$i], 'Account','SomePassword');
  4. }
  5.  
  6. function followthisone($accountname, $name, $pass) {
  7.     $url = "http://twitter.com/friendships/create/".$accountname.".xml";
  8.     $ch = curl_init();
  9.     curl_setopt($ch, CURLOPT_URL,$url);
  10.     curl_setopt($ch, CURLOPT_POST, 1);
  11.     curl_setopt($ch, CURLOPT_USERPWD, $name.":".$pass);
  12.     $result= curl_exec ($ch);
  13.     curl_close ($ch);
  14. }

hello friends!

Anyways, that’s the basic ingredients of a marketing channel, proper content and an audience.

         

Comments
No Comments »
Categories
seo tips and tricks
Tags
twitter
Comments rss Comments rss
Trackback Trackback

curl trackbacks

juust | 25/03/2009

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

  1. //a generic query = 5% success
  2. //add "(powered by) wordpress"
  3.       $query=urlencode('keywords+wordpress+trackback');
  4.       $xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
  5.       $count=0;
  6.       foreach($xml->channel->item as $i) {
  7.  
  8.            $count++;
  9.  
  10. //the data from msn
  11.            $target['link'] = (string) $i->link;
  12.            $target['title'] = (string) $i->title;
  13.            $target['excerpt'] = (string) $i->description;
  14.  
  15. //some variables I'll need later on
  16.            $target[id'] = $count;
  17.           $target['trackback'] = '';
  18.           $target['trackback_success'] = 0;
  19.  
  20.           $trackbacks[]=$target;
  21.       }

25% of the cms sites in the top of the search engines are Wordpress scripts and Wordpress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2. //I could use curl
  3. //but 95% of the urls offered are kosher and respond fast
  4.      $content = @file_get_contents($trackbacks[$t]['link']);
  5.      preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
  6.            "(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
  7.         $content, &$matches);
  8.  $uri_array = $matches[1];
  9.  foreach($uri_array as $key => $link) {
  10.              if(strpos($link, 'rackbac')>0) {
  11.                 $trackbacks[$t]['trackback'] = $link;
  12.                 break;
  13.              }
  14.         }
  15. }
  16. </count>

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

  1. function cache_xml_store($trackbacks, $pagetitle)
  2. {
  3.  $xml = '< ?xml version="1.0" encoding="UTF-8"?>
  4. <trackbacks>';
  5.  for($a=0;$a<count ($trackbacks);$a++) {
  6.   $arr = $trackbacks[$a];
  7.   $xml .= '<entry>';
  8.   $xml .= '<id>'.$arr['id'].'</id>';
  9.   $xml .= '<excerpt>'.$arr['excerpt'].'</excerpt>';
  10.   $xml .= '<link>'.$arr['link'].'</link>';
  11.   $xml .= '<title>'.$arr['title'].'</title>';
  12.   $xml .= '';
  13.  }
  14.  $xml .= '</count></trackbacks>';
  15.  
  16.  $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
  17.  if(file_exists($fname)) unlink('cache/'.$fname);
  18.  $fhandle = fopen($fname, 'w');
  19.  fwrite($fhandle, $xml);
  20.  fclose($fhandle);
  21.  return;
  22. }

I use simplexml to read that cached file and show the excertps and links once the page is requested.

  1. // retrieve the cached xml and return it as array.
  2. function cache_xml_retrieve($pagetitle)
  3. {
  4.  $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
  5.  if(file_exists($fname)) {
  6.   $xml=@simplexml_load_file($fname);
  7.   if(!$xml) return false;
  8.   foreach($xml->entry as $e) {
  9.    $trackback['id'] =(string) $e->id;
  10.    $trackback['link'] =  rid((string) $e->link);
  11.    $trackback['title'] =  (string) $e->title;
  12.    $trackback['description'] =  (string) $e->description;
  13.  
  14.    $trackbacks[] = $arr;
  15.   }
  16.   return $trackbacks;
  17.  }
  18.  return false;
  19. }

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

  1. $pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));
  2.  
  3. $cached_excerpts = cache_xml_retrieve($pagetitle);
  4.  
  5. //do some stuff with, make it look nice  :
  6. for($s=0;$s<count ($cached_excerpts);$s++) {
  7. //this lists the trackback (candidates)
  8.     echo $cached_excerpts[$s]['excerpt'];
  9.     echo '<a href="'.$cached_excerpts[$s]['link'].'">'.$cached_excerpts['title'].'';
  10. }
  11. </count>

Now I prepare the data for the trackback post :

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2.  
  3.     $trackback_url = $trackbacks[$t]['trackback'];
  4. //does it have a trackback target url ? then prepare data :
  5.     if($trackback_url !='') {
  6.         $trackback_data = array(
  7.  "url" => "url of my page with the link to the target",
  8.   "title" => "title of my page",
  9.  "blog_name" => "name of my blog",
  10.  "excerpt" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'
  11.         );
  12.         //…and try the trackback
  13.         $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
  14.     }
  15. }
  16. </count>

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

  1. function trackback_ping($trackback_url, $trackback)
  2.  {
  3.  
  4. //make a string of the data array to post
  5.  foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
  6.         $postfields= implode('&', $strout);
  7.  
  8. //create a curl instance
  9.  $ch = curl_init();
  10.  curl_setopt($ch, CURLOPT_URL, $trackback_url);
  11.  curl_setopt($ch, CURLOPT_TIMEOUT, 3);
  12.  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
  13.  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  14.  
  15. //set a custom form header
  16.  curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
  17.  
  18.  curl_setopt($ch, CURLOPT_NOBODY, true);
  19.  
  20.         curl_setopt($ch, CURLOPT_POST, true);
  21.  curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
  22.  
  23.  $content = curl_exec($ch);
  24.  
  25. //if the return has a tag 'error' with as value 0 it went flawless
  26.  $success = 0;
  27.  if(strpos($content, '>0')>0) $success = 1;
  28.  curl_close ($ch);
  29.  unset($ch);
  30.  return $success;
  31.  }

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2.     if($trackbacks[$t]['trackback_success']>0) {
  3.         $store_trackbacks[]=$trackbacks[$t];
  4.     }
  5. }
  6. cache_xml_store($store_trackbacks, $pagetitle);
  7. </count>

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

  1. /**
  2.   * Fetches ping url from the given url
  3.   *
  4.   * @param string $url URL to probe for RDF
  5.   * @return string Ping URL
  6.   */
  7.  protected function getPingURL($url) {
  8.   $pingUrl = '';
  9.   // Get URL content
  10.   $urlContent = t3lib_div::getURL($url);
  11.   if ($urlContent && ($rdfPos = strpos($urlContent, '<rdf :RDF')) !== false) {
  12.    // RDF exists in this content. Get it and parse
  13.    $urlContent = substr($urlContent, $rdfPos);
  14.    if (($endPos = strpos($urlContent, '</rdf:RDF>', $rdfPos)) !== false) {
  15.     // We will use quick regular expression to find ping URL
  16.     $rdfContent = substr($urlContent, $rdfPos, $endPos);
  17.     $pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
  18.    }
  19.   }
  20.   return $pingUrl;
  21.  }
  22. </rdf>
         
Comments
6 Comments »
Categories
links, php, seo tips and tricks
Tags
links, php, seo tips and tricks, trackback
Comments rss Comments rss
Trackback Trackback

proxies !

juust | 21/02/2009

I got a site banned at Google so I got pissed and took a script from the blackbox @ digerati marketing to scrape proxy addresses, wired a database and curl into it, so now it scrapes proxies, random picks a proxy, prunes dead proxies and returns data.

Basic, it uses anonymous (level 2) proxies, but it works. You can check the source here

  1.  
  2. /* (mysql table)
  3. CREATE TABLE IF NOT EXISTS `serp_proxies` (
  4.   `id` int(11) NOT NULL auto_increment,
  5.   `ip` text NOT NULL,
  6.   `port` text NOT NULL,
  7.   PRIMARY KEY  (`id`)
  8. ) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
  9. */
  10.  
  11. //initialize database class, replace with own code
  12. include('init.php');
  13.  
  14. //main class
  15. $p=new MyProxies;
  16.  
  17. //do I have proxies in the database ?
  18. //if not, get some and store them
  19. if($p->GetCount() < 1) {
  20.  $p->GetSomeAir(1);
  21.  $p->store2database();
  22. }
  23.  
  24. //pick one
  25. $p->RandomProxy();
  26.  
  27. //get the page
  28. $p->ThisProxy->DoRequest('http://www.domain.com/robots.txt');
  29.  
  30. //error handling
  31. if($p->ThisProxy->ProxyError > 0) {
  32. //7   no connect
  33. //28   timed out
  34. //52   empty reply
  35. //if it is dead, doesn't allow connections : prune it
  36.  if($p->ThisProxy->ProxyError==7) $p->DeleteProxy($p->ThisProxy->proxy_ip);
  37.  if($p->ThisProxy->ProxyError==52) $p->DeleteProxy($p->ThisProxy->proxy_ip);
  38. }
  39. //you could loop back until you get a 0-error proxy, but that ain't the point
  40.  
  41. //give me the content
  42. echo $p->ThisProxy->Content;
  43.  
  44.  
  45. Class MyProxies {
  46.  
  47.  var $Proxies = array();
  48.  var $ThisProxy;
  49.  var $MyCount;
  50.  
  51.  
  52. //picks a random proxy from the database
  53.  function RandomProxy() {
  54.  
  55.   global $serpdb;
  56.   $offset_result =  $serpdb->query("SELECT FLOOR(RAND() * COUNT(*)) AS `offset` FROM `serp_proxies`");
  57.   $offset_row = mysql_fetch_object($offset_result);
  58.   $offset = $offset_row->offset;
  59.   $result = $serpdb->query("SELECT * FROM `serp_proxies` LIMIT $offset, 1" );
  60.   while($row=mysql_fetch_assoc($result)) {
  61. //make instance of Proxy, with proxy_host ip and port
  62.    $this->ThisProxy = new Proxy($row['ip'].':'.$row['port']);
  63.    $this->ThisProxy->proxy_ip = $row['ip'];
  64.    $this->ThisProxy->proxy_port = $row['port'];
  65.    break;
  66.   }
  67.  }
  68.  
  69. //visit the famous russian site
  70.  function GetSomeAir($pages) {
  71.    for($index=0; $index< $pages; $index++)
  72.    {
  73.     $pageno = sprintf("%02d",$index+1);
  74.     $page_url = "http://www.samair.ru/proxy/proxy-" . $pageno . ".htm";
  75.     $page_html = @file_get_contents($page_url);
  76.  
  77. //get rid of the crap and extract the proxies
  78.     preg_match("/<tr><td>(.*)< \/td>< \/tr>/", $page_html, $matches);
  79.     $txt = $matches[1];
  80.     $main = split('</td><tr><td>', $txt);
  81.     for($x=0;$x<count ($main);$x++) {
  82.      $arr = split('</td><td>', $main[$x]);
  83.      $this->Proxies[] = split(':', $arr[0]);
  84.     }
  85.    }
  86.  }
  87.  
  88. //store the retrieved proxies (stored in this->Proxies) in the database
  89.  function store2database() {
  90.   global $serpdb;
  91.   foreach($this->Proxies as $p) {
  92.    $result = $serpdb->query("SELECT * FROM serp_proxies WHERE ip='".$p[0]."'");
  93.    if(mysql_num_rows($result)&lt;1) $serpdb->query("INSERT INTO serp_proxies (`ip`, `port`) VALUES ('".$p[0]."', '".$p[1]."')");
  94.   }
  95.   $serpdb->query("DELETE FROM serp_proxies WHERE `ip`=''");
  96.  }
  97.  
  98.  
  99.  function DeleteProxy($ip) {
  100.   global $serpdb;
  101.   $serpdb->query("DELETE FROM serp_proxies WHERE `ip`='".$ip."'");  
  102.  }
  103.  
  104.  
  105.  function GetCount()
  106.  {
  107. //use this to check how many proxies there are in the database
  108.   global $serpdb;
  109.   $this->MyCount = mysql_num_rows($serpdb->query("SELECT * FROM `serp_proxies`"));
  110.   return $this->MyCount;
  111.  }
  112.  
  113.  
  114. }
  115.  
  116. Class Proxy {
  117.  
  118.  var $proxy_ip;
  119.  var $proxy_port;
  120.  
  121.  var $proxy_host;
  122.  var $proxy_auth;
  123.  var $ch;
  124.  var $Content;
  125.  var $USERAGENT = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
  126.  var $ProxyError = 0;
  127.  var $ProxyErrorMsg = '';
  128.  var $TimeOut=3;
  129.  var $IncludeHeaders = 0;
  130.  
  131.  function Proxy($host, $username='', $pwd='') {
  132. //initialize class, set host
  133.          $this->proxy_host = $host;
  134.          if (strlen($username) > 0 || strlen($pwd) > 0) {
  135.             $this->proxy_auth = $username.":".$pwd;
  136.          }
  137.       }
  138.  
  139.  function CURL_PROXY($cc) {
  140.    if (strlen($this->proxy_host) > 0) {
  141.     curl_setopt($cc, CURLOPT_PROXY, $this->proxy_host);
  142.     if (strlen($this->proxy_auth) > 0)
  143.      curl_setopt($cc, CURLOPT_PROXYUSERPWD, $this->proxy_auth);
  144.    }
  145.  }
  146.  
  147.  function DoRequest($url) {
  148.   $this->ch = curl_init();
  149.   curl_setopt($this->ch, CURLOPT_URL,$url);
  150.   $this->CURL_PROXY($this->ch);
  151.   curl_setopt($this->ch, CURLOPT_HEADER, $this->IncludeHeaders); // baca header
  152.  
  153.   curl_setopt($this->ch, CURLOPT_USERAGENT, $this->USERAGENT);
  154.   curl_setopt($this->ch, CURLOPT_RETURNTRANSFER, 1);
  155.   curl_setopt($this->ch, CURLOPT_TIMEOUT, $this->TimeOut);
  156.      $this->Content = curl_exec($this->ch);
  157.  
  158. //if an error occurs, store the number and message
  159.   if (curl_errno($this->ch))
  160.    {
  161.     $this->ProxyError =  curl_errno($this->ch);
  162.     $this->ProxyErrorMsg =  curl_error($this->ch);
  163.    }
  164.  }
  165.  
  166. }
  167. </td></count></td></tr>

There is not much to say about it, just a rough outline. I would prefer elite level 1 proxies but for now it will have to do.

         
Comments
No Comments »
Categories
php, seo tips and tricks
Tags
php, scrape, seo tips and tricks
Comments rss Comments rss
Trackback Trackback

« Previous Entries

Recent Posts

  • p2p with wordpress xml-rpc
  • Tweets on Google’s frontpage
  • happy new year
  • metaWeblog.newPost posting to Wordpress from Word
  • IE is retarded

click me!
rss
Comments rss
Blog Directory
Web Developement Blogs - BlogCatalog Blog Directory
Listed in LS Blogs the Blog Directory and Blog Search Engine
Blog Flux Directory
joopita.com free web directory and search engine
design by jide
sitemap
8298 confirmed spam kills