juust ~ php oddities

Unordered list of one element
  • rss
  • begin
  • about
    • vcard
    • WTF is BroJesus
  • php scripts
    • flickr wp widget
    • google multi key serp tool, php script
    • gwt plugin
  • php classes
    • php pagerank class
    • fibonacci class
    • robots.txt parser php class
  • serp
    • serp dashboard wordpress plugin
  • services

a social spider

juust | 24/01/2009

I was reading about the BloGee project and some other stuff and then I thought “how much trouble would it be writing a Wordpress plugin to do some basic ’social spidering’”. BloGee is about a micro-content format and that’s a bit out of my scope.

I want a simple ’social’ spider for Wordpress so I am going to take some functions of simpleTags and add them to the Wordpress xml-rpc server methods, to get some basic functionality I can call upon through the xml-rpc endpoint.

I don’t know if I mentioned that, adding methods to Wordpress XML-RPC differs from the straight forward use of Incutio because Wordpress uses its own filter/hook system, the actual IXR_Server instance is made and managed by Wordpress itself.

I hence don’t use the

  1. class PeekAtYouServer extends IXR_Server {

style class instancing, in stead I make a class where I hook the function I would normally hand to the IXR_Server as callback into wordpress by adding method/callback to the ‘xmlrpc_methods’ filter array.

  1.  class PeekAtYouServer {
  2.  
  3.   function PeekAtYouServer() {
  4.   //  add callbacks as methods to the array (filter) xmlrpc_methods
  5.    add_filter('xmlrpc_methods', array(&$this, 'filterXmlrpcMethods'));
  6.   }
  7.  
  8.   /**
  9.    * Here I connect the methodName pay.PeekAtYou to a custom function
  10.    *
  11.    * This is the array that is added (as pair) to the xmlrpc_methods filter
  12.    **/
  13.  
  14.   function filterXmlrpcMethods(&$methods) {
  15.    $methods['pay.PeekAtYou'] = array(&$this, 'onXmlRpcpayPeekAtYou');
  16.    return $methods;
  17.   }
  18.  
  19.   /**
  20.    * the custom function used as callback for pay.PeekAtYou
  21.    **/
  22.  
  23.   function onXmlRpcpayPeekAtYou($args) {
  24.     global $wpdb;
  25. //grab posts
  26.     $sql = "SELECT ID, post_title FROM " . $wpdb->posts . " WHERE post_status = 'publish'";
  27.     $posts = $wpdb->get_results($sql);
  28. //cycle through all posts and grab the IDs
  29.  
  30.     $result = array();
  31.     if (!empty($posts)) {
  32.      foreach ($posts as $post) $this->postids[] = $post->ID;
  33. //I got all the ids in an array,
  34. //now I grab the tags (query is from the simpleTags plugin)
  35.      $this->getTagsFromCurrentPosts();
  36.     } else {
  37.      return new IXR_Error(404, 'no posts for the selected criterium.');
  38.     }
  39.     return $this->tags_currentposts;
  40.   }
  41.  
  42.  /**
  43.   * Get tags from current post views
  44.   * (SimpleTags plugin)
  45.   * @return boolean
  46.   */
  47.  function getTagsFromCurrentPosts() {
  48.   if ( is_array($this->postids) && count($this->postids) > 0 ) {
  49.  
  50.    // Generate SQL from post id
  51.    $postlist = implode( "', '", $this->postids);
  52.  
  53.    global $wpdb;
  54.    $results = $wpdb->get_results("
  55.    SELECT t.name AS name, t.term_id AS term_id, tt.count AS count
  56.    FROM {$wpdb->term_relationships} AS tr
  57.    INNER JOIN {$wpdb->term_taxonomy} AS tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
  58.    INNER JOIN {$wpdb->terms} AS t ON (tt.term_id = t.term_id)
  59.    WHERE tt.taxonomy = 'post_tag'
  60.    AND ( tr.object_id IN ('{$postlist}') )
  61.    GROUP BY t.term_id
  62.    ORDER BY tt.count DESC");
  63.  
  64.    $this->tags_currentposts = $results;
  65.    unset($results, $key);
  66.   }
  67.   return true;
  68.  }
  69.  }

I saved the file as PeekAtYouServer.class.php.

Now I need a simple file for Wordpress to ’spot the plugin’, so I can activate it and make the class instance, that adds the custom method and callback function.

  1.  /*
  2.  Plugin Name: PeekAtYou XMLRPC Server
  3.  Plugin URI: http://www.juust.org/
  4.  Description: Adds Social Spidering to your blog
  5.  Author: juust
  6.  Author URI: http://www.juust.org/
  7.  License: GPL
  8.  Version: 1.1
  9.  */
  10.  
  11.  require_once 'PeekAtYouServer.class.php';
  12.  $PeekAtYouServer = new PeekAtYouServer();

I save that as PeekAtYouServer.php and upload the lot to a directory /wp-plugins/pay-xmlrpc-server.

In the Plugin screen (wp 2.5) I can activate the plugin, and then make a call to the xmlrpc-endpoint of the blog using pay.PeekAtYou as methodName.

  1.  include('wp-includes/class-IXR.php');
  2.         $client = new IXR_Client('http://www.juust.org/xmlrpc.php');
  3.  $client->query('pay.PeekAtYou');
  4.  $response = &$client->getResponse();
  5.  print_r( $response);

That returns all tags the blog uses.

Next week : adding some basic social blog-spider functions.

Comments
1 Comment »
Categories
wordpress, xml-rpc
Tags
wordpress, xml-rpc
Comments rss Comments rss
Trackback Trackback

hands on xml-rpc : copying msql tables

juust | 08/01/2009

I don’t have anything to blog on, so I will bore you all with a quick generic function to copy mysql tables from one host to another, using xml-rpc.

I use the Incutio xml-rpc library on both hosts, to handle the tedious stuff (xml formatting and parsing). That leaves only some snippets to send and receive table data and store it on a mysql database.

First : how to handle the table data on the sending end:

  • I take an associative array from a mysql query
  • I make an array to hold the records
  • I add each row as array
  • I make an IXR-client.
  • I add some general parameters
  • I hand these and the entire table array to my IXR-client.
  • send…
  1. //the snippet with the client is at the bottom of the post
  2. $ThisClient = New SerpClient('http://serp.trismegistos.net/db/xmlrpc.php', 'user', 'pass', 'sender');
  3.  
  4. $tablename = "serp_tags_keys";
  5. $tableid = "id";
  6. $result = $serpdb->query("SELECT * FROM ".$tablename);
  7. $recordcount = mysql_num_rows($result);
  8.  
  9. while($row=mysql_fetch_assoc($result)) {
  10.  $record=array();
  11.  foreach($row as $key => $value) $record[$key]=$value;
  12.  $records[]=$record;
  13. }
  14.  
  15. $ThisClient->putTable($tablename, $recordcount, $tableid, $records);

I consider some additional fields necessary for basic integrity checks : I add “ID” as key field, so on the receiving end the server knows which field is my table’s auto-increment field. Other fields are a username, password, tablename and the batch recordcount.

The IXR_Client then generates a tangled mess of xml-tags holding the entire prodecure call and data. (you can put the client on ‘debug’, then it dumps the generated xml to the screen).

The first part of the xml file contains the single parameters :

  • username
  • password
  • tablename
  • recordcount
  • id-field

<methodCall>
<methodName>serp.putTable</methodName>
<params>
<param><value><string>user</string></value></param>
<param><value><string>pass</string></value></param>
<param><value><string>serp_tags_keys</string></value></param>
<param><value><int>91</int></value></param>
<param><value><string>id</string></value></param>

Then the entire table is sent as one parameter in the procedure call.

That parameter is built from an array containing the table rows as ’struct’. If I want to use the routine for any table, I need the fieldname-value pairs to compose a standard mysql insert statement. A struct type allows me to use key-value pairs in the xml-file that can be parsed back into an array.

<param><value><array>

<data>

<value><struct>
<member><name>id</name><value><string>4</string></value></member>
<member><name>tag</name><value><string>ranking</string></value></member>
<member><name>cat</name><value><string>alexa ranking seo internet ranking internet positi</string></value></member>
<member><name>date</name><value><string>200901</string></value></member>
</struct></value>

<value><struct>
<member><name>id</name><value><string>94</string></value></member>
<member><name>tag</name><value><string>firm</string></value></member>
<member><name>cat</name><value><string>firm seo</string></value></member>
<member><name>date</name><value><string>200901</string></value></member>
</struct></value>

</data>

</array></value></param>

That was the last of the param holding the table, so the entire tag-mess is closed :

</params&gt</methodCall&gt

Then the second part : on the receiving end the Incutio class parses the whole tag-mess, and hands an array of the param sections as input to my function putTable.

  1.  function putTable($args)
  2.  {
  3.   $user   = $args[0];
  4.   $pass   = $args[1];
  5.   $tname   = $args[2];
  6.   $tcount  = $args[3];
  7.   $id           = $args[4];
  8.   $table   = $args[5];

$table is a straightforward array holding as items an array ($t) created from the struct with the pairs of fieldname-value. I turn the recordsets key-value struct into a mysql INSERT query :
$query = “INSERT INTO `”.$tname.”` (” field, field… “) VALUES (” fieldvalue, fieldvalue “)”;

All I have to do is add the fieldnames and fieldvalues to the mysql insert query.

  1.   foreach($table as $t) {
  2.  
  3. //the fixed parts
  4.     $query0 = 'INSERT INTO `'.$tname.'` (';
  5.     $query2 .=") VALUES (";
  6.  
  7. //make the (`fieldname`, `fieldname`, `fieldname`) query-bit
  8. //and the ('fieldvalue', 'fieldvalue', 'fieldvalue') query-bit :
  9.  
  10.     foreach($t as $key=>$value) {
  11.      if($key!=$id) {
  12.       $query1 .="`".$key."`, ";
  13.       $query3 .="'".$value."', ";
  14.      }
  15.     }
  16.  
  17. //remove the trailing ", "
  18.     $query1=substr($query1, 0, strlen($query1)-2);
  19.     $query3=substr($query3, 0, strlen($query3)-2);
  20.  
  21. //glue em up and add the final ")"
  22.     $query0 .= $query1.$query2.$query3.")";
  23.  
  24. //query…
  25.     $this->connection->query($query0);
  26.  
  27. //reset the strings
  28.     $query0='';
  29.     $query1='';
  30.     $query2='';
  31.     $query3='';
  32.    }
  33.  }

that generates mysql queries like
INSERT INTO `serp_tags_keys` (`tag`, `cat`, `date`) VALUES (’ranking’, ‘alexa ranking’, ‘200901′) and copies the entire table.

That is how I handle the table data.

Of course I have to define two custom classes to process the serp.putTable procedure itself, using the Incutio class.

First the class for the sending script, which is pretty straight forward :

  • make an IXR_Client instance
  • hand the record set to it
  • have it formatted and sent
  1. //include the library
  2. include('class-IXR.php');
  3.  
  4. //make a custom class that uses the IXR_client
  5. Class SerpClient
  6. {
  7.  var $rpcurl;         //endpoint
  8.  var $username;   //you go figure
  9.  var $password;
  10.  var $bClient;      //incutio ixr-client instance
  11.  var $myclient;  //machine/host-id
  12.  
  13.     function SerpClient($rpcurl, $username, $password, $myclient)
  14.     {
  15.  $this->rpcurl = $rpcurl;
  16.     if (!$this->connect()) return false;
  17.  
  18.      //Standard variables to send in the message
  19.  $this->rpcurl = (string) $rpcurl;
  20.      $this->username = (string) $username;
  21.      $this->password = (string) $password;
  22.  $this->myclient = (string) $myclient;
  23.      return $this;
  24.     }
  25.  
  26.      function connect()
  27.    {
  28. //basic client, it takes the endpoint url, tests and returns true if it exists
  29.      if($this->bClient = new IXR_Client($this->rpcurl)) return true;
  30.     }
  31.  
  32. //the function I use to send the data
  33.   function putTable($tablename, $recordcount, $tableid, $array)
  34.  {
  35. //first parameter is always the methodname, then the parameters, which are
  36. //added sequential to the xml-file (with the appropriate tags for datatypes.
  37. //the script figures that out. note : it uses htmlentities on strings.
  38.   $this->bClient->query('serp.putTable', $this->username, $this->password, $tablename, $recordcount, $tableid, $array);
  39.  }
  40.  
  41. }

I use it in the snippets above with :

  1. $ThisClient = New SerpClient('http://serp.trismegistos.net/db/xmlrpc.php', 'user', 'pass', 'sender');
  2. //…
  3. $ThisClient->putTable($tname, $tcount, $tableid, $records);

Then, on the receiving end, my program has to know how to handle the xml containing the remote procedure call.

I define an extension on IXR_server and pass serp.putTable as new ‘method’ (callback function).

  1. //go away cookie…
  2. $_COOKIE = array();
  3.  
  4. //make sure you get the posted crap, the ixr instances grabs it input from it
  5. if ( !isset( $HTTP_RAW_POST_DATA ) ) $HTTP_RAW_POST_DATA = file_get_contents( 'php://input' );
  6. if ( isset($HTTP_RAW_POST_DATA) ) $HTTP_RAW_POST_DATA = trim($HTTP_RAW_POST_DATA);
  7.  
  8. //include the library
  9. include('class-IXR.php');
  10.  
  11. //make an extended class
  12. class serp_xmlrpc_server extends IXR_Server {
  13.  
  14. //use the same function name…
  15.  
  16.  function serp_xmlrpc_server() {
  17.  
  18. //build an array of methods :
  19. //first the procedurename you use in the xml-text,
  20. //then which function in the extended class (this one) it maps to
  21. //to be used as $this->method
  22.  
  23.   $this->methods = array('serp.putTable'  => 'this:putTable');
  24.  
  25. //hand em to the IXR server instance that will map it as callback
  26.   $this->IXR_Server($this->methods);
  27.  }
  28.  
  29. //now IXR_Server instance uses ($this->)putTable
  30. //to process incoming xml-text
  31. //containing serp.putTable as methodname
  32.  
  33.   function putTable($args)
  34.  {
  35. //(for routine : see the snippet above to store the xml data in mysql)
  36.  }
  37. }
  38.  
  39. //make the class instance like any regular get-post php program,
  40. //the only actual program line, that instantiates the extended class,
  41. //which handles the posted xml
  42.  
  43. $serp_xmlrpc_server = new serp_xmlrpc_server();

That’s all. I am not going to list a cut-and-paste version. You have to build some stuff with it, then you will come up with lots of stuff you can do with it.

Wordpress and iPhone built a plugin that receives pictures from iPhone. Wordpress uses Incutio so you can ‘piggyback’ on that and have an iPhone plugin for your own website in two days flat using an ajax lightbox gallery script. Or go monetize small websites with some seo oriented ‘optimisation’ functions like ChangeFooterLinks(array($paidurl, $anchortext)) :) or whatever… boring, isn’t it ?

Comments
No Comments »
Categories
optimisation, php, xml-rpc
Tags
optimisation, php, xml-rpc
Comments rss Comments rss
Trackback Trackback

RedHat Seo : scraper auto-blogging

juust | 26/12/2008

Just give us your endpoint and we’ll take it from there, sparky!

I was going to make one of these tools to scrape google and conjur a full blog out of nowhere, as Christmas special, RedHat Seo. The rough sketch has arrived , far from perfect, but it does produce a blog and don’t even look too shabby. I scraped a small batch of posts off of blogs, keeping the links intact and adding a tribute links. I hope they will pardon me for it.

structure

I use three main classes,

BlogMaker the application
Target the blogs you aim for
WPContent the scraped goodies

…and two support classes

SerpResult scraped urls
Custom_RPC a simple rpc-poster

Target blogs have three texts,

file contents maintenance
blog categories category you post under manual
blog tags tags you list on the blog manual
blog urls urls already used for the blog system

routine

The BlogMaker class grabs a result list (up to 1000 urls per phrase) from Google, extracts the urls and stores them in SerpResult, scrapes the urls and extracts the entry divs, stores div-entries in the WPContent class (that has some basic functions to sanitize the text), and uses the BlogTarget-definitions to post it up blogs with xml-rpc.

usage

My highlighter tends to mess up text with div markers in it, copying off the blog may not work,
the full text source (about 500 lines) is overhere. Underneath I’ll list the main program loop :

  1.  
  2. //make main instance
  3. $Blog = new BlogMaker("keyword");
  4.  
  5. //define a target blog, you can define multiple blogs and refer with code
  6. //then add rpc-url, password and user
  7. //and for every target blog three text-files
  8.  
  9. $T=$Blog->AddTarget(
  10.  'blogcode',
  11.  'http://my.blog.com/xmlrpc.php',
  12.  'password',
  13.  'user',
  14.  'keyword.categories.txt',
  15.  'keyword.tags.txt',
  16.  'keyword.urls.txt'
  17.  );
  18.  
  19. //read the tags, cats and url text files stored on the server
  20. //all retrieved urls are tested, if the target blog already has that
  21. //scraped url, it is discarded.
  22. $T->CSV_GetTags();
  23. $T->List_GetCats();
  24. $T->ReadURL();
  25.  
  26. //grab the google result list
  27. //use params (pages, keywords) to specify search
  28. $Blog->GoogleResults();
  29.  
  30. $a=0;
  31. foreach($Blog->Results as $BlogUrl) {
  32.   $a++;
  33.   echo $BlogUrl->url;
  34. //see if the url isnt used yet
  35.  
  36.  if($T->checkURL(trim($BlogUrl->url))!=true) {
  37.    echo '…checking ';
  38.    flush();
  39. //if not used, get the source
  40.    $BlogUrl->scrape();
  41. //check for divs marked "entry", if they arent there, check "post"
  42. //some blogs use other indications for the content
  43. //but entry and post cover 40%
  44.  
  45.    $entries = $BlogUrl->get_entries();
  46.    if(count($entries)&lt;1) {
  47.     echo 'no entries…';
  48.     flush();
  49.     $entries = $BlogUrl->get_posts();
  50.      if(count($entries)&lt;1) {
  51.       echo 'no posts either…';
  52. //if no entry-post div, mark url as done
  53.  
  54.       $T->RegisterURL($BlogUrl->url);
  55.      }
  56.    }
  57.  
  58.    $ct=0;
  59.    foreach($BlogUrl->WpContentPieces as $WpContent) {
  60. //in the get_entries/get_post function the fragments are stored
  61. //as wpcontent
  62.     $ct++;
  63.  
  64.     if($WpContent->judge(2000, 200, 5)) {
  65.      $WpContent->tribute();  //add tribute link
  66.      $T->settags($WpContent->divcontent); //add tags
  67.      $T->postCustomRPC($WpContent->title, $WpContent->divcontent, 1); //1=publish, 0=draft
  68.      $T->RegisterURL($WpContent->url);  //register use of url
  69. usleep(20000000);  //20 seconds break, for sitemapping
  70.     }
  71.    }
  72.   }
  73.  }

notes

  • xml-rpc needs to be activated explicitly on the wordpress dashboard under settings/writing.
  • categories must be present in the blog
  • url file must be writeable by the server (777)

It seems wordpress builds the sitemap as background process, the standard google xml sitemap plugin wil attempt to build in the cache (takes anywhere between 2 and 10 seconds), and apart from building a sitemap the posts also get pinged around. Giving the install 10 to 20 seconds between posts allows for all the hooked in functions to be completed.

period

That’s about all,
consider it gpl, I added some comments in the source but I will not develop this any further. A mysql backed blogfarm tool (euphemistically called ‘publishing tool’) is more interesting, besides, I am off to the wharves to do some painting.

if you use it, send some feedback,
merry christmas dogheads

Comments
1 Comment »
Categories
google, seo, seo tips and tricks, tool, wordpress, xml-rpc
Tags
google, scrape, seo, seo tips and tricks, tool, wordpress, xml-rpc
Comments rss Comments rss
Trackback Trackback

« Previous Entries Next Entries »

Recent Posts

  • geert wilders
  • gone till september
  • socialize me
  • Pagerank sculpting session
  • wish you were here

click me!
rss
Comments rss
Blog Directory
Web Developement Blogs - BlogCatalog Blog Directory
Listed in LS Blogs the Blog Directory and Blog Search Engine
Blog Flux Directory
joopita.com free web directory and search engine
design by jide
sitemap
22260 confirmed spam kills