{"id":375,"date":"2009-03-25T11:53:13","date_gmt":"2009-03-25T09:53:13","guid":{"rendered":"http:\/\/www.juust.org\/?p=375"},"modified":"2020-06-12T21:34:25","modified_gmt":"2020-06-12T19:34:25","slug":"curl-trackbacks","status":"publish","type":"post","link":"https:\/\/www.juust.org\/index.php\/curl-trackbacks\/2009\/03\/","title":{"rendered":"curl trackbacks"},"content":{"rendered":"<p>I figure i&#8217;d blog a post on trackback linkbuilding. A trackback is &#8230; (post a few and you&#8217;ll get it). The trackback protocol isn&#8217;t that interesting, but the implementation of it by blog-platforms and cms&#8217;es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).<\/p>\n<p>To post a succesful link proposal I need some basic data :<\/p>\n<p>about my page <\/p>\n<ul>\n<li>url (must exist)<\/li>\n<li>blog owner (free)<\/li>\n<li>blog name (free)<\/li>\n<\/ul>\n<p>about the other page<\/p>\n<ul>\n<li>url (must exist)<\/li>\n<li>excerpt (should be proper normal text)<\/li>\n<\/ul>\n<p><em>my page :<\/em> this is preferably a php routine that hacks some text, pictures and video&#8217;s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.<\/p>\n<p><em>other page :<\/em> don&#8217;t use &#8220;I liked your article so much&#8230;&#8221;, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value). <\/p>\n<p>Let&#8217;s get some stuff from the MSN rss feed :<\/p>\n<pre lang=\"php\">\r\n\/\/a generic query = 5% success\r\n\/\/add \"(powered by) wordpress\" \r\n      $query=urlencode('keywords+wordpress+trackback');\r\n      $xml = @simplexml_load_file(\"http:\/\/search.live.com\/results.aspx?q=$query&count=50&first=1&format=rss\");\r\n      $count=0;\r\n      foreach($xml->channel->item as $i) {\r\n\r\n           $count++;\r\n\r\n\/\/the data from msn\r\n           $target['link'] = (string) $i->link;\r\n           $target['title'] = (string) $i->title;\r\n           $target['excerpt'] = (string) $i->description;\r\n\r\n\/\/some variables I'll need later on\r\n           $target[id'] = $count;\r\n           $target['trackback'] = '';\r\n           $target['trackback_success'] = 0;\r\n\r\n           $trackbacks[]=$target;\r\n       }\r\n<\/pre>\n<p>25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses \/trackback\/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url&#8217;s in it, if any contains \/t<strong>rackbac<\/strong>k\/, I post a trackback to that url  and see if it sticks. <\/p>\n<p>(I can also spider all links and check if there is an rdf-segment in the target&#8217;s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).<\/p>\n<pre lang=\"php\">\r\nfor($t=0;$t<count ($trackbacks);$t++) {\r\n\/\/I could use curl \r\n\/\/but 95% of the urls offered are kosher and respond fast\r\n     $content = @file_get_contents($trackbacks[$t]['link']);\r\n     preg_match_all (\"\/a[\\s]+[^>]*?href[\\s]?=[\\s\\\"\\']+\".\r\n           \"(.*?)[\\\"\\']+.*?>\".\"([^< ]+|.*?)?<\\\/a>\/\",\r\n        $content, &$matches);\r\n\t$uri_array = $matches[1];\r\n\tforeach($uri_array as $key => $link) { \r\n             if(strpos($link, 'rackbac')>0) { \r\n                $trackbacks[$t]['trackback'] = $link;\r\n                break; \r\n             }\r\n        }\r\n}\r\n<\/count><\/pre>\n<p>When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.  <\/p>\n<pre lang=\"php\">\r\nfunction cache_xml_store($trackbacks, $pagetitle) \r\n{\r\n\t$xml = '< ?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n\t<trackbacks>';\r\n\tfor($a=0;$a<count ($trackbacks);$a++) {\r\n\t\t$arr = $trackbacks[$a];\r\n\t\t$xml .= '<entry>';\r\n\t\t$xml .= '<id>'.$arr['id'].'<\/id>';\r\n\t\t$xml .= '<excerpt>'.$arr['excerpt'].'<\/excerpt>';\r\n\t\t$xml .= '<link \/>'.$arr['link'].'';\r\n\t\t$xml .= '<title>'.$arr['title'].'<\/title>';\r\n\t\t$xml .= '';\r\n\t}\r\n\t$xml .= '<\/count><\/trackbacks>';\r\n\t\r\n\t$fname = 'cache\/trackback'.urlencode($pagetitle).'.xml';\r\n\tif(file_exists($fname)) unlink('cache\/'.$fname);\r\n\t$fhandle = fopen($fname, 'w');\r\n\tfwrite($fhandle, $xml);\r\n\tfclose($fhandle);\r\n\treturn;\r\n}\r\n<\/pre>\n<p>I use simplexml to read that cached file and show the excertps and links once the page is requested. <\/p>\n<pre lang=\"php\">\r\n\/\/ retrieve the cached xml and return it as array.\r\nfunction cache_xml_retrieve($pagetitle)\r\n{\r\n\t$fname = 'cache\/trackback'.urlencode($pagetitle).'.xml';\r\n\tif(file_exists($fname)) {\r\n\t\t$xml=@simplexml_load_file($fname);\r\n\t\tif(!$xml) return false;\r\n\t\tforeach($xml->entry as $e) {\r\n\t\t\t$trackback['id'] =(string) $e->id;\r\n\t\t\t$trackback['link'] =  rid((string) $e->link);\r\n\t\t\t$trackback['title'] =  (string) $e->title;\r\n\t\t\t$trackback['description'] =  (string) $e->description;\r\n\r\n\t\t\t$trackbacks[] = $arr;\r\n\t\t}\r\n\t\treturn $trackbacks;\r\n\t} \r\n\treturn false;\r\n}\r\n<\/pre>\n<p>(this setup requires a subdirectory <strong>cache<\/strong> set to read\/write with chmod 777)<\/p>\n<p>I use http:\/\/www.domain.com\/financial+trends.html and extract the pagetitle as &#8220;financial trends&#8217;, which has an xml-file http:\/\/www.domain.com\/cache\/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).<\/p>\n<pre lang=\"php\">\r\n$pagetitle=preg_replace('\/\\+\/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, \"UTF-8\"));\r\n\r\n$cached_excerpts = cache_xml_retrieve($pagetitle);\r\n\r\n\/\/do some stuff with, make it look nice  :\r\nfor($s=0;$s<count ($cached_excerpts);$s++) {\r\n\/\/this lists the trackback (candidates)\r\n    echo $cached_excerpts[$s]['excerpt'];\r\n    echo '<a href=\"'.$cached_excerpts[$s]['link'].'\">'.$cached_excerpts['title'].'';\r\n}\r\n<\/count><\/pre>\n<p>Now I prepare the data for the trackback post :<\/p>\n<pre lang=\"php\">\r\nfor($t=0;$t<count ($trackbacks);$t++) {\r\n\r\n    $trackback_url = $trackbacks[$t]['trackback'];\r\n\/\/does it have a trackback target url ? then prepare data :\r\n    if($trackback_url !='') {\r\n        $trackback_data = array(\r\n\t\"url\" => \"url of my page with the link to the target\",\r\n \t\"title\" => \"title of my page\",\r\n\t\"blog_name\" => \"name of my blog\",\r\n\t\"excerpt\" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'\r\n        );\r\n        \/\/...and try the trackback\r\n        $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);\r\n    }\r\n}\r\n<\/count><\/pre>\n<p>This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I  use three seconds. If a host does not respond in half a second it&#8217;s probably dead. Three seconds is generous.<\/p>\n<pre lang=\"php\">\r\nfunction trackback_ping($trackback_url, $trackback)\r\n\t{\r\n\r\n\/\/make a string of the data array to post\r\n\tforeach($trackback as $key=>$value) $strout[]=$key.\"=\".rawurlencode($value);\r\n        $postfields= implode('&', $strout);\r\n\t\t\r\n\/\/create a curl instance\r\n\t$ch = curl_init();\r\n\tcurl_setopt($ch, CURLOPT_URL, $trackback_url);\r\n\tcurl_setopt($ch, CURLOPT_TIMEOUT, 3);\r\n\tcurl_setopt($ch, CURLOPT_USERAGENT, \"Mozilla\/4.0 (compatible; MSIE 5.01; Windows NT 5.0)\");\r\n\tcurl_setopt($ch, CURLOPT_RETURNTRANSFER, true);\r\n\r\n\/\/set a custom form header\r\n\tcurl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application\/x-www-form-urlencoded'));\r\n\r\n\tcurl_setopt($ch, CURLOPT_NOBODY, true);\r\n\r\n        curl_setopt($ch, CURLOPT_POST, true);\r\n\tcurl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);\t\r\n\t\t\r\n\t$content = curl_exec($ch);\r\n\r\n\/\/if the return has a tag 'error' with as value 0 it went flawless\r\n\t$success = 0;\t\r\n\tif(strpos($content, '>0')>0) $success = 1; \r\n\tcurl_close ($ch);\r\n\tunset($ch);\r\n\treturn $success;\r\n\t}\r\n<\/pre>\n<p>Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :<\/p>\n<pre lang=\"php\">\r\nfor($t=0;$t<count ($trackbacks);$t++) {\r\n    if($trackbacks[$t]['trackback_success']>0) {\r\n        $store_trackbacks[]=$trackbacks[$t];\r\n    }\r\n}\r\ncache_xml_store($store_trackbacks, $pagetitle);\r\n<\/count><\/pre>\n<p>voila : a page with only successful trackbacks. <\/p>\n<p>Google (the backrub engine) don&#8217;t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost. <\/p>\n<p>0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo&#8217;s.<\/p>\n<p>*1) If you want to spider links for rdf-segments : <a href=\"https:\/\/svn.typo3.org\/TYPO3v4\/Extensions\/yablog\/trunk\/class.tx_yablog_ping.php\" rel=\"nofollow noopener\" target=\"_blank\">TYPO3v4<\/a> have some code for easy retrieval of trackback-uri&#8217;s :<\/p>\n<pre lang=\"php\">\r\n\/**\r\n\t * Fetches ping url from the given url\r\n\t *\r\n\t * @param\tstring\t$url\tURL to probe for RDF\r\n\t * @return\tstring\tPing URL\r\n\t *\/\r\n\tprotected function getPingURL($url) {\r\n\t\t$pingUrl = '';\r\n\t\t\/\/ Get URL content\r\n\t\t$urlContent = t3lib_div::getURL($url);\r\n\t\tif ($urlContent && ($rdfPos = strpos($urlContent, '<rdf :RDF')) !== false) {\r\n\t\t\t\/\/ RDF exists in this content. Get it and parse\r\n\t\t\t$urlContent = substr($urlContent, $rdfPos);\r\n\t\t\tif (($endPos = strpos($urlContent, '<\/rdf:RDF>', $rdfPos)) !== false) {\r\n\t\t\t\t\/\/ We will use quick regular expression to find ping URL\r\n\t\t\t\t$rdfContent = substr($urlContent, $rdfPos, $endPos);\r\n\t\t\t\t$pingUrl = preg_replace('\/trackback:ping=\"([^\"]+)\"\/', '\\1', $rdfContent);\r\n\t\t\t}\r\n\t\t}\r\n\t\treturn $pingUrl;\r\n\t}\r\n<\/rdf><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I figure i&#8217;d blog a post on trackback linkbuilding. A trackback is &#8230; (post a few and you&#8217;ll get it). The trackback protocol isn&#8217;t that interesting, but the implementation of it by blog-platforms and cms&#8217;es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy). To post [&hellip;]<\/p>\n","protected":false},"author":5796,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_sitemap_exclude":false,"_sitemap_priority":"","_sitemap_frequency":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[3],"tags":[103,94,104,27],"class_list":["post-375","post","type-post","status-publish","format-standard","hentry","category-php","tag-links","tag-php","tag-seo-tips-and-tricks","tag-trackback"],"_links":{"self":[{"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/posts\/375","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/users\/5796"}],"replies":[{"embeddable":true,"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/comments?post=375"}],"version-history":[{"count":0,"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/posts\/375\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/media?parent=375"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/categories?post=375"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.juust.org\/index.php\/wp-json\/wp\/v2\/tags?post=375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}