<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>juust ~ php oddities &#187; scrape</title>
	<atom:link href="http://www.juust.org/index.php/tag/scrape/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.juust.org</link>
	<description>Link theory and search engine optimization</description>
	<lastBuildDate>Thu, 19 Jan 2012 09:39:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>google suggest scraper (php &amp; simplexml)</title>
		<link>http://www.juust.org/index.php/google-suggest-scraper-php-simplexml/2011/12/</link>
		<comments>http://www.juust.org/index.php/google-suggest-scraper-php-simplexml/2011/12/#comments</comments>
		<pubDate>Mon, 19 Dec 2011 00:08:12 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[google]]></category>
		<category><![CDATA[seo tips and tricks]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[scrape]]></category>
		<category><![CDATA[simplexml]]></category>
		<category><![CDATA[tool]]></category>

		<guid isPermaLink="false">http://www.juust.org/?p=1834</guid>
		<description><![CDATA[Today&#8217;s goal is a basic php Google Suggest scraper because I wanted traffic data and keywords for free. Before we start : google scraping is bad ! Good People use the Google Adwords API : 25 cents for 1000 units, &#8230; <a href="http://www.juust.org/index.php/google-suggest-scraper-php-simplexml/2011/12/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Today&#8217;s goal is a basic php Google Suggest scraper because I wanted traffic data and keywords for free.</p>
<p>Before we start :</p>
<h2 style="text-align: center;">google scraping is <strong>bad</strong> !</h2>
<p>Good People use the <a title="google adwords api" href="http://code.google.com/apis/adwords/">Google Adwords API</a> : 25 cents for 1000 units, 15++ units for keyword suggestion so they pay 4 or 5 dollar for 1000 keyword suggestions (if they can find a good programmer which also costs a few dollars). Or they opt for <a title="search engine marketing data" href="http://www.semrush.com" rel="nofollow">SemRush</a> (also my preference), <a href="http://www.keywordspy.com" rel="nofollow">KeywordSpy</a>, <a href="http://www.spyfu.com" rel="nofollow">Spyfu</a>, and other services like 7Search PPC programs to get keyword and traffic data and data on their competitors but these also charge about 80 dollars per month for a limited account up to a few hundred per month for seo companies. Good people pay plenty.</p>
<p>We tiny grey webmice of marketing however just want a few estimates, at low or better no cost : like this :</p>
<table width="276" border="0" cellspacing="0" cellpadding="0">
<colgroup>
<col width="206" />
<col width="70" /> </colgroup>
<tbody>
<tr>
<td width="206" height="20">data</td>
<td align="right" width="70">num queries</td>
</tr>
<tr>
<td width="206" height="20">google suggest</td>
<td align="right" width="70">57800000</td>
</tr>
<tr>
<td height="20">google suggestion box</td>
<td align="right">5390000</td>
</tr>
<tr>
<td height="20">google suggest api</td>
<td align="right">5030000</td>
</tr>
<tr>
<td height="20">google suggestion tool</td>
<td align="right">3670000</td>
</tr>
<tr>
<td height="20">google suggest a site</td>
<td align="right">72700000</td>
</tr>
<tr>
<td height="20">google suggested users</td>
<td align="right">57000000</td>
</tr>
<tr>
<td height="20">google suggestions funny</td>
<td align="right">37400000</td>
</tr>
<tr>
<td height="20"><strong>google suggest scraper</strong></td>
<td align="right">62800</td>
</tr>
<tr>
<td height="20">google suggestions not working</td>
<td align="right">87100000</td>
</tr>
<tr>
<td height="20">google suggested user list</td>
<td align="right">254000000</td>
</tr>
</tbody>
</table>
<p>Suggestion autocomplete is AJAX, it outputs XML :</p>
<pre>&lt; ?xml version="1.0"? &gt;
   &lt;toplevel&gt;
     &lt;CompleteSuggestion&gt;
       &lt;suggestion data="senior quotes"/&gt;
       &lt;num_queries int="30000000"/&gt;
     &lt;/CompleteSuggestion&gt;
     &lt;CompleteSuggestion&gt;
       &lt;suggestion data="senior skip day lyrics"/&gt;
       &lt;num_queries int="441000"/&gt;
     &lt;/CompleteSuggestion&gt;
   &lt;/toplevel&gt;</pre>
<p>Using SimpleXML, the PHP routine is as simple as querying <strong><em>g00gle.c0m/complete/search?</em></strong>, grabbing the autocomplete xml, and extracting the attribute data :</p>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span><span class="re1">$_SERVER</span><span class="br0">&#91;</span><span class="st0">&#39;QUERY_STRING&#39;</span><span class="br0">&#93;</span><span class="sy0">==</span><span class="st0">&#39;&#39;</span><span class="br0">&#41;</span> <span class="kw3">die</span><span class="br0">&#40;</span><span class="st0">&#39;enter a query like http://host/filename.php?query&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$contentstring</span> <span class="sy0">=</span> <span class="sy0">@</span><span class="kw3">file_get_contents</span><span class="br0">&#40;</span><span class="st0">&quot;http://g00gle.c0m/complete/search?output=toolbar&amp;amp;q=&quot;</span><span class="sy0">.</span><span class="kw3">urlencode</span><span class="br0">&#40;</span><span class="re1">$kw</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$content</span> <span class="sy0">=</span> simplexml_load_string<span class="br0">&#40;</span><span class="re1">$contentstring</span> <span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$content</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>CompleteSuggestion <span class="kw1">as</span> <span class="re1">$c</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re1">$term</span> <span class="sy0">=</span> <span class="br0">&#40;</span>string<span class="br0">&#41;</span> <span class="re1">$c</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>suggestion<span class="sy0">-&amp;</span>gt<span class="sy0">;</span>attributes<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>data<span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//note : traffic data is sometimes missing &nbsp; </span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re1">$traffic</span> <span class="sy0">=</span> <span class="br0">&#40;</span>string<span class="br0">&#41;</span> <span class="re1">$c</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>num_queries<span class="sy0">-&amp;</span>gt<span class="sy0">;</span>attributes<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>int<span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw3">echo</span> <span class="re1">$term</span><span class="sy0">.</span> <span class="st0">&quot; &quot;</span><span class="sy0">.</span><span class="re1">$traffic</span> <span class="sy0">.</span> <span class="st0">&quot;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="st0">&quot;</span> <span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
</ol>
</div>
<p>I made a quick php script that outputs the terms as a list of new queries so you can walk through the suggestions :</p>
<p><a href="http://www.juust.org/wp-content/uploads/2011/12/google-suggest-scraper-tool.jpg" rel="shadowbox[post-1834];player=img;"><img class="alignnone size-medium wp-image-1835" title="google suggest scraper tool" src="http://www.juust.org/wp-content/uploads/2011/12/google-suggest-scraper-tool-300x289.jpg" alt="" width="529" height="400" /></a></p>
<p><a href="http://www.juust.org/wp-content/uploads/2011/12/google-suggest-scraper-tool-II1.jpg" rel="shadowbox[post-1834];player=img;"><img class="alignnone size-medium wp-image-1837" title="google suggest scraper tool II" src="http://www.juust.org/wp-content/uploads/2011/12/google-suggest-scraper-tool-II1-300x286.jpg" alt="" width="462" height="440" /></a></p>
<p>The source is as text file up for <a title="google suggest scraper tool code" href="http://www.juust.org/suggestit.txt">download overhere</a> (rename it to suggestit.php and it should run on any server with php5.* and simplexml).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.juust.org/index.php/google-suggest-scraper-php-simplexml/2011/12/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>proxies !</title>
		<link>http://www.juust.org/index.php/icanhazproxies/2009/02/</link>
		<comments>http://www.juust.org/index.php/icanhazproxies/2009/02/#comments</comments>
		<pubDate>Sat, 21 Feb 2009 03:41:16 +0000</pubDate>
		<dc:creator>juust</dc:creator>
				<category><![CDATA[php]]></category>
		<category><![CDATA[seo tips and tricks]]></category>
		<category><![CDATA[scrape]]></category>

		<guid isPermaLink="false">http://www.juust.org/?p=336</guid>
		<description><![CDATA[I got a site banned at Google so I got pissed and took a script from the blackbox @ digerati marketing to scrape proxy addresses, wired a database and curl into it, so now it scrapes proxies, random picks a &#8230; <a href="http://www.juust.org/index.php/icanhazproxies/2009/02/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I got a site banned at Google so I got pissed and took a script from the blackbox <a href="http://www.digeratimarketing.co.uk/2008/06/12/blackhat-seo-tools-scripts-the-digerati-blackbox/" rel="nofollow">@ digerati marketing</a> to scrape proxy addresses, wired a database and curl into it, so now it scrapes proxies, random picks a proxy, prunes dead proxies and returns data. </p>
<p>Basic, it uses anonymous (level 2) proxies, but it works.</p>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">/* (mysql table)</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">CREATE TABLE IF NOT EXISTS `serp_proxies` (</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; `id` int(11) NOT NULL auto_increment,</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; `ip` text NOT NULL,</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; `port` text NOT NULL,</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; PRIMARY KEY &nbsp;(`id`)</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">) ENGINE=MyISAM &nbsp;DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">*/</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//initialize database class, replace with own code</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">include</span><span class="br0">&#40;</span><span class="st0">&#39;init.php&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//main class</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$p</span><span class="sy0">=</span><span class="kw2">new</span> MyProxies<span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//do I have proxies in the database ?</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//if not, get some and store them</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">if</span><span class="br0">&#40;</span><span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">GetCount</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">&lt;</span> <span class="nu0">1</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">GetSomeAir</span><span class="br0">&#40;</span><span class="nu0">1</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">store2database</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//pick one</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">RandomProxy</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//get the page</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">DoRequest</span><span class="br0">&#40;</span><span class="st0">&#39;http://www.domain.com/robots.txt&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//error handling</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">if</span><span class="br0">&#40;</span><span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">ProxyError</span> <span class="sy0">&gt;</span> <span class="nu0">0</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//7 &nbsp; no connect</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//28 &nbsp; timed out</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//52 &nbsp; empty reply</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//if it is dead, doesn&#39;t allow connections : prune it</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw1">if</span><span class="br0">&#40;</span><span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">ProxyError</span><span class="sy0">==</span><span class="nu0">7</span><span class="br0">&#41;</span> <span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">DeleteProxy</span><span class="br0">&#40;</span><span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">proxy_ip</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw1">if</span><span class="br0">&#40;</span><span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">ProxyError</span><span class="sy0">==</span><span class="nu0">52</span><span class="br0">&#41;</span> <span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">DeleteProxy</span><span class="br0">&#40;</span><span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">proxy_ip</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//you could loop back until you get a 0-error proxy, but that ain&#39;t the point</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//give me the content</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">echo</span> <span class="re1">$p</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">Content</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">Class</span> MyProxies <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$Proxies</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$ThisProxy</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$MyCount</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//picks a random proxy from the database</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">function</span> RandomProxy<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw3">global</span> <span class="re1">$serpdb</span><span class="sy0">;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$offset_result</span> <span class="sy0">=</span> &nbsp;<span class="re1">$serpdb</span><span class="sy0">-&gt;</span><span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT FLOOR(RAND() * COUNT(*)) AS `offset` FROM `serp_proxies`&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$offset_row</span> <span class="sy0">=</span> <span class="kw3">mysql_fetch_object</span><span class="br0">&#40;</span><span class="re1">$offset_result</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$offset</span> <span class="sy0">=</span> <span class="re1">$offset_row</span><span class="sy0">-&gt;</span><span class="me1">offset</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$result</span> <span class="sy0">=</span> <span class="re1">$serpdb</span><span class="sy0">-&gt;</span><span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT * FROM `serp_proxies` LIMIT $offset, 1&quot;</span> <span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">while</span><span class="br0">&#40;</span><span class="re1">$row</span><span class="sy0">=</span><span class="kw3">mysql_fetch_assoc</span><span class="br0">&#40;</span><span class="re1">$result</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//make instance of Proxy, with proxy_host ip and port</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span> <span class="sy0">=</span> <span class="kw2">new</span> Proxy<span class="br0">&#40;</span><span class="re1">$row</span><span class="br0">&#91;</span><span class="st0">&#39;ip&#39;</span><span class="br0">&#93;</span><span class="sy0">.</span><span class="st0">&#39;:&#39;</span><span class="sy0">.</span><span class="re1">$row</span><span class="br0">&#91;</span><span class="st0">&#39;port&#39;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">proxy_ip</span> <span class="sy0">=</span> <span class="re1">$row</span><span class="br0">&#91;</span><span class="st0">&#39;ip&#39;</span><span class="br0">&#93;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ThisProxy</span><span class="sy0">-&gt;</span><span class="me1">proxy_port</span> <span class="sy0">=</span> <span class="re1">$row</span><span class="br0">&#91;</span><span class="st0">&#39;port&#39;</span><span class="br0">&#93;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">break</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//visit the famous russian site </span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">function</span> GetSomeAir<span class="br0">&#40;</span><span class="re1">$pages</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">for</span><span class="br0">&#40;</span><span class="re1">$index</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> <span class="re1">$index</span><span class="sy0">&lt;</span> <span class="re1">$pages</span><span class="sy0">;</span> <span class="re1">$index</span><span class="sy0">++</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$pageno</span> <span class="sy0">=</span> <span class="kw3">sprintf</span><span class="br0">&#40;</span><span class="st0">&quot;%02d&quot;</span><span class="sy0">,</span><span class="re1">$index</span><span class="nu0">+1</span><span class="br0">&#41;</span><span class="sy0">;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$page_url</span> <span class="sy0">=</span> <span class="st0">&quot;http://www.samair.ru/proxy/proxy-&quot;</span> <span class="sy0">.</span> <span class="re1">$pageno</span> <span class="sy0">.</span> <span class="st0">&quot;.htm&quot;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$page_html</span> <span class="sy0">=</span> <span class="sy0">@</span><span class="kw3">file_get_contents</span><span class="br0">&#40;</span><span class="re1">$page_url</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//get rid of the crap and extract the proxies</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw3">preg_match</span><span class="br0">&#40;</span><span class="st0">&quot;/&lt;tr&gt;&lt;td&gt;(.*)&lt; <span class="es0">\/</span>td&gt;&lt; <span class="es0">\/</span>tr&gt;/&quot;</span><span class="sy0">,</span> <span class="re1">$page_html</span><span class="sy0">,</span> <span class="re1">$matches</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$txt</span> <span class="sy0">=</span> <span class="re1">$matches</span><span class="br0">&#91;</span><span class="nu0">1</span><span class="br0">&#93;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$main</span> <span class="sy0">=</span> <span class="kw3">split</span><span class="br0">&#40;</span><span class="st0">&#39;&lt;/td&gt;&lt;tr&gt;&lt;td&gt;&#39;</span><span class="sy0">,</span> <span class="re1">$txt</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">for</span><span class="br0">&#40;</span><span class="re1">$x</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span><span class="re1">$x</span><span class="sy0">&lt;</span>count <span class="br0">&#40;</span><span class="re1">$main</span><span class="br0">&#41;</span><span class="sy0">;</span><span class="re1">$x</span><span class="sy0">++</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="re1">$arr</span> <span class="sy0">=</span> <span class="kw3">split</span><span class="br0">&#40;</span><span class="st0">&#39;&lt;/td&gt;&lt;td&gt;&#39;</span><span class="sy0">,</span> <span class="re1">$main</span><span class="br0">&#91;</span><span class="re1">$x</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">Proxies</span><span class="br0">&#91;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="kw3">split</span><span class="br0">&#40;</span><span class="st0">&#39;:&#39;</span><span class="sy0">,</span> <span class="re1">$arr</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//store the retrieved proxies (stored in this-&gt;Proxies) in the database</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">function</span> store2database<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw3">global</span> <span class="re1">$serpdb</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">Proxies</span> <span class="kw1">as</span> <span class="re1">$p</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$result</span> <span class="sy0">=</span> <span class="re1">$serpdb</span><span class="sy0">-&gt;</span><span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT * FROM serp_proxies WHERE ip=&#39;&quot;</span><span class="sy0">.</span><span class="re1">$p</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="sy0">.</span><span class="st0">&quot;&#39;&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">if</span><span class="br0">&#40;</span><span class="kw3">mysql_num_rows</span><span class="br0">&#40;</span><span class="re1">$result</span><span class="br0">&#41;</span><span class="sy0">&amp;</span>lt<span class="sy0">;</span><span class="nu0">1</span><span class="br0">&#41;</span> <span class="re1">$serpdb</span><span class="sy0">-&gt;</span><span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;INSERT INTO serp_proxies (`ip`, `port`) VALUES (&#39;&quot;</span><span class="sy0">.</span><span class="re1">$p</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="sy0">.</span><span class="st0">&quot;&#39;, &#39;&quot;</span><span class="sy0">.</span><span class="re1">$p</span><span class="br0">&#91;</span><span class="nu0">1</span><span class="br0">&#93;</span><span class="sy0">.</span><span class="st0">&quot;&#39;)&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$serpdb</span><span class="sy0">-&gt;</span><span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;DELETE FROM serp_proxies WHERE `ip`=&#39;&#39;&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">function</span> DeleteProxy<span class="br0">&#40;</span><span class="re1">$ip</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw3">global</span> <span class="re1">$serpdb</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$serpdb</span><span class="sy0">-&gt;</span><span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;DELETE FROM serp_proxies WHERE `ip`=&#39;&quot;</span><span class="sy0">.</span><span class="re1">$ip</span><span class="sy0">.</span><span class="st0">&quot;&#39;&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">function</span> GetCount<span class="br0">&#40;</span><span class="br0">&#41;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//use this to check how many proxies there are in the database</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw3">global</span> <span class="re1">$serpdb</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">MyCount</span> <span class="sy0">=</span> <span class="kw3">mysql_num_rows</span><span class="br0">&#40;</span><span class="re1">$serpdb</span><span class="sy0">-&gt;</span><span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT * FROM `serp_proxies`&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">return</span> <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">MyCount</span><span class="sy0">;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">Class</span> Proxy <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$proxy_ip</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$proxy_port</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$proxy_host</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$proxy_auth</span><span class="sy0">;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$ch</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$Content</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$USERAGENT</span> <span class="sy0">=</span> <span class="st0">&quot;Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)&quot;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$ProxyError</span> <span class="sy0">=</span> <span class="nu0">0</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$ProxyErrorMsg</span> <span class="sy0">=</span> <span class="st0">&#39;&#39;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$TimeOut</span><span class="sy0">=</span><span class="nu0">3</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$IncludeHeaders</span> <span class="sy0">=</span> <span class="nu0">0</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">function</span> Proxy<span class="br0">&#40;</span><span class="re1">$host</span><span class="sy0">,</span> <span class="re1">$username</span><span class="sy0">=</span><span class="st0">&#39;&#39;</span><span class="sy0">,</span> <span class="re1">$pwd</span><span class="sy0">=</span><span class="st0">&#39;&#39;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//initialize class, set host </span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">proxy_host</span> <span class="sy0">=</span> <span class="re1">$host</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> <span class="br0">&#40;</span><span class="kw3">strlen</span><span class="br0">&#40;</span><span class="re1">$username</span><span class="br0">&#41;</span> <span class="sy0">&gt;</span> <span class="nu0">0</span> <span class="sy0">||</span> <span class="kw3">strlen</span><span class="br0">&#40;</span><span class="re1">$pwd</span><span class="br0">&#41;</span> <span class="sy0">&gt;</span> <span class="nu0">0</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">proxy_auth</span> <span class="sy0">=</span> <span class="re1">$username</span><span class="sy0">.</span><span class="st0">&quot;:&quot;</span><span class="sy0">.</span><span class="re1">$pwd</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">function</span> CURL_PROXY<span class="br0">&#40;</span><span class="re1">$cc</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">if</span> <span class="br0">&#40;</span><span class="kw3">strlen</span><span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">proxy_host</span><span class="br0">&#41;</span> <span class="sy0">&gt;</span> <span class="nu0">0</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; curl_setopt<span class="br0">&#40;</span><span class="re1">$cc</span><span class="sy0">,</span> CURLOPT_PROXY<span class="sy0">,</span> <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">proxy_host</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span><span class="kw3">strlen</span><span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">proxy_auth</span><span class="br0">&#41;</span> <span class="sy0">&gt;</span> <span class="nu0">0</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;curl_setopt<span class="br0">&#40;</span><span class="re1">$cc</span><span class="sy0">,</span> CURLOPT_PROXYUSERPWD<span class="sy0">,</span> <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">proxy_auth</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">function</span> DoRequest<span class="br0">&#40;</span><span class="re1">$url</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span> <span class="sy0">=</span> curl_init<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; curl_setopt<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="sy0">,</span> CURLOPT_URL<span class="sy0">,</span><span class="re1">$url</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">CURL_PROXY</span><span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; curl_setopt<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="sy0">,</span> CURLOPT_HEADER<span class="sy0">,</span> <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">IncludeHeaders</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">// baca header</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; curl_setopt<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="sy0">,</span> CURLOPT_USERAGENT<span class="sy0">,</span> <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">USERAGENT</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; curl_setopt<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="sy0">,</span> CURLOPT_RETURNTRANSFER<span class="sy0">,</span> <span class="nu0">1</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; curl_setopt<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="sy0">,</span> CURLOPT_TIMEOUT<span class="sy0">,</span> <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">TimeOut</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">Content</span> <span class="sy0">=</span> curl_exec<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//if an error occurs, store the number and message</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>curl_errno<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#123;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ProxyError</span> <span class="sy0">=</span> &nbsp;curl_errno<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ProxyErrorMsg</span> <span class="sy0">=</span> &nbsp;curl_error<span class="br0">&#40;</span><span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">ch</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="sy0">&lt;/</span>td<span class="sy0">&gt;&lt;/</span>count<span class="sy0">&gt;&lt;/</span>td<span class="sy0">&gt;&lt;/</span>tr<span class="sy0">&gt;</span></div>
</li>
</ol>
</div>
<p>There is not much to say about it, just a rough outline. I would prefer elite level 1 proxies but for now it will have to do.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.juust.org/index.php/icanhazproxies/2009/02/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RedHat Seo : scraper auto-blogging</title>
		<link>http://www.juust.org/index.php/redhat-seo-christmas-edition/2008/12/</link>
		<comments>http://www.juust.org/index.php/redhat-seo-christmas-edition/2008/12/#comments</comments>
		<pubDate>Fri, 26 Dec 2008 18:07:01 +0000</pubDate>
		<dc:creator>juust</dc:creator>
				<category><![CDATA[google]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[seo tips and tricks]]></category>
		<category><![CDATA[tool]]></category>
		<category><![CDATA[wordpress]]></category>
		<category><![CDATA[xml-rpc]]></category>
		<category><![CDATA[scrape]]></category>

		<guid isPermaLink="false">http://www.juust.org/?p=270</guid>
		<description><![CDATA[Just give us your endpoint and we&#8217;ll take it from there, sparky! I was going to make one of these tools to scrape google and conjur a full blog out of nowhere, as Christmas special, RedHat Seo. The rough sketch &#8230; <a href="http://www.juust.org/index.php/redhat-seo-christmas-edition/2008/12/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<blockquote><p>Just give us your endpoint and we&#8217;ll take it from there, sparky!</p></blockquote>
<p>I was going to make one of these tools to scrape google and conjur a full blog out of nowhere, as Christmas special, RedHat Seo. The rough sketch has arrived , far from perfect, but it does produce a blog and don&#8217;t even look too shabby. I scraped a <a href="" rel="nofollow" target="_blank">small batch</a> of posts off of blogs, keeping the links intact and adding a tribute links. I hope they will pardon me for it. </p>
<h3>structure</h3>
<p>I use three main classes, </p>
<table>
<tbody>
<tr>
<td>BlogMaker    </td>
<td>     the application</td>
</tr>
<tr>
<td>Target         </td>
<td>     the blogs you aim for</td>
</tr>
<tr>
<td>WPContent   </td>
<td>     the scraped goodies</td>
</tr>
</tbody>
</table>
<p>&#8230;and two support classes</p>
<table>
<tbody>
<tr>
<td>SerpResult    </td>
<td>    scraped urls</td>
</tr>
<tr>
<td>Custom_RPC   </td>
<td>    a simple rpc-poster</td>
</tr>
</tbody>
</table>
<p>Target blogs have three texts, </p>
<table>
<tbody>
<tr>
<td>file</td>
<td>contents</td>
<td>maintenance</td>
</tr>
<tr>
<td>blog categories</td>
<td>category you post under</td>
<td>manual</td>
</tr>
<tr>
<td>blog tags</td>
<td> tags you list on the blog</td>
<td>manual</td>
</tr>
<tr>
<td>blog urls</td>
<td> urls already used for the blog</td>
<td>system</td>
</tr>
</tbody>
</table>
<h3>routine</h3>
<p>The BlogMaker class grabs a result list (up to 1000 urls per phrase) from Google, extracts the urls and stores them in SerpResult,  scrapes the urls and extracts the <strong>entry</strong> divs, stores div-entries in the WPContent class (that has some basic functions to sanitize the text), and uses the BlogTarget-definitions to post it up blogs with xml-rpc.</p>
<h3>usage</h3>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//make main instance</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$Blog</span> <span class="sy0">=</span> <span class="kw2">new</span> BlogMaker<span class="br0">&#40;</span><span class="st0">&quot;keyword&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//define a target blog, you can define multiple blogs and refer with code</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//then add rpc-url, password and user</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//and for every target blog three text-files </span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$T</span><span class="sy0">=</span><span class="re1">$Blog</span><span class="sy0">-&gt;</span><span class="me1">AddTarget</span><span class="br0">&#40;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="st0">&#39;blogcode&#39;</span><span class="sy0">,</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="st0">&#39;http://my.blog.com/xmlrpc.php&#39;</span><span class="sy0">,</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="st0">&#39;password&#39;</span><span class="sy0">,</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="st0">&#39;user&#39;</span><span class="sy0">,</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="st0">&#39;keyword.categories.txt&#39;</span><span class="sy0">,</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="st0">&#39;keyword.tags.txt&#39;</span><span class="sy0">,</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="st0">&#39;keyword.urls.txt&#39;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//read the tags, cats and url text files stored on the server </span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//all retrieved urls are tested, if the target blog already has that</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//scraped url, it is discarded.</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$T</span><span class="sy0">-&gt;</span><span class="me1">CSV_GetTags</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$T</span><span class="sy0">-&gt;</span><span class="me1">List_GetCats</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$T</span><span class="sy0">-&gt;</span><span class="me1">ReadURL</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//grab the google result list</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//use params (pages, keywords) to specify search</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$Blog</span><span class="sy0">-&gt;</span><span class="me1">GoogleResults</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$a</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$Blog</span><span class="sy0">-&gt;</span><span class="me1">Results</span> <span class="kw1">as</span> <span class="re1">$BlogUrl</span><span class="br0">&#41;</span> <span class="br0">&#123;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$a</span><span class="sy0">++;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw3">echo</span> <span class="re1">$BlogUrl</span><span class="sy0">-&gt;</span><span class="me1">url</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//see if the url isnt used yet</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw1">if</span><span class="br0">&#40;</span><span class="re1">$T</span><span class="sy0">-&gt;</span><span class="me1">checkURL</span><span class="br0">&#40;</span><span class="kw3">trim</span><span class="br0">&#40;</span><span class="re1">$BlogUrl</span><span class="sy0">-&gt;</span><span class="me1">url</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">!=</span><span class="kw2">true</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw3">echo</span> <span class="st0">&#39;&#8230;checking &#39;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw3">flush</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//if not used, get the source</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$BlogUrl</span><span class="sy0">-&gt;</span><span class="me1">scrape</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//check for divs marked &quot;entry&quot;, if they arent there, check &quot;post&quot;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//some blogs use other indications for the content</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//but entry and post cover 40%</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$entries</span> <span class="sy0">=</span> <span class="re1">$BlogUrl</span><span class="sy0">-&gt;</span><span class="me1">get_entries</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">if</span><span class="br0">&#40;</span><span class="kw3">count</span><span class="br0">&#40;</span><span class="re1">$entries</span><span class="br0">&#41;</span><span class="sy0">&amp;</span>lt<span class="sy0">;</span><span class="nu0">1</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw3">echo</span> <span class="st0">&#39;no entries&#8230;&#39;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw3">flush</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$entries</span> <span class="sy0">=</span> <span class="re1">$BlogUrl</span><span class="sy0">-&gt;</span><span class="me1">get_posts</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="kw1">if</span><span class="br0">&#40;</span><span class="kw3">count</span><span class="br0">&#40;</span><span class="re1">$entries</span><span class="br0">&#41;</span><span class="sy0">&amp;</span>lt<span class="sy0">;</span><span class="nu0">1</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="kw3">echo</span> <span class="st0">&#39;no posts either&#8230;&#39;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//if no entry-post div, mark url as done</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="re1">$T</span><span class="sy0">-&gt;</span><span class="me1">RegisterURL</span><span class="br0">&#40;</span><span class="re1">$BlogUrl</span><span class="sy0">-&gt;</span><span class="me1">url</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$ct</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$BlogUrl</span><span class="sy0">-&gt;</span><span class="me1">WpContentPieces</span> <span class="kw1">as</span> <span class="re1">$WpContent</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//in the get_entries/get_post function the fragments are stored</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//as wpcontent</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re1">$ct</span><span class="sy0">++;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span><span class="br0">&#40;</span><span class="re1">$WpContent</span><span class="sy0">-&gt;</span><span class="me1">judge</span><span class="br0">&#40;</span><span class="nu0">2000</span><span class="sy0">,</span> <span class="nu0">200</span><span class="sy0">,</span> <span class="nu0">5</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="re1">$WpContent</span><span class="sy0">-&gt;</span><span class="me1">tribute</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp;<span class="co1">//add tribute link</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="re1">$T</span><span class="sy0">-&gt;</span><span class="me1">settags</span><span class="br0">&#40;</span><span class="re1">$WpContent</span><span class="sy0">-&gt;</span><span class="me1">divcontent</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">//add tags</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="re1">$T</span><span class="sy0">-&gt;</span><span class="me1">postCustomRPC</span><span class="br0">&#40;</span><span class="re1">$WpContent</span><span class="sy0">-&gt;</span><span class="me1">title</span><span class="sy0">,</span> <span class="re1">$WpContent</span><span class="sy0">-&gt;</span><span class="me1">divcontent</span><span class="sy0">,</span> <span class="nu0">1</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="co1">//1=publish, 0=draft</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="re1">$T</span><span class="sy0">-&gt;</span><span class="me1">RegisterURL</span><span class="br0">&#40;</span><span class="re1">$WpContent</span><span class="sy0">-&gt;</span><span class="me1">url</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp;<span class="co1">//register use of url</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">usleep</span><span class="br0">&#40;</span><span class="nu0">20000000</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp;<span class="co1">//20 seconds break, for sitemapping</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
</ol>
</div>
<h3>notes</h3>
<ul>
<li>xml-rpc needs to be activated explicitly on the wordpress dashboard under settings/writing.</li>
<li>categories must be present in the blog</li>
<li>url file must be writeable by the server (777)</li>
</ul>
<p>It seems wordpress builds the sitemap as background process, the standard google xml sitemap plugin wil attempt to build in the cache (takes anywhere between 2 and 10 seconds), and apart from building a sitemap the posts also get pinged around. Giving the install 10 to 20 seconds between posts allows for all the hooked in functions to be completed.</p>
<h3>period</h3>
<p>That&#8217;s about all,<br />
consider it gpl, I added some comments in the source but I will not develop this any further. A mysql backed blogfarm tool (euphemistically called &#8216;publishing tool&#8217;) is more interesting, besides, I am off to the wharves to do some painting.</p>
<p>if you use it, send some feedback,<br />
merry christmas dogheads</p>
]]></content:encoded>
			<wfw:commentRss>http://www.juust.org/index.php/redhat-seo-christmas-edition/2008/12/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>google trends III</title>
		<link>http://www.juust.org/index.php/google-trends-iii/2008/12/</link>
		<comments>http://www.juust.org/index.php/google-trends-iii/2008/12/#comments</comments>
		<pubDate>Wed, 24 Dec 2008 23:53:14 +0000</pubDate>
		<dc:creator>juust</dc:creator>
				<category><![CDATA[google]]></category>
		<category><![CDATA[scrape]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://www.juust.org/?p=271</guid>
		<description><![CDATA[How to get the urls and snippets from the Google Trends details page. The news articles on the details page are listed with an &#8216;Ajax&#8217; call, they are not sent to the browser in the html source. No easy way &#8230; <a href="http://www.juust.org/index.php/google-trends-iii/2008/12/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>How to get the urls and snippets from the Google Trends details page. The news articles on the details page are listed with an &#8216;Ajax&#8217; call, they are not sent to the browser in the html source. No easy way to scrape that. </p>
<p>The blog articles are pretty straight forward : first the ugly fast way :</p>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1"><span class="re1">$mytitle</span><span class="sy0">=</span><span class="st0">&#39;manuel benitez&#39;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$mydate</span><span class="sy0">=</span><span class="st0">&#39;&#39;</span><span class="sy0">;</span> <span class="co1">//2008-12-24</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$html</span><span class="sy0">=</span><span class="kw3">file_get_contents</span><span class="br0">&#40;</span><span class="st0">&#39;http://www.google.com/trends/hottrends?q=&#39;</span><span class="sy0">.</span><span class="kw3">urlencode</span><span class="br0">&#40;</span><span class="re1">$mytitle</span><span class="br0">&#41;</span><span class="sy0">.</span><span class="st0">&#39;&amp;date=&amp;sa=X&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$start</span> <span class="sy0">=</span> <span class="kw3">strpos</span><span class="br0">&#40;</span><span class="re1">$html</span><span class="sy0">,</span> <span class="st0">&#39;&lt;div class=&quot;gsc-resultsbox-visible&quot;&gt;&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$end</span> <span class="sy0">=</span> <span class="kw3">strpos</span><span class="br0">&#40;</span><span class="re1">$html</span><span class="sy0">,</span> <span class="st0">&#39;&lt;div class=&quot;gsc-trailing-more-results&quot;&gt;&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$content</span> <span class="sy0">=</span> <span class="kw3">substr</span><span class="br0">&#40;</span><span class="re1">$html</span><span class="sy0">,</span> <span class="re1">$start</span><span class="sy0">,</span> <span class="re1">$end</span><span class="sy0">-</span><span class="re1">$start</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">echo</span> <span class="re1">$content</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="sy0">&lt;/</span>div<span class="sy0">&gt;&lt;/</span>div<span class="sy0">&gt;</span></div>
</li>
</ol>
</div>
<p>That returns the blog snippets, ugly. The other way : regular pattern matching : you can grab the divs that each content item has, marked with</p>
<ul>
<li>div class=&#8221;gs-title&#8221;</li>
<li>div class=&#8221;gs-relativePublishedDate&#8221;</li>
<li>div class=&#8221;gs-snippet&#8221;</li>
<li>div class=&#8221;gs-visibleUrl&#8221;</li>
</ul>
<p>from the html-source and organize them as &#8220;Content&#8221; array, after which you can list the content items with your own markup or store them in a database.</p>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1"><span class="co1">//I assume $mytitle is taken from the $_GET array.</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//array &#39;Content&#39; with it&#39;s members </span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">Class</span> Content <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$id</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$title</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$pubdate</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$snippet</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">var</span> <span class="re1">$url</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw2">public</span> <span class="kw2">function</span> __construct<span class="br0">&#40;</span><span class="re1">$id</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$this</span><span class="sy0">-&gt;</span><span class="me1">id</span><span class="sy0">=</span><span class="re1">$id</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//grab the source from the google page</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$html</span><span class="sy0">=</span><span class="kw3">file_get_contents</span><span class="br0">&#40;</span><span class="st0">&#39;http://www.google.com/trends/hottrends?q=&#39;</span><span class="sy0">.</span><span class="kw3">urlencode</span><span class="br0">&#40;</span><span class="re1">$mytitle</span><span class="br0">&#41;</span><span class="sy0">.</span><span class="st0">&#39;&amp;date=&amp;sa=X&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//cut out the part I want</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$start</span> <span class="sy0">=</span> <span class="kw3">strpos</span><span class="br0">&#40;</span><span class="re1">$html</span><span class="sy0">,</span> <span class="st0">&#39;&lt;div class=&quot;gsc-resultsbox-visible&quot;&gt;&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$end</span> <span class="sy0">=</span> <span class="kw3">strpos</span><span class="br0">&#40;</span><span class="re1">$html</span><span class="sy0">,</span> <span class="st0">&#39;&lt;div class=&quot;gsc-trailing-more-results&quot;&gt;&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$content</span> <span class="sy0">=</span> <span class="kw3">substr</span><span class="br0">&#40;</span><span class="re1">$html</span><span class="sy0">,</span> <span class="re1">$start</span><span class="sy0">,</span> <span class="re1">$end</span><span class="sy0">-</span><span class="re1">$start</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//grab the divs that contain title, publish date, snippet and url with regular pattern match</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">preg_match_all</span><span class="br0">&#40;</span><span class="st0">&#39;!&lt;div class=<span class="es0">\&#8221;</span>gs-title<span class="es0">\&#8221;</span>&gt;.*?&lt; <span class="es0">\/</span>div&gt;!si&#39;</span><span class="sy0">,</span> <span class="re1">$html</span><span class="sy0">,</span> <span class="re1">$titles</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">preg_match_all</span><span class="br0">&#40;</span><span class="st0">&#39;!&lt;div class=<span class="es0">\&#8221;</span>gs-relativePublishedDate<span class="es0">\&#8221;</span>&gt;.*?&lt; <span class="es0">\/</span>div&gt;!si&#39;</span><span class="sy0">,</span> <span class="re1">$html</span><span class="sy0">,</span> <span class="re1">$pubDates</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">preg_match_all</span><span class="br0">&#40;</span><span class="st0">&#39;!&lt;div class=<span class="es0">\&#8221;</span>gs-snippet<span class="es0">\&#8221;</span>&gt;.*?&lt; <span class="es0">\/</span>div&gt;!si&#39;</span><span class="sy0">,</span> <span class="re1">$html</span><span class="sy0">,</span> <span class="re1">$snippets</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">preg_match_all</span><span class="br0">&#40;</span><span class="st0">&#39;!&lt;div class=<span class="es0">\&#8221;</span>gs-visibleUrl<span class="es0">\&#8221;</span>&gt;.*?&lt; <span class="es0">\/</span>div&gt;!si&#39;</span><span class="sy0">,</span> <span class="re1">$html</span><span class="sy0">,</span> <span class="re1">$urls</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$Contents</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//organize them under Content;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$count</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$titles</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span> <span class="kw1">as</span> <span class="re1">$title</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//make a new instance of Content;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$Contents</span><span class="br0">&#91;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="kw2">new</span> Content<span class="br0">&#40;</span><span class="re1">$count</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//add title</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$count</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">title</span><span class="sy0">=</span><span class="re1">$title</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$count</span><span class="sy0">++;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$count</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$pubDates</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span> <span class="kw1">as</span> <span class="re1">$pubDate</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//add publishing date (contains some linebreak, remove it with strip_tags)</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$count</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">pubdate</span><span class="sy0">=</span><span class="kw3">strip_tags</span><span class="br0">&#40;</span><span class="re1">$pubDate</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$count</span><span class="sy0">++;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$count</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$snippets</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span> <span class="kw1">as</span> <span class="re1">$snippet</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//add snippet</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$count</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">snippet</span><span class="sy0">=</span><span class="re1">$snippet</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$count</span><span class="sy0">++;</span> </div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$count</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$urls</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span> <span class="kw1">as</span> <span class="re1">$url</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//add display url</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$count</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">url</span><span class="sy0">=</span><span class="re1">$url</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$count</span><span class="sy0">++;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//leave $count as is, the number of content-items with a 0-base array</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//add rel=nofollow to links to prevent pagerank assignment to blogs</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">for</span><span class="br0">&#40;</span><span class="re1">$ct</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span><span class="re1">$ct</span><span class="sy0">&lt;</span> <span class="re1">$count</span><span class="sy0">;</span><span class="re1">$ct</span><span class="sy0">++</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$ct</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">url</span> <span class="sy0">=</span> <span class="kw3">preg_replace</span><span class="br0">&#40;</span><span class="st0">&#39;/ target/&#39;</span><span class="sy0">,</span> <span class="st0">&#39; rel=&quot;nofollow&quot; target&#39;</span><span class="sy0">,</span> <span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$ct</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">url</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$ct</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">title</span> <span class="sy0">=</span> <span class="kw3">preg_replace</span><span class="br0">&#40;</span><span class="st0">&#39;/ target/&#39;</span><span class="sy0">,</span> <span class="st0">&#39; rel=&quot;nofollow&quot; target&#39;</span><span class="sy0">,</span> <span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$ct</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">title</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//its complete, list all content-items with some markup</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">for</span><span class="br0">&#40;</span><span class="re1">$ct</span><span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span><span class="re1">$ct</span><span class="sy0">&lt;</span> <span class="re1">$count</span><span class="sy0">;</span><span class="re1">$ct</span><span class="sy0">++</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw3">echo</span> <span class="st0">&#39;&lt;h3&gt;&#39;</span><span class="sy0">.</span><span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$ct</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">title</span><span class="sy0">.</span><span class="st0">&#39;&#39;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw3">echo</span> <span class="st0">&#39;&lt;p&gt;&lt;strong&gt;&#39;</span><span class="sy0">.</span><span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$ct</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">pubdate</span><span class="sy0">.</span><span class="st0">&#39;&lt;/strong&gt;:&lt;em&gt;&#39;</span><span class="sy0">.</span><span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$ct</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">snippet</span><span class="sy0">.</span><span class="st0">&#39;&lt;/em&gt;&lt;/p&gt;&#39;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw3">echo</span> <span class="re1">$Contents</span><span class="br0">&#91;</span><span class="re1">$ct</span><span class="br0">&#93;</span><span class="sy0">-&gt;</span><span class="me1">url</span><span class="sy0">.</span><span class="st0">&#39;&lt;br /&gt;&#39;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="sy0">&lt;/</span>div<span class="sy0">&gt;&lt;/</span>div<span class="sy0">&gt;&lt;/</span>div<span class="sy0">&gt;&lt;/</span>div<span class="sy0">&gt;&lt;/</span>div<span class="sy0">&gt;&lt;/</span>div<span class="sy0">&gt;</span></div>
</li>
</ol>
</div>
<p>It ain&#8217;t perfect, but it works. the highlighter I use gets a bit confused about the preg_match_all statements containing unclosed div&#8217;s, so copying the code of the blog may not work.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.juust.org/index.php/google-trends-iii/2008/12/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

