<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>juust ~ php oddities &#187; simplexml</title>
	<atom:link href="http://www.juust.org/index.php/tag/simplexml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.juust.org</link>
	<description>Link theory and search engine optimization</description>
	<lastBuildDate>Thu, 19 Jan 2012 09:39:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>google suggest scraper (php &amp; simplexml)</title>
		<link>http://www.juust.org/index.php/google-suggest-scraper-php-simplexml/2011/12/</link>
		<comments>http://www.juust.org/index.php/google-suggest-scraper-php-simplexml/2011/12/#comments</comments>
		<pubDate>Mon, 19 Dec 2011 00:08:12 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[google]]></category>
		<category><![CDATA[seo tips and tricks]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[scrape]]></category>
		<category><![CDATA[simplexml]]></category>
		<category><![CDATA[tool]]></category>

		<guid isPermaLink="false">http://www.juust.org/?p=1834</guid>
		<description><![CDATA[Today&#8217;s goal is a basic php Google Suggest scraper because I wanted traffic data and keywords for free. Before we start : google scraping is bad ! Good People use the Google Adwords API : 25 cents for 1000 units, &#8230; <a href="http://www.juust.org/index.php/google-suggest-scraper-php-simplexml/2011/12/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Today&#8217;s goal is a basic php Google Suggest scraper because I wanted traffic data and keywords for free.</p>
<p>Before we start :</p>
<h2 style="text-align: center;">google scraping is <strong>bad</strong> !</h2>
<p>Good People use the <a title="google adwords api" href="http://code.google.com/apis/adwords/">Google Adwords API</a> : 25 cents for 1000 units, 15++ units for keyword suggestion so they pay 4 or 5 dollar for 1000 keyword suggestions (if they can find a good programmer which also costs a few dollars). Or they opt for <a title="search engine marketing data" href="http://www.semrush.com" rel="nofollow">SemRush</a> (also my preference), <a href="http://www.keywordspy.com" rel="nofollow">KeywordSpy</a>, <a href="http://www.spyfu.com" rel="nofollow">Spyfu</a>, and other services like 7Search PPC programs to get keyword and traffic data and data on their competitors but these also charge about 80 dollars per month for a limited account up to a few hundred per month for seo companies. Good people pay plenty.</p>
<p>We tiny grey webmice of marketing however just want a few estimates, at low or better no cost : like this :</p>
<table width="276" border="0" cellspacing="0" cellpadding="0">
<colgroup>
<col width="206" />
<col width="70" /> </colgroup>
<tbody>
<tr>
<td width="206" height="20">data</td>
<td align="right" width="70">num queries</td>
</tr>
<tr>
<td width="206" height="20">google suggest</td>
<td align="right" width="70">57800000</td>
</tr>
<tr>
<td height="20">google suggestion box</td>
<td align="right">5390000</td>
</tr>
<tr>
<td height="20">google suggest api</td>
<td align="right">5030000</td>
</tr>
<tr>
<td height="20">google suggestion tool</td>
<td align="right">3670000</td>
</tr>
<tr>
<td height="20">google suggest a site</td>
<td align="right">72700000</td>
</tr>
<tr>
<td height="20">google suggested users</td>
<td align="right">57000000</td>
</tr>
<tr>
<td height="20">google suggestions funny</td>
<td align="right">37400000</td>
</tr>
<tr>
<td height="20"><strong>google suggest scraper</strong></td>
<td align="right">62800</td>
</tr>
<tr>
<td height="20">google suggestions not working</td>
<td align="right">87100000</td>
</tr>
<tr>
<td height="20">google suggested user list</td>
<td align="right">254000000</td>
</tr>
</tbody>
</table>
<p>Suggestion autocomplete is AJAX, it outputs XML :</p>
<pre>&lt; ?xml version="1.0"? &gt;
   &lt;toplevel&gt;
     &lt;CompleteSuggestion&gt;
       &lt;suggestion data="senior quotes"/&gt;
       &lt;num_queries int="30000000"/&gt;
     &lt;/CompleteSuggestion&gt;
     &lt;CompleteSuggestion&gt;
       &lt;suggestion data="senior skip day lyrics"/&gt;
       &lt;num_queries int="441000"/&gt;
     &lt;/CompleteSuggestion&gt;
   &lt;/toplevel&gt;</pre>
<p>Using SimpleXML, the PHP routine is as simple as querying <strong><em>g00gle.c0m/complete/search?</em></strong>, grabbing the autocomplete xml, and extracting the attribute data :</p>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span><span class="re1">$_SERVER</span><span class="br0">&#91;</span><span class="st0">&#39;QUERY_STRING&#39;</span><span class="br0">&#93;</span><span class="sy0">==</span><span class="st0">&#39;&#39;</span><span class="br0">&#41;</span> <span class="kw3">die</span><span class="br0">&#40;</span><span class="st0">&#39;enter a query like http://host/filename.php?query&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$contentstring</span> <span class="sy0">=</span> <span class="sy0">@</span><span class="kw3">file_get_contents</span><span class="br0">&#40;</span><span class="st0">&quot;http://g00gle.c0m/complete/search?output=toolbar&amp;amp;q=&quot;</span><span class="sy0">.</span><span class="kw3">urlencode</span><span class="br0">&#40;</span><span class="re1">$kw</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span> &nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="re1">$content</span> <span class="sy0">=</span> simplexml_load_string<span class="br0">&#40;</span><span class="re1">$contentstring</span> <span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re1">$content</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>CompleteSuggestion <span class="kw1">as</span> <span class="re1">$c</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re1">$term</span> <span class="sy0">=</span> <span class="br0">&#40;</span>string<span class="br0">&#41;</span> <span class="re1">$c</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>suggestion<span class="sy0">-&amp;</span>gt<span class="sy0">;</span>attributes<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>data<span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//note : traffic data is sometimes missing &nbsp; </span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re1">$traffic</span> <span class="sy0">=</span> <span class="br0">&#40;</span>string<span class="br0">&#41;</span> <span class="re1">$c</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>num_queries<span class="sy0">-&amp;</span>gt<span class="sy0">;</span>attributes<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">-&amp;</span>gt<span class="sy0">;</span>int<span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw3">echo</span> <span class="re1">$term</span><span class="sy0">.</span> <span class="st0">&quot; &quot;</span><span class="sy0">.</span><span class="re1">$traffic</span> <span class="sy0">.</span> <span class="st0">&quot;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="st0">&quot;</span> <span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
</ol>
</div>
<p>I made a quick php script that outputs the terms as a list of new queries so you can walk through the suggestions :</p>
<p><a href="http://www.juust.org/wp-content/uploads/2011/12/google-suggest-scraper-tool.jpg" rel="shadowbox[post-1834];player=img;"><img class="alignnone size-medium wp-image-1835" title="google suggest scraper tool" src="http://www.juust.org/wp-content/uploads/2011/12/google-suggest-scraper-tool-300x289.jpg" alt="" width="529" height="400" /></a></p>
<p><a href="http://www.juust.org/wp-content/uploads/2011/12/google-suggest-scraper-tool-II1.jpg" rel="shadowbox[post-1834];player=img;"><img class="alignnone size-medium wp-image-1837" title="google suggest scraper tool II" src="http://www.juust.org/wp-content/uploads/2011/12/google-suggest-scraper-tool-II1-300x286.jpg" alt="" width="462" height="440" /></a></p>
<p>The source is as text file up for <a title="google suggest scraper tool code" href="http://www.juust.org/suggestit.txt">download overhere</a> (rename it to suggestit.php and it should run on any server with php5.* and simplexml).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.juust.org/index.php/google-suggest-scraper-php-simplexml/2011/12/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>ga api sample : get pageviews</title>
		<link>http://www.juust.org/index.php/google-analytics-api-sample-get-pageviews/2009/05/</link>
		<comments>http://www.juust.org/index.php/google-analytics-api-sample-get-pageviews/2009/05/#comments</comments>
		<pubDate>Wed, 13 May 2009 14:58:01 +0000</pubDate>
		<dc:creator>juust</dc:creator>
				<category><![CDATA[google]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[ga]]></category>
		<category><![CDATA[namespaces]]></category>
		<category><![CDATA[simplexml]]></category>

		<guid isPermaLink="false">http://www.juust.org/?p=504</guid>
		<description><![CDATA[I was going to put that online : how to get the pageviews out of the google analytics api, using simplexml and php. Google use three namespaces in the output file which make it less easy accessible, so here&#8217;s a &#8230; <a href="http://www.juust.org/index.php/google-analytics-api-sample-get-pageviews/2009/05/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I was going to put that online : how to get the pageviews out of the google analytics api, using simplexml and php. Google use three namespaces in the output file which make it less easy accessible, so here&#8217;s a quick sample of how to get your sites pageviews out of it :</p>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1"><span class="co1">//ids &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = site identifier (from the site data feed)</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//metrics &nbsp; &nbsp; = what i want to see</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//start-date </span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//end-date </span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re1">$feedUri</span> <span class="sy0">=</span> <span class="st0">&quot;https://www.google.com/analytics/feeds/data?ids=ga:10516419&amp;metrics=ga:pageviews&amp;start-date=2009-04-01&amp;end-date=2009-05-01&quot;</span><span class="sy0">;</span> &nbsp; &nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$curl</span> <span class="sy0">=</span> curl_init<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;curl_setopt<span class="br0">&#40;</span><span class="re1">$curl</span><span class="sy0">,</span> CURLOPT_URL<span class="sy0">,</span> <span class="re1">$feedUri</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;curl_setopt<span class="br0">&#40;</span><span class="re1">$curl</span><span class="sy0">,</span> CURLOPT_CONNECTTIMEOUT<span class="sy0">,</span> <span class="nu0">3</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;curl_setopt<span class="br0">&#40;</span><span class="re1">$curl</span><span class="sy0">,</span> CURLOPT_RETURNTRANSFER<span class="sy0">,</span> <span class="nu0">1</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="re1">$headers</span><span class="br0">&#91;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="st0">&quot;Authorization: GoogleLogin auth=&quot;</span><span class="sy0">.</span><span class="re1">$Authtoken</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//for authtoken : see previous post</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;curl_setopt<span class="br0">&#40;</span><span class="re1">$curl</span><span class="sy0">,</span> CURLOPT_HTTPHEADER<span class="sy0">,</span> <span class="re1">$headers</span><span class="br0">&#41;</span><span class="sy0">;</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;curl_setopt<span class="br0">&#40;</span><span class="re1">$curl</span><span class="sy0">,</span> CURLOPT_SSL_VERIFYHOST<span class="sy0">,</span> <span class="nu0">0</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;curl_setopt<span class="br0">&#40;</span><span class="re1">$curl</span><span class="sy0">,</span> CURLOPT_SSL_VERIFYPEER<span class="sy0">,</span> <span class="kw2">false</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;curl_setopt<span class="br0">&#40;</span><span class="re1">$curl</span><span class="sy0">,</span> CURLOPT_VERBOSE<span class="sy0">,</span> <span class="nu0">1</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//get the string containing the xml file</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$gA</span> <span class="sy0">=</span> curl_exec<span class="br0">&#40;</span><span class="re1">$curl</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
</ol>
</div>
<p>the feed has three namespaces (atom, opensearch and dxp/analytics), a simple way is accessing the ENTRY tags (from the Atom namespace), in that tag is one DXP: line and that has the answer to the question.</p>
<p>&lt;dxp:metric confidenceInterval=&#8217;0.0&#8242; name=&#8217;ga:pageviews&#8217; type=&#8217;integer&#8217; value=&#8217;755&#8242;/&gt;</p>
<div class="geshi no php">
<ol>
<li class="li1">
<div class="de1"><span class="co1">//load the string into a simple xml object</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$feed</span> <span class="sy0">=</span> simplexml_load_string<span class="br0">&#40;</span><span class="re1">$gA</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//take the atom namespace</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$children</span> <span class="sy0">=</span> &nbsp;<span class="re1">$feed</span><span class="sy0">-&gt;</span><span class="me1">children</span><span class="br0">&#40;</span><span class="st0">&#39;http://www.w3.org/2005/Atom&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">//take the entry tags</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="re1">$parts</span> <span class="sy0">=</span> <span class="re1">$children</span><span class="sy0">-&gt;</span><span class="me1">entry</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw1">foreach</span> <span class="br0">&#40;</span><span class="re1">$parts</span> <span class="kw1">as</span> <span class="re1">$entry</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//from the entry tag,</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//access the dxp namespace</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re1">$dxp</span> <span class="sy0">=</span> <span class="br0">&#40;</span>object<span class="br0">&#41;</span> <span class="re1">$entry</span><span class="sy0">-&gt;</span><span class="me1">children</span><span class="br0">&#40;</span><span class="st0">&#39;http://schemas.google.com/analytics/2009&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//METRIC contains the answer to the question</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//grab from the tag METRIC the attribute VALUE</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw3">echo</span> &nbsp; <span class="br0">&#40;</span>string<span class="br0">&#41;</span> <span class="re1">$dxp</span><span class="sy0">-&gt;</span><span class="me1">metric</span><span class="sy0">-&gt;</span><span class="me1">attributes</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">-&gt;</span><span class="me1">value</span><span class="sy0">;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
</ol>
</div>
<p>Important is using the (string) typecast, normally simplexml returns a simplexml object, when you force a string type, it gives the actual metric ga:pageview <strong>value</strong> attribute as number.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.juust.org/index.php/google-analytics-api-sample-get-pageviews/2009/05/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

