Categories
php seo

bing api with php and simplexml

About scraping results off of Bing : Bing use a set of about eight cookies. You can grab 200 results with php curl, as 20 pages of 10, but after the first 200 the Bing server checks for the cookie and for lack of one returns a blank page. I can fidget with the curl cookiejar, but Bing also offer a straighforward API.

Using the Bing API to list search results is easier.

Bing TOS : not for seo rank checks

In the last paragraph of the api guide, Bing give a quick recap of their TOS, you can do max 7 queries per second, and using the results for SEO rank checks is explicitly prohibited.

These following snippets (text source) are hence explicitly not to be used for bing search engine result page (‘serp’) rank checks.

bing api with simplexml

So here is one for web results using php simplexml. The web api (which uses namespaces) allows for retrieving max 1000 results per term at max 50 results per query, you can specify the number of results and the offset, where to start grabbing results.

$Appid="A_VERY_LONG_STRING";
$Query = "seo rank check";
$Numres = 50; //max 50
$Offset = 1;    //up to 1000

$url = 'http://api.search.live.net/xml.aspx?
Appid='.$Appid.'
&query='.$Query.'
&sources=web
&web.count='.$Numres.'
&web.offset='.$Offset;

$feed = simplexml_load_file($url);
//use the web: namespace
 $children =  $feed->children('http://schemas.microsoft.com/LiveSearch/2008/04/XML/web');
      foreach ($children->Web->Results->WebResult as $d) {
                echo $d->Title.'
'; echo $d->Description.'
'; echo $d->Url.'
'; echo $d->DisplayUrl.'
'; }

..and one for the pictures using php simplexml :

$Appid="A_VERY_LONG_STRING";
$Query = "alkmaar";
$Numres = 10;
$Offset = 1;

$url = 'http://api.search.live.net/xml.aspx?';
$url .= 'Appid='.$Appid;
$url .= '&query='.$Query;
$url .= '&sources=image';
$url .= '&image.count='.$Numres;
$url .= '&image.offset='.$Offset;

$feed = simplexml_load_file($url);

//use the mms: namespace       
  $children =  $feed->children('http://schemas.microsoft.com/LiveSearch/2008/04/XML/multimedia');

    echo('
    '); foreach ($children->Image->Results->ImageResult as $d) { echo('
  • ' . $d->Title . '
    '); echo('
    '.$d->Thumbnail->ContentType.'
    '.$d->Thumbnail->Height.'
    '.$d->Thumbnail->Width.'
    '.$d->Thumbnail->FileSize.'
  • '); } echo("
");

I actually like that api, I am going to use that.

bing api with json

Bing seem to prefer you use json, less bandwidth usage. After their example in the api basics guide :


$Numres = 10;
$Offset = 1;
$Query='alkmaar';

$url = 'http://api.search.live.net/json.aspx?';
$url .= 'Appid='.$Appid;
$url .= '&query='.$Query;
$url .= '&sources=image';
$url .= '&image.count='.$Numres;
$url .= '&image.offset='.$Offset;


$response = file_get_contents($url);
$jsonobj = json_decode($response);
echo('
    '); foreach($jsonobj->SearchResponse->Image->Results as $value) { echo('
  • '); echo('
  • '); } echo("
");

Of course there is the old RSS-option, which doesnt require an appid but also falls under the api 2.0 tos, and a soap option.

other sources :
There is a bing api php class made over at routecafe, and a jquery bing plugin using json over at Einar Otto Stangvik’s blog.

Categories
google php seo

google trends II

I wanted to reply to a question elsewhere on the site, but a ‘comment’ box isn’t fit for it so I’ll put the reply here. The question was about creating ‘search engine friendly’ descriptive URL’s based on keywords from the Google Trends atom feed, listing pages a graph of the trend.

You can get a site to list http://domain.com/trend_title.html type url’s by using mod_rewrite, an apache module.

In the server directory of the application you can use an .htaccess file to set rules for file access in these folders. When the server gets request from browsers or servers it applies any rewriting rules you define in .htaccess to these requests.

I tried this one :


	RewriteEngine On
	RewriteCond %{REQUEST_FILENAME} !-f
	RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule ^(.*).html /trendinfo.php?title=$1 

RewriteEngine On
sets the rewrite mechanism on

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

tell the apache server that rewriteconditions apply to file-requests that are not an existing file (F) or directory (D). If the requested filename is anywhere in the servers file table, the server dishes out that file, otherwise it will try to apply a RewriteRule. Applying the rule generates a new request, if that returns anything, the server dishes that out, otherwise it returns an htpp-404 ‘file not found’.

The actual url rewrite rule is :
RewriteRule ^(.*).html /trendinfo.php?title=$1
which means :

  • if any filename is requested that satisfies the mask ^(.*).html then
  • take everything before .html
  • add that as variable $1 to trendinfo.php?title=$1
  • see if it sticks

If the browser requests http://domain.com/bob+bowersox.html, the server will assert it is not a file or directory on the server, and test the available rules. When it notices it the requested file ends with .html, it applies the rewrite rule and tries to access http://domain.com/trendinfo.php?title=bob+bowersox.

A browsing user does not notice a thing.

In trendinfo.php I wrote some code to handle the ‘new’ request :

if(!isset($_REQUEST['title'])) {
//if there is no $1, added as title, fake a 404 "file not found" message 
        echo 'the emptiness...';
} else {
//get the title from the request
  $mytitle=htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8");
//put the google trends graph url together
  $graphurl = 'http://www.google.com/trends/viz?hl=&q=';
  $graphurl .= urlencode($mytitle);
  $graphurl .= '&date=';                        //leave date blank to get the current graph
  $graphurl .= '&graph=hot_img&sa=X';
  echo "";
}

…that outputs the Google trend graph on the url http://domain.com/bob+bowersox.html

You can also put this in index.php :

		$feed = simplexml_load_file('http://www.google.com/trends/hottrends/atom/hourly');
		$children =  $feed->children('http://www.w3.org/2005/Atom');
		$parts = $children->entry;
		foreach ($parts as $entry) {
		  	$details = $entry->children('http://www.w3.org/2005/Atom');
	 	 	 $dom = new domDocument(); 
		 	 $html=$details->content;
		 	 @$dom->loadHTML($html); 
		  	 $anchors = $dom->getElementsByTagName('a'); 
				foreach ($anchors as $anchor) { 
		 			$url = $anchor->getAttribute('href'); 
	 				$urltext = $anchor->nodeValue;
					echo ''.$urltext.' ';
				}
			}
			unset($dom);
			unset($anchors);
			unset($parts);
			unset($feed);

That lists the current 100 google trends with a link. If you use the .htaccess rewrite rules, the server reroutes all the links to trendinfo.php with descriptive urls.

I hope that helps.

Categories
links seo seo tips and tricks

seo tricks : the magpie incident

Some universities like Southern California, Harvard and Michigan State have their web-guru’s explain to us how rss feeds work with the elegant Magpie parser demo :

Some example on how to use Magpie:

* magpie_simple.php *
Simple example of fetching and parsing an RSS file. Expects to be
called with a query param ‘rss_url=http://(some rss file)’
….

* magpie_debug.php *
Displays all the information available from a parsed feed.

Note : magpie_debug.php is the one to watch for, you can do a google search on :

site:.edu magpie_debug.php

and you get a number of educational facilities that kindly demonstrate the use of the magpie rss parser.

These demo pages have a textbox where you can enter an rss feed url, the magpie demo parses your feed and outputs it as an html-page.

You have to be careful with these programs, though : I actually found one domain (www.scripps.edu) with this remark under the ‘parse rss’ button :

Security Note:
This is a simple example script. If this was a real script we probably wouldn’t allow strangers to submit random URLs, and we certainly wouldn’t simply echo anything passed in the URL. Additionally its a bad idea to leave this example script lying around.

Thank you, you are surely wise like the buddha, I shall try to remember your insight !

….
note: after a while I decided I had had enough fun with magpies and took the blog off-line.