scrape rss flickr pics (one)

Today, ungodly ones, we scrape flickr-pics with php.

I first checked the normal tag-listing page and you can scrape that one as well but sometimes rss-files are easier. In this case the nice flickr people stuff the image link in a [description] or something, so I have to use preg_match anyway, otherwise i’d use simplexml.

The http://api.flickr.com/services/feeds/photos_public.gne rss-file has less obsolete stuff in it so I’ll use that one.

$mytag="apes";
$flikker = join("",file("http://api.flickr.com/services/feeds/photos_public.gne?tags=$mytag&format=rss"));
$flikkerhits = preg_split('/img src="\;/', $flikker, -1, PREG_SPLIT_OFFSET_CAPTURE);
$i=0;
foreach($flikkerhits as $flikkerhit){
    $i++;
    if($i>1) echo "";
}

$mytag is the pictures tag I want
photos_public.gne is the public pictures file
I break up the rss text on [img src="]
that gives me a list starting with all image url’s
all image url parts end with [&quot width’]
so I do a strpos on “width”, and take 7 off
and I got my image url,
add an image tag and I got a basic dump of the images
the first string is crap, so i use a counter to exclude it.

a basic 7 lines flickr rss-scraper, short, I can handle that.

ape beasty …that is a cute ape

on to part two…

php serp scripts

Some basic PHP search engine result page scripts (always come in handy for yer basic seo adventures).

MSN Serp

$first=1;
$query="php+serp";
$count=50;

$xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=$count&first=$first&format=rss");
foreach($xml->channel->item as $i) echo $i->link."
";

www.tellinya.com has another version for non-simplexml servers (uses curl and the ehttp.client class, the technique is closer to a sequential line parser so sometimes its more practical (hint:blog-pipe parsing with “streams”).

[ad#banner]

Yahoo Serp


        $varkeywords=$_GET["keywords"];
        $vardomain=$_GET["domain"];
       

        $strResult='';
        $strHitsResult='';
        $strHitsCount=0;
        $strHits='';

        $start = 1;
        $numberofresults = 10;
        for($ii=0; $ii<20; $ii++) {
            $jj=$ii*$numberofresults+1;
            $kk=$jj+$numberofresults;

        $vargoogleresultpage = "http://search.yahoo.com/search?p=".urlencode(trim($varkeywords))."&ei=UTF-8&fr=sfp&xargs=0&pstart=1&b=".$jj;
            flush();   
           
            $googleresponse = join("",file($vargoogleresultpage));
        $googlehits = preg_split('/ 1){
                $serp = $i-1;
                $SearchForDomain = "~".$vardomain."~i";
                    if(preg_match($SearchForDomain, $t[1][0])){
                    $strHitsCount++;
                    $strHits .= $serp .', ';    
                    $strResult=$strResult.$serp."".$t[1][0]."
"; $strHitsResult = $strHitsResult.$serp."".$t[1][0]."
"; } else { $strResult=$strResult.$serp."".$t[1][0]."
"; } } } } echo $strHits.' total = '.$strHitsCount.'
'; echo $strHitsResult; echo '

full list :
'; echo $strResult;

[ad#banner]

Google SERP

        $varkeywords=$_GET["keywords"];
        $vardomain=$_GET["domain"];
       

        $strResult='';
        $strHitsResult='';
        $strHitsCount=0;
        $strHits='';

        $start = 1;
        $numberofresults = 100;
        for($ii=0; $ii<11; $ii++) {
            $jj=$ii*100+1;
            $vargoogleresultpage = "http://www.google.com/search?as_q=".urlencode(trim($varkeywords))."&num=".$numberofresults."&start=".$jj."&hl=en&lr=lang_en";
            flush();
           
            $googleresponse = join("",file($vargoogleresultpage));
        $googlehits = preg_split('/class=r> 1){
                        $serp = $i-1;
                        $SearchForDomain = "~".$vardomain."~i";
                        if(preg_match($SearchForDomain, $t[1][0])){
                            $strHitsCount++;
                            $strHits .= $serp .', ';    
                            $strResult=$strResult.$serp."".$t[1][0]."
"; $strHitsResult = $strHitsResult.$serp."".$t[1][0]."
"; } else { $strResult=$strResult.$serp."".$t[1][0]."
"; } } } } echo $strHits.' total = '.$strHitsCount.'
'; echo $strHitsResult; echo '

full list :
'; echo $strResult;

MSN (being microsoft) don’t encourage commercial use of the rss-feed.

scrape keywords

If you are truly desparately seeking keywords :

	$seed="test";
	$html = file_get_contents("http://freekeywords.wordtracker.com/?seed=".urlencode($seed)."&suggest=Hit+Me&adult_filter=remove_dubious");
	if(!eregi("Apologies", $html)){
		preg_match_all("/remove_dubious\">(.+?)< \/a>/", $html, $keywords);
		foreach($keywords[1] as $keyword){
			echo $keyword."";
		}
	}

I’d use 7Search, you can’t scrape that one but it does have pretty good results for the american/english region. (compare it’s search volume estimate to google adwords, they match)