从没有图像URL的RSS源获取图像

我只是想知道其他开发人员如何管理正确地从RSS提要中的URL获取/提取网站的博客主要内容中的第一个图像。这是我想到的方式，因为RSS提要中没有帖子/博客条目的图像URL。虽然我一直在看

<img src="http://feeds.feedburner.com/~r/CookingLight/EatingSmart/~4/sIG3nePOu-c" />

，但它只有1px的图像。这个是否与提要条目有相关价值或者我可以将其转换为实际图像?这是RSS http://feeds.cookinglight.com/CookingLight/EatingSmart?format=xml

无论如何，这是我使用提要中的url提取图像的尝试:

function extact_first_image( $url ) {  
  $content = file_get_contents($url);
  // Narrow the html to get the main div with the blog content only.
  // source: http://stackoverflow.com/questions/15643710/php-get-a-div-from-page-x
  $PreMain = explode('<div id="main-content"', $content);
  $main = explode("</div>" , $PreMain[1] );
  // Regex that finds matches with img tags.
  $output = preg_match_all('/<img[^>]+src=['"]([^'"]+)['"][^>]*>/i', $main[12], $matches);  
  // Return the img in html format.
  return $matches[0][0];  
}
$url = 'http://www.cookinglight.com/eating-smart/nutrition-101/foods-that-fight-fat'; //Sample URL from the feed.
echo extact_first_image($url);

这个函数的明显缺点:如果在html中找到<div id="main-content"，它将正确地爆炸。当有另一个xml要用另一个结构解析时，也会有另一个爆炸。这是非常静态的

我想它也值得一提的是关于加载时间。当我执行循环遍历提要中的条目时，它甚至更长。

我希望我把要点讲清楚了。请随时提出任何有助于优化解决方案的想法。

图像url位于rss文件中，因此只需解析xml即可获得它们。每个& lt; item>元素包含元素，包含元素。该项图像的url在的"url"属性中。元素。下面是一些用于将图像url提取到数组中的基本代码(php):

$xml = simplexml_load_file("http://feeds.cookinglight.com/CookingLight/EatingSmart?format=xml");
$imageUrls = array();
foreach($xml->channel->item as $item)
{
    array_push($imageUrls, (string)$item->children('media', true)->group->content->attributes()->url);
}

请记住，媒体并不一定是图像。它可以是视频或音频记录。甚至可能有不止一个。你可以查看元素查看它是什么

相关内容

最新更新

热门标签：