需要从原始网站提取文章标签内容,样式他们和张贴在我的网站



好的,这是url的一个例子。https://www.finn.no/car/used/search.html?orgId=3553552&排序= PUBLISHED_DESC

在这里我有广告集存储在标签。我需要收集他们每次页面在我的网站上被加载,并显示给访问者也改变了一些风格的选择,如背景和他们如何出现在我的网站上。还有分页选项,所以也需要传输。

这个市场提供的唯一选择是iFrame,这在2023年的世界看起来非常糟糕。

原网站地址:https://bbvest.no

I tried code with no success:

<?php
$url="https://www.finn.no/car/used/search.html?orgId=3553552&sort=PUBLISHED_DESC";
$html=file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($html);
$div=$doc->getElementsByClassName("ads__unit");


?>
<div><?php echo $div; ?></div>

谢谢你的帮助。

类DOMDocument不包含getElementsByClassName方法

获取文本和图像,

<?php
$url="https://www.finn.no/car/used/search.html?orgId=3553552&sort=PUBLISHED_DESC";
$html=file_get_contents($url);
$doc = new DOMDocument();
libxml_use_internal_errors(true); // use it if getting error DOMDocument::loadHTML(): Tag finn-topbar invalid in Entity
$doc->loadHTML($html);
$arts = $doc->getElementsByTagName('article'); // get tag article
$display = "";
foreach($arts as $index => $art){
$imgs = $doc->getElementsByTagName('img'); // get tag img in tag article
$article = $art->textContent; // text of article
$display.= $article."</br>";
$display.= $imgs[$index]->getAttribute('src')."</br>"; // src img in tag img
}
?>
<div><?php echo $display; ?></div>

try use preg_match_all,

<?php
$str = file_get_contents('https://www.finn.no/car/used/search.html?orgId=3553552&sort=PUBLISHED_DESC');
preg_match_all('#<article class="ads__unit (.*)">(.*?)</article>#', $str, $matches);
$div = "";
foreach($matches as $match){
foreach($match as $mt){
$div .= $mt;
}
}
?>
<div><?php echo $div ?></div>

<?php
$str = file_get_contents('https://www.finn.no/car/used/search.html?orgId=3553552&sort=PUBLISHED_DESC');

$div = "";
if(preg_match('#<div class="ads (.*)">(.*)</div>#', $str, $m)){
$div .= $m[0];
} else {
echo 'Regex syntax has to be improved to your search criteria'.PHP_EOL;
}
?>
<div><?php echo $div; ?></div>

我使用了这个方法,它需要更少的时间来加载,而且它可以抓取所有的内容并很好地放置它。通过导入css,我可以得到我想要的一切。现在我可以继续添加css样式和其他设置。

计划是让这个工作WP/JOMLA插件

<?php
$merchantID = '3553552';
$finn_link = 'https://www.finn.no/car/used/search.html?orgId=' . $merchantID;
$finnTagName = 'article';
$finnAttrName = 'class';
$finnAttrValue = 'ads__unit';
$finnDom = new DOMDocument;
$finnDom->preserveWhiteSpace = false;
@$finnDom->loadHTMLFile($finn_link);
$finnHtml = getTags( $finnDom, $finnTagName, $finnAttrName, $finnAttrValue );
function getTags( $finnDom, $finnTagName, $finnAttrName, $finnAttrValue ){
$finnHtml = '';
$domxpath = new DOMXPath($finnDom);
$newDom = new DOMDocument;
$newDom->formatOutput = true;
$filtered = $domxpath->query("//$finnTagName" . '[@' . $finnAttrName . "='$finnAttrValue']");
// $filtered =  $domxpath->query('//div[@class="className"]');
// '//' when you don't know 'absolute' path
// since above returns DomNodeList Object
// I use following routine to convert it to string(html); copied it from someone's post in this site. Thank you.
$i = 0;
while( $myItem = $filtered->item($i++) ){
$node = $newDom->importNode( $myItem, true );    // import node
$newDom->appendChild($node);                    // append node
}
$finnHtml = $newDom->saveHTML();
return $finnHtml;
}
?>
<?php echo $finnHtml; ?>

相关内容

最新更新