我有我的脚本来监视一些Facebook页面。自 Facebook API 于 2019 年 9 月 4 日禁止页面公共访问权限以来。我需要通过 xpath 方法解析内容。
每个Facebook帖子都由div[contains(@class,"userContentWrapper")]
包装。我想一个接一个地循环发布以找到所需的数据。
我不知道为什么$message = $post->findvalue('//div[@data-testid="post_message"]//p');
在每个帖子的<p>
显示所有文本。
use LWP::UserAgent;
$ua = new LWP::UserAgent;
$request = new HTTP::Request;
$request->url('https://www.facebook.com/pg/FIFA/posts/');
$request->method('GET');
$request->header('User-Agent' => 'Mozilla/5.0 Chrome/71.0.3578.98 Safari/537.36');
$response = $ua->request($request);
open(HTM, ">zzz.htm");
print HTM $response->content;
close(HTM);
use HTML::TreeBuilder::XPath;
$tree = HTML::TreeBuilder::XPath->new_from_content($response->content);
$posts = $tree->findnodes('//div[contains(@class,"userContentWrapper")]');
for my $post (@{$posts})
{
$id = $post->findnodes('//div[@data-testid="story-subtitle"]/@id');
$id = $id->[0]->getValue;
print "id = $idnn";
$object_id = $post->findnodes('//div[@data-testid="story-subtitle"]//a/@href');
$object_id = 'https://www.facebook.com' . $object_id->[0]->getValue;
print "object_id = $object_idnn";
$message = $post->findvalue('//div[@data-testid="post_message"]//p');
# $message = $message->[0]->getValue;
print "$messagenn";
$ajaxify = $post->findnodes('//div[@class="mtm"]//a/@ajaxify');
$ajaxify = $ajaxify->[0]->getValue;
print "ajaxify = $ajaxifynn";
$ploi = $post->findnodes('//div[@class="mtm"]//a/@data-ploi');
$ploi = $ploi->[0]->getValue;
print "ploi = $ploinn";
# $plsi = $post->findnodes('//div[@class="mtm"]//a/@data-plsi');
# $plsi = $plsi->[0]->getValue;
# print "plsi = $plsinn";
$href = $post->findnodes('//div[@class="mtm"]//a/@href');
$href = 'https://www.facebook.com' . $href->[0]->getValue;
print "href = $hrefnn";
print "---------------------------------------------------------nn";
}
该帖子不清楚,似乎包含多个问题。这需要修复,但与此同时,我将解决以下问题:
我想一个接一个地循环发布以找到所需的数据。
From HTML::TreeBuilder::XPath,
查找节点 ($path(
返回由
$path
找到的节点的列表。在标量上下文中返回一个Tree::XPathEngine::NodeSet
对象。
From Tree::XPathEngine::NodeSet,
get_nodelist((
返回节点列表。请参阅 Tree::XPathEngine::XMLParser 了解节点的格式。
所以
my @posts = $tree->findnodes('...');
for my $post (@posts) { ... }
或
my $posts = $tree->findnodes('...');
for my $post ($posts->get_nodelist()) { ... }
任何其他问题应作为单独的问题发布。