正则表达式,用于在 php 中提取脚本标记内的内容



我试图从网页中提取下载网址。 尝试的代码如下

function getbinaryurl ($url)
{
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FRESH_CONNECT, true);
$value1 = curl_exec($curl);
curl_close($curl);        
$start = preg_quote('<script type="text/x-component">', '/');
$end = preg_quote('</script>', '/');
$rx = preg_match("/$start(.*?)$end/", $value1, $matches);
var_dump($matches);
}
$url = "https://www.sourcetreeapp.com/download-archives";
getbinaryurl($url);

这样,我得到的是标签信息,而不是脚本标签内的内容。 如何获取里面的信息。

预期结果是: https://product-downloads.atlassian.com/software/sourcetree/ga/Sourcetree_4.0.1_234.zip, https://product-downloads.atlassian.com/software/sourcetree/windows/ga/SourceTreeSetup-3.3.6.exe, https://product-downloads.atlassian.com/software/sourcetree/windows/ga/SourcetreeEnterpriseSetup_3.3.6.msi

我在编写这些正则表达式方面非常陌生。 请帮我。

而不是使用正则表达式,使用 DOMDocument 和 XPath 可以更好地控制您选择的元素。

尽管 XPath 可能很困难(与正则表达式相同(,但对某些人来说,这看起来更直观。 代码使用//script[@type="text/x-component"][contains(text(), "macURL")]细分为

  • 脚本 = 任何脚本节点
  • [@type="text/x-component"] = 它有一个叫做 type 的属性 具体值
  • [contains(text((, "macURL"(] = who's text 包含字符串 macURL

query()方法返回匹配项列表,因此循环访问它们。 内容是 JSON,所以解码它并输出值...

function getbinaryurl ($url)
{
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FRESH_CONNECT, true);
$value1 = curl_exec($curl);
curl_close($curl);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($value1);
libxml_use_internal_errors(false);
$xp = new DOMXPath($doc);
$srcs = $xp->query('//script[@type="text/x-component"][contains(text(), "macURL")]');
foreach ( $srcs as $src )   {
$content = json_decode( $src->textContent, true);
echo $content['params']['macURL'] . PHP_EOL;
echo $content['params']['windowsURL'] . PHP_EOL;
echo $content['params']['enterpriseURL'] . PHP_EOL;
}
}
$url = "https://www.sourcetreeapp.com/download-archives";
getbinaryurl($url);

哪些输出

https://product-downloads.atlassian.com/software/sourcetree/ga/Sourcetree_4.0.1_234.zip
https://product-downloads.atlassian.com/software/sourcetree/windows/ga/SourceTreeSetup-3.3.8.exe
https://product-downloads.atlassian.com/software/sourcetree/windows/ga/SourcetreeEnterpriseSetup_3.3.8.msi

最新更新