小贝子编程

如何将所有文本从 <一个 href> 抓取到 list with net.ruippeixotog.scalascraper

本文关键字：抓取 list with scalascraper ruippeixotog net href 一个文本 scala web-scraping
更新时间 : 2023-09-16
英文 : How to scrape all texts from <a href> to List with net.ruippeixotog.scalascraper

这是html：

<tr class="countries" valign="top"> 
<td nowrap> </td>
<td nowrap>
<a href="https://ar.indeed.com/"><img src="/images/flags/ar.png"></a> 
<a href="https://ar.indeed.com/">Argentina</a> <br> 
<a href="https://au.indeed.com/"><img src="/images/flags/au.png"></a> 
<a href="https://au.indeed.com/">Australia</a> <br> 
<a href="https://at.indeed.com/"><img src="/images/flags/at.png"></a> 
<a href="https://at.indeed.com/">Austria</a> <br> 
</td> 
</tr>

我想获取<a href ...>和</a>之间的文本元素列表。当我写：

items >> allText("a")

然后我得到一个包含 1 个元素的列表：

ArgentinaAustraliaAustria

如何将这些文本作为n元素列表获取？

您可以按如下方式使用texts方法：

(items >> texts("a")).filter(_.nonEmpty)

它产生：

List(Argentina, Australia, Austria)

过滤用于以下情况

<a href="https://at.indeed.com/"><img src="/images/flags/at.png"></a>

因为它们在<a>标签中有一个空文本

如何将所有文本从 <一个 href> 抓取到 list with net.ruippeixotog.scalascraper

相关内容

最新更新

热门标签：