无法使用 BeautifulSoup CSS 选择器循环浏览可导航字符串

我想提取下面p标签内的内容。

<section id="abstractSection" class="row">
<h3 class="h4">Abstract<span id="viewRefPH" class="pull-right hidden"></span>
</h3>
<p> Variation of the  (<span class="ScopusTermHighlight">EEG</span>), has functional and. behavioural effects in sensory <span class="ScopusTermHighlight">EEG</span We can interpret our. Individual <span class="ScopusTermHighlight">EEG</span> text to extract <span class="ScopusTermHighlight">EEG</span> power level.</p>
</section>

一行Selenium如下，

document_abstract = WebDriverWait(self.browser, 20).until(
EC.visibility_of_element_located((By.XPATH, '//*[@id="abstractSection"]/p'))).text

可以轻松提取p标签内容并提供以下输出：

Variation of the EEG, has functional and. behavioural effects in sensoryEEG. We can interpret our. Individual EEG text to extract EEG power level.

尽管如此，出于速度考虑，我想使用BeautifulSoup。

以下bs通过参考css选择器(即#abstractSection(进行了测试

url = r'scopus_offilne_specific_page.html'
with open(url, 'r', encoding='utf-8') as f:
page_soup = soup(f, 'html.parser')
home=page_soup.select_one('#abstractSection').next_sibling
for item in home:
for a in item.find_all("p"):
print(a.get_text())

但是，编译器返回以下错误：

属性

错误："str"对象没有属性"find_all">

此外，由于 Scopus 需要登录 ID，因此可以使用可通过此链接访问的离线 html 重现上述问题。

愿我知道我哪里做错了，感谢任何见解

多亏了这个OP，上面发布的问题显然可以简单地解决如下

document_abstract=page_soup.select('#abstractSection > p')[0].text

相关内容

最新更新

热门标签：