如何在美丽汤中复制标签后的所有内容

在做家庭作业时，我有包含数据的"doc.html"文件：

<span class="descriptor">Title:</span> Automated Scalable Bayesian Inference via Hilbert Coresets
<span class="descriptor">Title:</span> PASS-GLM: polynomial approximate sufficient statistics for scalable  Bayesian GLM inference
<span class="descriptor">Title:</span> Covariances, Robustness, and Variational Bayes
<span class="descriptor">Title:</span> Edge-exchangeable graphs and sparsity (NIPS 2016)
<span class="descriptor">Title:</span> Fast Measurements of Robustness to Changing Priors in Variational Bayes
<span class="descriptor">Title:</span> Boosting Variational Inference

对于每一行，我试图在</span>之后得到任何东西 - 所以预期的输出应该是：

Automated Scalable Bayesian Inference via Hilbert Coresets
PASS-GLM: polynomial approximate sufficient statistics for scalable  Bayesian GLM inference
Covariances, Robustness, and Variational Bayes
Edge-exchangeable graphs and sparsity (NIPS 2016)
Fast Measurements of Robustness to Changing Priors in Variational Bayes
Boosting Variational Inference

我尝试了下面的代码（不起作用）。

from bs4 import BeautifulSoup
with open("doc.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')
    for line in soup.find_all('span'):
        print line.get_text()

缺少的是什么？

你需要span元素的nextSibling，而不是span内部的text！

注意：使用 strip（）删除尾随换行符。

>>> with open("doc.html") as fp:
...     soup = BeautifulSoup(fp, 'html.parser')
...     for line in soup.find_all('span'):
...         print line.nextSibling.strip()
... 
Automated Scalable Bayesian Inference via Hilbert Coresets
PASS-GLM: polynomial approximate sufficient statistics for scalable  Bayesian GLM inference
Covariances, Robustness, and Variational Bayes
Edge-exchangeable graphs and sparsity (NIPS 2016)
Fast Measurements of Robustness to Changing Priors in Variational Bayes
Boosting Variational Inference
>>>

相关内容

最新更新

热门标签：