如何从包含ID的HTML标签返回ID,同时使用beautifulsoup来解析它



我想将 html 源中的 ID 写入 CSV 文件,但我很难找到合适的代码。

我想解决两个案例。

案例1(

<footnotes>
<footnote id="F1">Includes 4,675.96 restricted stock units that will vest and settle in shares of the Company's common stock on a one-for-one basis on February 23, 2012.</footnote>
</footnotes>

我想在 CSV 文件中将其编写为以下内容。

案例1 想要(

F1 Includes 4,675.96 restricted stock units that will vest and settle in shares of the Company's common stock on a one-for-one basis on February 23, 2012.

基本上,我想从标签中保留"F1",这是它的 ID,并在文件中使用文本编写它。

案例2(

<exerciseDate>
<footnoteId id="F5"/>
</exerciseDate>

我想在 CSV 文件中将其编写为以下内容。

案例2 想要(

F5

我找不到合适的代码将其写入文件。

不幸的是,我没有准备好的代码。

如果您能为我提供解决其中一个问题的方法,那将非常有帮助。

很快

soup.find('footnote').get('id') 

示例代码:

from bs4 import BeautifulSoup as BS
text = '''<footnotes>
<footnote id="F1">Includes 4,675.96 restricted stock units that will vest and settle in shares of the Company's common stock on a one-for-one basis on February 23, 2012.</footnote>
</footnotes>
<exerciseDate>
<footnoteId id="F5"/>
</exerciseDate>'''
soup = BS(text, 'html.parser')
item = soup.find('footnote')
print(item.get('id'), item.get_text())
item = soup.find('footnoteid')
print(item.get('id'))

下面是一个简短的示例代码,可帮助您入门:

from bs4 import BeautifulSoup
html_text = """
<footnotes>
<footnote id="F1">Includes 4,675.96 restricted stock units that will vest and settle in shares of the Company's common stock on a one-for-one basis on February 23, 2012.</footnote>
</footnotes>
"""
# ~~ Parse HTML ~~ #
soup = BeautifulSoup(html_text,'html.parser')
# ~~ Find footnote tags in the html ~~ #
footnote_tag = soup.find("footnote")
# From footnote tag, get id
footnote_id = footnote_tag['id']
# From footnote tag, get text
footnote_text = footnote_tag.get_text()
# Putting id with text
return_statement = "ID {0} {1}".format(footnote_id,footnote_text)

最新更新