网络解析:在 <p>div 中获取第四个标签



我有这个html代码:

<!--- (...) --->
<article>
<div class="card">
<p class="date">08.10.2020 - 14:55</p>
<p class="customer">
<a class="story-customer">Customer Name</a>
</p>
<h1>TITLE</h1>
<div class="story-sharing"></div>
<p>
<i class="story-city"><a>CITY</a></i>
</p>
<p>
"IMPORTANT TEXT"
</p>
<!--- (...) --->
</div>
</article>

我需要解析标题(h1标签(、城市和";重要文本"。我用解析了h1标签

def connection(url):
return (requests.get(url))
def connectionSoup(url):
return(bs(connection(url).content, 'html.parser'))
def get_title(url): 
return(connectionSoup(url).h1.text)

但我不知道如何用BeautifulSoup解析第四个p标签。

from bs4 import BeautifulSoup
html_doc = '''
<!--- (...) --->
<article>
<div class="card">
<p class="date">08.10.2020 - 14:55</p>
<p class="customer">
<a class="story-customer">Customer Name</a>
</p>
<h1>TITLE</h1>
<div class="story-sharing"></div>
<p>
<i class="story-city"><a>CITY</a></i>
</p>
<p>
"IMPORTANT TEXT"
</p>
<!--- (...) --->
</div>
</article>'''
soup = BeautifulSoup(html_doc, 'html.parser')
# h1 tag:
print(soup.h1.text)
# CITY:
print(soup.find(class_="story-city").text)
# IMPORTANT TEXT: (this is <p> tag that follows after the CITY):
print(soup.find(class_="story-city").find_next('p').text.strip())

打印:

TITLE
CITY
"IMPORTANT TEXT"

最新更新