如何提取姓名、电子邮件和电话号码并将它们全部打印在一行中: 这是mydivs
的内容
<div class="card-name"><a href="contact.php?leaduuid=9dfe">Mike <b>Denis</b></a></div>
<div class="activity-value">mdniz@gmail.com</div>
<div class="activity-value">(233) 333-9814</div>
<div class="card-name"><a href="contact.php?leaduuid=78f3">Sami <b>Baney</b></a></div>
<div class="activity-value">sadt@gmail.com</div>
<div class="activity-value">(123) 763-2322</div>
我想让输出看起来像这样:
Mike Denis, mdniz@gmail.com, (233) 333-9814
Sami Baney, sadt@gmail.com, (123) 763-2322
我能够得到的最接近的是上面的代码:
mydivs = soup.find_all('div', [ 'card-name', 'activity-value'])
for div in mydivs:
print (div)
谢谢
你可以试试这个:
from bs4 import BeautifulSoup
import re
html_doc = '''
<div class="card-name"><a href="contact.php?leaduuid=9dfe">Mike <b>Denis</b></a></div>
<div class="activity-value">mdniz@gmail.com</div>
<div class="activity-value">(233) 333-9814</div>
<div class="card-name"><a href="contact.php?leaduuid=78f3">Sami <b>Baney</b></a></div>
<div class="activity-value">sadt@gmail.com</div>
<div class="activity-value">(123) 763-2322</div>
'''
soup = BeautifulSoup(html_doc, 'html.parser')
mydivs = soup.find_all('div', [ 'card-name', 'activity-value'])
st=''
for div in mydivs:
if re.search('^([0-9][0-9][0-9])', div.text):
st+=f'{div.text}n'
else:
st+=f'{div.text}, '
print(st)
输出:
Mike Denis, mdniz@gmail.com, (233) 333-9814
Sami Baney, sadt@gmail.com, (123) 763-2322
如果你的div遵循你所有问题的结构->一个<div class="card-name">
后跟两个<div class="activity-value">
,那么你可以这样做:
from bs4 import BeautifulSoup
txt = '''<div class="card-name"><a href="contact.php?leaduuid=9dfe">Mike <b>Denis</b></a></div>
<div class="activity-value">mdniz@gmail.com</div>
<div class="activity-value">(233) 333-9814</div>
<div class="card-name"><a href="contact.php?leaduuid=78f3">Sami <b>Baney</b></a></div>
<div class="activity-value">sadt@gmail.com</div>
<div class="activity-value">(123) 763-2322</div>'''
soup = BeautifulSoup(txt, 'html.parser')
divs = soup.select('.card-name, .activity-value')
for name, email, phone in zip(divs[::3], divs[1::3], divs[2::3]):
print('Name: {}tE-Mail: {}t Phone: {}'.format(name.text, email.text, phone.text))
指纹:
Name: Mike Denis E-Mail: mdniz@gmail.com Phone: (233) 333-9814
Name: Sami Baney E-Mail: sadt@gmail.com Phone: (123) 763-2322