我想废弃一个网站。你能告诉我如何只获得这种格式的输出文本吗;纯电动汽车,Enyaq CoupéiV vRS,斯柯达,英国,大众"?目前,我的输出还包括HTML标签等
感谢您的投入!
from bs4 import BeautifulSoup
import requests
import csv
source = requests.get('https://www.electrive.com/2022/02/13/skoda-reveals-uk-pricing-for-enyaq-coupe-iv-vrs/').text
soup = BeautifulSoup(source, 'lxml')
article = soup.find()
tags2 = article.find_all('div', class_='tags')
print (tags2)
输出:
[<div class="tags">
<a href="https://www.electrive.com/tag/bev/" rel="tag">BEV</a><a href="https://www.electrive.com/tag/enyaq-coupe-iv-vrs/" rel="tag">Enyaq Coupé iV vRS</a><a href="https://www.electrive.com/tag/skoda/" rel="tag">Skoda</a><a href="https://www.electrive.com/tag/uk/" rel="tag">UK</a><a href="https://www.electrive.com/tag/volkswagen/" rel="tag">Volkswagen</a> </div>]
[Finished in 580ms]
您必须选择更具体的元素,原因信息在<a>
中,并在ResultSet
上迭代,例如使用list comprehension
:
tags2 = [e.text for e in soup.find('div', class_='tags').find_all('a')]
css selectors
:的替代使用
tags2 = [e.text for e in soup.select('div.tags a')]
#output
['BEV', 'Enyaq Coupé iV vRS', 'Skoda', 'UK', 'Volkswagen']
如果你想得到一个字符串而不是列表,只需join()
元素:
tags2 = ','.join([e.text for e in soup.find('div', class_='tags').find_all('a')])
#output
BEV,Enyaq Coupé iV vRS,Skoda,UK,Volkswagen