从网页的H标签中抓取文本并将其保存到csv文件



我试图从页面的所有 h3 和 h4 标签中提取文本并将其保存到 csv 文件中:

样本:

<div class="vc_column-inner">
<div class="wpb_wrapper">
<div class="wpb_text_column wpb_content_element ">
<div class="wpb_wrapper">
<h4>service text</h4>
</div>
</div>
<div class="wpb_text_column wpb_content_element ">
<div class="wpb_wrapper">
<h3 style="color: #2ac4ea; font-size: 35px;">2.900</h3>
</div>
</div>
</div> 
</div>

我的代码:

service=[]
price=[]
url = 'www.site.com'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.content, 'html.parser')

for div in soup.findAll(class_='row'):
for div1 in div.findAll(class_='vc_column-inner'):
services=div1.find('h4')
prices=div1.find('h3')
service.append(services)
price.append(prices)

df = pd.DataFrame({'service':service,'price':price}) 
df.to_csv('results.csv', index=False, encoding='utf-8')

结果:

service,price
<h4>service text</h4>,"<h3 style=""color: #2ac4ea; font-size: 35px;"">2.900</h3>"

我需要这样:

service,price
service text,2.900

上述方法可以吗? 谢谢

servicesprices变量追加到列表时,请使用.get_text()方法:

service.append(services.get_text(strip=True))
price.append(prices.get_text(strip=True))

然后结果将是:

service,price
service text,2.900

代码:

service=[]
price=[]
url = 'www.site.com'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.content, 'html.parser')

for div in soup.findAll(class_='row'):
for div1 in div.findAll(class_='vc_column-inner'):
services=div1.find('h4')
prices=div1.find('h3')
service.append(services.get_text(strip=True))  # <-- .get_text()
price.append(prices.get_text(strip=True))      # <-- .get_text()
df = pd.DataFrame({'service':service,'price':price}) 
df.to_csv('results.csv', index=False, encoding='utf-8')

最新更新