如何抓取标签下的一行文本，而<div>标签又在 <div 类>标签下

<div class="style__font-bold___1k9Dl style__font-14px___YZZrf style__flex-row___2AKyf style__space-between___2mbvn style__padding-bottom-5px___2NrDR">
<div>Augmentin 625 Duo Tablet</div></div>

我想刮"；Augmentin 625 Duo片剂"；文本，但似乎无法获取。

我现在使用的代码：

import requests
import bs4
import lxml
result=requests.get("https://www.pharmadude.com")
#print((type(result)))

soup = bs4.BeautifulSoup(result.text,"lxml")
#print(soup)
scrape=soup.find_all('div', attrs={'class': 'style__font-bold___1k9Dl style__font-14px___YZZrf style__flex-row___2AKyf style__space-between___2mbvn style__padding-bottom-5px___2NrDR'})
for div in scrape:
bar=soup.find_all('div')
print(bar.text)

使用find_all定义bar。所以bar是一个列表，不具有text属性。在循环的div变量上使用find，而不是find_all。比如：

import requests
import bs4
import lxml
result=requests.get("https://www.1mg.com/drugs-all-medicines")
#print((type(result)))
soup = bs4.BeautifulSoup(result.text,"lxml")
#print(soup)
scrape=soup.find_all('div', attrs={'class': 'style__font-bold___1k9Dl style__font-14px___YZZrf style__flex-row___2AKyf style__space-between___2mbvn style__padding-bottom-5px___2NrDR'})
for div in scrape:
bar=div.find('div')
print(bar.text)

(你也可以参考这个答案(

import requests
import bs4
import lxml
result=requests.get("https://www.1mg.com/drugs-all-medicines")
soup = bs4.BeautifulSoup(result.text,"lxml")
box = soup.find('a', attrs={"href": "/drugs/augmentin-625-duo-tablet-138629"})
text_content = box.find('span')
for paragraph in text_content.find_all('p'):
print(paragraph.text)
price = text_content.find('div').find('span').text
print(price)

输出：

Augmentin 625 Duo Tablet
Prescription Required
strip of 10 tablets
Glaxo SmithKline Pharmaceuticals Ltd
Amoxycillin  (500mg) +  Clavulanic Acid (125mg)
MRP ₹200.59

首先，识别包含您的数据的框。然后获取包含文本的跨度。对于跨度内的每个段落，打印文本。

如果对价格感兴趣，请进入div并进一步进入它的跨度。

import requests
import bs4
import lxml
result=requests.get("https://www.1mg.com/drugs-all-medicines")
soup = bs4.BeautifulSoup(result.text,"lxml")
scrape=soup.find_all('div', attrs={'class': 'style__font-bold___1k9Dl style__font-14px___YZZrf style__flex-row___2AKyf style__space-between___2mbvn style__padding-bottom-5px___2NrDR'})
bar = soup.find_all('div')[1].text
print(bar)

输出：Augmentin 625 Duo Tablet

这个代码正是您所需要的。

soup.find_all返回一个列表，您需要在列表中找到该项，然后获取该项的.text属性。在您的情况下，项目是第二个项目，因此使用第一个索引。

相关内容

最新更新

热门标签：