通过子字符串查找div类,然后提取整个类名



我试图找到所有包含子字符串"拍卖结果"的div,然后提取类名。这里有一个例子:

<div class="auction-results high-bid has-price"></div>

我可以找到所有包含"拍卖结果"的div,如下所示:

results = soup.select("div[class*=auction-results]")
type(results)
results
Out: [<div class="auction-results high-bid has-price">
<i class="icon"></i>
<span class="lot-price">       $700,000</span>
</div>]
Out: bs4.element.ResultSet

我想要的是将整个类名"拍卖结果高出价有价格"存储在熊猫栏中,如下所示:

class_text = ['auction-results high-bid has-price']
'auction-results high-bid has-price'
scraped_data = pd.DataFrame({'class_text': class_text})
scraped_data
class_text
0   auction-results high-bid has-price

我还没有找到解决方案,所以我希望有人能帮我,谢谢!

这样试试:

columns = ['class_text']
rows = []
for result in results:
rows.append(' '.join(result['class']))
scraped_data = pd.DataFrame([rows],columns=columns)
scraped_data

输出:

class_text
0   auction-results high-bid has-price

请参阅下面的示例。您可以将其视为html文档,并使用lxml解析全名值。

from lxml import html

results = '<div class="auction-results high-bid has-price"><i class="icon"></i><span class="lot-price">$700,000</span></div>'
tree = html.fromstring(results)
name = tree.xpath("//div[contains(@class,'auction-results')]/@class")
print(name)

它打印完整的类名

["结果-高出价有价格"]

相关内容

最新更新