selenium同时在一个块中刮取多个属性

我有一个遵循以下模式的网页：

<a class="card cardlisting0"  href="abc/def/gh.com">
<div class="contentWrapper"> 
<div class="card-content">
<time datetime="2020-05-31">3 hours ago</time>
</div>
</div>
</a>
<a class="card cardlisting1"  href="ijk/lmn/op.com">
<div class="contentWrapper">
<div class="card-content">
<time datetime="2020-04-30">20200430</time>
</div>
</div>
</a>
...

我想成对地抓取href和date-time属性：[abc/def/gh.com，2020-05-31]，[ijk/lmn/op.com，2020-04-30]

我怎么能意识到这一点？

谢谢。

您可以尝试以下操作：

from bs4 import BeautifulSoup
t='''<a class="card cardlisting0"  href="abc/def/gh.com">
<div class="contentWrapper"> 
<div class="card-content">
<time datetime="2020-05-31">3 hours ago</time>
</div>
</div>
</a>
<a class="card cardlisting1"  href="ijk/lmn/op.com">
<div class="contentWrapper">
<div class="card-content">
<time datetime="2020-04-30">20200430</time>
</div>
</div>
</a>'''
soup=BeautifulSoup(t,"lxml")
aTags=soup.select('a')
data=[]
for aTag in aTags:
timeTag=aTag.select_one('time')
data.append([aTag.get('href'),timeTag['datetime']])
print(data)

您可以使用硒的响应来代替t。

输出：

[['abc/def/gh.com', '2020-05-31'], ['ijk/lmn/op.com', '2020-04-30']]

您可以使用Python使用find_element_by_xpath()和get_attribute()函数，如下所示：

# for the hrefs
urls = [a.get_attribute('href') for a in driver.find_elements_by_xpath('//a[contains(@class, "card cardlisting0")]')]
# for the datetimes
dates = [time_element.get_attribute('datetime') for time_element in driver.find_elements_by_xpath('//a//time')]

相关内容

最新更新

热门标签：