selenium同时在一个块中刮取多个属性



我有一个遵循以下模式的网页:

<a class="card cardlisting0"  href="abc/def/gh.com">
<div class="contentWrapper"> 
<div class="card-content">
<time datetime="2020-05-31">3 hours ago</time>
</div>
</div>
</a>
<a class="card cardlisting1"  href="ijk/lmn/op.com">
<div class="contentWrapper">
<div class="card-content">
<time datetime="2020-04-30">20200430</time>
</div>
</div>
</a>
...

我想成对地抓取href和date-time属性:[abc/def/gh.com,2020-05-31],[ijk/lmn/op.com,2020-04-30]

我怎么能意识到这一点?

谢谢。

您可以尝试以下操作:

from bs4 import BeautifulSoup
t='''<a class="card cardlisting0"  href="abc/def/gh.com">
<div class="contentWrapper"> 
<div class="card-content">
<time datetime="2020-05-31">3 hours ago</time>
</div>
</div>
</a>
<a class="card cardlisting1"  href="ijk/lmn/op.com">
<div class="contentWrapper">
<div class="card-content">
<time datetime="2020-04-30">20200430</time>
</div>
</div>
</a>'''
soup=BeautifulSoup(t,"lxml")
aTags=soup.select('a')
data=[]
for aTag in aTags:
timeTag=aTag.select_one('time')
data.append([aTag.get('href'),timeTag['datetime']])
print(data)

您可以使用硒的响应来代替t

输出:

[['abc/def/gh.com', '2020-05-31'], ['ijk/lmn/op.com', '2020-04-30']]

您可以使用Python使用find_element_by_xpath()get_attribute()函数,如下所示:

# for the hrefs
urls = [a.get_attribute('href') for a in driver.find_elements_by_xpath('//a[contains(@class, "card cardlisting0")]')]
# for the datetimes
dates = [time_element.get_attribute('datetime') for time_element in driver.find_elements_by_xpath('//a//time')]

相关内容

  • 没有找到相关文章

最新更新