用Selenium刮吉菲。无法检索正确的'src'属性

我正在尝试用python硒包刮giphy.com。当我从xPath中选择所需的属性"src"时，它返回的内容与网站的"inspect"部分不同。

它返回的是：giphy gif img giphy img加载的

data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7

而我希望根据网站提取src元素：

src="https://media0.giphy.com/media/j6x5zFoaJN9rAejDfZ/giphy.gif?cid=ecf05e47v5k8qf29vp649xd8nsbba2c0ai8m6ftuifkrnipp&amp;rid=giphy.gif&amp;ct=g"

奇怪的是，当我以前运行这个时，它会为我提供所需的元素，但现在已经停止提供该元素了！

url = 'https://giphy.com/search/fall-over'
img_x_path = '//*[@id="react-target"]/div/div[6]/div[2]/div[1]/a[11]/div/picture/img'
#%%
#first initialise the driver and then get the webpage
def initialise_chrome():
driver = webdriver.Chrome()
driver.get(url)
return driver
driver = initialise_chrome()
# then let's find the xpath element
print(driver)
#%%
x_path_req = driver.find_element_by_xpath(img_x_path)
def retrive_image_link(x_path_req):
#first - locate the img with pre-defined x_path

print(x_path_req)
#from that, then pick the src bit
image_link = x_path_req.get_attribute('src')
print(image_link) 

retrive_image_link(x_path_req)

在我看来，giphy使用的是base64编码的图像，而不是从URL源加载图像。例如，当我检查页面时，我会看到这个。

<a href="https://giphy.com/gifs/1stLookTV-montreal-johnny-bananas-1st-look-tv-fCUCWxvDVyuE9gLQSC" class="giphy-gif  css-r2u7fp" tabindex="0" style="width: 248px; height: 136px; position: absolute; transform: translate3d(792px, 517px, 0px);">
<div style="width: 248px; height: 136px; position: relative;">
<picture>
<source type="image/webp" srcset="https://media4.giphy.com/media/fCUCWxvDVyuE9gLQSC/200w.webp?cid=ecf05e478y8zdwetr9wadyulcwbss0njcd0gvhp8wbw1sf59&amp;rid=200w.webp&amp;ct=g">. 
<img class="giphy-gif-img " src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" width="248" height="136" alt="fall over wipe out GIF by 1st Look" style="background: rgb(153, 51, 255);">
</picture>
</div>
</a>

不过，<img>上方的<source>元素在srcset属性中有实际的URL源，因此您可以更改XPath来提取它。

您也可以编辑XPath来提取srcset属性。我想应该是这个

//*[@id="react-target"]/div/div[6]/div[2]/div[1]/a[11]/div/picture/source/@srcset

相关内容

最新更新

热门标签：