用Selenium刮吉菲。无法检索正确的'src'属性



我正在尝试用python硒包刮giphy.com。当我从xPath中选择所需的属性"src"时,它返回的内容与网站的"inspect"部分不同。

它返回的是:giphy gif img giphy img加载的

data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7

而我希望根据网站提取src元素:

src="https://media0.giphy.com/media/j6x5zFoaJN9rAejDfZ/giphy.gif?cid=ecf05e47v5k8qf29vp649xd8nsbba2c0ai8m6ftuifkrnipp&rid=giphy.gif&ct=g"

奇怪的是,当我以前运行这个时,它会为我提供所需的元素,但现在已经停止提供该元素了!

url = 'https://giphy.com/search/fall-over'
img_x_path = '//*[@id="react-target"]/div/div[6]/div[2]/div[1]/a[11]/div/picture/img'
#%%
#first initialise the driver and then get the webpage
def initialise_chrome():
driver = webdriver.Chrome()
driver.get(url)
return driver
driver = initialise_chrome()
# then let's find the xpath element
print(driver)
#%%
x_path_req = driver.find_element_by_xpath(img_x_path)
def retrive_image_link(x_path_req):
#first - locate the img with pre-defined x_path

print(x_path_req)
#from that, then pick the src bit
image_link = x_path_req.get_attribute('src')
print(image_link) 

retrive_image_link(x_path_req)

在我看来,giphy使用的是base64编码的图像,而不是从URL源加载图像。例如,当我检查页面时,我会看到这个。

<a href="https://giphy.com/gifs/1stLookTV-montreal-johnny-bananas-1st-look-tv-fCUCWxvDVyuE9gLQSC" class="giphy-gif  css-r2u7fp" tabindex="0" style="width: 248px; height: 136px; position: absolute; transform: translate3d(792px, 517px, 0px);">
<div style="width: 248px; height: 136px; position: relative;">
<picture>
<source type="image/webp" srcset="https://media4.giphy.com/media/fCUCWxvDVyuE9gLQSC/200w.webp?cid=ecf05e478y8zdwetr9wadyulcwbss0njcd0gvhp8wbw1sf59&amp;rid=200w.webp&amp;ct=g">. 
<img class="giphy-gif-img " src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" width="248" height="136" alt="fall over wipe out GIF by 1st Look" style="background: rgb(153, 51, 255);">
</picture>
</div>
</a>

不过,<img>上方的<source>元素在srcset属性中有实际的URL源,因此您可以更改XPath来提取它。

您也可以编辑XPath来提取srcset属性。我想应该是这个

//*[@id="react-target"]/div/div[6]/div[2]/div[1]/a[11]/div/picture/source/@srcset

最新更新