使用BeautifulSoup4从HTML提取IMG SRC4


<div id="thumbnailsImagePreview">
     <img src="getImage.do?imageSize=Small&amp;imageId=730645&amp;r=150521020" imageindex="0" hspace="0" vspace="0" loaded="false" class="selected">
     <img src="getImage.do?imageSize=Small&amp;imageId=7589956&amp;r=150521020" imageindex="1" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=7590018&amp;r=150521020" imageindex="2" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=2803850&amp;r=150521020" imageindex="3" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=2973197&amp;r=150521020" imageindex="4" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=7589888&amp;r=150521020" imageindex="5" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=7877267&amp;r=150521020" imageindex="6" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=7877375&amp;r=150521020" imageindex="7" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=6812892&amp;r=150521020" imageindex="8" hspace="0" vspace="0" loaded="false">
</div>

我试图在此HTML中提取到IMG SRC(对于具有关联imageIndex的链接(的链接,但是由于它们都保存在Div ID中" ThumbnailSimagePreview"中,因此我使用以下代码行时,我得到了一个很大的文字,因此我无法为每个IMG SRC链接解析。

images = soup.find_all('div', attrs = {'id' : 'thumbnailsImagePreview'})

如何获得链接的数组?

当我打印出图像时,这就是我得到的:

[<div id="thumbnailsImagePreview">n<img class="selected" hspace="0" 
imageindex="0" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=730645&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="1" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7589956&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="2" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7590018&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="3" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=2803850&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="4" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=2973197&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="5" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7589888&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="6" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7877267&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="7" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7877375&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="8" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=6812892&amp;r=150521020" vspace="0"/>n<img 
hspace="0" imageindex="9" loaded="false" 
</div>]

您需要通过将每个元素视为字典

来找到内部img元素并获取src属性值
image_srcs = [img['src'] for img in soup.select('#thumbnailsImagePreview img[src]')]

#thumbnailsImagePreview img[src]这是一个CSS选择器,它将找到所有img元素,其 src属性位于带有id="thumbnailsImagePreview"的元素下。

最新更新