如何用漂亮的汤将图像提取到数据帧中



如何从Asos网站提取图像并将其放入数据帧中?我希望标题是好的

def asos(soup_in):
# Image
image_div = soup_in.find_all('img', class_='_2r9Zh0W', alt=True)

for container1 in image_div:
container1 = container1["src"]
print(container1)
image.append(container1)
url = "https://www.asos.com/men/t-shirts-vests/cat/?cid=7616&nlid=mw%7Cclothing%7Cshop%20by%20product%7Ct-shirts%20%26%20vests&page=1"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
results = requests.get(url, headers=headers)
soup = BeautifulSoup(results.text, "html.parser")
asos(soup, brand_names)

asos_t_shirt = pd.DataFrame({
'Images': image,
})

它目前使用src标签在中返回四个随机图像

//images.asos-media.com/products/levis-2-horses-t-shirt-in-red/2116073-1-red$n_480w$&wid=476&拟合=约束//images.asos-media.com/products/levis-spaced-t-shirt-in-light-blue2/11168144-1-lightblue$n_480w$&wid=476&拟合=约束//images.asos-media.com/products/tommy-hilfiger-larg-patch-broken-flag-box-embroid-logo-long-sheef-to-in-white/23377256-1-白色$n_480w$&wid=476&拟合=约束//images.asos-media.com/products/tommy-jeans-big-tall-corp-logo-t-shirt-in-twillight-navy/2345661-1-twilightnavy$n_480w$&wid=476&拟合=约束

html中的类看起来像这个

<img alt="" class="_2r9Zh0W" data-auto-id="productTileImage" sizes="(min-width: 768px) 317px, 238px" src="//images.asos-media.com/products/under-armour-training-lockertag-logo-t-shirt-in-red/21982249-1-red?$n_480w$&amp;wid=476&amp;fit=constrain" srcset="//images.asos-media.com/products/under-armour-training-lockertag-logo-t-shirt-in-red/21982249-1-red?$n_240w$&amp;wid=238&amp;fit=constrain 238w,//images.asos-media.com/products/under-armour-training-lockertag-logo-t-shirt-in-red/21982249-1-red?$n_320w$&amp;wid=317&amp;fit=constrain 317w,//images.asos-media.com/products/under-armour-training-lockertag-logo-t-shirt-in-red/21982249-1-red?$n_480w$&amp;wid=476&amp;fit=constrain 476w,//images.asos-media.com/products/under-armour-training-lockertag-logo-t-shirt-in-red/21982249-1-red?$n_640w$&amp;wid=634&amp;fit=constrain 634w,//images.asos-media.com/products/under-armour-training-lockertag-logo-t-shirt-in-red/21982249-1-red?$n_750w$&amp;wid=714&amp;fit=constrain 714w,//images.asos-media.com/products/under-armour-training-lockertag-logo-t-shirt-in-red/21982249-1-red?$n_960w$&amp;wid=952&amp;fit=constrain 952w">

如果您已经导入了请求,则可以运行

r = requests.get(img_url, allow_redirects=False)

在你得到的四个地址中的任何一个上(尽管我会在问号后丢弃任何东西(,图像的内容(这是一个.web图像文件(将被存储作为

img = r.contents

因此,您可以将其保存为磁盘上某个位置的.web文件,并将其路径存储在数据文件中,或者将其值(长字符串(分配给DataFrame中的单元格。Webp格式并不是最受欢迎的格式,因此您可能对使用PIL或其他工具进行转换感兴趣。

最新更新