无法从json中抓取jpg图像链接



我正在尝试从每个产品中抓取jpg图像,每个产品的url都保存在csv中。json数据中有图像链接,所以请尝试访问json键值。当我尝试运行代码时,尽管有图像url链接,但它只能返回所有键值,其次,尽管所有url都保存在csv中,但我的代码只能抓取最后一个产品url。

{'name': {'b': {'src': {'xs': 'https://ctl.s6img.com/society6/img/xVx1vleu7iLcR79ZkRZKqQiSzZE/w_125/artwork/~artwork/s6-0041/a/18613683_5971445', 'lg': 'https://ctl.s6img.com/society6/img/W-ESMqUtC_oOEUjx-1E_SyIdueI/w_550/artwork/~artwork/s6-0041/a/18613683_5971445', 'xl': 'https://ctl.s6img.com/society6/img/z90VlaYwd8cxCqbrZ1ttAxINpaY/w_700/artwork/~artwork/s6-0041/a/18613683_5971445', 'xxl': None}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}, 'c': {'src': {'xs': 'https://ctl.s6img.com/society6/img/KQJbb4jG0gBHcqQiOCivLUbKMxI/w_125/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'lg': 'https://ctl.s6img.com/society6/img/ztGrxSpA7FC1LfzM3UldiQkEi7g/w_550/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xl': 'https://ctl.s6img.com/society6/img/PHjp9jDic2NGUrpq8k0aaxsYZr4/w_700/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xxl': 'https://ctl.s6img.com/society6/img/m-1HhSM5CIGl6DY9ukCVxSmVDIw/w_1500/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg'}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}, 'd': {'src': {'xs': 'https://ctl.s6img.com/society6/img/G9TikRnVvy1w0kwKCAmgWsWy42Q/w_125/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'lg': 'https://ctl.s6img.com/society6/img/uVOYOxbHmhrNhmGQAi6QeydrFdY/w_550/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xl': 'https://ctl.s6img.com/society6/img/-WIIUx9oB6jQKJdkSkq2ofhjLzc/w_700/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xxl': 'https://ctl.s6img.com/society6/img/HlSFppIm7Wk6aVxO17fI4b5s0ts/w_1500/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg'}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}}}

这是json数据。我只想刮jpg图片链接。以下是我的代码:

import json
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

contents = []
with open('test.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
newlist = []
for url in contents:
try:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, 'html.parser')
scripts = soup.find_all('script')[7].text.strip()[24:]
data = json.loads(scripts)
link = data['product']['response']['product']['data']['attributes']['media_map']
except:
link = 'no data'
detail = {
'name': link
}
print(detail)
newlist.append(detail)
df = pd.DataFrame(detail)
df.to_csv('s1.csv')

我正在尝试抓取所有的jpg图像链接,我保存了每个产品url的csv文件,所以我想打开csv文件并循环每个url。

很少有东西:

  1. df = pd.DataFrame(detail)应为df = pd.DataFrame(newlist)
  2. 你的循环缩进是关闭的。事实上,你为什么要循环两次url?您可以从test.csv中获取url(无论如何都应该使用panda(,将url放入contents列表中,然后在该列表中循环

试试这个:

import json
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

contents = []
with open('test.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
try:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, 'html.parser')
scripts = soup.find_all('script')[7].text.strip()[24:]
data = json.loads(scripts)
link = data['product']['response']['product']['data']['attributes']['media_map']
except:
link = 'no data'
detail = {
'name': link
}
print(detail)
contents.append(detail)
df = pd.DataFrame(contents)
df.to_csv('s1.csv')

相关内容

  • 没有找到相关文章

最新更新