我正在练习网络抓取,我是新手,我正在尝试使用他们的产品id抓取亚马逊。假设我有一个产品id列表
asin = ['B09DKZYBR5','B098QGLXXY','B09BQV32W2']
现在我想一次使用每个列表值来创建产品名列表所以我尝试了
for i in asin:
page = requests.get("https://www.amazon.com.au/dp/" + {i})
soup = page.content
doc = BeautifulSoup(soup, "html.parser")
data = []
data.append({
name = doc.find_all(class_="a-size-large product-title-word-break")[0].parent.find('span').string
})
那么当我做这个
pd.DataFrame(data)
我想要这样的东西
name
0 Auriko Cat Teaser Toy with Wooden Hand..
1 Cactus Cat Scratching Posts Pole Tree...
2 LNLtoy 8 Cat Toys Kitten Toys Assortme..
我得到这个错误
File "C:UsersParthAppDataLocalTemp/ipykernel_21420/2238400382.py", line 2
page = requests.get("https://www.amazon.com.au/dp/" + {i})
^
IndentationError: expected an indented block
我认为你必须在append {name:doc.find_all(class_="a-size-large....}
错误的主要原因是for循环中的标识没有正确完成要得到您想要的解决方案,应该这样做:
asin = ['B09DKZYBR5','B098QGLXXY','B09BQV32W2']
data = []
for i in asin:
page = requests.get("https://www.amazon.com.au/dp/" + str(i))
soup = page.content
doc = BeautifulSoup(soup, "html.parser")
name = doc.find_all(class_="a-size-large product-title-word-break")[0].parent.find('span').string
data.append(name)
df = pd.DataFrame({"name":data})