我试图在网页抓取中使用列的值来获得循环的结果



我正在练习网络抓取,我是新手,我正在尝试使用他们的产品id抓取亚马逊。假设我有一个产品id列表

asin = ['B09DKZYBR5','B098QGLXXY','B09BQV32W2']

现在我想一次使用每个列表值来创建产品名列表所以我尝试了

for i in asin:
page = requests.get("https://www.amazon.com.au/dp/" + {i})
soup = page.content
doc = BeautifulSoup(soup, "html.parser")
data = []
data.append({
name = doc.find_all(class_="a-size-large product-title-word-break")[0].parent.find('span').string
})

那么当我做这个

pd.DataFrame(data)

我想要这样的东西

name
0   Auriko Cat Teaser Toy with Wooden Hand..
1   Cactus Cat Scratching Posts Pole Tree...
2   LNLtoy 8 Cat Toys Kitten Toys Assortme..

我得到这个错误

File "C:UsersParthAppDataLocalTemp/ipykernel_21420/2238400382.py", line 2
page = requests.get("https://www.amazon.com.au/dp/" + {i})
^
IndentationError: expected an indented block

我认为你必须在append {name:doc.find_all(class_="a-size-large....}

错误的主要原因是for循环中的标识没有正确完成要得到您想要的解决方案,应该这样做:

asin = ['B09DKZYBR5','B098QGLXXY','B09BQV32W2']
data = []
for i in asin:
page = requests.get("https://www.amazon.com.au/dp/" + str(i))
soup = page.content
doc = BeautifulSoup(soup, "html.parser")
name = doc.find_all(class_="a-size-large product-title-word-break")[0].parent.find('span').string
data.append(name)
df = pd.DataFrame({"name":data}) 

最新更新