BeautifulSoup,如何阻止我的网络链接列表相互覆盖



下面的代码只给了我列表中的最后一个单词

import csv
wo = csv.reader(open('WORD.csv') )
row=list(wo)
from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen as uReq  # Web client
# URl to web scrape from.
# in this example we web scrape lexico 
with open("WORD.csv") as f:
for row in csv.reader(f):

for word in row:

# Number of pages plus one
url = "https://www.lexico.pt/{}".format(word)

# opens the connection and downloads html page from url
uClient = uReq(url)
page_html = uClient.read()

# parses html into a soup data structure to traverse html
# as if it were a json data type.
page_soup = soup(page_html, "html.parser")
# finds each product from the store page
containers = page_soup.find("div", {"class": "card card-pl card-pl-significado"})
# name the output file to write to local disk
out_filename = "test.csv"

# opens file, and writes headers
f = open(out_filename, "w")


Word = containers.h2.text
Defention = containers.p.text

f.write("n" + Word + ", " + Defention + "n")

f.close()

请帮帮我,我什么都试过了。我是BeautifulSoup的初学者,很抱歉我糟糕的代码格式

正如我前面提到的,我相信您已经实现了目标。

在python中,作用域是通过缩进来确定的。这定义了局部变量的有效区域。由于您在示例中没有持续遵循这一点,所以在发送第一个请求时,迭代已经完成。循环变量已被重新分配,并且包含上一个迭代步骤的结果。

# open files for reading and writing
with open('WORD.csv') as src, open('test.txt', 'w') as dst:
# read row by row
for row in csv.reader(src):
# get words separated by comma
for word in row:
# open connection and create parser with read data
url = f'https://www.lexico.pt/{word}'
resp = urlopen(url)
html = soup(resp.read(), 'html.parser')
# find card/content
card = html.find('div', {'class':'card-pl-significado'})
word = card.h2.text
desc = card.p.text
# write formatted result to file
dst.write(f'{word}, {desc}n')

玩得开心

最新更新