我正在尝试基于相同的url格式获取三个状态的数据。
states = ['123', '124', '125']
urls = []
for state in states:
url = f'www.something.com/geo={state}'
urls.append(url)
从那里我有三个独立的url,每个url都包含不同的状态ID。
然而,当我通过BS处理它时,输出只显示来自状态123的数据。
for url in urls:
client = ScrapingBeeClient(api_key="API_KEY")
response = client.get(url)
doc = BeautifulSoup(response.text, 'html.parser')
随后,我用提取了我想要的列
listings = doc.select('.is-9-desktop')
rows = []
for listing in listings:
row = {}
try:
row['name'] = listing.select_one('.result-title').text.strip()
except:
print("no name")
try:
row['add'] = listing.select_one('.address-text').text.strip()
except:
print("no add")
try:
row['mention'] = listing.select_one('.review-mention-block').text.strip()
except:
pass
rows.append(row)
但如前所述,它只显示了123州的数据。如果有人能告诉我哪里出了问题,我将不胜感激,谢谢!
编辑
我将URL输出添加到一个列表中,并能够获得所有三个状态的数据。
doc = []
for url in urls:
client = ScrapingBeeClient(api_key="API_KEY")
response = client.get(url)
docs = BeautifulSoup(response.text, 'html.parser')
doc.append(docs)
然而,当我通过BS运行它时,它导致了错误消息:
属性错误:"list"对象没有属性选择。
我要通过另一个循环运行它吗?
它不需要所有这些循环-只需迭代状态并将列表附加到行中。
最重要的是,rows=[]
被放置在for循环之外,以阻止它重写自己。
示例
states = ['123', '124', '125']
rows = []
for state in states:
url = f'www.something.com/geo={states}'
client = ScrapingBeeClient(api_key="API_KEY")
response = client.get(url)
doc = BeautifulSoup(response.text, 'html.parser')
listings = doc.select('.is-9-desktop')
for listing in listings:
row = {}
try:
row['name'] = listing.select_one('.result-title').text.strip()
except:
print("no name")
try:
row['add'] = listing.select_one('.address-text').text.strip()
except:
print("no add")
try:
row['mention'] = listing.select_one('.review-mention-block').text.strip()
except:
pass
rows.append(row)