循环的最后一次迭代没有完全执行



我目前正在编写一个简短的脚本,从一个零售商在我的祖国刮所有网点。我首先从邮政服务网站上抓取所有可能的邮政编码,然后用Selenium在其位置查找器中逐个自动输入这些编码。在此之后,我检查找到的位置是否已经在我的结果DataFrame中,并添加我尚未找到的位置。下面是我使用的代码:

# Define options and webdriver
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
prefs = {"profile.default_content_setting_values.geolocation" :2}
options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome("path", options=options)

# Get postal codes
driver.get("website post office")
soup = BeautifulSoup(driver.page_source)
postal_codes = [code.string for code in soup.find_all("tag", class_="class")]

# Get retail location
driver.get("retail website url")
option1_button = driver.find_element(By.XPATH,"xpath")
driver.execute_script("arguments[0].click();", option1_button)
option2_button = driver.find_element(By.XPATH,"xpath")
driver.execute_script("arguments[0].click();", option2_button)
outlets = pd.DataFrame(columns = ["Name","Address"])
for i in range(len(postal_codes)):
searchbar  = driver.find_element(By.XPATH,"xpath")
searchbar.clear()
searchbar.send_keys(postal_codes[i])
searchbar.send_keys(Keys.RETURN)
soup = BeautifulSoup(driver.page_source)
names = [name.strong.string for name in soup.find_all("div", class_="class")]
addresses = [address.div.string for address in soup.find_all("div", class_="class")]
for j in range(len(addresses)):
if addresses[j] in outlets["Address"].values:
print(addresses[j] + " added already")
else:
outlets = outlets.append({"Name": names[j],"Address": addresses[j]}, ignore_index=True)

我正在设法刮除所有的位置,除了最后的邮政编码。该脚本完美地操纵了位置查找器,以打开所有邮政编码的零售位置,但postal_code列表中的最后一个除外。对于postal_code列表中的最后一个邮政编码,它正确地打开网页并输入正确的邮政编码,但似乎没有注册网点的地址和名称。当我打开地址列表和姓名列表时,它们仍然包含最后一个之前的邮政编码元素。看起来这个循环并没有完全完成。有人能告诉我问题是什么以及如何解决这个问题吗?谢谢你!

import requests
import pandas as pd

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0'
}

def main(url):
with requests.Session() as req:
req.headers.update(headers)
params = {
'q': '9000',
'filter': [
'KBC_PALO',
'CBC_PALO',
],
'language': 'nl',
}
r = req.get(url, params=params)
df = pd.DataFrame(r.json()['branches'])
print(df)

if __name__ == "__main__":
main('https://www.kbc.be/X9Y-P/elasticsearch-service/api/v3/branches/search')

输出:

branchId                    branchName  ...           saturdayOH cashCd
0  ORG7441          KBC BANK GENT KOUTER  ...  N;09.00.00;12.00.00      2
1  ORG7426       KBC BANK GENT DE STERRE  ...  N;09.00.00;12.00.00      2
2  ORG6304     KBC BANK GENT GRAVENSTEEN  ...                  NaN      3
3  ORG3225  KBC BANK GENT WATERSPORTBAAN  ...  N;09.00.00;12.00.00      3
4  ORG7446           KBC BANK GENTBRUGGE  ...                  NaN      3
5  ORG7447            KBC BANK WONDELGEM  ...                  NaN      3
6  ORG7434           KBC BANK ZWIJNAARDE  ...                  NaN      3
7  ORG7439           KBC BANK MARIAKERKE  ...                  NaN      1
8  ORG3407            KBC BANK OOSTAKKER  ...                  NaN      3
9  ORG3395       KBC BANK ST.-AMANDSBERG  ...  N;09.00.00;12.00.00      2
[10 rows x 20 columns]

相关内容

  • 没有找到相关文章

最新更新