我目前正在编写一个简短的脚本,从一个零售商在我的祖国刮所有网点。我首先从邮政服务网站上抓取所有可能的邮政编码,然后用Selenium在其位置查找器中逐个自动输入这些编码。在此之后,我检查找到的位置是否已经在我的结果DataFrame中,并添加我尚未找到的位置。下面是我使用的代码:
# Define options and webdriver
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
prefs = {"profile.default_content_setting_values.geolocation" :2}
options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome("path", options=options)
# Get postal codes
driver.get("website post office")
soup = BeautifulSoup(driver.page_source)
postal_codes = [code.string for code in soup.find_all("tag", class_="class")]
# Get retail location
driver.get("retail website url")
option1_button = driver.find_element(By.XPATH,"xpath")
driver.execute_script("arguments[0].click();", option1_button)
option2_button = driver.find_element(By.XPATH,"xpath")
driver.execute_script("arguments[0].click();", option2_button)
outlets = pd.DataFrame(columns = ["Name","Address"])
for i in range(len(postal_codes)):
searchbar = driver.find_element(By.XPATH,"xpath")
searchbar.clear()
searchbar.send_keys(postal_codes[i])
searchbar.send_keys(Keys.RETURN)
soup = BeautifulSoup(driver.page_source)
names = [name.strong.string for name in soup.find_all("div", class_="class")]
addresses = [address.div.string for address in soup.find_all("div", class_="class")]
for j in range(len(addresses)):
if addresses[j] in outlets["Address"].values:
print(addresses[j] + " added already")
else:
outlets = outlets.append({"Name": names[j],"Address": addresses[j]}, ignore_index=True)
我正在设法刮除所有的位置,除了最后的邮政编码。该脚本完美地操纵了位置查找器,以打开所有邮政编码的零售位置,但postal_code列表中的最后一个除外。对于postal_code列表中的最后一个邮政编码,它正确地打开网页并输入正确的邮政编码,但似乎没有注册网点的地址和名称。当我打开地址列表和姓名列表时,它们仍然包含最后一个之前的邮政编码元素。看起来这个循环并没有完全完成。有人能告诉我问题是什么以及如何解决这个问题吗?谢谢你!
import requests
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0'
}
def main(url):
with requests.Session() as req:
req.headers.update(headers)
params = {
'q': '9000',
'filter': [
'KBC_PALO',
'CBC_PALO',
],
'language': 'nl',
}
r = req.get(url, params=params)
df = pd.DataFrame(r.json()['branches'])
print(df)
if __name__ == "__main__":
main('https://www.kbc.be/X9Y-P/elasticsearch-service/api/v3/branches/search')
输出:
branchId branchName ... saturdayOH cashCd
0 ORG7441 KBC BANK GENT KOUTER ... N;09.00.00;12.00.00 2
1 ORG7426 KBC BANK GENT DE STERRE ... N;09.00.00;12.00.00 2
2 ORG6304 KBC BANK GENT GRAVENSTEEN ... NaN 3
3 ORG3225 KBC BANK GENT WATERSPORTBAAN ... N;09.00.00;12.00.00 3
4 ORG7446 KBC BANK GENTBRUGGE ... NaN 3
5 ORG7447 KBC BANK WONDELGEM ... NaN 3
6 ORG7434 KBC BANK ZWIJNAARDE ... NaN 3
7 ORG7439 KBC BANK MARIAKERKE ... NaN 1
8 ORG3407 KBC BANK OOSTAKKER ... NaN 3
9 ORG3395 KBC BANK ST.-AMANDSBERG ... N;09.00.00;12.00.00 2
[10 rows x 20 columns]