我已经编写了一些代码来抓取"地址"和"电话"针对某些商店名称,这工作正常。但是,它有两个参数需要填写以执行其活动。我希望从csv文件中执行相同的操作,其中"名称"将位于第一列中,"Lid"将位于第二列中,并且收获的结果将相应地放置在第三列和第四列中。在这一点上,我无法了解如何从csv文件执行搜索。任何建议将不胜感激。
import requests
from lxml import html
Names=["Literati Cafe","Standard Insurance Co","Suehiro Cafe"]
Lids=["3221083","497670909","12183177"]
for Name in Names and Lids:
Page_link="https://www.yellowpages.com/los-angeles-ca/mip/"+Name.replace(" ","-")+"-"+Name
response = requests.get(Page_link)
tree = html.fromstring(response.text)
titles = tree.xpath('//article[contains(@class,"business-card")]')
for title in titles:
Address= title.xpath('.//p[@class="address"]/span/text()')[0]
Contact = title.xpath('.//p[@class="phone"]/text()')[0]
print(Address,Contact)
CSV获取Names
和Lids
列表,例如:
import csv
Names, Lids = [], []
with open("file_name.csv", "r") as f:
reader = csv.DictReader(f)
for line in reader:
Names.append(line["Name"])
Lids.append(line["Lid"])
(暂时不要介意PEP违规行为;)(。然后你可以在代码的其余部分使用它,虽然我不确定你想用你的for Name in Names and Lids:
循环实现什么,但它并没有给你你认为它是什么 - 它不会循环Names
列表,而只会通过Lids
列表。
此外,优化的第一个顺序应该是将循环替换为 CSV 上的循环,例如:
with open("file_name.csv", "r") as f:
reader = csv.DictReader(f)
for entry in reader:
page_link = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}".format(entry["Name"].replace(" ","-"), entry["Lid"])
# rest of your scraping code...