使用Python和BeautifulSoup进行Webscrape - 保存到csv文件时出错



我正在尝试编写一个脚本,该脚本将从本网站上抓取房地产经纪人的姓名,角色和电话号码。

我的代码:

containers = page_soup.findAll("div",{"class":"card horizontal-split vcard"})
filename = "agents.csv"
f = open(filename, "w")
headers = "name, role, numbern" 
f.write(headers)
for container in containers:
agent_name = container.findAll("li", {"class":"agent-name"})
if agent_name:
name = agent_name[0].text
agent_role = container.findAll("li", {"class":"agent-role"})
if agent_role:
role = agent_role[0].text
filterfn = lambda x: 'href' in x.attrs and x['href'].startswith("tel")
phones = list(map(lambda x: x.text,filter(filterfn,container.findAll("a"))))
print("name: " + name)
print("role: " + role)
print("phones:" + repr(phones))
f.write(name + "," +role + "," + phones.replace(",", "|") + "," + "n")
f.close()

我的代码在尝试将其保存到可以在 excel 中打开的 csv 文件之前在终端中工作。但是,现在我收到两条错误消息:

TypeError: must be str, not list
f.write(name + "," +role + "," + phones.replace(",", "|") + "," + "n")

f.write(name + "," +role + "," + phones.replace(",", "|") + "," + "n")
AttributeError: 'list' object has no attribute 'replace'

**请注意,我将","替换为"|"以避免在csv文件中创建额外的列。

正如错误所提到的,phones是一个没有replace()方法的列表。您可以改用.join()将列表的元素与指定的分隔符连接起来(在本例中为|(:

f.write(name + "," +role + "," + '|'.join(phones) + "," + "n")

例如:

>>> phones = ['123', '321', '123']
>>> '|'.join(phones)
'123|321|123'

最新更新