我正在尝试网络抓取网站并将数据转换为 csv 文件以用于练习目的,当我到达它收集数据并将其存储到变量中的地步时,我收到此错误:
TypeError: string indices must be integers
关于这一行:
email = address['email'].strip()
我希望它收集所有要写入csv文件的数据。 整个代码如下:
from urllib.request import urlopen as uReq
import json
import re
import csv
my_url = 'https://www.haart.co.uk/umbraco/api/branches/getsales/HRT'
uClient = uReq(my_url)
page_json = uClient.read()
uClient.close()
records = []
filename = 'haartscrape.csv'
addresses = json.loads(page_json)
for address in addresses:
headline = address['headline']
address = re.sub(r'<.*?>', '', address['address'])
email = address['email'].strip()
tel = address['telephone']
records.append({'Name':headline, 'Address':address, 'Email': email, 'Telephone':tel})
with open(filename, 'w') as f:
writer = csv.DictWriter(f, ['Name', 'Address', 'Email', 'Telephone'])
writer.writeheader()
for r in records:
writer.writerow(r)
完整回溯:
Traceback (most recent call last):
File "haart_webscrape.py", line 18, in <module>
email = address['email'].strip()
TypeError: string indices must be integers
任何帮助,不胜感激。提前谢谢你。
您正在重新分配 JSON 元素
for address in addresses:
headline = address['headline']
address = # here
重命名循环变量或另一个
或者这样做
with open(filename, 'w') as f:
writer = csv.DictWriter(f, ['Name', 'Address', 'Email', 'Telephone'])
writer.writeheader()
for address in addresses:
r = {
'Name':address['headline'],
'Address':re.sub(r'<.*?>', '', address['address'],
'Email': address['email'].strip(),
'Telephone':address['telephone']}
writer.writerow(r)