TypeError:在抓取JSON数据时,列表索引必须是整数或切片,而不是str错误



我试图使用scrapy抓取JSON数据。我在抓取JSON数据时出错:

更新:

前6个值运行良好。其他值不打印任何内容。如果我使用这些值,其他值也会打印N/A。值存在,但不返回任何内容。

导致错误的表达式如下:

"网站":value['_source']['AgentMarketingCenter']['0']['网站'],

"Facebook":value['_source']['AgentMarketingCenter']['0']['Facebook_URL'],

"领英":value['_source']['AgentMarketingCenter']['0']['LinkedIn_URL'],

"Twitter":value['_source']['AgentMarketingCenter']['0']['Witter'],

"BIO":value['_source']['AgentMarketingCenter']['0']['Bio'],

import scrapy
import json
class MainSpider(scrapy.Spider):
name = 'main'
start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']
def parse(self, response):
resp = json.loads(response.body)
values = resp['hits']['hits']
for value in values:
try: 
yield {
'Full Name': value['_source']['fullName'],
'Primary Phonenumber':value['_source']['primaryPhone'],
"Email": value['_source']['primaryEmail'],
"City": value['_source']['agentPrimaryLocation'][0]['city'],
"State": value['_source']['agentPrimaryLocation'][0]['state'],
"Zip": value['_source']['agentPrimaryLocation'][0]['zipcode'],
"Website": value['_source']['AgentMarketingCenter']['0']['Website'],
"Facebook": value['_source']['AgentMarketingCenter']['0']['Facebook_URL'],
"LinkedIn": value['_source']['AgentMarketingCenter']['0']['LinkedIn_URL'],
"Twitter": value['_source']['AgentMarketingCenter']['0']['Twitter'],
"BIO": value['_source']['AgentMarketingCenter']['0']['Bio'],
}
except KeyError:
yield { 
'Full Name': 'N/A',
'Primary Phonenumber': 'N/A',
'Email': 'N/A',
'City': 'N/A',
'State': 'N/A',
'Zip': 'N/A',
'Website': 'N/A',
'Facebook': 'N/A',
'LinkedIn': 'N/A',
'Twitter': 'N/A',
'BIO': 'N/A',
}

您想要收集的信息并不是所有dict都存在,因此您需要使用具有默认值的get方法来避免错误

item = {
'Full Name': value['_source']['fullName'],
'Primary Phonenumber': value['_source']['primaryPhone'],
"Email": value['_source']['primaryEmail'],
"City": value['_source']['agentPrimaryLocation'][0]['city'],
"State": value['_source']['agentPrimaryLocation'][0].get('stateName', 'NA'),
"Zip": value['_source']['agentPrimaryLocation'][0]['zipcode'],
"Website": value['_source']['AgentMarketingCenter'][0].get('Website', 'NA'),
"Facebook": value['_source']['AgentMarketingCenter'][0].get('Facebook_URL', 'NA'),
"LinkedIn": value['_source']['AgentMarketingCenter'][0].get('LinkedIn_URL', 'NA'),
"Twitter": value['_source']['AgentMarketingCenter'][0].get('Twitter', 'NA'),
"BIO": value['_source']['AgentMarketingCenter'][0].get('Bio', 'NA'),
}

相关内容

  • 没有找到相关文章

最新更新