在熊猫中使用应用函数创建新列类型错误:字符串索引必须是整数



我有一个熊猫数据帧,其中我有一个不完整的地址列表,我将其推送到 Google Maps API 以获取尽可能多的关于每个地址的数据,并将这些数据存储在一个名为 Components 的列中,然后使用其他函数对其进行解析以获取区域名称、邮政编码等。

这就是它的样子

df['Components'][0]:
"{'access_points': [],
'address_components': [{'long_name': '350',
'short_name': '350',
'types': ['subpremise']},
{'long_name': '1313', 'short_name': '1313', 'types': ['street_number']},
{'long_name': 'Broadway', 'short_name': 'Broadway', 'types': ['route']},
{'long_name': 'New Tacoma',
'short_name': 'New Tacoma',
'types': ['neighborhood', 'political']},
{'long_name': 'Tacoma',
'short_name': 'Tacoma',
'types': ['locality', 'political']},
{'long_name': 'Pierce County',
'short_name': 'Pierce County',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Washington',
'short_name': 'WA',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'United States',
'short_name': 'US',
'types': ['country', 'political']},
{'long_name': '98402', 'short_name': '98402', 'types': ['postal_code']}],
'formatted_address': '1313 Broadway #350, Tacoma, WA 98402, USA',
'geometry': {'location': {'lat': 47.250653, 'lng': -122.43913},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 47.2520019802915,
'lng': -122.4377810197085},
'southwest': {'lat': 47.2493040197085, 'lng': -122.4404789802915}}},
'place_id': 'ChIJcysCMHtVkFQRRUkEIPwScyk',
'plus_code': {'compound_code': '7H26+78 Tacoma, Washington, United States',
'global_code': '84VV7H26+78'},
'types': ['establishment', 'finance', 'point_of_interest']}"

然后我使用以下函数获取区域名称

def get_area(address_data):
for item in address_data['address_components']:
typs = set(item['types'])
if typs == set(['neighborhood', 'political']):
return item['long_name']
return None
df.loc[:10000, 'area'] = df['Components'][:10000].apply(get_area)
TypeError                                 Traceback (most recent call last)
<ipython-input-233-eb2932e010e3> in <module>
----> 1 dfm.loc[:10000, 'area'] = dfm['Components'][:10000].apply(get_area)
2 dfm['area'].value_counts()
~/virt_env/virt2/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
4040             else:
4041                 values = self.astype(object).values
-> 4042                 mapped = lib.map_infer(values, f, convert=convert_dtype)
4043 
4044         if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-232-ede4aa629b42> in get_area(address_data)
149 
150 def get_area(address_data):
--> 151     for item in address_data['address_components']:
152         typs = set(item['types'])
153         if typs == set(['neighborhood', 'political']):
TypeError: string indices must be integers

如何解决此问题以便能够在"组件"列上运行此函数和其他函数?

出现该问题是因为df['Components']是一个字符串,有几种解决方法:

import json
def get_area(address_data_raw): 
address_data = json.loads(address_data_raw) 
for item in address_data['address_components']: 
...

第二种方式:

import json
def get_area(address_data):
...
to_dict = lambda x: json.loads(x)
df.loc[:10000, 'area'] = df['Components'][:10000].apply(to_dict)
df.loc[:10000, 'area'] = df['Components'][:10000].apply(get_area)

这些是让它工作的几种方法!

最新更新