解决方案
我有一个数据框,其中一列作为列表,另一列作为字典。然而,这并不一致。它可以是单个元素,也可以是NULL
df = pd.DataFrame({'item_id':[1,1,1,2,3,4,4],
'shop_id':['S1','S2','S3','S2','S3','S1','S2'],
'price_list':["{'10':['S1','S2'], '20':['S3'], '30':['S4']}","{'10':['S1','S2'], '20':['S3'], '30':['S4']}","{'10':['S1','S2'], '20':['S3'], '30':['S4']}",'50','NaN',"{'10':['S1','S2','S3'],'25':['S4']}","{'10':['S1','S2','S3'],'25':['S4']}"]})
+---------+---------+--------------------------------------------------+
| item_id | shop_id | price_list |
+---------+---------+--------------------------------------------------+
| 1 | S1 | {'10': ['S1', 'S2'], '20': ['S3'], '30': ['S4']} |
| 1 | S2 | {'10': ['S1', 'S2'], '20': ['S3'], '30': ['S4']} |
| 1 | S3 | {'10': ['S1', 'S2'], '20': ['S3'], '30': ['S4']} |
| 2 | S2 | 50 |
| 3 | S3 | NaN |
| 4 | S1 | {'10': ['S1', 'S2', 'S3'], '25': ['S4']} |
| 4 | S2 | {'10': ['S1', 'S2', 'S3'], '25': ['S4']} |
+---------+---------+--------------------------------------------------+
我想把这个扩展为:
+---------+---------+-------+
| item_id | shop_id | price |
+---------+---------+-------+
| 1 | S1 | 10 |
| 1 | S2 | 10 |
| 1 | S3 | 20 |
| 2 | S2 | 50 |
| 3 | S3 | NaN |
| 4 | S1 | 10 |
| 4 | S2 | 10 |
+---------+---------+-------+
我试过apply:
def get_price(row):
if row['price_list'][0]=='{':
prices = eval(row['price_list'])
for key,value in prices.items():
if str(row['shop_id']) in value:
price = key
break
price = np.nan
else:
price = row["price_list"]
return price
df['price'] = df.apply(lambda row: get_price(row),axis=1)
price_list列中的字典元素实际上是字符串,所以我可能需要首先将它们作为字典求值?
但是上面的方法花费了很多时间,因为我的数据框架非常大。
实现这一目标的最佳方法是什么?任何建议都很感激。谢谢!
我将使用带有生成器的列表推导式从值中搜索键:
df['price'] = [next((k for k,l in d.items() if s in l), None)
if isinstance(d, dict) else d
for s, d in zip(df['shop_id'], df.pop('price_list'))]
NB。pop
删除了"price_list";列就位
输出:
item_id shop_id price
0 1 S1 10
1 1 S2 10
2 1 S3 20
3 2 S2 50
4 3 S3 NaN
5 4 S1 10
6 4 S2 10
如果您有dicts的字符串表示形式,则使用解决方案import ast
df['price'] = [next((k for k,l in ast.literal_eval(d).items() if s in l), None)
if isinstance(d, str) and d.startswith('{') else d
for s, d in zip(df['shop_id'], df.pop('price_list'))]
import ast
df['price'] = [next((k for k,l in ast.literal_eval(d).items() if s in l), None)
if isinstance(d, str) and d.startswith('{') else d
for s, d in zip(df['shop_id'], df.pop('price_list'))]