从列表字典的数据框列中提取值并创建新列



我有一个数据框,其中一列作为列表,另一列作为字典。然而,这并不一致。它可以是单个元素,也可以是NULL

df = pd.DataFrame({'item_id':[1,1,1,2,3,4,4],
'shop_id':['S1','S2','S3','S2','S3','S1','S2'], 
'price_list':["{'10':['S1','S2'], '20':['S3'], '30':['S4']}","{'10':['S1','S2'], '20':['S3'], '30':['S4']}","{'10':['S1','S2'], '20':['S3'], '30':['S4']}",'50','NaN',"{'10':['S1','S2','S3'],'25':['S4']}","{'10':['S1','S2','S3'],'25':['S4']}"]})

+---------+---------+--------------------------------------------------+
| item_id | shop_id |                      price_list                  |
+---------+---------+--------------------------------------------------+
|       1 | S1      | {'10': ['S1', 'S2'], '20': ['S3'], '30': ['S4']} |
|       1 | S2      | {'10': ['S1', 'S2'], '20': ['S3'], '30': ['S4']} |
|       1 | S3      | {'10': ['S1', 'S2'], '20': ['S3'], '30': ['S4']} |
|       2 | S2      | 50                                               |
|       3 | S3      | NaN                                              |
|       4 | S1      | {'10': ['S1', 'S2', 'S3'], '25': ['S4']}         |
|       4 | S2      | {'10': ['S1', 'S2', 'S3'], '25': ['S4']}         |
+---------+---------+--------------------------------------------------+

我想把这个扩展为:

+---------+---------+-------+
| item_id | shop_id | price |
+---------+---------+-------+
|       1 | S1      | 10    |
|       1 | S2      | 10    |
|       1 | S3      | 20    |
|       2 | S2      | 50    |
|       3 | S3      | NaN   |
|       4 | S1      | 10    |
|       4 | S2      | 10    |
+---------+---------+-------+

我试过apply:

def get_price(row):
if row['price_list'][0]=='{':
prices = eval(row['price_list'])
for key,value in prices.items():
if str(row['shop_id']) in value:
price = key
break
price =  np.nan
else:
price =  row["price_list"]
return price

df['price'] = df.apply(lambda row: get_price(row),axis=1)

price_list列中的字典元素实际上是字符串,所以我可能需要首先将它们作为字典求值?

但是上面的方法花费了很多时间,因为我的数据框架非常大。

实现这一目标的最佳方法是什么?任何建议都很感激。谢谢!

我将使用带有生成器的列表推导式从值中搜索键:

df['price'] = [next((k for k,l in d.items() if s in l), None)
if isinstance(d, dict) else d
for s, d in zip(df['shop_id'], df.pop('price_list'))]

NB。pop删除了"price_list";列就位

输出:

item_id shop_id price
0        1      S1    10
1        1      S2    10
2        1      S3    20
3        2      S2    50
4        3      S3   NaN
5        4      S1    10
6        4      S2    10
如果您有dicts的字符串表示形式,则使用

解决方案
import ast
df['price'] = [next((k for k,l in ast.literal_eval(d).items() if s in l), None)
if isinstance(d, str) and d.startswith('{') else d
for s, d in zip(df['shop_id'], df.pop('price_list'))]

相关内容

  • 没有找到相关文章

最新更新