如何根据条件在Pandas DataFrame中单元格的字典中创建新的键值对



我需要根据条件向pandas dataframe列添加一个新的键值对。目标列数据采用字典格式。因此,如果条件为真,则需要创建pair,否则不需要任何操作。

df = pd.DataFrame({"amenity": ["1","2","3","4"], "tags": [{"building":"yes"},{"entrance": "yes"},{},{}], "sport": [None, "hockey", "football", None], "leisure":["multi", "some", "field", "wake"]})
leisure_var_add = ["field", "multi"]
df['tags']['sport'] = np.where((df['sport'] != None) | (df['leisure'].isin(leisure_var_add))), df['sport'], None)
df['tags']['leisure'] = np.where((df['sport'] == None) & (df['leisure'] !=None) & (~df['leisure'].isin(leisure_var_add)), df['leisure'], None)

我想要这样的东西:

amenity                                         tags     sport leisure
0       1          {'building':'yes','sport': 'multi'}      None   multi
1       2        {'entrance': 'yes','sport': 'hockey'}    hokkey    some
2       3    {'sport': 'football', 'leisure': 'field'}  football   field
3       4                          {'leisure': 'wake'}      None    wake

我已经实现了这个任务,在每一行上循环并使用索引操作,但是在这种情况下,我失去了Pandas的所有好处。你知道如何实施吗?

使用推导式:

df['tags'] = df[['sport', 'leisure']] 
.apply(lambda x: {k: v for k, v in x[x.notna()].items()}, axis=1)

输出:

>>> df
amenity                                       tags     sport leisure
0       1                       {'leisure': 'multi'}      None   multi
1       2     {'sport': 'hokkey', 'leisure': 'some'}    hokkey    some
2       3  {'sport': 'football', 'leisure': 'field'}  football   field
3       4                        {'leisure': 'wake'}      None    wake

我使用apply将所有数据移动到列中,然后迭代使用列数据构建标签字典的行,不包括便利

df = pd.DataFrame({"amenity": ["1","2","3","4"], "tags": [{"building":"yes"},{"entrance": "yes"},{},{}], "sport": [None, "hockey", "football", None], "leisure":["multi", "some", "field", "wake"]})
def EmptyList(x):
if len(x)>0:
return x[0]
else:
return None
df['building']=df['tags'].apply(lambda x: [v for k,v in x.items() if k=='building']).apply(EmptyList)
df['entrance']=df['tags'].apply(lambda x: [v for k,v in x.items() if k=='entrance']).apply(EmptyList)
df.drop(['tags'],inplace=True,axis=1)
print(df)
tags_dict={}
columns=df.columns
for key,value in df.iterrows():
for column in columns:
if value[column]!=None and column != 'amenity':
#print(value[column])
tags_dict[column]=value[column]
#print(tags_dict)
df.loc[key,'tags']=str(tags_dict)
tags_dict.clear()
print(df)

输出
amenity     sport leisure building entrance  
0       1      None   multi      yes     None   
1       2    hockey    some     None      yes   
2       3  football   field     None     None   
3       4      None    wake     None     None   
tags  
0            {'leisure': 'multi', 'building': 'yes'}  
1  {'sport': 'hockey', 'leisure': 'some', 'entran...  
2          {'sport': 'football', 'leisure': 'field'}  
3                                {'leisure': 'wake'}  

相关内容

  • 没有找到相关文章

最新更新