我有这个熊猫数据帧
import pandas as pd
df = pd.DataFrame([{'col1': ['plane', 'chair']}, {'col1': ['computer', 'beach', 'book', 'language']}, {'col1': ['rice', 'bus', 'street']}])
我有这本字典
categories = {
'transport': ['car', 'truck', 'plane'],
'reading': ['book', 'library'],
'food': ['rice', 'milk', 'tea']
}
我想要这样的最终输出:
index col1 col2
0: ['plane', 'chair'], transport-plane
1: ['computer', 'beach', 'book', 'language'], reading-book
2: ['rice', 'bus', 'street'], food-rice
我希望col2拥有字典中的关键字和值。
我只添加了字典中的键,但没有添加字典中的密钥和值。
尝试:
tmp = {vv: k for k, v in categories.items() for vv in v}
x = df.explode("col1")
x["col2"] = x["col1"].apply(
lambda x: "{}-{}".format(tmp[x], x) if x in tmp else np.nan
)
x = x.groupby(level=0).agg(
col1=("col1", list), col2=("col2", lambda x: ", ".join(x[x.notna()]))
)
print(x)
打印:
col1 col2
0 [plane, chair] transport-plane
1 [computer, beach, book, language] reading-book
2 [rice, bus, street] food-rice
怎么样:
import pandas as pd
df = pd.DataFrame([{'col1': ['plane', 'chair']}, {'col1': ['computer', 'beach', 'book', 'language']}, {'col1': ['rice', 'bus', 'street']}])
categories = {
'transport': ['car', 'truck', 'plane'],
'reading': ['book', 'library'],
'food': ['rice', 'milk', 'tea']
}
def match_pairs(categories, df):
col2=[]
index=0
for v in categories:
print(f'{df["col1"][index]} at index {index}')
for entry in df['col1'][index]:
print(f"Finding [{entry}] in {categories[v]}...")
if entry in categories[v]:
col2.append(v+'-'+entry)
break
index += 1
print(col2)
df['col2'] = col2
return df
print (match_pairs(categories, df))