python pandas数据框架创建列如果字符串列表包含在一个列



given this df:

data = {'Description':  ['with milk and orange', 'champagne', 'BANANA', 'bananas and apple', 'fafsa Lemons', 'GIN LEMON'],
'Amount': ['10', '20', '10', '5', '9', '15']}
df = pd.DataFrame(data)
print (df)

和以下向量:

Fruits = ['apple','banana','lemon', 'orange']

如何获得"水果"栏?(在列描述中搜索向量Fruits的所有元素,如果包含在描述中,则将它们添加到列"fruit"中)

datanew = {'Description':  ['with milk and orange', 'champagne', 'BANANA', 'bananas and apple', 'fafsa Lemons', 'GIN LEMON'],
'Amount': ['10', '20', '10', '5', '9', '15'],
'Fruit':  ['orange', '', 'banana', 'banana-apple', 'lemon', 'lemon'],

}
df2 = pd.DataFrame(datanew)
print (df2)

这是使用str.findall()的另一种方法

(df.assign(Fruit = df['Description'].str.lower()
.str.findall('|'.join(Fruits))
.str.join('-')))
Description Amount         Fruit
0  with milk and orange     10        orange
1             champagne     20              
2                BANANA     10        banana
3     bananas and apple      5  banana-apple
4          fafsa Lemons      9         lemon
5             GIN LEMON     15         lemon

您可以使用str.extractallgroupby.agg:

import re
df['Fruit'] = (df['Description']
.str.extractall(f"({'|'.join(Fruits)})", flags=re.I)
.groupby(level=0).agg('-'.join)[0]
.str.lower()
)

输出:

Description Amount         Fruit
0  with milk and orange     10        orange
1             champagne     20           NaN
2                BANANA     10        banana
3     bananas and apple      5  banana-apple
4          fafsa Lemons      9         lemon
5             GIN LEMON     15         lemon

相关内容

最新更新