given this df:
data = {'Description': ['with milk and orange', 'champagne', 'BANANA', 'bananas and apple', 'fafsa Lemons', 'GIN LEMON'],
'Amount': ['10', '20', '10', '5', '9', '15']}
df = pd.DataFrame(data)
print (df)
和以下向量:
Fruits = ['apple','banana','lemon', 'orange']
如何获得"水果"栏?(在列描述中搜索向量Fruits的所有元素,如果包含在描述中,则将它们添加到列"fruit"中)
datanew = {'Description': ['with milk and orange', 'champagne', 'BANANA', 'bananas and apple', 'fafsa Lemons', 'GIN LEMON'],
'Amount': ['10', '20', '10', '5', '9', '15'],
'Fruit': ['orange', '', 'banana', 'banana-apple', 'lemon', 'lemon'],
}
df2 = pd.DataFrame(datanew)
print (df2)
这是使用str.findall()
的另一种方法
(df.assign(Fruit = df['Description'].str.lower()
.str.findall('|'.join(Fruits))
.str.join('-')))
Description Amount Fruit
0 with milk and orange 10 orange
1 champagne 20
2 BANANA 10 banana
3 bananas and apple 5 banana-apple
4 fafsa Lemons 9 lemon
5 GIN LEMON 15 lemon
您可以使用str.extractall
和groupby.agg
:
import re
df['Fruit'] = (df['Description']
.str.extractall(f"({'|'.join(Fruits)})", flags=re.I)
.groupby(level=0).agg('-'.join)[0]
.str.lower()
)
输出:
Description Amount Fruit
0 with milk and orange 10 orange
1 champagne 20 NaN
2 BANANA 10 banana
3 bananas and apple 5 banana-apple
4 fafsa Lemons 9 lemon
5 GIN LEMON 15 lemon