我很好奇是否可以将多个功能应用于单个pandas dataframe列。例如,假设我有三个功能:
in:
def foo(col):
if 'hi' in col:
return 'TRUE'
def bar(col):
if 'bye' in col:
return 'TRUE'
def baz(col):
if 'ok' in col:
return 'TRUE'
和以下数据框:
dfs = pd.DataFrame({'col':['The quick hi brown fox hi jumps over the lazy dog',
'The quick hi brown fox bye jumps over the lazy dog',
'The NO quick brown fox ok jumps bye over the lazy dog']})
如果我想将每个功能应用于col
,通常会使用PANDAS应用功能:
dfs['new_col1'] = dfs['col'].apply(foo)
dfs['new_col2'] = dfs['col'].apply(bar)
dfs['new_col3'] = dfs['col'].apply(baz)
dfs
out:
col new_col1 new_col2 new_col3
0 The quick hi brown fox hi jumps over the lazy dog TRUE None None
1 The quick hi brown fox bye jumps over the lazy... TRUE TRUE None
2 The NO quick brown fox ok jumps bye over the l... None TRUE TRUE
但是,如您所见,我创建了3列。因此,我的问题是如何在大型数据范围内有效地应用上述3个函数,同时将3个函数应用于特定的列?,预期的结果应为:
col new_col
0 The quick hi brown fox hi jumps over the lazy dog TRUE
1 The quick hi brown fox bye jumps over the lazy... TRUE, TRUE
2 The NO quick brown fox ok jumps bye over the l... TRUE, TRUE
请注意,我知道我可以将3列合并为单个列。尽管如此,我想知道上述问题是否可能。
为什么不将所有功能汇总到一个巨型功能中?
def oneGaintFunc(col):
def foo(col):
if 'hi' in col:
return 'TRUE'
def bar(col):
if 'bye' in col:
return 'TRUE'
def baz(col):
if 'ok' in col:
return 'TRUE'
a = foo(col)
b = bar(col)
c = baz(col)
return '{} {} {}'.format(a, b, c)
df['new_col'] = df['col'].apply(oneGiantFunc)
您可以将 apply
与 list comprehension
一起使用,其中 None
值:
dfs['new_col'] = dfs['col'].apply(lambda x: (', '.join([x for x in
[foo(x), bar(x), baz(x)] if x != None])))
print (dfs)
col new_col
0 The quick hi brown fox hi jumps over the lazy dog TRUE
1 The quick hi brown fox bye jumps over the lazy... TRUE, TRUE
2 The NO quick brown fox ok jumps bye over the l... TRUE, TRUE
我认为您实际上不能同时完成'。但是尽管如此,这里有2个选项
1。假设功能定义为:
dfs['new_col1'] = (dfs['col'].apply(foo)&dfs['col'].apply(bar))&dfs['col'].apply(baz)
2。重新定义功能
def foo(aao): # all at once
if ('hi' in col) and ('bye' in col) and ('ok' in col):
return 'TRUE'
dfs['new_col'] = dfs['col'].apply(aao)
使用lambda函数,例如
lambda x: ', '.join([f(x) for f in [foo, bar, baz] if f(x)])
在申请的电话中。完整示例:
In : dfs['new_col'] = dfs['col'].apply(lambda x: ', '.join([f(x) for f in [foo, bar, baz] if f(x)]))
In : dfs
Out:
col new_col
0 The quick hi brown fox hi jumps over the lazy dog TRUE
1 The quick hi brown fox bye jumps over the lazy... TRUE, TRUE
2 The NO quick brown fox ok jumps bye over the l... TRUE, TRUE