如何将多个功能应用于单个PANDAS数据框列



我很好奇是否可以将多个功能应用于单个pandas dataframe列。例如,假设我有三个功能:

in:

def foo(col):
    if 'hi' in col:
        return 'TRUE'
def bar(col):
    if 'bye' in col:
        return 'TRUE'
def baz(col):
    if 'ok' in col:
        return 'TRUE'

和以下数据框:

dfs = pd.DataFrame({'col':['The quick hi brown fox hi jumps over the lazy dog', 
                           'The quick hi brown fox bye jumps over the lazy dog', 
                           'The NO quick brown fox ok jumps bye over the lazy dog']})

如果我想将每个功能应用于col,通常会使用PANDAS应用功能:

dfs['new_col1'] = dfs['col'].apply(foo)
dfs['new_col2'] = dfs['col'].apply(bar)
dfs['new_col3'] = dfs['col'].apply(baz)
dfs

out:

    col     new_col1    new_col2    new_col3
0   The quick hi brown fox hi jumps over the lazy dog   TRUE    None    None
1   The quick hi brown fox bye jumps over the lazy...   TRUE    TRUE    None
2   The NO quick brown fox ok jumps bye over the l...   None    TRUE    TRUE

但是,如您所见,我创建了3列。因此,我的问题是如何在大型数据范围内有效地应用上述3个函数,同时将3个函数应用于特定的列?,预期的结果应为:

    col                                                 new_col
0   The quick hi brown fox hi jumps over the lazy dog   TRUE
1   The quick hi brown fox bye jumps over the lazy...   TRUE, TRUE
2   The NO quick brown fox ok jumps bye over the l...   TRUE, TRUE

请注意,我知道我可以将3列合并为单个列。尽管如此,我想知道上述问题是否可能。

为什么不将所有功能汇总到一个巨型功能中?

def oneGaintFunc(col):    
    def foo(col):
        if 'hi' in col:
            return 'TRUE'
    def bar(col):
        if 'bye' in col:
            return 'TRUE'
    def baz(col):
        if 'ok' in col:
            return 'TRUE'
    a = foo(col)
    b = bar(col)
    c = baz(col)
    return '{} {} {}'.format(a, b, c)
df['new_col'] = df['col'].apply(oneGiantFunc)

您可以将 applylist comprehension一起使用,其中 None值:

dfs['new_col'] = dfs['col'].apply(lambda x: (', '.join([x for x in 
                                            [foo(x), bar(x), baz(x)] if x != None])))
print (dfs)
                                                 col     new_col
0  The quick hi brown fox hi jumps over the lazy dog        TRUE
1  The quick hi brown fox bye jumps over the lazy...  TRUE, TRUE
2  The NO quick brown fox ok jumps bye over the l...  TRUE, TRUE

我认为您实际上不能同时完成'。但是尽管如此,这里有2个选项

1。假设功能定义为:

dfs['new_col1'] = (dfs['col'].apply(foo)&dfs['col'].apply(bar))&dfs['col'].apply(baz)

2。重新定义功能

def foo(aao): # all at once
    if ('hi' in col) and ('bye' in col) and ('ok' in col):
        return 'TRUE'
dfs['new_col'] = dfs['col'].apply(aao)

使用lambda函数,例如

lambda x: ', '.join([f(x) for f in [foo, bar, baz] if f(x)])

在申请的电话中。完整示例:

In : dfs['new_col'] = dfs['col'].apply(lambda x: ', '.join([f(x) for f in [foo, bar, baz] if f(x)]))
In : dfs
Out: 
                                                 col     new_col
0  The quick hi brown fox hi jumps over the lazy dog        TRUE
1  The quick hi brown fox bye jumps over the lazy...  TRUE, TRUE
2  The NO quick brown fox ok jumps bye over the l...  TRUE, TRUE

最新更新