如何在pandas中使用特定条件进行过滤并同时应用函数



我有一个这样的Dataframe:

text,                  pred                 score                             logits
No thank you.          positive    [[0, 0, 1], [1, 0, 2], , [1, 0, 0]]]   [0.01, 0.02, 0.97]      
They didn't respond me negative    [[], [0, 1, 0], [], []]                [0.81, 0.10, 0.18]

中可以使用:

df = pd.DataFrame({'text':['No thank you', 'They didnt respond me negative'],
'pred':['positive', 'negative'],
'score':['[[0, 0, 1], [1, 0, 2],[1, 0, 0]]]', '[[], [0, 1, 0], [], []]'],
'logits':['[0.01, 0.02, 0.97]', '[0.81, 0.10, 0.18]']})

我需要做的是:

如果df['pred'] = 'positive',我想把score第一个位置的所有元素加起来sum(df['score'][0])(0+1+1),乘以logits的第三个元素df['logits'][2](0.97)

(我们将对negative做同样的事情,只是改变位置:sum(df['score'][1])1+0+0+0乘以logits的第一个元素即df['logits'][1]0.81

所以输出看起来像这样:

text,                  pred                 score                       logits          decision
No thank you.          positive    [[0, 0, 1], [1, 0, 2], [1, 0, 0]]  [0.01, 0.02, 0.97]  1.94    
They didn't respond me negative    [[], [0, 1, 0], [], []]            [0.81, 0.10, 0.18]  0.81

我做了什么(或我需要遵循的逻辑),显然我的代码不运行,我想问题是在这里sum(df['score'][0])

df[df['pred'] == 'positive','decision'] = df[df['pred'] == 'positive', df['logits'][2] * sum(df['score'][0])]

更清晰

在score中,我们有一个与每个单词相关的列表。这就是为什么3在第一行,4在第二行。它们只不过是(积极的,消极的,中性的)与每个单词相关的分数。如果列表为空,则在计算中将其替换为零。

一个可能的解决方案是创建具有各种规则的映射字典(例如,如果为正,则只对第一个索引求和(0)等):

m_sum = {"positive": 0, "negative": 1}
m_mul = {"positive": 2, "negative": 0}
df["decision"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] for v in x["score"] if v)
* x["logits"][m_mul[x["pred"]]],
axis=1,
)
print(df)

打印:

text,      pred                              score              logits  decision
0           No thank you.  positive  [[0, 0, 1], [1, 0, 2], [1, 0, 0]]  [0.01, 0.02, 0.97]      1.94
1  They didn't respond me  negative            [[], [0, 1, 0], [], []]   [0.81, 0.1, 0.18]      0.81

EDIT: withast.literal_eval:

import pandas as pd
from ast import literal_eval

df = pd.DataFrame(
{
"text": ["No thank you", "They didnt respond me negative"],
"pred": ["positive", "negative"],
"score": [
"[[0, 0, 1], [1, 0, 2],[1, 0, 0]]",
"[[], [0, 1, 0], [], []]",
],
"logits": ["[0.01, 0.02, 0.97]", "[0.81, 0.10, 0.18]"],
}
)

df["score"] = df["score"].apply(literal_eval)
df["logits"] = df["logits"].apply(literal_eval)
m_sum = {"positive": 0, "negative": 1}
m_mul = {"positive": 2, "negative": 0}

df["decision"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] for v in x["score"] if v)
* x["logits"][m_mul[x["pred"]]],
axis=1,
)
print(df)

打印:

text      pred                              score              logits  decision
0                    No thank you  positive  [[0, 0, 1], [1, 0, 2], [1, 0, 0]]  [0.01, 0.02, 0.97]      1.94
1  They didnt respond me negative  negative            [[], [0, 1, 0], [], []]   [0.81, 0.1, 0.18]      0.81

最新更新