我有一个这样的Dataframe:
text, pred score logits
No thank you. positive [[0, 0, 1], [1, 0, 2], , [1, 0, 0]]] [0.01, 0.02, 0.97]
They didn't respond me negative [[], [0, 1, 0], [], []] [0.81, 0.10, 0.18]
中可以使用:
df = pd.DataFrame({'text':['No thank you', 'They didnt respond me negative'],
'pred':['positive', 'negative'],
'score':['[[0, 0, 1], [1, 0, 2],[1, 0, 0]]]', '[[], [0, 1, 0], [], []]'],
'logits':['[0.01, 0.02, 0.97]', '[0.81, 0.10, 0.18]']})
我需要做的是:
如果df['pred'] = 'positive'
,我想把score
第一个位置的所有元素加起来sum(df['score'][0])
即(0+1+1)
,乘以logits
的第三个元素df['logits'][2]
即(0.97)
。
(我们将对negative
做同样的事情,只是改变位置:sum(df['score'][1])
即1+0+0+0
乘以logits
的第一个元素即df['logits'][1]
即0.81
所以输出看起来像这样:
text, pred score logits decision
No thank you. positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] [0.01, 0.02, 0.97] 1.94
They didn't respond me negative [[], [0, 1, 0], [], []] [0.81, 0.10, 0.18] 0.81
我做了什么(或我需要遵循的逻辑),显然我的代码不运行,我想问题是在这里sum(df['score'][0])
。
df[df['pred'] == 'positive','decision'] = df[df['pred'] == 'positive', df['logits'][2] * sum(df['score'][0])]
更清晰
在score中,我们有一个与每个单词相关的列表。这就是为什么3在第一行,4在第二行。它们只不过是(积极的,消极的,中性的)与每个单词相关的分数。如果列表为空,则在计算中将其替换为零。
一个可能的解决方案是创建具有各种规则的映射字典(例如,如果为正,则只对第一个索引求和(0
)等):
m_sum = {"positive": 0, "negative": 1}
m_mul = {"positive": 2, "negative": 0}
df["decision"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] for v in x["score"] if v)
* x["logits"][m_mul[x["pred"]]],
axis=1,
)
print(df)
打印:
text, pred score logits decision
0 No thank you. positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] [0.01, 0.02, 0.97] 1.94
1 They didn't respond me negative [[], [0, 1, 0], [], []] [0.81, 0.1, 0.18] 0.81
EDIT: withast.literal_eval
:
import pandas as pd
from ast import literal_eval
df = pd.DataFrame(
{
"text": ["No thank you", "They didnt respond me negative"],
"pred": ["positive", "negative"],
"score": [
"[[0, 0, 1], [1, 0, 2],[1, 0, 0]]",
"[[], [0, 1, 0], [], []]",
],
"logits": ["[0.01, 0.02, 0.97]", "[0.81, 0.10, 0.18]"],
}
)
df["score"] = df["score"].apply(literal_eval)
df["logits"] = df["logits"].apply(literal_eval)
m_sum = {"positive": 0, "negative": 1}
m_mul = {"positive": 2, "negative": 0}
df["decision"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] for v in x["score"] if v)
* x["logits"][m_mul[x["pred"]]],
axis=1,
)
print(df)
打印:
text pred score logits decision
0 No thank you positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] [0.01, 0.02, 0.97] 1.94
1 They didnt respond me negative negative [[], [0, 1, 0], [], []] [0.81, 0.1, 0.18] 0.81