我有一个这样的Dataframe:
df = pd.DataFrame({'text':['No thank you', 'They didnt respond me'],
'pred':['positive', 'negative'],
'score':["[[0, 0, 1], [1, 0, 2], [1, 0, 0]]", "[[], [0, 1, 0], [], []]"]
})
(这是一个字符串,但我们可以将其转换为列表from ast import literal_eval. df["score"] = df["score"].apply(literal_eval)
)
看起来像这样:
text, pred score
No thank you. positive [[0, 0, 1], [1, 0, 2], , [1, 0, 0]]]
They didn't respond me negative [[], [0, 1, 0], [], []]
score
是一个二维列表,其中第一个元素为positive
,第二个元素为negative
,第三个元素为neutral
。
我想要的是如果pred=positive
计算score
中non zero and non empty
的列表数量。negative
和neutral
的逻辑相同。
那么结果将是这样的:
text, pred score count
No thank you. positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]]] 2
They didn't respond me negative [[], [0, 1, 0], [], []] 1
因为在第一行中pred=positive
和score
第一个位置的两个元素非空非零,同样适用于negative
。
到目前为止我做了什么:
m_sum = {"positive": 0, "negative": 1, "neutral": 2}
df["count"] = df.apply(
lambda x: count(v[m_sum[x["pred"]]] for v in x["score"] if v and v!=0),
axis=1)
但是不能这样使用count
。
谢谢。
您可以使用sum()
来计算非零元素的数量:
# if not converted already, convert the "score" column to list:
# from ast import literal_eval
# df["score"] = df["score"].apply(literal_eval)
m_sum = {"positive": 0, "negative": 1, "neutral": 2}
df["count"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] != 0 for v in x["score"] if v),
axis=1,
)
print(df)
打印:
text pred score count
0 No thank you positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] 2
1 They didnt respond me negative [[], [0, 1, 0], [], []] 1
只需将'[]'替换为'[0,0,0]',计算和并根据'pred'列得到正确的列:
get_cnt = lambda x: np.sum(ast.literal_eval(x['score']), axis=0)[m_sum[x['pred']]]
df['count'] = df.replace({'score': {r'[]': '[0, 0, 0]'}}, regex=True)
.apply(get_cnt, axis=1)
输出:
>>> df
text pred score count
0 No thank you positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] 2
1 They didnt respond me negative [[], [0, 1, 0], [], []] 1