如何在pandas中使用count over 2d列表



我有一个这样的Dataframe:

df = pd.DataFrame({'text':['No thank you', 'They didnt respond me'],
'pred':['positive', 'negative'],
'score':["[[0, 0, 1], [1, 0, 2], [1, 0, 0]]", "[[], [0, 1, 0], [], []]"] 
})

(这是一个字符串,但我们可以将其转换为列表from ast import literal_eval. df["score"] = df["score"].apply(literal_eval))

看起来像这样:

text,                  pred                 score                            
No thank you.          positive    [[0, 0, 1], [1, 0, 2], , [1, 0, 0]]]      
They didn't respond me negative    [[], [0, 1, 0], [], []]                

score是一个二维列表,其中第一个元素为positive,第二个元素为negative,第三个元素为neutral

我想要的是如果pred=positive计算scorenon zero and non empty的列表数量。negativeneutral的逻辑相同。

那么结果将是这样的:

text,                  pred                 score                        count                         
No thank you.          positive    [[0, 0, 1], [1, 0, 2], [1, 0, 0]]]      2  
They didn't respond me negative    [[], [0, 1, 0], [], []]                 1

因为在第一行中pred=positivescore第一个位置的两个元素非空非零,同样适用于negative

到目前为止我做了什么:

m_sum = {"positive": 0, "negative": 1, "neutral": 2}
df["count"] = df.apply(
lambda x: count(v[m_sum[x["pred"]]] for v in x["score"] if v and v!=0),
axis=1)

但是不能这样使用count

谢谢。

您可以使用sum()来计算非零元素的数量:

# if not converted already, convert the "score" column to list:
# from ast import literal_eval
# df["score"] = df["score"].apply(literal_eval)
m_sum = {"positive": 0, "negative": 1, "neutral": 2}
df["count"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] != 0 for v in x["score"] if v),
axis=1,
)
print(df)

打印:

text      pred                              score  count
0           No thank you  positive  [[0, 0, 1], [1, 0, 2], [1, 0, 0]]      2
1  They didnt respond me  negative            [[], [0, 1, 0], [], []]      1

只需将'[]'替换为'[0,0,0]',计算和并根据'pred'列得到正确的列:

get_cnt = lambda x: np.sum(ast.literal_eval(x['score']), axis=0)[m_sum[x['pred']]]
df['count'] = df.replace({'score': {r'[]': '[0, 0, 0]'}}, regex=True) 
.apply(get_cnt, axis=1)

输出:

>>> df
text      pred                              score  count
0           No thank you  positive  [[0, 0, 1], [1, 0, 2], [1, 0, 0]]      2
1  They didnt respond me  negative            [[], [0, 1, 0], [], []]      1

最新更新