我在我的推文数据集上运行了一个情感分析模型,并创建了一个输出名为'scores'的新列。输出是3个概率的集合:第一个表示推文为负的概率,第二个表示推文为中性的概率,第三个表示推文为正的概率。例如:
[0.013780469, 0.94494355, 0.041276094]
下面是对'score'列的一些观察的截图
使用下面的代码:df.scores.dtype
,我发现数据类型是一个对象。
我想为每个概率创建三个单独的列,'Negative', 'Neutral', "Positive'。因此,我想把"分数"分开。我该怎么做呢?
我已经试过了:
df[['Negative', 'Neutral', 'Positive']] = pd.DataFrame(df.scores.tolist(), index=df.index)
但是我得到一个错误提示:
ValueError: Columns must be same length as key
我也试过这个:
df[['Negative', 'Neutral', 'Positive']] = pd.DataFrame([ x.split('~') for x in df['scores'].tolist() ])
但是我得到一个错误提示:
AttributeError: 'float' object has no attribute 'split'
当使用str(x).split()
代替x.split()
时,我得到了这个错误:
ValueError: Columns must be same length as key
以下是执行print(df['scores'])
时的输出:
0 [0.07552529 0.7626313 0.16184345]
1 [0.0552146 0.7753107 0.16947475]
2 [0.06891786 0.6625086 0.26857358]
3 [0.10522033 0.7078265 0.18695314]
4 [0.04945428 0.78878057 0.16176508]
...
4976 [0.0196455 0.9556966 0.02465796]
4977 [0.02270025 0.94873595 0.02856365]
4978 [0.01378047 0.94494355 0.04127609]
4979 [0.05239033 0.9061995 0.04141007]
4980 [0.0651902 0.9061197 0.02869013]
Name: scores, Length: 4981, dtype: object
下面是我执行df.loc[0:5, "scores"].to_dict()
时的输出:
{0: '[0.07552529 0.7626313 0.16184345]',
1: '[0.0552146 0.7753107 0.16947475]',
2: '[0.06891786 0.6625086 0.26857358]',
3: '[0.10522033 0.7078265 0.18695314]',
4: '[0.04945428 0.78878057 0.16176508]',
5: '[0.02224329 0.87228 0.10547666]'}
你可以试试这个方法:
import pandas as pd
# Create some sample data
df = pd.DataFrame(columns=["scores"], data=["[0.013780469, 0.94494355, 0.041276094]",
"[0.013780469, 0.94494355, 0.941276094]",
"[0.513780469, 0.74494355, 0.041276094]",
"[0.813780469, 0.14494355, 0.541276094]"])
# First strip the unwanted characters and split by ", "
df[['Negative', 'Neutral', 'Positive']] = df.scores.str.replace("[", "", regex=True).replace("]", "", regex=True).str.split(", ", expand=True)
# Drop the original scores column
df.drop("scores", axis=1, inplace=True)
print(df)
输出:Negative Neutral Positive
0 0.013780469 0.94494355 0.041276094
1 0.013780469 0.94494355 0.941276094
2 0.513780469 0.74494355 0.041276094
3 0.813780469 0.14494355 0.541276094