我如何遍历数据帧并获得现有文本(transcript)的极性分数,以便我在python中有1行每个id ?



我能够用脚本遍历目录中的文件,但无法将相同的逻辑应用于所有转录都在表/数据框中的情况。我之前的脚本-

import os    
from glob import glob
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
files = glob('C:/Users/jj/Desktop/Bulk_Wav_Completed_CancelsvsSaves/*.csv')
sid = SentimentIntensityAnalyzer()
# use dict comprehension to apply you analysis
data = {os.path.basename(file): sid.polarity_scores(' '.join(pd.read_csv(file, encoding="utf-8")['transcript'])) for file in files}
# create a data frame from the dictionary above
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv("sentimentcancelvssaves.csv")

我如何将上面的应用到下面的表中

dfo
Out[52]: 
InteractionId             Agent  Transcript  
0      100392327420210105      David Michel  hi how are you          
1      100392327420210105      David Michel  yes i am not fine       
2      100390719220210104    Mindy Campbell  .,xyz..        
3      100390719220210104    Mindy Campbell  no       
4      100390719220210104    Mindy Campbell  maybe    
...               ...  ...       ...      ...
93407  300390890320200915    Sandra Yacklin  ...   
93408  300390890320200915    Sandra Yacklin  ...     
93409  300390890320200915    Sandra Yacklin  ...     

如你所见,我有一个唯一的列交互id。我的最终数据集为每个id提供1行,并且我需要获得附加到该id的情感的极性分数。

期望输出100390719220210104 -

InteractionId             Agent     Transcript       Positive     Compound
2      100390719220210104    Mindy Campbell  xyz no maybe     0.190   0.5457

如何为所有交互id执行此操作?当我必须将我的脚本应用于目录中的所有转录csv并遍历它们时,我能够做到这一点。但是,我如何将其应用于所有数据都在一个地方而不是不同的数据框架呢?

所以不是遍历文件,而是遍历唯一的interactionid。你可以使用:for interaction_id in dfo['InteractionId'].unique()

然后将ID列中的值连接起来你可以通过
' '.join(dfo[dfo['InteractionId'] == interaction_id]['Transcript'])

把它们放在一起,你有:

import os
from glob import glob
import nltk
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
dfo = pd.DataFrame(
data={
'InteractionId': [
100392327420210105,
100390719220210104,
100390719220210104,
100390719220210104,
],
'Transcript': ['hi how are you', '.,xyz..', 'no', 'maybe'],
}
)
sid = SentimentIntensityAnalyzer()
# use dict comprehension to apply you analysis
data = {
interaction_id: sid.polarity_scores(
' '.join(dfo[dfo['InteractionId'] == interaction_id]['Transcript'])
)
for interaction_id in dfo['InteractionId'].unique()
}
# create a data frame from the dictionary above
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv("sentimentcancelvssaves.csv")

最新更新