Python IBM Watson Speech to Text API将Transcript转换为CSV



我在Python中使用IBM Watson语音到文本API,并将JSON响应存储为嵌套字典。我可以使用pprint(data_response['results'][0]['alternatives'][0]['transcript'])访问单个记录,但不能打印所有成绩单。我需要把整个记录转储成csv文件。我已经尝试使用生成器理解使用相同的格式建议我在另一个帖子使用print(a["confidence"] for r in data_response["results"] for a in r["alternatives"]),但我一定不理解生成器理解是如何工作的。

下面是使用pretty print的嵌套字典的样子:

{'result_index': 0,
'results': [{'alternatives': [{'confidence': 0.99, 'transcript': 'hello '}],
'final': True},
{'alternatives': [{'confidence': 0.9,
'transcript': 'good morning any this is '}],
'final': True},
{'alternatives': [{'confidence': 0.59,
'transcript': "I'm on a recorded morning "
'%HESITATION today start running '
"yeah it's really good how are "
"you %HESITATION it's one three "
'six thank you so much for '
'asking '}],
'final': True},
{'alternatives': [{'confidence': 0.87,
'transcript': 'I appreciate this opportunity '
'to get together with you and '
'%HESITATION you know learn more '
'about you your interest in '}],
'final': True},

edit:这是我使用@ seachchange的响应将。pkl文件列表转换为。csv文件的最终解决方案,该响应有助于仅导出嵌套字典的transcript部分。我肯定有更有效的方法来转换文件,但它对我的应用程序来说效果很好。

# set the input path
input_path = "00_dataWatson Responses"
# set the output path
output_path = "00_dataWatson Scripts"
# set the list of all files in the input path with a file ending of pkl
files = [f for f in glob.glob(input_path + "**/*.pkl", recursive=True)]
# open each pkl file, convert the list to a dataframe, and export to a csv
for file in files:
base_name = os.path.basename(file)
f_name, f_ext = os.path.splitext(base_name)
pkl_file = open(join(dirname(__file__), input_path, base_name), 'rb')
data_response = pickle.load(pkl_file)
pkl_file.close()
transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
dataframe = pd.DataFrame(transcripts)
dataframe.to_csv(os.path.join(output_path, f'{f_name}.csv'), index = False, header = False)
transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]

给出了所有转录本的列表。此时,它只取决于您希望如何格式化输出文件。如果你想让每个文本在新的一行上,你可以使用writelines。

writeline

最新更新