我在处理txt和json数据集方面相对较新。我在一个txt文件中有一个对话数据集,我想把它转换成一个csv文件,每一行都转换成一列。当下一个对话框(下一段(开始时,它以一个新行开始。所以我得到格式的数据
Header = ['Q1' , 'A1' , 'Q2' , 'A2' .......]
以下是供参考的数据(此文件为txt格式(:对话框数据
1 hello hello what can i help you with today
2 may i have a table in a moderate price range for two in rome with italian cuisine i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call italian rome two moderate
1 hi hello what can i help you with today
2 can you make a restaurant reservation in a expensive price range with british cuisine in rome for eight people i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call british rome eight expensive
1 hi hello what can i help you with today
2 may i have a table in london with spanish cuisine i'm on it
3 <SILENCE> how many people would be in your party
4 we will be six which price range are looking for
5 i am looking for a moderate restaurant ok let me look into some options for you
6 <SILENCE> api_call spanish london six moderate
CSV文件是用逗号分隔的字符串列表,换行符(n
(分隔行。
由于这种简单的布局,它通常不适合包含可能包含逗号的字符串,例如对话。
也就是说,对于您的输入文件,可以使用regex将任何换行符替换为逗号,这样可以有效地执行";每一新行转换成一列,每一新段落转换成一新行;要求
import re
with open('input.txt', 'r') as reader:
text = reader.read()
text = re.sub(r"(...)n", r"1,", text)
print(text)
with open('output.csv', 'w') as writer:
writer.write(text)
这里的工作示例。