将.txt文件转换为.csv,其中每一行都指向一个新列,每一段都指向一行



我在处理txt和json数据集方面相对较新。我在一个txt文件中有一个对话数据集,我想把它转换成一个csv文件,每一行都转换成一列。当下一个对话框(下一段(开始时,它以一个新行开始。所以我得到格式的数据

Header = ['Q1' , 'A1' , 'Q2' , 'A2' .......]

以下是供参考的数据(此文件为txt格式(:对话框数据

1 hello hello what can i help you with today
2 may i have a table in a moderate price range for two in rome with italian cuisine i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call italian rome two moderate
1 hi    hello what can i help you with today
2 can you make a restaurant reservation in a expensive price range with british cuisine in rome for eight people    i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call british rome eight expensive
1 hi    hello what can i help you with today
2 may i have a table in london with spanish cuisine i'm on it
3 <SILENCE> how many people would be in your party
4 we will be six    which price range are looking for
5 i am looking for a moderate restaurant    ok let me look into some options for you
6 <SILENCE> api_call spanish london six moderate

CSV文件是用逗号分隔的字符串列表,换行符(n(分隔行。

由于这种简单的布局,它通常不适合包含可能包含逗号的字符串,例如对话。

也就是说,对于您的输入文件,可以使用regex将任何换行符替换为逗号,这样可以有效地执行";每一新行转换成一列,每一新段落转换成一新行;要求

import re
with open('input.txt', 'r') as reader:
text = reader.read()
text = re.sub(r"(...)n", r"1,", text)
print(text)
with open('output.csv', 'w') as writer:
writer.write(text)

这里的工作示例。

最新更新