我正在尝试将多个段落中的评论连接成一个段落 - 我正在尝试这样做:
for x in docs:
with open(fp) as data_file:
data_item = json.load(data_file)
b = data_item['reviews']
for item in b:
name = '000' + str(counter) + '.txt'
file = open(name, 'wb')
output = item['text']
" ".join(output.split())
counter = counter+1
file.write(output.encode('utf-8'))
file.close()
但是它不起作用;每个.txt输出文件都是原样(在 JSON 字段中使用 (...
示例 JSON:
{ "评论": [ { "创建": "2008-07-09T00:00:00", "文本": "有一些令人放心的东西等等。 \乐队的技巧 \克雷格·芬恩的歌声等", }, "votes_negative": 0, "votes_positive": 0 } ] }
结果输出 (.txt(:
有一些令人放心的东西等等。
乐队的技巧等。
克雷格·芬恩的歌声等。
提前非常感谢。
您不将 join 的输出分配给变量,请尝试以下操作:
# sidenote: use enumerate to replace counter
for counter, item in enumerate(b):
name = '000' + str(counter) + '.txt'
output = item['text']
output = ' '.join(output.split())
# imho with is always nicer than open/close
with open(name, ‘wb’) as file:
file.write(output.encode(‘utf-8’))
如果我正确阅读了您的问题,您希望所有内容都在一行中,您可以这样做:
...
output = item['text'].replace('n',' ')
...
输出:
There's something reassuring etc. The band's skill etc. Craig Finn's vocals etc.
或者,如果您希望每行之间有一行:
...
output = item['text'].replace('nn','n')
...
输出:
There's something reassuring etc.
The band's skill etc.
Craig Finn's vocals etc.
# One extra blank line here