python读取x行,然后处理它们并再读取y行



我有一个2列的tsv文件,类似于(实际的长得多):

<表类> ntxt 评论 tbody><<tr>0001空间delim字符串10001space delim string 20001space delim string 30001space delim string 40001space delim string 50002space delim string 60002space delim string 70003space delim string 80003space delim string 90003space delim string 100003space delim string 11

试试DefaultDict from Collections.

from collections import defaultdict
new_data = defaultdict(list)
with open('readme.txt') as f:
heading = f.readline()
lines = [line.strip().split("t") for line in f]
[new_data[i[0]].append(i[1]) for i in lines]
for i, j in new_data.items():
print(i, ','.join(j))

这会给你下面的输出

0001 space delim string 1,space delim string 2,space delim string 3,space delim string 4,space delim string 5
0002 space delim string 6,space delim string 7
0003 space delim string 8,space delim string 9,space delim string 10,space delim string 11

应该可以了。这基本上是一个"报告撰写者"。具有一级分组的模式。

col1 = ''
columns = []
with open('x.txt', 'r') as f:
for line in f:
parts = line.strip().split('t')
if parts[0] != col1:
if col1:
print(col1+'t'+(', '.join(columns)))
col1 = parts[0]
columns = []
columns.append( parts[1] )
if col1:
print(col1+'t'+(', '.join(columns)))

输入:

0001    space delim string 1
0001    space delim string 2
0001    space delim string 3
0001    space delim string 4
0001    space delim string 5
0002    space delim string 6
0002    space delim string 7
0002    space delim string 8
0002    space delim string 9
0003    space delim string 10
0003    space delim string 11
0003    space delim string 12
0003    space delim string 13

这产生:

0001    space delim string 1, space delim string 2, space delim string 3, space delim string 4, space delim string 5
0002    space delim string 6, space delim string 7, space delim string 8, space delim string 9
0003    space delim string 10, space delim string 11, space delim string 12, space delim string 13

最新更新