使用Python反转.csv文件的一部分



我有一个csv文件,看起来像这样:

index label observation groundTruth
0     1     10.00       0
1     3      5.50       0
2     1     18.90       1
---------------------------
3     1     12.00       1
4     3     23.68       0
5     1     21.45       0
6     3      6.57       1
7     1     10.00       1

这些数据表示时间序列观测,其中每个链的集合长度为5。因为不是所有的观察链默认都是5长,所以添加了一些填充来人为地增加长度,使用下面的代码来获得这个文件:

index label observation groundTruth
0     1     10.00       0
1     3      5.50       0
2     1     18.90       1
3     0        0        0
4     0        0        0
--------------------------
5     1     12.00       1
6     3     23.68       0
7     1     21.45       0
8     3      6.57       1
9     1     10.00       1

这是代码:

line = [0,0,0]
with open(input_file, 'r') as inp, open(output_file, 'a') as out:
writer = csv.writer(out)
reader = csv.reader(inp)
counter = 0

for row in reader:
counter += 1
if(row[0]=='s' and counter<6):
while(counter<6):
writer.writerow(line)
counter+=1
counter=0
else:
writer.writerow(row)

我的问题是,这个填充需要在每个序列的开始,而不是结束。

我需要的是像这样的文件:

index label observation groundTruth
0     0        0        0
1     0        0        0
2     1     10.00       0
3     3      5.50       0
4     1     18.90       1
--------------------------
5     1     12.00       1
6     3     23.68       0
7     1     21.45       0
8     3      6.57       1
9     1     10.00       1

我试着简单地反转输出csv文件,像这样:

with open('data/test.csv', 'r') as inp, open('data/test_reverse.csv', 'a') as out:
writer = csv.writer(out)
reader = csv.reader(inp)

for row in reversed(list(reader)):
writer.writerow(row)

,但这会反转整个时间序列,再次产生我不想要的不合理数据:

index label observation groundTruth
0     0        0        0
1     0        0        0
2     1     18.90       1
3     3      5.50       0
4     1     10.00       0
--------------------------
5     1     10.00       1
6     3      6.57       1
7     1     21.45       0
8     3     23.68       0
9     1     12.00       1

你知道怎么做吗?

注意:---不是我的。csv的一部分,它只是帮助使问题更清楚。

注2:可以可靠地检测到填充行,因为label 0不是数据中自然出现的。(如果这有助于解决问题)。

如果所有的观测值的长度为5,那么你可以使用下一个例子如何移动所有的行label="0"前:

import csv
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)

with open("data.csv", "r") as f_in, open("out.csv", "w") as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
# write headers to output
writer.writerow(next(reader))
for rows in grouper(reader, 5):
# save index column
index_column, *_ = zip(*rows)
# move rows with label=="0" to front:
rows = sorted(rows, key=lambda k: k[1] != "0")
# correct index column
for i, r in zip(index_column, rows):
r[0] = i
# write to csv file
writer.writerows(rows)

out.csv:

index,label,observation,groundTruth
0,0,0,0
1,0,0,0
2,1,10.00,0
3,3,5.50,0
4,1,18.90,1
5,1,12.00,1
6,3,23.68,0
7,1,21.45,0
8,3,6.57,1
9,1,10.00,1

如何修改原始程序来正确编写填充?

(我使用Python 3.10)

import csv
from typing import Any
Rows = list[list[Any]]

def pad_rows(rows: Rows) -> Rows:
max_rows = 6
n_rows = len(rows)
if n_rows >= max_rows:
return rows
pad_n = max_rows - n_rows
pad = [[0, 0, 0]] * pad_n
return rows + pad

with (
open("input.csv", newline="") as f_in,  # the csv module docs recommend newline=""
open("output.csv", "w", newline="") as f_out,  # I changed "a" to "w" for my dev/testing
):
reader = csv.reader(f_in)
writer = csv.writer(f_out)
writer.writerow(next(reader))  # header
series: Rows = []
for row in reader:
if row[0] == "s" and series != []:
writer.writerows(pad_rows(series))
series = []
continue
series.append(row)
# Write final series if "s" (break) wasn't the last non-empty row
if series != []:
writer.writerows(pad_rows(series))

实际上,这产生了原始的,不需要的输出:

| label | observation | groundTruth |
|-------|-------------|-------------|
| 1     | 10.00       | 0           |
| 3     | 5.50        | 0           |
| 1     | 18.90       | 1           |
| 0     | 0           | 0           |
| 0     | 0           | 0           |
| 1     | 12.00       | 1           |
| 3     | 23.68       | 0           |
| 1     | 21.45       | 0           |
| 3     | 6.57        | 1           |
| 1     | 10.00       | 1           |

我相信你可以找到一行修改,使它工作的方式你想要的。(提示:它在pad_rows函数中)

相关内容

  • 没有找到相关文章

最新更新