我有一个大的csv文件,我正在用块读取它。在进程的中间,内存已经满了,所以我想从它离开的地方重新启动。我知道哪个区块,但不知道如何直接进入那个区块。
这就是我尝试过的。
# data is the txt file
reader = pd.read_csv(data ,
delimiter = "t",
chunksize = 1000
)
# Please see the code below. When my last process broke, i was 154 so I think it should
# start from 154000th line. This time I don't
# plan to read whole file at once so I have an
# end point at 160000
first = 154*1000
last = 160*1000
output_path = 'usa_hotspot_data_' + str(first) + '_' + str(last) + '.csv'
print("Output file: ", output_path)
try:
os.remove(output_path)
except OSError:
pass
# Read chunks and save to a new csv
for i,chunk in enumerate(reader):
if (i >= first and i<=last) :
< -- here I do something -- >
# Progress Bar to keep track
if (i% 1000 == 0):
print("#", end ='')
然而,这需要很多时间才能到达我想去的第I条线。我怎么能跳过之前的阅读片段,直接去那里呢?
pandas.read_csv
skiprows:要跳过的行号(0索引(或要跳过的行数(int(。
您可以将此skiprows传递给read_csv
,它将起到偏移的作用。