使用for line in和.append in循环重建、填充.csv中的空白



请帮帮我。我不明白为什么要这样做

要解决的问题:csv文件必须包含1sec数据,如

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

,但它没有。有些秒不存在,所以我用最后看到的数据基本上是一行:

sep = ','    
data = line.split(sep)

和data[1]到data[9]保持不变,只有data[0]变化+ 1秒,以填补空白:

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:01,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:02,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:03,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:06,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

我做了这个逻辑,只是。append让我耍花招…输出是针对源CSV文件中的每一行,它在输出CSV文件中产生相同数量的行,但所有记录都是相同的,源文件的最后一行,f,:

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

whyyyyyy,下面是代码:

import glob
import datetime
import time
import pandas as pd
# make files sync on sec's
filenames = [i for i in glob.glob("*unique_sorted.csv")]
for filename in filenames:
coin_name = filename[0:18] 
print(filename)
with open (filename, "r") as f:                     
x = -1
memory = {}
memory["data"] = {}
memory["data"]["time"] = {}

new_file = []
tolist = {}
tolist["total"] = {}
tolist["memory"] = {}
sep = ','
start = 1
for line in f:
data = line.split(sep)
if data[0] != "time":
x = x + 1
if start == 0 and data[0][0:19] != sec_1_more:
memory_time = datetime.datetime.strptime(tolist["memory"]['time'], "%Y-%m-%d %H:%M:%S") # data[0] from previous line
read_line_time = datetime.datetime.strptime(data[0][0:19], "%Y-%m-%d %H:%M:%S") # current_line
diff = read_line_time - sec_1_more
diff_sec = diff.total_seconds()
sec = int(diff_sec)
i = 1
while i < sec:
time_for_same_data = memory_time + datetime.timedelta(seconds=1) # 2:00:00 + 1 second
time_for_same_data_str = str(time_for_same_data)
#2022-04-03 02:00:04,1.4073,1.4073,1.4071,1.4072,1.375,1.4137077251573131,1.4242302135495926,1.4304935994973778,1.437633859477853
tolist["total"]['time'] = time_for_same_data_str
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])
memory_time = memory_time + datetime.timedelta(seconds=1)
i = i + 1
tolist["total"]['time'] = data[0]
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])

elif start == 0 and data[0][0:19] == sec_1_more:
tolist["total"]['time'] = data[0]
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])

memory["data"]["data"] = str(line)
memory["data"]["time"] = str(data[0][0:19]) 

tolist["memory"]['time'] = data[0]
tolist["memory"]['open'] = data[1]
tolist["memory"]['high'] = data[2]
tolist["memory"]['low'] = data[3]
tolist["memory"]['close'] = data[4]
tolist["memory"]['Extremum'] = data[5]
tolist["memory"]['Fib 1'] = data[6]
tolist["memory"]['Fib 2'] = data[7]
tolist["memory"]['Fib 3'] = data[8]
tolist["memory"]['l100'] = data[9].strip()

#t = "2022-04-03 02:00:04"
t = datetime.datetime.strptime(memory["data"]["time"], "%Y-%m-%d %H:%M:%S")
#sec_1_more = (t + datetime.timedelta(seconds=1)).strftime("%Y-%m-%d %H:%M:%S")
#or
sec_1_more = t + datetime.timedelta(seconds=1)

if start == 1:
new_file.append(tolist["memory"])

start = 0

if x == 10:
#print(new_file)
#quit()
break # for test to see only 10 first
f.close() # needed      
csvData = pd.DataFrame(new_file)
csvData.to_csv(coin_name+"_unique_sorted_synced.csv", mode="w", index=False)                            
quit() # coz just one file processing for testing

看起来您可以直接读取数据,并使用asfreq:


# instead of read_clipboard, you'd read it with pd.read_csv
df = pd.read_clipboard(sep=',', parse_dates = ['time']) 
df.set_index('time').asfreq(freq='1S').ffill()
open   high    low  close  Extremum     Fib 1     Fib 2     Fib 3     l100
time
2022-04-03 02:00:00  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:01  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:02  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:03  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:04  3.294  3.295  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:05  3.293  3.293  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:06  3.293  3.293  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:07  3.293  3.293  3.293  3.293     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:08  3.293  3.293  3.293  3.293     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:09  3.292  3.292  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745

相关内容

  • 没有找到相关文章

最新更新