我有一个python脚本,它出去拉出一个巨大的JSON数据块,然后迭代它来构建2个列表
# Get all price data
response = c.get_price_history_every_minute(symbol)
# Build prices list
prices = list()
for i in range (len(response.json()["candles"])):
prices.append (response.json()["candles"][i]["prices"])
# Build times list
times = list()
for i in range (len(response.json()["candles"])):
times.append (response.json()["candles"][i]["datetime"])
这工作得很好,但需要很长时间才能拉入所有数据并构建列表。我正在做一些测试,试图构建一个复杂的脚本,并希望将这两个列表保存到两个文件中,然后从这些文件中导入数据,并在运行后续测试时重新创建列表,以跳过生成,迭代和解析JSON。
我一直在尝试以下方法:
# Write Price to a File
a_file = open("prices7.txt", "w")
content = str(prices)
a_file.write(content)
a_file.close()
然后在以后的脚本中:
# Load Prices from File
prices_test = array('d')
a_file = open("prices7.txt", "r")
prices_test = a_file.read()
从我的json列表的输出和加载到从文件输出创建的列表的数据看起来是一样的,但是当我试图做任何从文件加载的数据是垃圾…
print (prices)
{The output looks like this} [69.73, 69.72, 69.64, ... 69.85, 69.82, etc]
print (prices_test)
The output looks identical
如果我运行一个简单的查询,如:
print (prices[1], prices[2])
I get the expected output {69.73, 69.72]
如果我对从文件创建的列表做同样的操作:
print (prices_test[1], prices_test[2])
I get the output ( [,6 )
它单独提取字符串中的每个字符,而不是像我期望的那样使用逗号分隔值…
我已经用谷歌搜索了我能想到的每一个搜索词组合,所以任何帮助都会非常感激!!
我以前必须做这样的事情。我用pickle来做。
import pickle
def pickle_the_data(pickle_name, list_to_pickle):
"""This function pickles a given list.
Args:
pickle_name (str): name of the resulting pickle.
list_to_pickle (list): list that you need to pickle
"""
with open(pickle_name +'.pickle', 'wb') as pikd:
pickle.dump(list_to_pickle, pikd)
file_name = pickle_name + '.pickle'
print(f'{file_name}: Created.')
def unpickle_the_data(pickle_file_name):
"""This will unpickle a pickled file
Args:
pickle_file_name (str): file name of the pickle
Returns:
list: when we pass a pickled list, it will return an
unpickled list.
"""
with open(pickle_file_name, 'rb') as pk_file:
unpickleddata = pickle.load(pk_file)
return unpickleddata
所以首先pickle你的列表pickle_the_data(name_for_pickle, your_list)
然后当你需要加载列表unpickle_the_data(name_of_your_pickle_file)
这就是我试图在评论部分解释的内容。注意,我将response.json()
替换为jsonData
,成功地将其从每个for循环中取出,并将两个循环减少到一个循环中以提高效率。现在代码应该运行得更快了。
import json
def saveData(filename, data):
# Convert Data to a JSON String
data = json.dumps(data)
# Open the file, then save it
try:
file = open(filename, "wt")
except:
print("Failed to save the file.")
return False
else:
file.write(data)
file.close()
return True
def loadData(filename):
# Open the file, then load its contents
try:
file = open(filename, "rt")
except:
print("Failed to load the file.")
return None
else:
data = file.read()
file.close()
# Data is a JSON string, so now we convert it back
# to a Python Structure:
data = json.loads(data)
return data
# Get all price data
response = c.get_price_history_every_minute(symbol)
jsonData = response.json()
# Build prices and times list:
#
# As you're iterating over the same "candles" index on both loops
# when building those two lists, just reduce it to a single loop
prices = list()
times = list()
for i in range(len(jsonData["candles"])):
prices.append(jsonData["candles"][i]["prices"])
times.append(jsonData["candles"][i]["datetime"])
# Now, when you need, just save each list like this:
saveData("prices_list.json", prices)
saveData("times_list.json", times)
# And retrieve them back when you need it later:
prices = loadData("prices_list.json")
times = loadData("times_list.json")
顺便说一句,pickle做同样的事情,但它使用二进制数据而不是json,这可能是更快的保存/加载数据。我不知道,我没有测试过。
在json中,你有可读性的优势,因为你可以打开每个文件并直接读取它,如果你能理解json语法。