如何使用多进程/多线程读取csv文件并将其存储在生成的新变量中?


  • 我有一个文件名列表,并使用它来生成字符串,该字符串将成为存储数据帧的新变量。
  • 它下面的代码不起作用。
def filename(name):
filename = f'{name}.csv'
return pd.read_csv(filename(name))
with concurrent.futures.ProcessPoolExecutor() as executor:
files = [
'20190702',
'20190703',
'20190708',
]
# list of stings which will be new variable names
name_list = ['df_' + i.split('2019')[1] for i in files]
# list to store new variables
executor_list = []
for i in range(len(files)):
name = name_list[i]
dataframe = executor.submit(filename, files[i])
exec(f"{name} = {dataframe}") # Some error here!
exec(f"executor_list.append({name})")
for i in executor_list:
exec(f"{i} = {i.result()}")

我在colab中运行了这个,但出现此错误:

File "<string>", line 1
df_0702 = <Future at 0x7f0e5b8cc3c8 state=running>
^
SyntaxError: invalid syntax

您不需要使用 ProcessPoolExecutor,因为您的操作是 I/O 绑定的。生成线程比进程便宜。因此,您可以改用 ThreadPoolExecutor。

def filename(name):
filename = f'{name}.csv'
return filename, pd.read_csv(filename(name))
files = ['20190702', '20190703', '20190708']
futures = []
with concurrent.futures.ThreadPoolExecutor() as executor:
for i, filename in enumerate(files):
vname = 'df_' + filename.split('2019')[1]
filename = filename + '.csv'
futures.append(executor.submit(pd.read_csv, filename))
results = [f.result() for f in futures]

executor.submit返回Future对象。所以你应该从未来对象中检索结果

files = ['20190702', '20190703', '20190708']
futures = {}
with concurrent.futures.ProcessPoolExecutor() as executor:
for filename in files:
vname = 'df_' + filename.split('2019')[1]
filename = filename + '.csv'
future = executor.submit(pd.read_csv, filename)
futures[vname] = future
for vname, f in futures.items():
dataframe = f.result()
# do something with vname and dataframe

另外,除了用于调试/测试目的外,切勿使用execeval函数。它们使您的代码不安全且难以调试。

最新更新