Python非阻止写CSV文件

我正在编写一些python代码来进行一些计算并将结果写入文件。这是我当前的代码：

for name, group in data.groupby('Date'):
    df = lot_of_numpy_calculations(group)
    with open('result.csv', 'a') as f:
        df.to_csv(f, header=False, index=False)

有时计算和写入。我读了一些有关python中异步的文章，但我不知道该如何实施。是否有一种简单的方法来优化此循环，以免等待直到写作完成并开始下一个迭代？

由于numpy和pandas io都不知道，因此对于线程而言，这可能是比asyncio更好的用例。(此外，基于异步的解决方案无论如何都会在幕后使用线程。(

例如，此代码催生了您使用队列提交作品的作者线程：

import threading, queue
to_write = queue.Queue()
def writer():
    # Call to_write.get() until it returns None
    for df in iter(to_write.get, None):
        with open('result.csv', 'a') as f:
            df.to_csv(f, header=False, index=False)
threading.Thread(target=writer).start()
for name, group in data.groupby('Date'):
    df = lot_of_numpy_calculations(group)
    to_write.put(df)
# enqueue None to instruct the writer thread to exit
to_write.put(None)

请注意，如果写作始终比计算要慢，则队列将不断累积数据帧，这可能最终会消耗大量内存。在这种情况下，请确保通过将maxsize参数传递给构造函数为队列提供最大大小。

另外，请考虑重新打开每个写入文件可以减慢写作。如果书面的数据量很小，也许您可以通过事先打开文件来获得更好的性能。

，由于大多数操作系统不支持异步文件I/O，因此通用的跨平台方法现在是使用线程。

例如，AIOFILES模块包装线程池为Asyncio提供文件I/O API。

相关内容

最新更新

热门标签：