如何在Linux上运行单个Python脚本的最佳实例

我有一个脚本，该脚本正在大约1200个不同的文件上执行独立任务。它循环遍历每个文件，并检查它是否已经完成或正在进行中用文件执行任务。这遵循以下一般概述：

myScript.py：

for file in directory:
    fileStatus = getFileStatus(file)
    if fileStatus != 'Complete' and fileStatus != 'inProgress':
        setFileStatus(file, 'inProgress')
        doTask(file)
        setFileStatus(file, 'Complete')

doTask()在我的机器上需要20-40分钟，并且将从最小的RAM需求开始到中间约8GB，然后回到最低要求。根据文件的不同，这将在可变的时间内发生。

我想与自身并行运行此脚本，以便使用我的计算机资源的最大量在最短的时间内完成所有任务。假设(无知(限制资源是RAM(我的计算机具有64GB(，并且这些脚本都将同时具有峰值RAM消耗，我可以以以下方式模仿对这个问题的回答：

python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py &

但是，我想我可以根据每个过程的执行方式更适合。

是否有一种方法可以动态确定我拥有多少资源，并相应地创建，销毁或暂停该脚本的实例，以便机器在时间上以最大的效率工作？我想避免对myScript进行更改，而是从另一个可以处理创建，破坏和暂停的另一个更改。

gnu Parallel构建用于做：

的事情

python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py & 
python myScript.py &

它还具有一些可以执行资源限制的功能。但是，找到最佳数字确实很难：

每个作业运行20-40分钟(如果修复了，这将更容易(
具有像山一样的RAM用法信封(如果整个跑步整个级别都保持在相同的水平，那将更容易(

如果64 GB RAM是限制资源，那么运行8个工作总是安全的：

cat filelist | parallel -j8 python myScript.py

如果您有足够的CPU功率，并且愿意冒险浪费一些CPU，那么如果没有8 GB的内存，您可以在3分钟前开始工作(假设工作达到顶峰，就可以开始工作内存使用3-5分钟内(。如果免费内存低于4 GB，则GNU平行将杀死最新的工作并将其放回队列中：

cat filelist | parallel -j0 --memlimit 8G --delay 300 python myScript.py

update ：
感谢您进一步澄清。
但是，有了您刚才提到的要求和方法，您将最终重新发明多线程。我建议您避免使用多个脚本呼叫，并在循环内部控制所有脚本(就像我的原始响应中的一个一样(。
您可能正在寻找进程的内存使用情况(如此(。
可能在这里帮助您的一个特定组件是设置各个任务的优先级(此处提到(。
您可能会发现此链接对于安排任务的优先级特别有用。
INFACT，我建议在此处使用螺纹2个软件包，因为它具有优先控制的内置功能。

原始响应：
由于您大致确定了哪些部分需要多少内存，因此您可以轻松地使用多线程。

import threading
thread1 = threading.Thread(target=process1 , args=(yourArg1,)) # process1 takes 1 GB
thread2 = threading.Thread(target=process2 , args=(yourArg1,)) # process2 takes 1 GB
threadList1 = [thread1,thread2]
thread3 = threading.Thread(target=process3 , args=(yourArg1,)) # process3 takes 0.5 GB
thread4 = threading.Thread(target=process4 , args=(yourArg1,)) # process4 takes 0.5 GB
threadList2 = [thread3,thread4]

# Batch1 : 
for thread in threadList1:
    thread.start()
for thread in threadList1:
    thread.join()

# Batch2 : 
for thread in threadList2:
    thread.start()
for thread in threadList2:
    thread.join()

相关内容

最新更新

热门标签：