尝试了解如何使用ray
正确编程。
下面的结果似乎与这里解释的ray
的性能改进不一致。
环境:
- Python版本:3.6.10
- 射线版本:0.7.4
以下是机器规格:
>>> import psutil
>>> psutil.cpu_count(logical=False)
4
>>> psutil.cpu_count(logical=True)
8
>>> mem = psutil.virtual_memory()
>>> mem.total
33707012096 # 32 GB
首先,使用Queue
(multipc_function.py(的传统python多处理:
import time
from multiprocessing import Process, Queue
N_PARALLEL = 8
N_LIST_ITEMS = int(1e8)
def loop(n, nums, q):
print(f"n = {n}")
s = 0
start = time.perf_counter()
for e in nums:
s += e
t_taken = round(time.perf_counter() - start, 2)
q.put((n, s, t_taken))
if __name__ == '__main__':
results = []
nums = list(range(N_LIST_ITEMS))
q = Queue()
procs = []
for i in range(N_PARALLEL):
procs.append(Process(target=loop, args=(i, nums, q)))
for proc in procs:
proc.start()
for proc in procs:
n, s, t_taken = q.get()
results.append((n, s, t_taken))
for proc in procs:
proc.join()
for r in results:
print(r)
结果是:
$ time python multiproc_function.py
n = 0
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
(0, 4999999950000000, 11.12)
(1, 4999999950000000, 11.14)
(2, 4999999950000000, 11.1)
(3, 4999999950000000, 11.23)
(4, 4999999950000000, 11.2)
(6, 4999999950000000, 11.22)
(7, 4999999950000000, 11.24)
(5, 4999999950000000, 11.54)
real 0m19.156s
user 1m13.614s
sys 0m24.496s
在运行过程中检查htop
时,内存从2.6 GB的基本消耗量变为8 GB,并且所有8个处理器都已完全消耗。此外,从user+sys
>real
可以清楚地看出,并行处理正在发生。
这是射线测试代码(ray_test.py(:
import time
import psutil
import ray
N_PARALLEL = 8
N_LIST_ITEMS = int(1e8)
use_logical_cores = False
num_cpus = psutil.cpu_count(logical=use_logical_cores)
if use_logical_cores:
print(f"Setting num_cpus to # logical cores = {num_cpus}")
else:
print(f"Setting num_cpus to # physical cores = {num_cpus}")
ray.init(num_cpus=num_cpus)
@ray.remote
def loop(nums, n):
print(f"n = {n}")
s = 0
start = time.perf_counter()
for e in nums:
s += e
t_taken = round(time.perf_counter() - start, 2)
return (n, s, t_taken)
if __name__ == '__main__':
nums = list(range(N_LIST_ITEMS))
list_id = ray.put(nums)
results = ray.get([loop.remote(list_id, i) for i in range(N_PARALLEL)])
for r in results:
print(r)
结果是:
$ time python ray_test.py
Setting num_cpus to # physical cores = 4
2020-04-28 16:52:51,419 INFO resource_spec.py:205 -- Starting Ray with 18.16 GiB memory available for workers and up to 9.11 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
(pid=78483) n = 2
(pid=78485) n = 1
(pid=78484) n = 3
(pid=78486) n = 0
(pid=78484) n = 4
(pid=78483) n = 5
(pid=78485) n = 6
(pid=78486) n = 7
(0, 4999999950000000, 5.12)
(1, 4999999950000000, 5.02)
(2, 4999999950000000, 4.8)
(3, 4999999950000000, 4.43)
(4, 4999999950000000, 4.64)
(5, 4999999950000000, 4.61)
(6, 4999999950000000, 4.84)
(7, 4999999950000000, 4.99)
real 0m45.082s
user 0m22.163s
sys 0m10.213s
real
的时间比python多处理的时间长得多。此外,real
大于user+sys
。当检查htop
时,内存高达30GB,内核也没有完全饱和。所有这些似乎都与ray
应该做的事情相矛盾
然后我将use_logical_cores
设置为True
。由于内存不足,运行被终止:
$ time python ray_test.py
Setting num_cpus to # logical cores = 8
2020-04-28 16:27:43,709 INFO resource_spec.py:205 -- Starting Ray with 17.29 GiB memory available for workers and up to 8.65 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
Killed
real 0m25.205s
user 0m15.056s
sys 0m4.028s
我是不是做错了什么?
首先,Ray不保证CPU亲和性或资源隔离。这可能是它CPU使用率不饱和的原因。(不过我不能百分之百确定(。您可以尝试使用psutil设置cpu相关性,看看内核是否仍未饱和。(https://psutil.readthedocs.io/en/latest/#psutil.Process.cpu_affinity)。
关于结果,你介意试试最新版本的雷吗?在表现上有相当好的进步;从0.7.4版本开始在Ray中进行内存管理。