小贝子编程

使用 Dask 访问大型已发布数组中的单个元素

本文关键字：数组元素单个访问 Dask 大型布数组使用 dask dask-delayed dask-distributed
更新时间 : 2023-09-14
英文 : Access a single element in large published array with Dask

有没有更快的方法可以使用 Dask 只检索大型已发布数组中的单个元素而不检索整个数组？

在下面的示例中，client.get_dataset('array1'([0] 花费的时间与 client.get_dataset('array1'( 大致相同。

import distributed
client = distributed.Client()
data = [1]*10000000
payload = {'array1': data}
client.publish(**payload)
one_element = client.get_dataset('array1')[0]

请注意，您发布的任何内容都会转到调度程序，而不是工作线程，因此这效率低下。发布旨在与 dask.array 等 Dask 集合一起使用。

客户端 1

import dask.array as da
x = da.ones(10000000, chunks=(100000,))  # 1e7 size array cut into 1e5 size chunks
x = x.persist()  # persist array on the workers of the cluster
client.publish(x=x)  # store the metadata of x on the scheduler

客户端 2

x = client.get_dataset('x')  # get the lazy collection x
x[0].compute()  # this selection happens on the worker, only the result comes down

使用 Dask 访问大型已发布数组中的单个元素

客户端 1

客户端 2

相关内容

最新更新

热门标签：