如何使用xarray数据集实现numpy索引

我知道2D数组的x和y索引(numpy索引(。

根据本文档，xarray使用例如Fortran风格的索引。

所以当我通过例如

ind_x = [1, 2]
ind_y = [3, 4]

我期望索引对(1,3(和(2,4(有2个值，但xarray返回一个2x2矩阵。

现在我想知道如何使用xarray实现类似numpy的索引？

注意：我想避免将整个数据加载到内存中。因此，使用.valuesApi并不是我想要的解决方案的一部分。

您可以访问底层numpy数组来直接对其进行索引：

import xarray as xr
x = xr.tutorial.load_dataset("air_temperature")
ind_x = [1, 2]
ind_y = [3, 4]
print(x.air.data[0, ind_y, ind_x].shape)
# (2,)

编辑：

假设您的数据位于支持dask的xarray中，并且不想将所有数据加载到内存中，则需要在xarray数据对象后面的dask阵列上使用vindex：

import xarray as xr
# simple chunk to convert to dask array
x = xr.tutorial.load_dataset("air_temperature").chunk({"time":1})
extract = x.air.data.vindex[0, ind_y, ind_x]
print(extract.shape)
# (2,)
print(extract.compute())
# [267.1, 274.1], dtype=float32)

为了考虑速度，我用不同的方法进行了测试。

def method_1(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
d = Dataset(file, 'r')
data.append(d.variables['hrv'][indices])
d.close()
return data

def method_2(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
data.append(xarray.open_dataset(file, engine='h5netcdf').hrv.values[indices])
return data

def method_3(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
data.append(xarray.open_mfdataset([file], engine='h5netcdf').hrv.data.vindex[indices].compute())
return data

In [1]: len(file_paths)
Out[1]: 4813

结果：

方法_1(使用netcdf4库(：101.9
方法2(使用xarray和值API(：591.4s
方法3(使用xarray+dask(：688.7s

我猜xarray+dask在.compute步骤中花费了很多时间。

相关内容

最新更新

热门标签：