内核操作的Xarray数据集中的重叠块

我尝试使用自定义过滤器在大型卫星图像上运行9x9像素内核。一个卫星场景有~ 40 GB，为了适应我的RAM，我使用xarrays选项将我的数据集与dask块。

我的过滤器包括检查内核是否完整(即在图像边缘不丢失数据)。在这种情况下，返回NaN以防止潜在的偏差(我并不真正关心边)。我现在意识到，这不仅在图像的边缘(预期行为)引入了nan，而且在每个块的边缘也引入了nan，因为块不重叠。dask提供了创建重叠块的选项，但是在xarray中有任何类似的功能吗?我发现了这个问题，但似乎没有任何进展。

一些示例代码(原始代码的缩短版本):


import numpy as np
import numba
import math
import xarray as xr

@numba.jit("f4[:,:](f4[:,:],i4)", nopython = True)
def water_anomaly_filter(input_arr, window_size = 9):
# check if window size is odd
if window_size%2 == 0:
raise ValueError("Window size must be odd!")

# prepare an output array with NaNs and the same dtype as the input
output_arr = np.zeros_like(input_arr)
output_arr[:] = np.nan

# calculate how many pixels in x and y direction around the center pixel
# are in the kernel
pix_dist = math.floor(window_size/2-0.5)

# create a dummy weight matrix
weights = np.ones((window_size, window_size))

# get the shape of the input array
xn,yn = input_arr.shape

# iterate over the x axis
for x in range(xn):
# determine limits of the kernel in x direction
xmin = max(0, x - pix_dist)
xmax = min(xn, x + pix_dist+1)

# iterate over the y axis
for y in range(yn):
# determine limits of the kernel in y direction
ymin = max(0, y - pix_dist)
ymax = min(yn, y + pix_dist+1)
# extract data values inside the kernel
kernel = input_arr[xmin:xmax, ymin:ymax]

# if the kernel is complete (i.e. not at image edge...) and it
# is not all NaN
if kernel.shape == weights.shape and not np.isnan(kernel).all():
# apply the filter. In this example simply keep the original
# value
output_arr[x,y] = input_arr[x,y]

return output_arr
def run_water_anomaly_filter_xr(xds, var_prefix = "band", 
window_size = 9):

variables = [x for x in list(xds.variables) if x.startswith(var_prefix)]

for var in variables[:2]:
xds[var].values = water_anomaly_filter(xds[var].values, 
window_size = window_size)

return xds
def create_test_nc():
data = np.random.randn(1000, 1000).astype(np.float32)
rows = np.arange(54, 55, 0.001)
cols = np.arange(10, 11, 0.001)
ds = xr.Dataset(
data_vars=dict(
band_1=(["x", "y"], data)
),
coords=dict(
lon=(["x"], rows),
lat=(["y"], cols),
),
attrs=dict(description="Testdata"),
)
ds.to_netcdf("test.nc")
if __name__ == "__main__":
# if required, create test data
create_test_nc()

# import data
with xr.open_dataset("test.nc",
chunks = {"x": 50, 
"y": 50},

) as xds:   
xds_2 = xr.map_blocks(run_water_anomaly_filter_xr, 
xds,
template = xds).compute()
xds_2["band_1"][:200,:200].plot()

这个收益率:输入图片描述

您可以清楚地看到每个块边缘的nan行和列。

我很高兴听到任何建议。我很想在xarray中得到重叠的块(或任何其他解决方案)，但我也对其他解决方案持开放态度。

您可以使用Dask的map_blocks如下:

arr = dask.array.map_overlap(
water_anomaly_filter, xds.band_1.data, dtype='f4', depth=4, window_size=9
).compute()
da = xr.DataArray(arr, dims=xds.band_1.dims, coords=xds.band_1.coords)

请注意，您可能希望为您的特定应用程序调优depth和window_size。

相关内容

最新更新

热门标签：