我有点沮丧,因为我找不到解决我的问题的方法,这在r中用包gapfill似乎很容易做到,但在python中更难。
我的问题是:我有一个三维数组,维度是纬度、经度和时间。我想要的是在每个光栅/阵列中插入nan值(由云和其他失真引起(。nan值形成块(由于云(,有时相对较大。我的想法是不仅对每个时间步长的相邻像素进行插值,还对前后的时间步长进行插值(假设前几天和后几天的像素具有实际相似的值,因为陆地覆盖率变化不太快(。我的目标是用相同的像素位置做一个随时间的线性插值。(前后有多少时间步长也是我不确定如何在interpn函数中定义的问题?(
我找到了不同的选择来做这件事,但还没有奏效。我发现最有前途的方法是从带有interpole.interpn函数的软件包scipy中找到的。此函数使用的是numpy数组,而不是xarray。我的尝试:
#change from xarray to numpy
my array_np = my array.to_numpy()
# lable dimensions (what is done when building a numpy with meshgrid)
x = array_np [0]
y = array_np [1]
z = array_np [2]
#get index of nan values
nanIndex= np.isnan(array_np ).nonzero()
nanIndex
#name dimensions of nan values
xc= nanIndex[0]
yc= nanIndex[1]
zc= nanIndex[2]
# For using the scipy interpolate. interpn function:
# points = the regular grid - in my case x,y,z
# values = the data on the regular grid - in my case my array (my_array_np)
# point_nan = the point that is evaluate in the 3D grid - in my case xc, y,c, zy
points = (x, y, z) # dimensions
points_nan = (xc, yc, zc) #nandimensions
print(interpolate.interpn(points, my_array_np, points_nan))
我现在得到的错误是:
"The points in dimension 0 must be strictly ascending"
我哪里错了?感谢您的提前帮助!如果你有其他解决方案,也解决了我的问题,我也很乐意帮助!
这是我的数组的样子:
array([[[ nan, nan, nan, ..., 279.64 , 282.16998,
279.66998],
[277.62 , 277.52 , 277.88 , ..., 281.75998, 281.72 ,
281.66 ],
[277.38 , 277.75 , 277.88998, ..., 281.75998, 281.75998,
280.91998],
...,
[ nan, nan, nan, ..., 280.72998, 280.33 ,
280.94 ],
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan]],
[[ nan, nan, nan, ..., 272.22 , 271.54 ,
271.02 ],
[280.02 , 280.44998, 281.18 , ..., 271.47998, 271.88 ,
272.03 ],
[280.32 , 281. , 281.27 , ..., 270.83 , 271.58 ,
272.03 ],
...,
[ nan, nan, nan, ..., 290.34 , 290.25 ,
288.365 ],
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan]],
[[ nan, nan, nan, ..., nan, nan,
nan],
[276.44998, 276.19998, 276.19 , ..., nan, nan,
nan],
[276.50998, 276.79 , 276.58 , ..., nan, nan,
nan],
...,
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan]],
...,
[[ nan, nan, nan, ..., 276.38998, 276.44 ,
275.72998],
[ nan, nan, nan, ..., 276.55 , 276.81 ,
276.72998],
[ nan, nan, nan, ..., 279.74 , 277.11 ,
276.97 ],
...,
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan]],
[[ nan, nan, nan, ..., 277.38 , 278.08 ,
277.79 ],
[279.66998, 280.00998, 283.13 , ..., 277.34 , 277.41998,
277.62 ],
[ nan, 277.41 , 277.41 , ..., 277.825 , 277.31 ,
277.52 ],
...,
[ nan, nan, nan, ..., 276.52 , nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan]],
[[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan],
...,
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan],
[ nan, nan, nan, ..., nan, nan,
nan]]], dtype=float32)
interpn
不能用于填充规则网格中的间隙-interpn是将完整的规则网格(没有间隙(插值到不同坐标的快速方法。
为了用N维插值填充缺失值,对非结构化N维数据使用scipy插值方法。
由于您正在将插值到常规网格,我将演示scipy.interpolate.griddata
:的使用
import xarray as xr, pandas as pd, numpy as np, scipy.interpolate
# create dummy data
x = y = z = np.linspace(0, 1, 5)
da = xr.DataArray(
np.sin(x).reshape(-1, 1, 1) * np.cos(y).reshape(1, -1, 1) + z.reshape(1, 1, -1),
dims=['x', 'y', 'z'],
coords=[x, y, z],
)
# randomly fill with NaNs
da = da.where(np.random.random(size=da.shape) > 0.1)
这看起来像下面的
In [11]: da
Out[11]:
<xarray.DataArray (x: 5, y: 5, z: 5)>
array([[[0. , 0.25 , 0.5 , 0.75 , 1. ],
[0. , 0.25 , 0.5 , 0.75 , 1. ],
[0. , 0.25 , 0.5 , 0.75 , 1. ],
[0. , 0.25 , 0.5 , 0.75 , 1. ],
[0. , 0.25 , 0.5 , 0.75 , 1. ]],
[[0.24740396, 0.49740396, 0.74740396, 0.99740396, nan],
[ nan, 0.48971277, nan, 0.98971277, 1.23971277],
[0.2171174 , 0.4671174 , 0.7171174 , 0.9671174 , 1.2171174 ],
[0.18102272, 0.43102272, 0.68102272, 0.93102272, 1.18102272],
[ nan, 0.38367293, 0.63367293, 0.88367293, 1.13367293]],
[[0.47942554, nan, 0.97942554, 1.22942554, 1.47942554],
[0.46452136, 0.71452136, 0.96452136, 1.21452136, 1.46452136],
[0.42073549, 0.67073549, 0.92073549, 1.17073549, nan],
[0.35079033, 0.60079033, 0.85079033, 1.10079033, 1.35079033],
[ nan, 0.50903472, 0.75903472, nan, 1.25903472]],
[[0.68163876, 0.93163876, 1.18163876, 1.43163876, 1.68163876],
[0.66044826, 0.91044826, nan, 1.41044826, 1.66044826],
[0.59819429, 0.84819429, 1.09819429, 1.34819429, nan],
[ nan, 0.74874749, 0.99874749, 1.24874749, nan],
[0.36829099, 0.61829099, 0.86829099, 1.11829099, 1.36829099]],
[[0.84147098, 1.09147098, nan, 1.59147098, 1.84147098],
[0.81531169, 1.06531169, 1.31531169, 1.56531169, 1.81531169],
[0.73846026, 0.98846026, 1.23846026, 1.48846026, nan],
[0.61569495, 0.86569495, 1.11569495, 1.36569495, 1.61569495],
[0.45464871, 0.70464871, 0.95464871, nan, 1.45464871]]])
Coordinates:
* x (x) float64 0.0 0.25 0.5 0.75 1.0
* y (y) float64 0.0 0.25 0.5 0.75 1.0
* z (z) float64 0.0 0.25 0.5 0.75 1.0
要使用非结构化scipy插值器,必须将具有缺失值的网格数据转换为没有缺失值的1D点的矢量:
# ravel all points and find the valid ones
points = da.data.ravel()
valid = ~np.isnan(points)
points_valid = points[valid]
# construct arrays of (x, y, z) points, masked to only include the valid points
xx, yy, zz = np.meshgrid(x, y, z)
xx, yy, zz = xx.ravel(), yy.ravel(), zz.ravel()
xxv = xx[valid]
yyv = yy[valid]
zzv = zz[valid]
# feed these into the interpolator, and also provide the target grid
interpolated = scipy.interpolate.griddata(np.stack([xxv, yyv, zzv]).T, points_valid, (xx, yy, zz), method="linear")
# reshape to match the original array and replace the DataArray values with
# the interpolated data
da.values = interpolated.reshape(da.shape)
这导致阵列被填充
In [32]: da
Out[32]:
<xarray.DataArray (x: 5, y: 5, z: 5)>
array([[[0. , 0.25 , 0.5 , 0.75 , 1. ],
[0. , 0.25 , 0.5 , 0.75 , 1. ],
[0. , 0.25 , 0.5 , 0.75 , 1. ],
[0. , 0.25 , 0.5 , 0.75 , 1. ],
[0. , 0.25 , 0.5 , 0.75 , 1. ]],
[[0.24740396, 0.49740396, 0.74740396, 0.99740396, 1.23971277],
[0.23226068, 0.48971277, 0.73226068, 0.98971277, 1.23971277],
[0.2171174 , 0.4671174 , 0.7171174 , 0.9671174 , 1.2171174 ],
[0.18102272, 0.43102272, 0.68102272, 0.93102272, 1.18102272],
[0.12276366, 0.38367293, 0.63367293, 0.88367293, 1.13367293]],
[[0.47942554, 0.71452136, 0.97942554, 1.22942554, 1.47942554],
[0.46452136, 0.71452136, 0.96452136, 1.21452136, 1.46452136],
[0.42073549, 0.67073549, 0.92073549, 1.17073549, 1.40765584],
[0.35079033, 0.60079033, 0.85079033, 1.10079033, 1.35079033],
[0.24552733, 0.50903472, 0.75903472, 1.00903472, 1.25903472]],
[[0.68163876, 0.93163876, 1.18163876, 1.43163876, 1.68163876],
[0.66044826, 0.91044826, 1.16044826, 1.41044826, 1.66044826],
[0.59819429, 0.84819429, 1.09819429, 1.34819429, 1.57184545],
[0.48324264, 0.74874749, 0.99874749, 1.24874749, 1.48324264],
[0.36829099, 0.61829099, 0.86829099, 1.11829099, 1.36829099]],
[[0.84147098, 1.09147098, 1.34147098, 1.59147098, 1.84147098],
[0.81531169, 1.06531169, 1.31531169, 1.56531169, 1.81531169],
[0.73846026, 0.98846026, 1.23846026, 1.48846026, 1.71550332],
[0.61569495, 0.86569495, 1.11569495, 1.36569495, 1.61569495],
[0.45464871, 0.70464871, 0.95464871, 1.20464871, 1.45464871]]])
Coordinates:
* x (x) float64 0.0 0.25 0.5 0.75 1.0
* y (y) float64 0.0 0.25 0.5 0.75 1.0
* z (z) float64 0.0 0.25 0.5 0.75 1.0
请注意,这填充了整个阵列,因为可用点的凸包覆盖了整个阵列。如果不是这种情况,则可能需要使用最近邻居或将样条曲线拟合到填充数据的第二步。