如何通过折叠坐标来重塑xarray数据集



我目前有一个数据集,当用xarray打开时包含三个坐标x, y, band。波段坐标在4个不同的时间间隔分别有温度和露点,这意味着总共有8个波段。有没有一种方法可以重塑它,使x, y, band, time的波段坐标现在只有长度2而时间坐标是长度4?

我想我可以添加一个名为time的新坐标,然后在but

中添加频带
ds = ds.assign_coords(time=[1,2,3,4])

返回ValueError: cannot add coordinates with new dimensions to a DataArray

可以重新分配"频带";坐标到MultiIndex:

In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])
In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
...:     [
...:         [1, 1, 1, 1, 2, 2, 2, 2],
...:         pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
...:     ],
...:     names=['band_stacked', 'time'],
...: )
In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
[3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
[7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
[5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
[[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
[9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
[6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
[7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
* band          (band) MultiIndex
- band_stacked  (band) int64 1 1 1 1 2 2 2 2
- time          (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y

则可以通过解除堆叠来展开维度:

In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
5.23808010e-01],
[8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
1.54739786e-02]],
...
[[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
6.23766131e-02],
[5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
2.44385584e-01]]]])
Coordinates:
* band     (band) int64 1 2
* time     (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y

另一个更手动的选择是在numpy中重塑并创建一个新的DataArray。注意,对于更大的数组,这个手动重塑要快得多:

In [8]: reshaped = xr.DataArray(
...:     da.data.reshape((4, 4, 2, 4)),
...:     dims=['x', 'y', 'band', 'time'],
...:     coords={
...:         'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
...:         'band': [1, 2],
...:     },
...: )

请注意,如果您的数据是分块的(假设您希望保持这种方式),您的选项会受到一些限制-请参阅关于重塑任务数组的任务文档。第一种(MultiIndexing unstack)方法确实适用于任务数组,只要数组不是沿着未堆叠的维度分块。请看这个问题的例子。

最新更新