展平/展开/折叠三维xr.DataArray(Xarray)沿一个轴划分为2个维度

我有一个数据集，我在其中存储不同类/子类型的复制（不确定该怎么称呼它），然后存储每个类/子类的属性。从本质上讲，有5个子类型/类，每个子类型/类有4个重复，测量了100个属性。

有没有像np.ravel或np.flatten这样的方法可以使用Xarray合并2个维度

在这个例子中，我想合并调光subtype和replicates，这样我就有了一个2D阵列（或者pd.DataFrame和attributes vs. subtype/replicates。

它不需要具有"coord_1|coord_2"或任何格式。如果它保留原始的coord名称，那将非常有用。也许有groupby这样的东西可以做到这一点？Groupby总是让我困惑，所以如果它是xarray原生的东西，那就太棒了。

import xarray as xr
import numpy as np
# Set up xr.DataArray
dims = (5,4,100)
DA_data = xr.DataArray(np.random.random(dims), dims=["subtype","replicates","attributes"])
DA_data.coords["subtype"] = ["subtype_%d"%_ for _ in range(dims[0])]
DA_data.coords["replicates"] = ["rep_%d"%_ for _ in range(dims[1])]
DA_data.coords["attributes"] = ["attr_%d"%_ for _ in range(dims[2])]
# DA_data.coords
# Coordinates:
#   * subtype     (subtype) <U9 'subtype_0' 'subtype_1' 'subtype_2' ...
#   * replicates  (replicates) <U5 'rep_0' 'rep_1' 'rep_2' 'rep_3'
#   * attributes  (attributes) <U7 'attr_0' 'attr_1' 'attr_2' 'attr_3' ...
# DA_data.dims
# ('subtype', 'replicates', 'attributes')
# Naive way to collapse the replicate dimension into the subtype dimension
desired_columns = list()
for subtype in DA_data.coords["subtype"]:
    for replicate in DA_data.coords["replicates"]:
        desired_columns.append(str(subtype.values) + "|" + str(replicate.values))
desired_columns
# ['subtype_0|rep_0',
#  'subtype_0|rep_1',
#  'subtype_0|rep_2',
#  'subtype_0|rep_3',
#  'subtype_1|rep_0',
#  'subtype_1|rep_1',
#  'subtype_1|rep_2',
#  'subtype_1|rep_3',
#  'subtype_2|rep_0',
#  'subtype_2|rep_1',
#  'subtype_2|rep_2',
#  'subtype_2|rep_3',
#  'subtype_3|rep_0',
#  'subtype_3|rep_1',
#  'subtype_3|rep_2',
#  'subtype_3|rep_3',
#  'subtype_4|rep_0',
#  'subtype_4|rep_1',
#  'subtype_4|rep_2',
#  'subtype_4|rep_3']

是的，这正是.stack的用途：

In [33]: stacked = DA_data.stack(desired=['subtype', 'replicates'])
In [34]: stacked
Out[34]:
<xarray.DataArray (attributes: 100, desired: 20)>
array([[ 0.54020268,  0.14914837,  0.83398895, ...,  0.25986503,
         0.62520466,  0.08617668],
       [ 0.47021735,  0.10627027,  0.66666478, ...,  0.84392176,
         0.64461418,  0.4444864 ],
       [ 0.4065543 ,  0.59817851,  0.65033094, ...,  0.01747058,
         0.94414244,  0.31467342],
       ...,
       [ 0.23724934,  0.61742922,  0.97563316, ...,  0.62966631,
         0.89513904,  0.20139552],
       [ 0.21157447,  0.43868899,  0.77488211, ...,  0.98285015,
         0.24367352,  0.8061804 ],
       [ 0.21518079,  0.234854  ,  0.18294781, ...,  0.64679141,
         0.49678393,  0.32215219]])
Coordinates:
  * attributes  (attributes) |S7 'attr_0' 'attr_1' 'attr_2' 'attr_3' ...
  * desired     (desired) object ('subtype_0', 'rep_0') ...

得到的堆叠坐标是pandas.MultiIndex，其值由元组给出：

In [35]: stacked['desired'].values
Out[35]:
array([('subtype_0', 'rep_0'), ('subtype_0', 'rep_1'),
       ('subtype_0', 'rep_2'), ('subtype_0', 'rep_3'),
       ('subtype_1', 'rep_0'), ('subtype_1', 'rep_1'),
       ('subtype_1', 'rep_2'), ('subtype_1', 'rep_3'),
       ('subtype_2', 'rep_0'), ('subtype_2', 'rep_1'),
       ('subtype_2', 'rep_2'), ('subtype_2', 'rep_3'),
       ('subtype_3', 'rep_0'), ('subtype_3', 'rep_1'),
       ('subtype_3', 'rep_2'), ('subtype_3', 'rep_3'),
       ('subtype_4', 'rep_0'), ('subtype_4', 'rep_1'),
       ('subtype_4', 'rep_2'), ('subtype_4', 'rep_3')], dtype=object)

相关内容

最新更新

热门标签：