将熊猫系列的"_"分隔字符串拆分为可变数量的字段,填充缺失的字段



在以下数据帧中,我想将系列"resource"正确地拆分为不同的组件,这些组件以字符"_"分隔:

资源
MTUG1_ABO_DPP_1
MTUG1_ABO_DPP_2
MTUG1_ABO_DPP_3
MTUG1_ABO_DPP_4
MTUG1_ABO_DPP_5
MTUG1_ABO_DPU_1
MTUG1_ABO_DPU_2
MTUG1_ABO_DPU_3
MTUG1_ABO_UUB_VDU1_1
MTUG1_ABO_UUB_VDU1_2
MTUG1_ABO_UUB_VDU1_3
MTUG1_ABO_UUB_VDU2_1
MTUG1_ABO_UUB_VDU2_2

您可以使用布尔索引分别应用这两种情况,在布尔索引中检查vdu_num是否为NaN

df.loc[~vdu_num.isna(), 'vm']=df['Unit'].str.cat(unit_num, sep="_").str.cat(vdu_num, sep="_")
df.loc[vdu_num.isna(), 'vm']=df['Unit'].str.cat(unit_num, sep="_")

产生

resource              Node    Unit    unit_num      vdu_num  vm
--  --------------------  ------  ------  ----------  ---------  ----------
0  MTUG1_ABO_DPP_1       MTUG1   DPP     1                 nan  DPP_1
1  MTUG1_ABO_DPP_2       MTUG1   DPP     2                 nan  DPP_2
2  MTUG1_ABO_DPP_3       MTUG1   DPP     3                 nan  DPP_3
3  MTUG1_ABO_DPP_4       MTUG1   DPP     4                 nan  DPP_4
4  MTUG1_ABO_DPP_5       MTUG1   DPP     5                 nan  DPP_5
5  MTUG1_ABO_DPU_1       MTUG1   DPU     1                 nan  DPU_1
6  MTUG1_ABO_DPU_2       MTUG1   DPU     2                 nan  DPU_2
7  MTUG1_ABO_DPU_3       MTUG1   DPU     3                 nan  DPU_3
8  MTUG1_ABO_UUB_VDU1_1  MTUG1   UUB     VDU1                1  UUB_VDU1_1
9  MTUG1_ABO_UUB_VDU1_2  MTUG1   UUB     VDU1                2  UUB_VDU1_2
10  MTUG1_ABO_UUB_VDU1_3  MTUG1   UUB     VDU1                3  UUB_VDU1_3
11  MTUG1_ABO_UUB_VDU2_1  MTUG1   UUB     VDU2                1  UUB_VDU2_1
12  MTUG1_ABO_UUB_VDU2_2  MTUG1   UUB     VDU2                2  UUB_VDU2_2

您可以简化拆分(只做一次(,然后也可以使用whereunit_numvdu_num,具体取决于非null:

df2 = (
df['resource']
.str.split('_', expand=True)[[0, 2, 3, 4]]
.set_axis('Node Unit unit_num vdu_num'.split(), axis=1)
)
df2['vm'] = df2['Unit'].str.cat(
df2['unit_num'].where(df2['vdu_num'].isnull(), df2['vdu_num']), sep='_')

或者,如果您喜欢覆盖原始df:

df = pd.concat([
df['resource'],
df['resource']
.str.split('_', expand=True)[[0, 2, 3, 4]]
.set_axis('Node Unit unit_num vdu_num'.split(), axis=1)
], axis=1)
df['vm'] = df['Unit'].str.cat(
df['unit_num'].where(df['vdu_num'].isnull(), df['vdu_num']), sep='_')

最新更新